infra/docs
Viktor Barzin 4c8d12229f mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP
Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol
listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr:
... Servname not supported for ai_socktype` — the daemon respawns get throttled by
postfix master, and real client connections that land mid-respawn time out. We saw
this as ~50% timeout rate on public 587 from inside the cluster.

Layer 1 (book-search) — stacks/ebooks/main.tf:
  SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local
  Internal services should use ClusterIP, not hairpin through pfSense+HAProxy.
  12/12 OK in <28ms vs ~6/12 timeouts on the public path.

Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php:
  Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc:
    30145 → pod 25  (stock postscreen)
    30146 → pod 465 (stock smtps)
    30147 → pod 587 (stock submission)
  HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to
  redirect L4 health probes to those ports while real client traffic keeps
  going to 30125-30128 with PROXY v2.
  Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s.
  Inter dropped 120000 → 5000 since log-spam concern is gone.

`option smtpchk EHLO` was tried first but flapped against postscreen (multi-line
greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP).
Plain TCP accept-on-port check is sufficient for both submission and postscreen.

Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck
path and mark the "Known warts" entry as resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 19:45:33 +00:00
..
architecture phpipam-pfsense-import: every 5min → hourly 2026-04-26 22:48:43 +00:00
plans vault: record Phase 3 vault Released-PV cleanup 2026-04-25 23:08:45 +00:00
post-mortems vault: complete Phase 2 NFS-hostile migration; remove nfs-proxmox SC 2026-04-25 17:10:00 +00:00
runbooks mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP 2026-05-05 19:45:33 +00:00
README.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest