Viktor Barzin 4c8d12229f mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP

Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol
listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr:
... Servname not supported for ai_socktype` — the daemon respawns get throttled by
postfix master, and real client connections that land mid-respawn time out. We saw
this as ~50% timeout rate on public 587 from inside the cluster.

Layer 1 (book-search) — stacks/ebooks/main.tf:
  SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local
  Internal services should use ClusterIP, not hairpin through pfSense+HAProxy.
  12/12 OK in <28ms vs ~6/12 timeouts on the public path.

Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php:
  Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc:
    30145 → pod 25  (stock postscreen)
    30146 → pod 465 (stock smtps)
    30147 → pod 587 (stock submission)
  HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to
  redirect L4 health probes to those ports while real client traffic keeps
  going to 30125-30128 with PROXY v2.
  Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s.
  Inter dropped 120000 → 5000 since log-spam concern is gone.

`option smtpchk EHLO` was tried first but flapped against postscreen (multi-line
greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP).
Plain TCP accept-on-port check is sufficient for both submission and postscreen.

Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck
path and mark the "Known warts" entry as resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-05 19:45:33 +00:00

11 KiB

Raw Blame History

pfSense HAProxy for Mailserver — Runbook

Last updated: 2026-04-19 (Phase 6 complete)

What & why

External mail traffic (SMTP/IMAP) requires real client IP visibility for CrowdSec + Postfix rate-limiting. MetalLB cannot inject PROXY-protocol headers (see mailserver-proxy-protocol.md), so pfSense runs a small HAProxy that:

Listens on the pfSense VLAN20 IP (10.0.20.1) on all 4 mail ports,
Forwards each connection to a k8s node's NodePort with send-proxy-v2,
Injects PROXY v2 framing so Postfix/Dovecot see the original client IP,
TCP-checks every k8s worker via dedicated non-PROXY healthcheck NodePorts (30145/30146/30147 → pod stock 25/465/587 listeners, no PROXY required). This split path avoids the smtpd_peer_hostaddr_to_sockaddr fatal that used to fire on every PROXY-aware health probe and throttled real client connections.

Corresponding k8s-side setup (stacks/mailserver/modules/mailserver/):

ConfigMap mailserver-user-patches → user-patches.sh appends 3 alt master.cf services to Postfix:
- :2525 postscreen (alt :25) with postscreen_upstream_proxy_protocol=haproxy
- :4465 smtpd (alt :465 SMTPS) with smtpd_upstream_proxy_protocol=haproxy
- :5587 smtpd (alt :587 submission) with smtpd_upstream_proxy_protocol=haproxy
ConfigMap mailserver.config adds Dovecot inet_listener imaps_proxy on port 10993 with haproxy = yes and haproxy_trusted_networks = 10.0.20.0/24.
Service mailserver-proxy (NodePort, ETP:Cluster) — 4 PROXY data ports + 3 non-PROXY healthcheck ports:
- Data (PROXY v2):
  - port 25 → targetPort 2525 → nodePort 30125
  - port 465 → targetPort 4465 → nodePort 30126
  - port 587 → targetPort 5587 → nodePort 30127
  - port 993 → targetPort 10993 → nodePort 30128
- Healthcheck (no PROXY, stock SMTP/SMTPS/Submission listeners):
  - port 2500 → targetPort 25 → nodePort 30145 (smtp-check)
  - port 4650 → targetPort 465 → nodePort 30146 (smtps-check)
  - port 5870 → targetPort 587 → nodePort 30147 (sub-check)
Service mailserver (ClusterIP) — unchanged stock ports 25/465/587/993 for intra-cluster clients (Roundcube pod, email-roundtrip-monitor CronJob, book-search). These listeners are PROXY-free.

bd: code-yiu.

Steady-state architecture

External mail (WAN) path — PROXY v2
┌─────────────────────────────────────────────────────────────────────┐
│  Client (real IP)                                                   │
│      │  SMTP/SMTPS/Sub/IMAPS                                        │
│      ▼                                                              │
│  pfSense WAN:{25,465,587,993}                                       │
│      │  NAT rdr → 10.0.20.1:{same}                                  │
│      ▼                                                              │
│  pfSense HAProxy  (mode tcp, 4 frontends, 4 backend pools)          │
│      │  data: send-proxy-v2 → :{30125..30128}  (PROXY-aware pod)   │
│      │  health: TCP-check    → :{30145..30147}  (no-PROXY pod)     │
│      │  inter 5000                                                  │
│      ▼                                                              │
│  k8s-node<1-4>:{30125..30128}   ← any node (ETP:Cluster)            │
│      │  kube-proxy SNAT (source IP lost on the wire)                │
│      ▼                                                              │
│  mailserver pod :{2525,4465,5587,10993}                             │
│      │  postscreen / smtpd / Dovecot parse PROXY v2 header          │
│      │  → real client IP recovered despite kube-proxy SNAT          │
│      ▼                                                              │
│  CrowdSec + Postfix / Dovecot see the true source IP ✓              │
└─────────────────────────────────────────────────────────────────────┘

Intra-cluster path — no PROXY
┌─────────────────────────────────────────────────────────────────────┐
│  Roundcube pod / email-roundtrip-monitor CronJob                    │
│      │  SMTP/IMAP                                                   │
│      ▼                                                              │
│  mailserver.mailserver.svc.cluster.local:{25,465,587,993}           │
│      │  ClusterIP — bypasses LoadBalancer/NodePort layer entirely   │
│      ▼                                                              │
│  mailserver pod stock :{25,465,587,993}  (PROXY-free)               │
└─────────────────────────────────────────────────────────────────────┘

Validation

# All HAProxy frontends listening
ssh admin@10.0.20.1 'sockstat -l | grep haproxy'
# Expect: *:25, *:465, *:587, *:993, *:2525 (test port)

# All backend pools healthy
ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio" \
  | awk 'NR>1 {print $3, $4, $6}'
# srv_op_state 2 = UP, 0 = DOWN

# Container listens on all 8 ports
kubectl exec -n mailserver -c docker-mailserver deployment/mailserver -- \
  ss -ltn | grep -E ':(25|2525|465|4465|587|5587|993|10993)\b'

# pf rdr points at pfSense (10.0.20.1), not <mailserver> alias
ssh admin@10.0.20.1 'pfctl -sn' | grep -E 'port = (25|submission|imaps|smtps)'

# E2E probe — Brevo → external MX :25 → IMAP fetch
kubectl create job --from=cronjob/email-roundtrip-monitor probe-test -n mailserver
kubectl wait --for=condition=complete --timeout=90s job/probe-test -n mailserver
kubectl logs job/probe-test -n mailserver | grep SUCCESS
kubectl delete job probe-test -n mailserver

# Real client IP in maillog post-delivery
kubectl logs -c docker-mailserver deployment/mailserver -n mailserver \
  | grep 'smtpd-proxy25.*CONNECT from' | tail -5
# Expect external source IPs (e.g., Brevo 77.32.148.x), NOT 10.0.20.x

Bootstrap / restore from scratch

pfSense HAProxy config lives in /cf/conf/config.xml under <installedpackages><haproxy>. That file is scp'd nightly to /mnt/backup/pfsense/config-YYYYMMDD.xml by scripts/daily-backup.sh, then synced to Synology. To rebuild from source of truth (git):

scp infra/scripts/pfsense-haproxy-bootstrap.php admin@10.0.20.1:/tmp/
ssh admin@10.0.20.1 'php /tmp/pfsense-haproxy-bootstrap.php'

The script is idempotent — re-runs reset the mailserver frontends + backends to the declared state.

Expected output:

haproxy_check_and_run rc=OK

Operations

Change backend k8s node IPs / NodePorts

Edit infra/scripts/pfsense-haproxy-bootstrap.php — $NODES array + the build_pool() port arguments. Re-run the bootstrap command above. Don't hand-edit /var/etc/haproxy/haproxy.cfg — it is regenerated from XML on every apply.

Check health of backends

ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio"

srv_op_state=2 means UP, 0 means DOWN.

View live HAProxy stats (WebUI)

https://pfsense.viktorbarzin.me → Services → HAProxy → Stats.

Reload after config.xml edit

ssh admin@10.0.20.1 'pfSsh.php playback svc restart haproxy'

Rollback (flip NAT back to MetalLB, post-Phase-6 only partial)

There is no Phase-6 rollback one-liner. Phase 6 removed the MetalLB LoadBalancer 10.0.20.202 entirely, so un-flipping NAT now would send traffic to a dead alias. To regress:

Re-add metallb.io/loadBalancerIPs = "10.0.20.202" + type = "LoadBalancer"
- external_traffic_policy = "Local" to kubernetes_service.mailserver, apply.
Re-add the mailserver host alias in pfSense pointing at 10.0.20.202 (Firewall → Aliases → Hosts).
Run infra/scripts/pfsense-nat-mailserver-haproxy-unflip.php on pfSense.

For rollback of just the NAT (Phase 4) without touching the Service, only the third step is needed — but only meaningful BEFORE Phase 6.

Restore from backup

pfSense config backup is a plain XML file:

/mnt/backup/pfsense/config-YYYYMMDD.xml        # sda host copy (1.1TB RAID1)
/volume1/Backup/Viki/pve-backup/pfsense/...    # Synology offsite

Full restore: pfSense WebUI → Diagnostics → Backup & Restore → Upload that config.xml. The <installedpackages><haproxy> section is included.

Phase history (bd code-yiu)

Phase	Status	Description
1a	✅ commit `ef75c02f`	k8s alt :2525 listener + NodePort Service
2	✅ 2026-04-19	pfSense HAProxy pkg installed (`pfSense-pkg-haproxy-devel-0.63_2`, HAProxy 2.9-dev6)
3	✅ commit `ba697b02`	HAProxy config persisted in pfSense XML (bootstrap script + this runbook)
4+5	✅ commit `9806d515`	4-port alt listeners + HAProxy frontends for 25/465/587/993 + NAT flip
6	✅ this commit	Mailserver Service downgraded LoadBalancer → ClusterIP; `10.0.20.202` released back to MetalLB pool; orphan `mailserver` pfSense alias removed; monitors retargeted

Known warts

~~HAProxy TCP health-check with send-proxy-v2 generates getpeername: Transport endpoint not connected warnings on postscreen every check cycle.~~ Resolved 2026-05-05: dedicated non-PROXY healthcheck NodePorts (30145/30146/30147 → stock pod 25/465/587) added; HAProxy now checks those, eliminating both the getpeername postscreen warnings and the smtpd_peer_hostaddr_to_sockaddr: ... Servname not supported fatals that were throttling smtpd respawns and causing ~50% client timeouts on the public 587 path. inter dropped 120000 → 5000 (fast failover, no log-spam concern). option smtpchk was tried but flapped against postscreen (multi-line greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP). Plain TCP check on the no-PROXY ports is sufficient.
Frontend binds on all pfSense interfaces (bind :25 instead of 10.0.20.1:25). <extaddr> is set in XML but pfSense templates it port-only. Low concern in practice because WAN firewall rules plus the NAT rdr gate external access; internal VLAN clients SHOULD be able to reach HAProxy on any pfSense-local IP.
k8s-node5 doesn't exist — cluster has master + 4 workers. Backend pool capped at 4 servers.
Postscreen still logs improper command pipelining for legitimate clients that send EHLO\r\nQUIT\r\n as a single TCP write. This is unchanged pre/post-migration — postscreen's anti-bot heuristic.

11 KiB Raw Blame History