mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP

Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol
listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr:
... Servname not supported for ai_socktype` — the daemon respawns get throttled by
postfix master, and real client connections that land mid-respawn time out. We saw
this as ~50% timeout rate on public 587 from inside the cluster.

Layer 1 (book-search) — stacks/ebooks/main.tf:
  SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local
  Internal services should use ClusterIP, not hairpin through pfSense+HAProxy.
  12/12 OK in <28ms vs ~6/12 timeouts on the public path.

Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php:
  Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc:
    30145 → pod 25  (stock postscreen)
    30146 → pod 465 (stock smtps)
    30147 → pod 587 (stock submission)
  HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to
  redirect L4 health probes to those ports while real client traffic keeps
  going to 30125-30128 with PROXY v2.
  Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s.
  Inter dropped 120000 → 5000 since log-spam concern is gone.

`option smtpchk EHLO` was tried first but flapped against postscreen (multi-line
greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP).
Plain TCP accept-on-port check is sufficient for both submission and postscreen.

Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck
path and mark the "Known warts" entry as resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-05 19:45:33 +00:00
parent c4c5057edc
commit 4c8d12229f
4 changed files with 131 additions and 25 deletions

View file

@ -733,6 +733,35 @@ resource "kubernetes_service" "mailserver_proxy" {
target_port = 10993
node_port = 30128
}
# Dedicated non-PROXY healthcheck NodePorts. HAProxy on pfSense uses
# `option smtpchk` against these stock pod ports (25/465/587, no PROXY)
# so health probes don't hit the smtpd_peer_hostaddr_to_sockaddr fatal
# that fires on PROXY-v2 LOCAL/AF_UNSPEC frames sent during checks. The
# data path (30125-30128 2525/4465/5587/10993) still gets PROXY v2 for
# real client IP visibility only the healthcheck path is split off.
# See infra/scripts/pfsense-haproxy-bootstrap.php (`check port` directive)
# and docs/runbooks/mailserver-pfsense-haproxy.md.
port {
name = "smtp-check"
protocol = "TCP"
port = 2500
target_port = 25
node_port = 30145
}
port {
name = "smtps-check"
protocol = "TCP"
port = 4650
target_port = 465
node_port = 30146
}
port {
name = "sub-check"
protocol = "TCP"
port = 5870
target_port = 587
node_port = 30147
}
}
}