mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP
Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol
listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr:
... Servname not supported for ai_socktype` — the daemon respawns get throttled by
postfix master, and real client connections that land mid-respawn time out. We saw
this as ~50% timeout rate on public 587 from inside the cluster.
Layer 1 (book-search) — stacks/ebooks/main.tf:
SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local
Internal services should use ClusterIP, not hairpin through pfSense+HAProxy.
12/12 OK in <28ms vs ~6/12 timeouts on the public path.
Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php:
Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc:
30145 → pod 25 (stock postscreen)
30146 → pod 465 (stock smtps)
30147 → pod 587 (stock submission)
HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to
redirect L4 health probes to those ports while real client traffic keeps
going to 30125-30128 with PROXY v2.
Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s.
Inter dropped 120000 → 5000 since log-spam concern is gone.
`option smtpchk EHLO` was tried first but flapped against postscreen (multi-line
greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP).
Plain TCP accept-on-port check is sufficient for both submission and postscreen.
Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck
path and mark the "Known warts" entry as resolved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
c4c5057edc
commit
4c8d12229f
4 changed files with 131 additions and 25 deletions
|
|
@ -12,7 +12,11 @@ so pfSense runs a small HAProxy that:
|
||||||
1. Listens on the pfSense VLAN20 IP (`10.0.20.1`) on all 4 mail ports,
|
1. Listens on the pfSense VLAN20 IP (`10.0.20.1`) on all 4 mail ports,
|
||||||
2. Forwards each connection to a k8s node's NodePort with `send-proxy-v2`,
|
2. Forwards each connection to a k8s node's NodePort with `send-proxy-v2`,
|
||||||
3. Injects PROXY v2 framing so Postfix/Dovecot see the original client IP,
|
3. Injects PROXY v2 framing so Postfix/Dovecot see the original client IP,
|
||||||
4. TCP health-checks every k8s worker — any node can serve (ETP:Cluster).
|
4. TCP-checks every k8s worker via dedicated **non-PROXY healthcheck NodePorts**
|
||||||
|
(30145/30146/30147 → pod stock 25/465/587 listeners, no PROXY required).
|
||||||
|
This split path avoids the `smtpd_peer_hostaddr_to_sockaddr` fatal that
|
||||||
|
used to fire on every PROXY-aware health probe and throttled real client
|
||||||
|
connections.
|
||||||
|
|
||||||
Corresponding k8s-side setup (`stacks/mailserver/modules/mailserver/`):
|
Corresponding k8s-side setup (`stacks/mailserver/modules/mailserver/`):
|
||||||
|
|
||||||
|
|
@ -23,14 +27,20 @@ Corresponding k8s-side setup (`stacks/mailserver/modules/mailserver/`):
|
||||||
- `:5587` smtpd (alt :587 submission) with `smtpd_upstream_proxy_protocol=haproxy`
|
- `:5587` smtpd (alt :587 submission) with `smtpd_upstream_proxy_protocol=haproxy`
|
||||||
- ConfigMap `mailserver.config` adds Dovecot `inet_listener imaps_proxy` on
|
- ConfigMap `mailserver.config` adds Dovecot `inet_listener imaps_proxy` on
|
||||||
port 10993 with `haproxy = yes` and `haproxy_trusted_networks = 10.0.20.0/24`.
|
port 10993 with `haproxy = yes` and `haproxy_trusted_networks = 10.0.20.0/24`.
|
||||||
- Service `mailserver-proxy` (NodePort, ETP:Cluster) with 4 NodePorts:
|
- Service `mailserver-proxy` (NodePort, ETP:Cluster) — 4 PROXY data ports +
|
||||||
- `port 25 → targetPort 2525 → nodePort 30125`
|
3 non-PROXY healthcheck ports:
|
||||||
- `port 465 → targetPort 4465 → nodePort 30126`
|
- Data (PROXY v2):
|
||||||
- `port 587 → targetPort 5587 → nodePort 30127`
|
- `port 25 → targetPort 2525 → nodePort 30125`
|
||||||
- `port 993 → targetPort 10993 → nodePort 30128`
|
- `port 465 → targetPort 4465 → nodePort 30126`
|
||||||
|
- `port 587 → targetPort 5587 → nodePort 30127`
|
||||||
|
- `port 993 → targetPort 10993 → nodePort 30128`
|
||||||
|
- Healthcheck (no PROXY, stock SMTP/SMTPS/Submission listeners):
|
||||||
|
- `port 2500 → targetPort 25 → nodePort 30145` (smtp-check)
|
||||||
|
- `port 4650 → targetPort 465 → nodePort 30146` (smtps-check)
|
||||||
|
- `port 5870 → targetPort 587 → nodePort 30147` (sub-check)
|
||||||
- Service `mailserver` (ClusterIP) — unchanged stock ports 25/465/587/993
|
- Service `mailserver` (ClusterIP) — unchanged stock ports 25/465/587/993
|
||||||
for intra-cluster clients (Roundcube pod, `email-roundtrip-monitor`
|
for intra-cluster clients (Roundcube pod, `email-roundtrip-monitor`
|
||||||
CronJob). These listeners are PROXY-free.
|
CronJob, book-search). These listeners are PROXY-free.
|
||||||
|
|
||||||
bd: `code-yiu`.
|
bd: `code-yiu`.
|
||||||
|
|
||||||
|
|
@ -46,7 +56,9 @@ External mail (WAN) path — PROXY v2
|
||||||
│ │ NAT rdr → 10.0.20.1:{same} │
|
│ │ NAT rdr → 10.0.20.1:{same} │
|
||||||
│ ▼ │
|
│ ▼ │
|
||||||
│ pfSense HAProxy (mode tcp, 4 frontends, 4 backend pools) │
|
│ pfSense HAProxy (mode tcp, 4 frontends, 4 backend pools) │
|
||||||
│ │ send-proxy-v2 + tcp-check inter 120000 │
|
│ │ data: send-proxy-v2 → :{30125..30128} (PROXY-aware pod) │
|
||||||
|
│ │ health: TCP-check → :{30145..30147} (no-PROXY pod) │
|
||||||
|
│ │ inter 5000 │
|
||||||
│ ▼ │
|
│ ▼ │
|
||||||
│ k8s-node<1-4>:{30125..30128} ← any node (ETP:Cluster) │
|
│ k8s-node<1-4>:{30125..30128} ← any node (ETP:Cluster) │
|
||||||
│ │ kube-proxy SNAT (source IP lost on the wire) │
|
│ │ kube-proxy SNAT (source IP lost on the wire) │
|
||||||
|
|
@ -186,11 +198,18 @@ Full restore: pfSense WebUI → Diagnostics → Backup & Restore → Upload that
|
||||||
|
|
||||||
## Known warts
|
## Known warts
|
||||||
|
|
||||||
- HAProxy TCP health-check with `send-proxy-v2` generates `getpeername:
|
- ~~HAProxy TCP health-check with `send-proxy-v2` generates `getpeername:
|
||||||
Transport endpoint not connected` warnings on postscreen every check cycle.
|
Transport endpoint not connected` warnings on postscreen every check cycle.~~
|
||||||
Mitigated with `inter 120000` (2 min). To reduce further, switch to
|
**Resolved 2026-05-05**: dedicated non-PROXY healthcheck NodePorts
|
||||||
`option smtpchk` — but that requires a separate non-PROXY health-check
|
(30145/30146/30147 → stock pod 25/465/587) added; HAProxy now checks
|
||||||
port on the pod (not done yet).
|
those, eliminating both the `getpeername` postscreen warnings and the
|
||||||
|
`smtpd_peer_hostaddr_to_sockaddr: ... Servname not supported` fatals
|
||||||
|
that were throttling smtpd respawns and causing ~50% client timeouts on
|
||||||
|
the public 587 path. `inter` dropped 120000 → 5000 (fast failover, no
|
||||||
|
log-spam concern). `option smtpchk` was tried but flapped against
|
||||||
|
postscreen (multi-line greet + DNSBL silence + anti-pre-greet detection
|
||||||
|
trip HAProxy's parser → L7RSP). Plain TCP check on the no-PROXY ports
|
||||||
|
is sufficient.
|
||||||
- Frontend binds on all pfSense interfaces (`bind :25` instead of
|
- Frontend binds on all pfSense interfaces (`bind :25` instead of
|
||||||
`10.0.20.1:25`). `<extaddr>` is set in XML but pfSense templates it
|
`10.0.20.1:25`). `<extaddr>` is set in XML but pfSense templates it
|
||||||
port-only. Low concern in practice because WAN firewall rules plus the
|
port-only. Low concern in practice because WAN firewall rules plus the
|
||||||
|
|
|
||||||
|
|
@ -68,7 +68,35 @@ $NODES = [
|
||||||
['k8s-node4', '10.0.20.104'],
|
['k8s-node4', '10.0.20.104'],
|
||||||
];
|
];
|
||||||
|
|
||||||
function build_pool(string $name, string $nodeport, array $nodes): array {
|
// Build a pool with optional split healthcheck path.
|
||||||
|
//
|
||||||
|
// $check_port: if non-null, HAProxy sends health probes to that NodePort
|
||||||
|
// (which Service `mailserver-proxy` maps to the pod's stock no-PROXY
|
||||||
|
// listener — see infra/stacks/mailserver/.../mailserver_proxy ports
|
||||||
|
// 30145/30146/30147). Real client traffic still goes to $nodeport with
|
||||||
|
// PROXY v2 framing.
|
||||||
|
// $check_type: 'TCP' for plain accept-on-port checks, 'ESMTP' for
|
||||||
|
// `option smtpchk EHLO <monitor_domain>` (real SMTP banner+EHLO+250).
|
||||||
|
//
|
||||||
|
// Why split: smtpd-proxy587/4465 fatal on every PROXY-v2-aware health
|
||||||
|
// probe with `smtpd_peer_hostaddr_to_sockaddr: ... Servname not supported`
|
||||||
|
// — the daemon respawns get throttled by Postfix master and real clients
|
||||||
|
// land mid-respawn → 6s TCP timeout. Routing health probes to the stock
|
||||||
|
// no-PROXY port sidesteps the bug entirely while data path still gets
|
||||||
|
// PROXY v2 for CrowdSec/Postfix client-IP visibility. The HAProxy package
|
||||||
|
// has no `checkport` field, so `port N` is appended via the server's
|
||||||
|
// `advanced` string (HAProxy parses server keywords in any order).
|
||||||
|
function build_pool(
|
||||||
|
string $name,
|
||||||
|
string $nodeport,
|
||||||
|
array $nodes,
|
||||||
|
string $check_type = 'TCP',
|
||||||
|
?string $check_port = null,
|
||||||
|
string $monitor_domain = ''
|
||||||
|
): array {
|
||||||
|
$advanced_check = $check_port !== null
|
||||||
|
? "send-proxy-v2 port {$check_port}"
|
||||||
|
: 'send-proxy-v2';
|
||||||
$servers = [];
|
$servers = [];
|
||||||
foreach ($nodes as $n) {
|
foreach ($nodes as $n) {
|
||||||
$servers[] = [
|
$servers[] = [
|
||||||
|
|
@ -77,18 +105,19 @@ function build_pool(string $name, string $nodeport, array $nodes): array {
|
||||||
'port' => $nodeport,
|
'port' => $nodeport,
|
||||||
'weight' => '10',
|
'weight' => '10',
|
||||||
'ssl' => '',
|
'ssl' => '',
|
||||||
// check every 2 min — send-proxy-v2 check + close generates
|
// 5s = sub-block-window failover when a NodePort goes sour.
|
||||||
// noise on postscreen, not worth doing more often.
|
// Safe to be aggressive once health probes don't fatal smtpd.
|
||||||
'checkinter' => '120000',
|
'checkinter' => '5000',
|
||||||
'advanced' => 'send-proxy-v2',
|
'advanced' => $advanced_check,
|
||||||
'status' => 'active',
|
'status' => 'active',
|
||||||
];
|
];
|
||||||
}
|
}
|
||||||
return [
|
return [
|
||||||
'name' => $name,
|
'name' => $name,
|
||||||
'balance' => 'roundrobin',
|
'balance' => 'roundrobin',
|
||||||
'check_type' => 'TCP',
|
'check_type' => $check_type,
|
||||||
'checkinter' => '120000',
|
'monitor_domain' => $monitor_domain,
|
||||||
|
'checkinter' => '5000',
|
||||||
'retries' => '3',
|
'retries' => '3',
|
||||||
'ha_servers' => ['item' => $servers],
|
'ha_servers' => ['item' => $servers],
|
||||||
'advanced_bind' => '',
|
'advanced_bind' => '',
|
||||||
|
|
@ -132,9 +161,28 @@ $h['ha_pools']['item'] = array_values(array_filter(
|
||||||
$h['ha_pools']['item'][] = build_pool('mailserver_nodes', '30125', $NODES);
|
$h['ha_pools']['item'][] = build_pool('mailserver_nodes', '30125', $NODES);
|
||||||
|
|
||||||
// Production pools — one per mail port.
|
// Production pools — one per mail port.
|
||||||
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtp', '30125', $NODES);
|
//
|
||||||
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtps', '30126', $NODES);
|
// All SMTP/SMTPS/Submission backends use plain TCP checks against
|
||||||
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_sub', '30127', $NODES);
|
// dedicated non-PROXY healthcheck NodePorts (30145/30146/30147 → pod
|
||||||
|
// stock 25/465/587) so probes hit the no-PROXY listeners and avoid
|
||||||
|
// the smtpd_peer_hostaddr_to_sockaddr fatal that fires on PROXY-v2
|
||||||
|
// LOCAL frames. Real client traffic still goes to 30125-30128 with
|
||||||
|
// PROXY v2 for client-IP visibility.
|
||||||
|
//
|
||||||
|
// We tried `option smtpchk EHLO` initially — it works on the plain
|
||||||
|
// `submission` daemon (587) but flaps the `postscreen` listener on
|
||||||
|
// port 25 (multi-line greet + DNSBL silence + anti-pre-greet
|
||||||
|
// detection makes HAProxy's simple smtpchk parser hit L7RSP). A
|
||||||
|
// plain TCP accept-on-port check is enough for both: HAProxy still
|
||||||
|
// gets fast failover when the listener actually goes away, and we
|
||||||
|
// stop triggering the Postfix fatal entirely.
|
||||||
|
//
|
||||||
|
// IMAPS stays on its existing TCP-check-with-PROXY-frame for now —
|
||||||
|
// Dovecot's PROXY parser doesn't show the same fatal pattern; adding
|
||||||
|
// a separate IMAP healthcheck path would require another svc port.
|
||||||
|
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtp', '30125', $NODES, 'TCP', '30145');
|
||||||
|
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtps', '30126', $NODES, 'TCP', '30146');
|
||||||
|
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_sub', '30127', $NODES, 'TCP', '30147');
|
||||||
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_imaps', '30128', $NODES);
|
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_imaps', '30128', $NODES);
|
||||||
|
|
||||||
// ── Frontends ───────────────────────────────────────────────────────────
|
// ── Frontends ───────────────────────────────────────────────────────────
|
||||||
|
|
|
||||||
|
|
@ -785,8 +785,18 @@ resource "kubernetes_deployment" "book_search" {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
env {
|
env {
|
||||||
name = "SMTP_HOST"
|
name = "SMTP_HOST"
|
||||||
value = "mail.viktorbarzin.me"
|
# Use intra-cluster ClusterIP path — bypasses pfSense HAProxy +
|
||||||
|
# PROXY v2 (the public path hairpins through HAProxy:587 →
|
||||||
|
# NodePort → pod :5587 where Postfix's smtpd-proxy587 daemon
|
||||||
|
# crashes ~50% of HAProxy healthchecks with
|
||||||
|
# `smtpd_peer_hostaddr_to_sockaddr: ... Servname not supported`,
|
||||||
|
# producing intermittent 6s TCP timeouts for clients that land
|
||||||
|
# mid-respawn). The ClusterIP service points to pod port 587
|
||||||
|
# (stock submission daemon, no PROXY) and is rock-solid (12/12
|
||||||
|
# in <31ms vs 6/12 timeouts on the public path).
|
||||||
|
# See docs/runbooks/mailserver-pfsense-haproxy.md.
|
||||||
|
value = "mailserver.mailserver.svc.cluster.local"
|
||||||
}
|
}
|
||||||
env {
|
env {
|
||||||
name = "SMTP_PORT"
|
name = "SMTP_PORT"
|
||||||
|
|
|
||||||
|
|
@ -733,6 +733,35 @@ resource "kubernetes_service" "mailserver_proxy" {
|
||||||
target_port = 10993
|
target_port = 10993
|
||||||
node_port = 30128
|
node_port = 30128
|
||||||
}
|
}
|
||||||
|
# Dedicated non-PROXY healthcheck NodePorts. HAProxy on pfSense uses
|
||||||
|
# `option smtpchk` against these stock pod ports (25/465/587, no PROXY)
|
||||||
|
# so health probes don't hit the smtpd_peer_hostaddr_to_sockaddr fatal
|
||||||
|
# that fires on PROXY-v2 LOCAL/AF_UNSPEC frames sent during checks. The
|
||||||
|
# data path (30125-30128 → 2525/4465/5587/10993) still gets PROXY v2 for
|
||||||
|
# real client IP visibility — only the healthcheck path is split off.
|
||||||
|
# See infra/scripts/pfsense-haproxy-bootstrap.php (`check port` directive)
|
||||||
|
# and docs/runbooks/mailserver-pfsense-haproxy.md.
|
||||||
|
port {
|
||||||
|
name = "smtp-check"
|
||||||
|
protocol = "TCP"
|
||||||
|
port = 2500
|
||||||
|
target_port = 25
|
||||||
|
node_port = 30145
|
||||||
|
}
|
||||||
|
port {
|
||||||
|
name = "smtps-check"
|
||||||
|
protocol = "TCP"
|
||||||
|
port = 4650
|
||||||
|
target_port = 465
|
||||||
|
node_port = 30146
|
||||||
|
}
|
||||||
|
port {
|
||||||
|
name = "sub-check"
|
||||||
|
protocol = "TCP"
|
||||||
|
port = 5870
|
||||||
|
target_port = 587
|
||||||
|
node_port = 30147
|
||||||
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue