[mailserver] Phase 6 — decommission MetalLB LB path [ci skip]
## Context (bd code-yiu) With Phase 4+5 proven (external mail flows through pfSense HAProxy + PROXY v2 to the alt PROXY-speaking container listeners), the MetalLB LoadBalancer Service + `10.0.20.202` external IP + ETP:Local policy are obsolete. Phase 6 decommissions them and documents the steady-state architecture. ## This change ### Terraform (stacks/mailserver/modules/mailserver/main.tf) - `kubernetes_service.mailserver` downgraded: `LoadBalancer` → `ClusterIP`. - Removed `metallb.io/loadBalancerIPs = "10.0.20.202"` annotation. - Removed `external_traffic_policy = "Local"` (irrelevant for ClusterIP). - Port set unchanged — the Service still exposes 25/465/587/993 for intra-cluster clients (Roundcube pod, `email-roundtrip-monitor` CronJob) that hit the stock PROXY-free container listeners. - Inline comment documents the downgrade rationale + companion `mailserver-proxy` NodePort Service that now carries external traffic. ### pfSense (ops, not in git) - `mailserver` host alias (pointing at `10.0.20.202`) deleted. No NAT rule references it post-Phase-4; keeping it would be misleading dead metadata. Reversible via WebUI + `php /tmp/delete-mailserver-alias.php` companion script (ad-hoc, not checked in — alias is just a Firewall → Aliases → Hosts entry). ### Uptime Kuma (ops) - Monitors `282` and `283` (PORT checks) retargeted from `10.0.20.202` → `10.0.20.1`. Renamed to `Mailserver HAProxy SMTP (pfSense :25)` / `... IMAPS (pfSense :993)` to reflect their new purpose (HAProxy layer liveness). History retained (edit, not delete-recreate). ### Docs - `docs/runbooks/mailserver-pfsense-haproxy.md` — fully rewritten "Current state" section; now reflects steady-state architecture with two-path diagram (external via HAProxy / intra-cluster via ClusterIP). Phase history table marks Phase 6 ✅. Rollback section updated (no one-liner post-Phase-6; need Service-type re-upgrade + alias re-add). - `docs/architecture/mailserver.md` — Overview, Mermaid diagram, Inbound flow, CrowdSec section, Uptime Kuma monitors list, Decisions section (dedicated MetalLB IP → "Client-IP Preservation via HAProxy + PROXY v2"), Troubleshooting all updated. - `.claude/CLAUDE.md` — mailserver monitoring + architecture paragraph updated with new external path description; references the new runbook. ## What is NOT in this change - Removal of `10.0.20.202` from `cloudflare_proxied_names` or any reserved-IP tracking — wasn't there to begin with. The `metallb-system default` IPAddressPool (10.0.20.200-220) shows 2 of 19 available after this, confirming `.202` went back to the pool. - Phase 4 NAT-flip rollback scripts — kept on-disk, still valid if someone re-introduces the MetalLB LB (see runbook "Rollback"). ## Test Plan ### Automated (verified pre-commit 2026-04-19) ``` # Service is ClusterIP with no EXTERNAL-IP $ kubectl get svc -n mailserver mailserver mailserver ClusterIP 10.103.108.217 <none> 25/TCP,465/TCP,587/TCP,993/TCP # 10.0.20.202 no longer answers ARP (ping from pfSense) $ ssh admin@10.0.20.1 'ping -c 2 -t 2 10.0.20.202' 2 packets transmitted, 0 packets received, 100.0% packet loss # MetalLB pool released the IP $ kubectl get ipaddresspool default -n metallb-system \ -o jsonpath='{.status.assignedIPv4} of {.status.availableIPv4}' 2 of 19 available # E2E probe — external Brevo → WAN:25 → pfSense HAProxy → pod — STILL SUCCEEDS $ kubectl create job --from=cronjob/email-roundtrip-monitor probe-phase6 -n mailserver ... Round-trip SUCCESS in 20.3s ... $ kubectl delete job probe-phase6 -n mailserver # pfSense mailserver alias removed $ ssh admin@10.0.20.1 'php -r "..." | grep mailserver' (no output) ``` ### Manual Verification 1. Visit `https://uptime.viktorbarzin.me` — monitors 282/283 green on new hostname `10.0.20.1`. 2. Roundcube login works (`https://mail.viktorbarzin.me/`). 3. Send test email to `smoke-test@viktorbarzin.me` from Gmail — observe `postfix/smtpd-proxy25/postscreen: CONNECT from [<Gmail-IP>]` in mailserver logs within ~10s. 4. CrowdSec should still see real client IPs in postfix/dovecot parsers (verify with `cscli alerts list` on next auth-fail event). ## Phase history (bd code-yiu) | Phase | Status | Description | |---|---|---| | 1a | ✅ `ef75c02f` | k8s alt :2525 listener + NodePort Service | | 2 | ✅ 2026-04-19 | pfSense HAProxy pkg installed | | 3 | ✅ `ba697b02` | HAProxy config persisted in pfSense XML | | 4+5 | ✅ `9806d515` | 4-port alt listeners + HAProxy frontends + NAT flip | | 6 | ✅ **this commit** | MetalLB LB retired; 10.0.20.202 released; docs updated | Closes: code-yiu
This commit is contained in:
parent
9806d515dd
commit
43fe11fffc
4 changed files with 160 additions and 98 deletions
|
|
@ -1,6 +1,6 @@
|
|||
# pfSense HAProxy for Mailserver — Runbook
|
||||
|
||||
Last updated: 2026-04-19
|
||||
Last updated: 2026-04-19 (Phase 6 complete)
|
||||
|
||||
## What & why
|
||||
|
||||
|
|
@ -9,92 +9,126 @@ CrowdSec + Postfix rate-limiting. MetalLB cannot inject PROXY-protocol
|
|||
headers (see [`mailserver-proxy-protocol.md`](./mailserver-proxy-protocol.md)),
|
||||
so pfSense runs a small HAProxy that:
|
||||
|
||||
1. Listens on the pfSense VIP,
|
||||
2. Forwards each connection to a k8s node's NodePort,
|
||||
3. Injects PROXY-v2 framing so Postfix/Dovecot see the original client IP,
|
||||
4. TCP health-checks every worker — any node can serve.
|
||||
1. Listens on the pfSense VLAN20 IP (`10.0.20.1`) on all 4 mail ports,
|
||||
2. Forwards each connection to a k8s node's NodePort with `send-proxy-v2`,
|
||||
3. Injects PROXY v2 framing so Postfix/Dovecot see the original client IP,
|
||||
4. TCP health-checks every k8s worker — any node can serve (ETP:Cluster).
|
||||
|
||||
Corresponding k8s-side setup lives in `stacks/mailserver/modules/mailserver/`:
|
||||
- ConfigMap `mailserver-user-patches` → `user-patches.sh` appends alt
|
||||
`master.cf` service on port 2525 with
|
||||
`postscreen_upstream_proxy_protocol=haproxy`.
|
||||
- Service `mailserver-proxy` → NodePort 30125 → targetPort 2525 →
|
||||
`externalTrafficPolicy: Cluster`.
|
||||
Corresponding k8s-side setup (`stacks/mailserver/modules/mailserver/`):
|
||||
|
||||
- ConfigMap `mailserver-user-patches` → `user-patches.sh` appends 3 alt
|
||||
`master.cf` services to Postfix:
|
||||
- `:2525` postscreen (alt :25) with `postscreen_upstream_proxy_protocol=haproxy`
|
||||
- `:4465` smtpd (alt :465 SMTPS) with `smtpd_upstream_proxy_protocol=haproxy`
|
||||
- `:5587` smtpd (alt :587 submission) with `smtpd_upstream_proxy_protocol=haproxy`
|
||||
- ConfigMap `mailserver.config` adds Dovecot `inet_listener imaps_proxy` on
|
||||
port 10993 with `haproxy = yes` and `haproxy_trusted_networks = 10.0.20.0/24`.
|
||||
- Service `mailserver-proxy` (NodePort, ETP:Cluster) with 4 NodePorts:
|
||||
- `port 25 → targetPort 2525 → nodePort 30125`
|
||||
- `port 465 → targetPort 4465 → nodePort 30126`
|
||||
- `port 587 → targetPort 5587 → nodePort 30127`
|
||||
- `port 993 → targetPort 10993 → nodePort 30128`
|
||||
- Service `mailserver` (ClusterIP) — unchanged stock ports 25/465/587/993
|
||||
for intra-cluster clients (Roundcube pod, `email-roundtrip-monitor`
|
||||
CronJob). These listeners are PROXY-free.
|
||||
|
||||
bd: `code-yiu`.
|
||||
|
||||
## Current state (Phase 3 — TEST PATH)
|
||||
## Steady-state architecture
|
||||
|
||||
```
|
||||
INTERNET
|
||||
│
|
||||
(unchanged — still MetalLB path)
|
||||
↓
|
||||
WAN:25/465/587/993 ─┐
|
||||
│ TEST PATH ↓
|
||||
│ (pfSense HAProxy, port 2525 only)
|
||||
pfSense NAT rdr │
|
||||
↓ ↓
|
||||
<mailserver> alias (= 10.0.20.202) HAProxy on pfSense (10.0.20.1:2525)
|
||||
↓ ↓
|
||||
MetalLB VIP 10.0.20.202 k8s-node:30125 (NodePort, ETP: Cluster)
|
||||
(ETP:Local, kube-proxy DNAT) ↓
|
||||
↓ kube-proxy (SNAT — IP lost here, recovered by PROXY-v2)
|
||||
mailserver pod ↓
|
||||
stock :25/:465/:587/:993 mailserver pod :2525 postscreen (PROXY-v2)
|
||||
External mail (WAN) path — PROXY v2
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Client (real IP) │
|
||||
│ │ SMTP/SMTPS/Sub/IMAPS │
|
||||
│ ▼ │
|
||||
│ pfSense WAN:{25,465,587,993} │
|
||||
│ │ NAT rdr → 10.0.20.1:{same} │
|
||||
│ ▼ │
|
||||
│ pfSense HAProxy (mode tcp, 4 frontends, 4 backend pools) │
|
||||
│ │ send-proxy-v2 + tcp-check inter 120000 │
|
||||
│ ▼ │
|
||||
│ k8s-node<1-4>:{30125..30128} ← any node (ETP:Cluster) │
|
||||
│ │ kube-proxy SNAT (source IP lost on the wire) │
|
||||
│ ▼ │
|
||||
│ mailserver pod :{2525,4465,5587,10993} │
|
||||
│ │ postscreen / smtpd / Dovecot parse PROXY v2 header │
|
||||
│ │ → real client IP recovered despite kube-proxy SNAT │
|
||||
│ ▼ │
|
||||
│ CrowdSec + Postfix / Dovecot see the true source IP ✓ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Intra-cluster path — no PROXY
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Roundcube pod / email-roundtrip-monitor CronJob │
|
||||
│ │ SMTP/IMAP │
|
||||
│ ▼ │
|
||||
│ mailserver.mailserver.svc.cluster.local:{25,465,587,993} │
|
||||
│ │ ClusterIP — bypasses LoadBalancer/NodePort layer entirely │
|
||||
│ ▼ │
|
||||
│ mailserver pod stock :{25,465,587,993} (PROXY-free) │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
Nothing production flips to HAProxy yet; all real traffic still uses the
|
||||
MetalLB LB IP path. To validate the HAProxy path:
|
||||
## Validation
|
||||
|
||||
```sh
|
||||
# From any k8s VLAN host:
|
||||
python3 -c "
|
||||
import socket; s=socket.socket(); s.connect(('10.0.20.1', 2525))
|
||||
print(s.recv(200).decode())
|
||||
s.send(b'EHLO testclient\r\n')
|
||||
print(s.recv(500).decode())
|
||||
s.send(b'QUIT\r\n'); s.close()"
|
||||
# All HAProxy frontends listening
|
||||
ssh admin@10.0.20.1 'sockstat -l | grep haproxy'
|
||||
# Expect: *:25, *:465, *:587, *:993, *:2525 (test port)
|
||||
|
||||
# Then check mailserver logs for CONNECT from [YOUR-IP]:
|
||||
kubectl logs -c docker-mailserver deployment/mailserver -n mailserver --tail=20 | grep smtpd-proxy
|
||||
# All backend pools healthy
|
||||
ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio" \
|
||||
| awk 'NR>1 {print $3, $4, $6}'
|
||||
# srv_op_state 2 = UP, 0 = DOWN
|
||||
|
||||
# Container listens on all 8 ports
|
||||
kubectl exec -n mailserver -c docker-mailserver deployment/mailserver -- \
|
||||
ss -ltn | grep -E ':(25|2525|465|4465|587|5587|993|10993)\b'
|
||||
|
||||
# pf rdr points at pfSense (10.0.20.1), not <mailserver> alias
|
||||
ssh admin@10.0.20.1 'pfctl -sn' | grep -E 'port = (25|submission|imaps|smtps)'
|
||||
|
||||
# E2E probe — Brevo → external MX :25 → IMAP fetch
|
||||
kubectl create job --from=cronjob/email-roundtrip-monitor probe-test -n mailserver
|
||||
kubectl wait --for=condition=complete --timeout=90s job/probe-test -n mailserver
|
||||
kubectl logs job/probe-test -n mailserver | grep SUCCESS
|
||||
kubectl delete job probe-test -n mailserver
|
||||
|
||||
# Real client IP in maillog post-delivery
|
||||
kubectl logs -c docker-mailserver deployment/mailserver -n mailserver \
|
||||
| grep 'smtpd-proxy25.*CONNECT from' | tail -5
|
||||
# Expect external source IPs (e.g., Brevo 77.32.148.x), NOT 10.0.20.x
|
||||
```
|
||||
|
||||
## Bootstrap / restore from scratch
|
||||
|
||||
Config lives in pfSense `/cf/conf/config.xml` under
|
||||
`<installedpackages><haproxy>`. Backed up nightly to
|
||||
pfSense HAProxy config lives in `/cf/conf/config.xml` under
|
||||
`<installedpackages><haproxy>`. That file is scp'd nightly to
|
||||
`/mnt/backup/pfsense/config-YYYYMMDD.xml` by `scripts/daily-backup.sh`, then
|
||||
Synology. To rebuild from source of truth (git):
|
||||
synced to Synology. To rebuild from source of truth (git):
|
||||
|
||||
```sh
|
||||
scp infra/scripts/pfsense-haproxy-bootstrap.php admin@10.0.20.1:/tmp/
|
||||
ssh admin@10.0.20.1 'php /tmp/pfsense-haproxy-bootstrap.php'
|
||||
```
|
||||
|
||||
The script is idempotent — re-runs reset the mailserver frontend + backend to
|
||||
the declared state.
|
||||
The script is idempotent — re-runs reset the mailserver frontends + backends
|
||||
to the declared state.
|
||||
|
||||
Expected output:
|
||||
```
|
||||
haproxy_check_and_run rc=OK
|
||||
messages: ...
|
||||
```
|
||||
|
||||
Verify:
|
||||
```sh
|
||||
ssh admin@10.0.20.1 "pgrep -lf haproxy; sockstat -l | grep ':2525'"
|
||||
# 64009 /usr/local/sbin/haproxy -f /var/etc/haproxy/haproxy.cfg ...
|
||||
# www haproxy 64009 5 tcp4 *:2525 *:*
|
||||
```
|
||||
|
||||
## Operations
|
||||
|
||||
### Change backend k8s node IPs
|
||||
### Change backend k8s node IPs / NodePorts
|
||||
|
||||
Edit `infra/scripts/pfsense-haproxy-bootstrap.php` → `foreach` array of
|
||||
`[name, address]`, re-run via the bootstrap command above. Don't hand-edit
|
||||
`/var/etc/haproxy/haproxy.cfg` — it is regenerated from XML on every apply.
|
||||
Edit `infra/scripts/pfsense-haproxy-bootstrap.php` — `$NODES` array + the
|
||||
`build_pool()` port arguments. Re-run the bootstrap command above. Don't
|
||||
hand-edit `/var/etc/haproxy/haproxy.cfg` — it is regenerated from XML on
|
||||
every apply.
|
||||
|
||||
### Check health of backends
|
||||
|
||||
|
|
@ -105,7 +139,7 @@ ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio
|
|||
|
||||
### View live HAProxy stats (WebUI)
|
||||
|
||||
`https://pfsense.viktorbarzin.me` → Services → HAProxy → Stats
|
||||
`https://pfsense.viktorbarzin.me` → Services → HAProxy → Stats.
|
||||
|
||||
### Reload after config.xml edit
|
||||
|
||||
|
|
@ -113,6 +147,22 @@ ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio
|
|||
ssh admin@10.0.20.1 'pfSsh.php playback svc restart haproxy'
|
||||
```
|
||||
|
||||
### Rollback (flip NAT back to MetalLB, post-Phase-6 only partial)
|
||||
|
||||
There is no Phase-6 rollback one-liner. Phase 6 removed the MetalLB
|
||||
LoadBalancer 10.0.20.202 entirely, so un-flipping NAT now would send
|
||||
traffic to a dead alias. To regress:
|
||||
|
||||
1. Re-add `metallb.io/loadBalancerIPs = "10.0.20.202"` + `type = "LoadBalancer"`
|
||||
+ `external_traffic_policy = "Local"` to `kubernetes_service.mailserver`,
|
||||
apply.
|
||||
2. Re-add the `mailserver` host alias in pfSense pointing at 10.0.20.202
|
||||
(Firewall → Aliases → Hosts).
|
||||
3. Run `infra/scripts/pfsense-nat-mailserver-haproxy-unflip.php` on pfSense.
|
||||
|
||||
For rollback of just the NAT (Phase 4) without touching the Service, only
|
||||
the third step is needed — but only meaningful BEFORE Phase 6.
|
||||
|
||||
### Restore from backup
|
||||
|
||||
pfSense config backup is a plain XML file:
|
||||
|
|
@ -124,27 +174,30 @@ pfSense config backup is a plain XML file:
|
|||
Full restore: pfSense WebUI → Diagnostics → Backup & Restore → Upload that
|
||||
`config.xml`. The `<installedpackages><haproxy>` section is included.
|
||||
|
||||
## Phase roadmap (bd code-yiu)
|
||||
## Phase history (bd code-yiu)
|
||||
|
||||
| Phase | Status | Description |
|
||||
|---|---|---|
|
||||
| 1a | ✅ done (commit `ef75c02f`) | k8s alt listener `:2525` + `mailserver-proxy` NodePort |
|
||||
| 2 | ✅ done (2026-04-19) | pfSense HAProxy installed + test config on `:2525` |
|
||||
| 3 | ✅ done (2026-04-19) | HAProxy config persisted to pfSense `config.xml` (this runbook + `pfsense-haproxy-bootstrap.php`) |
|
||||
| 4 | not yet | Flip pfSense NAT rdr for `:25` from `<mailserver>` alias → HAProxy VIP. Requires atomic cutover. |
|
||||
| 5 | not yet | Extend to ports 465/587/993: add alt container listeners (4465/5587/10993), add Dovecot `haproxy = yes` on extra inet_listener, expand HAProxy frontends, flip NAT. |
|
||||
| 6 | not yet | Observe 48h, decommission MetalLB LB path (downgrade mailserver Service from LoadBalancer to ClusterIP, free `10.0.20.202`). |
|
||||
| 1a | ✅ commit `ef75c02f` | k8s alt :2525 listener + NodePort Service |
|
||||
| 2 | ✅ 2026-04-19 | pfSense HAProxy pkg installed (`pfSense-pkg-haproxy-devel-0.63_2`, HAProxy 2.9-dev6) |
|
||||
| 3 | ✅ commit `ba697b02` | HAProxy config persisted in pfSense XML (bootstrap script + this runbook) |
|
||||
| 4+5| ✅ commit `9806d515` | 4-port alt listeners + HAProxy frontends for 25/465/587/993 + NAT flip |
|
||||
| 6 | ✅ this commit | Mailserver Service downgraded LoadBalancer → ClusterIP; `10.0.20.202` released back to MetalLB pool; orphan `mailserver` pfSense alias removed; monitors retargeted |
|
||||
|
||||
## Known warts
|
||||
|
||||
- HAProxy TCP health-check with `send-proxy-v2` + short `inter` floods
|
||||
postscreen with `getpeername: Transport endpoint not connected` warnings
|
||||
every check cycle. Mitigated with `inter 120000` (2 min). To reduce
|
||||
further, switch to `option smtpchk` — but that requires a separate
|
||||
non-PROXY health-check port on the pod (not done yet).
|
||||
- Frontend binds on all pfSense interfaces (`bind :2525`) rather than just
|
||||
`10.0.20.1:2525`. `<extaddr>` is set in XML but pfSense templates it as
|
||||
port-only. Low concern while port 2525 is a test port; tighten once
|
||||
promoted to real ports (25/465/587/993).
|
||||
- HAProxy TCP health-check with `send-proxy-v2` generates `getpeername:
|
||||
Transport endpoint not connected` warnings on postscreen every check cycle.
|
||||
Mitigated with `inter 120000` (2 min). To reduce further, switch to
|
||||
`option smtpchk` — but that requires a separate non-PROXY health-check
|
||||
port on the pod (not done yet).
|
||||
- Frontend binds on all pfSense interfaces (`bind :25` instead of
|
||||
`10.0.20.1:25`). `<extaddr>` is set in XML but pfSense templates it
|
||||
port-only. Low concern in practice because WAN firewall rules plus the
|
||||
NAT rdr gate external access; internal VLAN clients SHOULD be able to
|
||||
reach HAProxy on any pfSense-local IP.
|
||||
- k8s-node5 doesn't exist — cluster has master + 4 workers. Backend pool
|
||||
capped at 4 servers.
|
||||
- Postscreen still logs `improper command pipelining` for legitimate
|
||||
clients that send `EHLO\r\nQUIT\r\n` as a single TCP write. This is
|
||||
unchanged pre/post-migration — postscreen's anti-bot heuristic.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue