## Context (bd code-yiu) With Phase 4+5 proven (external mail flows through pfSense HAProxy + PROXY v2 to the alt PROXY-speaking container listeners), the MetalLB LoadBalancer Service + `10.0.20.202` external IP + ETP:Local policy are obsolete. Phase 6 decommissions them and documents the steady-state architecture. ## This change ### Terraform (stacks/mailserver/modules/mailserver/main.tf) - `kubernetes_service.mailserver` downgraded: `LoadBalancer` → `ClusterIP`. - Removed `metallb.io/loadBalancerIPs = "10.0.20.202"` annotation. - Removed `external_traffic_policy = "Local"` (irrelevant for ClusterIP). - Port set unchanged — the Service still exposes 25/465/587/993 for intra-cluster clients (Roundcube pod, `email-roundtrip-monitor` CronJob) that hit the stock PROXY-free container listeners. - Inline comment documents the downgrade rationale + companion `mailserver-proxy` NodePort Service that now carries external traffic. ### pfSense (ops, not in git) - `mailserver` host alias (pointing at `10.0.20.202`) deleted. No NAT rule references it post-Phase-4; keeping it would be misleading dead metadata. Reversible via WebUI + `php /tmp/delete-mailserver-alias.php` companion script (ad-hoc, not checked in — alias is just a Firewall → Aliases → Hosts entry). ### Uptime Kuma (ops) - Monitors `282` and `283` (PORT checks) retargeted from `10.0.20.202` → `10.0.20.1`. Renamed to `Mailserver HAProxy SMTP (pfSense :25)` / `... IMAPS (pfSense :993)` to reflect their new purpose (HAProxy layer liveness). History retained (edit, not delete-recreate). ### Docs - `docs/runbooks/mailserver-pfsense-haproxy.md` — fully rewritten "Current state" section; now reflects steady-state architecture with two-path diagram (external via HAProxy / intra-cluster via ClusterIP). Phase history table marks Phase 6 ✅. Rollback section updated (no one-liner post-Phase-6; need Service-type re-upgrade + alias re-add). - `docs/architecture/mailserver.md` — Overview, Mermaid diagram, Inbound flow, CrowdSec section, Uptime Kuma monitors list, Decisions section (dedicated MetalLB IP → "Client-IP Preservation via HAProxy + PROXY v2"), Troubleshooting all updated. - `.claude/CLAUDE.md` — mailserver monitoring + architecture paragraph updated with new external path description; references the new runbook. ## What is NOT in this change - Removal of `10.0.20.202` from `cloudflare_proxied_names` or any reserved-IP tracking — wasn't there to begin with. The `metallb-system default` IPAddressPool (10.0.20.200-220) shows 2 of 19 available after this, confirming `.202` went back to the pool. - Phase 4 NAT-flip rollback scripts — kept on-disk, still valid if someone re-introduces the MetalLB LB (see runbook "Rollback"). ## Test Plan ### Automated (verified pre-commit 2026-04-19) ``` # Service is ClusterIP with no EXTERNAL-IP $ kubectl get svc -n mailserver mailserver mailserver ClusterIP 10.103.108.217 <none> 25/TCP,465/TCP,587/TCP,993/TCP # 10.0.20.202 no longer answers ARP (ping from pfSense) $ ssh admin@10.0.20.1 'ping -c 2 -t 2 10.0.20.202' 2 packets transmitted, 0 packets received, 100.0% packet loss # MetalLB pool released the IP $ kubectl get ipaddresspool default -n metallb-system \ -o jsonpath='{.status.assignedIPv4} of {.status.availableIPv4}' 2 of 19 available # E2E probe — external Brevo → WAN:25 → pfSense HAProxy → pod — STILL SUCCEEDS $ kubectl create job --from=cronjob/email-roundtrip-monitor probe-phase6 -n mailserver ... Round-trip SUCCESS in 20.3s ... $ kubectl delete job probe-phase6 -n mailserver # pfSense mailserver alias removed $ ssh admin@10.0.20.1 'php -r "..." | grep mailserver' (no output) ``` ### Manual Verification 1. Visit `https://uptime.viktorbarzin.me` — monitors 282/283 green on new hostname `10.0.20.1`. 2. Roundcube login works (`https://mail.viktorbarzin.me/`). 3. Send test email to `smoke-test@viktorbarzin.me` from Gmail — observe `postfix/smtpd-proxy25/postscreen: CONNECT from [<Gmail-IP>]` in mailserver logs within ~10s. 4. CrowdSec should still see real client IPs in postfix/dovecot parsers (verify with `cscli alerts list` on next auth-fail event). ## Phase history (bd code-yiu) | Phase | Status | Description | |---|---|---| | 1a | ✅ `ef75c02f` | k8s alt :2525 listener + NodePort Service | | 2 | ✅ 2026-04-19 | pfSense HAProxy pkg installed | | 3 | ✅ `ba697b02` | HAProxy config persisted in pfSense XML | | 4+5 | ✅ `9806d515` | 4-port alt listeners + HAProxy frontends + NAT flip | | 6 | ✅ **this commit** | MetalLB LB retired; 10.0.20.202 released; docs updated | Closes: code-yiu
15 KiB
Mail Server Architecture
Last updated: 2026-04-19 (code-yiu Phase 6: MetalLB LB retired; traffic now enters via pfSense HAProxy with PROXY v2)
Overview
Self-hosted email for viktorbarzin.me using docker-mailserver 15.0.0 on Kubernetes. Inbound mail arrives directly via MX record to the home IP on port 25. Outbound mail relays through Brevo EU (smtp-relay.brevo.com:587 — migrated from Mailgun on 2026-04-12; SPF record cut over on 2026-04-18). Roundcubemail provides webmail access. CrowdSec protects SMTP/IMAP from brute-force attacks using real client IPs: pfSense HAProxy injects the PROXY v2 header on each backend connection so the mailserver pod sees the true source IP despite kube-proxy SNAT. See runbooks/mailserver-pfsense-haproxy.md for ops details.
Architecture Diagram
graph TB
subgraph "Inbound Mail"
SENDER[Sending MTA] -->|MX lookup| MX[mail.viktorbarzin.me:25]
MX -->|176.12.22.76:25| PF[pfSense NAT rdr]
PF -->|10.0.20.1:25| HAP[pfSense HAProxy<br/>send-proxy-v2]
HAP -->|k8s-node:30125| KP[kube-proxy<br/>ETP: Cluster]
KP -->|pod:2525 PROXY v2| POSTFIX[Postfix MTA<br/>postscreen]
end
subgraph "Mail Processing"
POSTFIX --> RSPAMD[Rspamd<br/>Spam/DKIM/DMARC]
RSPAMD --> DOVECOT[Dovecot IMAP]
DOVECOT --> MAILBOX[(Mailboxes<br/>proxmox-lvm PVC)]
end
subgraph "Outbound Mail"
POSTFIX_OUT[Postfix] -->|SASL + TLS| MAILGUN[Brevo EU Relay<br/>smtp-relay.brevo.com:587]
MAILGUN --> RECIPIENT[Recipient]
end
subgraph "Webmail"
USER[User] -->|HTTPS| TRAEFIK[Traefik Ingress]
TRAEFIK --> RC[Roundcubemail]
RC -->|IMAP 993| DOVECOT
RC -->|SMTP 587| POSTFIX_OUT
end
subgraph "Security"
POSTFIX -->|Real client IPs<br/>from PROXY v2 header| CS_AGENT[CrowdSec Agent<br/>postfix + dovecot parsers]
CS_AGENT --> CS_LAPI[CrowdSec LAPI]
end
subgraph "Monitoring"
PROBE[E2E Roundtrip Probe<br/>CronJob every 20m] -->|Mailgun API| SENDER
PROBE -->|IMAP check| DOVECOT
PROBE --> PUSH[Pushgateway + Uptime Kuma]
DEXP[Dovecot Exporter<br/>:9166] --> PROM[Prometheus]
end
Components
| Component | Version | Location | Purpose |
|---|---|---|---|
| docker-mailserver | 15.0.0 | mailserver namespace |
Postfix MTA + Dovecot IMAP + Rspamd |
| Roundcubemail | 1.6.13-apache | mailserver namespace |
Webmail UI (MySQL-backed) |
| Dovecot Exporter | latest | Sidecar in mailserver pod | Prometheus metrics (port 9166) |
| Rspamd | Built into docker-mailserver | — | Spam filtering, DKIM signing, DMARC verification |
| Brevo EU (ex-Sendinblue) | SaaS | — | Outbound SMTP relay (300/day free) |
Mail Flow
Inbound
Internet → MX: mail.viktorbarzin.me (priority 1)
→ A record: 176.12.22.76 (non-proxied Cloudflare DNS-only)
→ pfSense NAT rdr: WAN:{25,465,587,993} → 10.0.20.1:{same}
→ pfSense HAProxy (TCP mode, send-proxy-v2 on backend)
→ k8s-node:{30125..30128} NodePort (mailserver-proxy, ETP: Cluster)
→ kube-proxy → pod alt listener (2525/4465/5587/10993)
→ Postfix postscreen / smtpd / Dovecot parses PROXY v2 header
→ Rspamd (spam + DKIM + DMARC) → Dovecot → mailbox
No backup MX. If the server is down, sender MTAs queue and retry for 4-5 days per SMTP standards (RFC 5321).
Outbound
Postfix → relayhost [smtp-relay.brevo.com]:587 (SASL auth + TLS required)
→ Brevo handles IP reputation, deliverability, bounce processing
→ 300 emails/day free tier (migrated from Mailgun 100/day on 2026-04-12)
Webmail
https://mail.viktorbarzin.me → Traefik → Roundcubemail
IMAP: ssl://mailserver:993 (internal K8s service)
SMTP: tls://mailserver:587 (internal K8s service)
DB: MySQL (mysql.dbaas.svc.cluster.local)
DNS Records
All managed in Terraform at stacks/cloudflared/modules/cloudflared/cloudflare.tf.
| Type | Name | Value | Purpose |
|---|---|---|---|
| MX | viktorbarzin.me |
mail.viktorbarzin.me (pri 1) |
Inbound mail routing |
| A | mail.viktorbarzin.me |
176.12.22.76 (non-proxied) |
Mail server IP |
| AAAA | mail.viktorbarzin.me |
2001:470:6e:43d::2 |
IPv6 (HE tunnel) |
| TXT (SPF) | viktorbarzin.me |
v=spf1 include:spf.brevo.com ~all |
Authorize Brevo for outbound (soft-fail during cutover; was include:mailgun.org -all until 2026-04-18 Brevo migration) |
| TXT (DKIM) | s1._domainkey |
RSA 1024-bit key | Mailgun DKIM (roundtrip probe only — inbound testing still uses Mailgun API) |
| TXT (DKIM) | mail._domainkey |
RSA 2048-bit key | Rspamd self-hosted DKIM signing |
| CNAME (DKIM) | brevo1._domainkey |
b1.viktorbarzin-me.dkim.brevo.com | Brevo outbound DKIM (delegated) |
| CNAME (DKIM) | brevo2._domainkey |
b2.viktorbarzin-me.dkim.brevo.com | Brevo outbound DKIM (delegated) |
| TXT | viktorbarzin.me |
brevo-code:a6ef1dd9... |
Brevo domain verification |
| TXT (DMARC) | _dmarc |
p=quarantine; pct=100; rua=mailto:dmarc@viktorbarzin.me |
DMARC enforcement; aggregate reports land in-domain at dmarc@viktorbarzin.me (tracked under code-569; current live record still points at e21c0ff8@dmarc.mailgun.org pending cutover) |
| TXT (MTA-STS) | _mta-sts |
v=STSv1; id=20260412 |
TLS enforcement for inbound |
| TXT (TLSRPT) | _smtp._tls |
v=TLSRPTv1; rua=mailto:postmaster@... |
TLS failure reporting |
Known Limitation: PTR Mismatch
Reverse DNS for 176.12.22.76 returns 176-12-22-76.pon.spectrumnet.bg. (ISP-assigned) instead of mail.viktorbarzin.me. This is ISP-controlled and cannot be changed on a residential connection. Most modern providers (Gmail, Outlook) rely on SPF/DKIM/DMARC rather than PTR, so impact is minimal.
Security
CrowdSec Integration
- Collections:
crowdsecurity/postfix+crowdsecurity/dovecot(installed) - Log acquisition: CrowdSec agents parse mailserver pod logs for brute-force patterns
- Real client IPs: pfSense HAProxy injects PROXY v2 header on each backend connection; Postfix (
postscreen_upstream_proxy_protocol=haproxy/smtpd_upstream_proxy_protocol=haproxyon alt ports) + Dovecot (haproxy = yeson alt IMAPS listener) parse it to recover the true source IP despite kube-proxy SNAT. Replaces the pre-2026-04-19 MetalLB10.0.20.202ETP:Local scheme (see code-yiu) - Decisions: CrowdSec bans/challenges attackers via firewall bouncer rules
Fail2ban Disabled (CrowdSec is the Policy)
docker-mailserver ships Fail2ban, but it is explicitly disabled here: ENABLE_FAIL2BAN = "0" at stacks/mailserver/modules/mailserver/main.tf:68. CrowdSec is the cluster-wide bouncer for SSH, HTTP, and SMTP/IMAP brute-force defence — it already parses the postfix and dovecot log streams via the collections listed above and applies decisions at the LB/firewall layer. Enabling Fail2ban in-pod would create a duplicate response path (two systems racing to ban the same IP from different enforcement points), add iptables churn inside the container, and fragment the audit trail across two decision stores. Decision (2026-04-18): keep it disabled; CrowdSec owns this policy.
Rspamd
- Spam filtering with phishing detection and Oletools
- DKIM signing (selector
mail, 2048-bit RSA) - DMARC verification on inbound mail
- Auto-learns from Junk folder movements (
RSPAMD_LEARN=1) - SRS (Sender Rewriting Scheme) enabled for forwarded mail
Postfix Rate Limiting
smtpd_client_connection_rate_limit = 10 # per minute per client
smtpd_client_message_rate_limit = 30 # per minute per client
anvil_rate_time_unit = 60s
TLS
- Wildcard Let's Encrypt cert (
*.viktorbarzin.me) for SMTP STARTTLS and IMAPS - Renewed via Woodpecker CI cron pipeline (DNS-01 challenge via Cloudflare)
- MTA-STS enforces TLS for inbound delivery
Monitoring
E2E Roundtrip Probe
CronJob email-roundtrip-monitor (every 10 min):
- Sends test email via Mailgun HTTP API to
smoke-test@viktorbarzin.me - Email hits MX → Postfix → catch-all delivers to
spam@mailbox - Verifies delivery via IMAP (searches by UUID marker)
- Deletes test email, pushes metrics to Pushgateway + Uptime Kuma
Prometheus Alerts
| Alert | Threshold | Severity |
|---|---|---|
| MailServerDown | No replicas for 5m | warning |
| EmailRoundtripFailing | Probe failing for 30m | warning |
| EmailRoundtripStale | No success in >40m | warning |
| EmailRoundtripNeverRun | Metric absent for 40m | warning |
Uptime Kuma Monitors
- TCP SMTP on
176.12.22.76:25— full external path (DNS → WAN → pfSense HAProxy → mailserver) - TCP
mailserver.svc:{587,993}— intra-cluster ClusterIP path - TCP
10.0.20.1:{25,993}— pfSense HAProxy health (post code-yiu Phase 6) - E2E Push monitor (receives push from
email-roundtrip-monitorprobe)
Dovecot Exporter
- Sidecar container in mailserver pod, port 9166
- Scraped by Prometheus for IMAP connection metrics
Terraform
| Stack | Path | Resources |
|---|---|---|
| Mailserver | stacks/mailserver/ |
Namespace, deployment, service, CronJob, PVCs |
| DNS | stacks/cloudflared/modules/cloudflared/cloudflare.tf |
MX, SPF, DKIM, DMARC, MTA-STS, TLSRPT records |
| Monitoring | stacks/monitoring/ |
Prometheus alert rules |
| CrowdSec | stacks/crowdsec/ |
Collections, log acquisition (already configured) |
Secrets (Vault)
| Path | Key | Purpose |
|---|---|---|
secret/platform |
mailserver_accounts |
User credentials (JSON) |
secret/platform |
mailserver_aliases |
Postfix virtual aliases |
secret/platform |
mailserver_opendkim_key |
DKIM private key |
secret/platform |
mailserver_sasl_passwd |
Brevo relay credentials ([smtp-relay.brevo.com]:587 <login>:<key>) |
secret/viktor |
mailgun_api_key |
Mailgun API for E2E roundtrip probe (retained for inbound delivery testing only; not used for user mail) |
secret/viktor |
brevo_api_key |
Brevo API key (stored for reference) |
Storage
| PVC | Size | Storage Class | Purpose |
|---|---|---|---|
mailserver-data-proxmox |
2Gi (auto-resize 5Gi) | proxmox-lvm | Mail data, state, logs |
roundcubemail-html-proxmox |
1Gi | proxmox-lvm | Roundcube web files |
roundcubemail-enigma-proxmox |
1Gi | proxmox-lvm | Roundcube encryption |
Decisions & Rationale
No Backup MX
- Alternatives considered: ForwardEmail (free relay), Cloudflare Email Routing, Dynu Store/Forward
- Decision: Direct MX only. ForwardEmail relay was evaluated (2026-04-12) and abandoned — its anti-spoofing enforcement rejects legitimate forwarded mail regardless of SPF configuration. Cloudflare Email Routing can't store-and-forward (pass-through proxy only). Dynu ($9.99/yr) is a viable future option.
- Tradeoff: If server is down, mail delivery relies on sender MTA retry queues (4-5 days standard). No immediate forwarding to a backup address.
Brevo for Outbound (migrated from Mailgun 2026-04-12)
- Decision: All outbound relays through Brevo EU (ex-Sendinblue). 300 emails/day free tier (3x Mailgun's 100/day).
- Why migrated: Mailgun's 100/day limit was too tight — the E2E probe uses ~72/day, leaving only 28 for real mail.
- DKIM: Brevo uses delegated DKIM via CNAME (
brevo1._domainkey,brevo2._domainkey). Mailgun'ss1._domainkeyretained for the roundtrip probe (still uses Mailgun API for inbound testing). - Tradeoff: Dependency on Brevo SaaS for outbound.
Rspamd over SpamAssassin/OpenDKIM
- Decision: Rspamd replaces both SpamAssassin and OpenDKIM in a single component
- Tradeoff: Higher memory usage (~150-200MB) but simpler stack
Client-IP Preservation (pfSense HAProxy + PROXY v2)
- Current (2026-04-19, bd code-yiu): pfSense HAProxy listens on
10.0.20.1:{25,465,587,993}, forwards to k8s NodePort 30125-30128 withsend-proxy-v2on each backend connection. The mailserver pod exposes parallel listeners (2525/4465/5587/10993) that REQUIRE the PROXY v2 header, while the stock ports 25/465/587/993 stay PROXY-free for intra-cluster traffic (Roundcube, probe). The mailserver Service is ClusterIP-only; ETP is no longer a concern for external traffic. - Historical (2026-04-12 → 2026-04-19): Dedicated MetalLB IP
10.0.20.202withexternalTrafficPolicy: Local— required pod/speaker colocation; kube-proxy preserved client IP only when pod was on the same node as the advertising speaker. - Why switched: ETP:Local made the mailserver's single replica drop inbound mail silently during pod reschedule (30-60s GARP flip). HAProxy with
send-proxy-v2lets the pod reschedule to any node and recover IP-preservation through the header. - Tradeoff: pfSense now runs HAProxy (one more service in the firewall's responsibility); alt container ports + extra Service are ~80 lines of Terraform. The win is HA without IP-preservation compromise.
- Runbook:
runbooks/mailserver-pfsense-haproxy.md.
Troubleshooting
Inbound mail not arriving
- Check MX:
dig MX viktorbarzin.me +short→ should showmail.viktorbarzin.me - Check port 25:
nc -zw5 mail.viktorbarzin.me 25 - Check pfSense NAT rule: port 25 →
10.0.20.1:25(pfSense HAProxy VIP, post code-yiu Phase 4) - Check Postfix logs:
kubectl logs -n mailserver deploy/mailserver -c docker-mailserver | grep -E 'from=|reject' - Check if CrowdSec is blocking the sender:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list
Outbound mail failing
- Check Brevo relay:
kubectl logs -n mailserver deploy/mailserver -c docker-mailserver | grep relay— should showrelay=smtp-relay.brevo.com - Check SASL credentials:
vault kv get -field=mailserver_sasl_passwd secret/platform— should show[smtp-relay.brevo.com]:587 - Check Brevo dashboard for delivery status
- SASL auth failure → verify SMTP key (xsmtpsib-...) and login (a7e778001@smtp-brevo.com)
E2E roundtrip probe failing
- Check CronJob:
kubectl get cronjob -n mailserver email-roundtrip-monitor - Check job logs:
kubectl logs -n mailserver -l job-name --tail=20 - Check Mailgun rate limit (HTTP 429 errors mean too many API calls)
- Check IMAP login: verify
spam@viktorbarzin.mepassword in Vault (secret/platform→mailserver_accounts)
Spam/brute-force attacks
- Check CrowdSec decisions:
kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list - Check Postfix logs for auth failures:
kubectl logs -n mailserver deploy/mailserver -c docker-mailserver | grep 'authentication failed' - Verify real client IPs in logs (not 10.0.20.x node IPs)
Related
- Monitoring Architecture — alert definitions, Uptime Kuma
- Networking Architecture — MetalLB, pfSense NAT, Cloudflare DNS
- Security Architecture — CrowdSec deployment
- Secrets Management — Vault paths for mail credentials
- Mailserver Hardening Plan — historical