infra/stacks/mailserver/modules/mailserver/variables.tf

# this is appended and merged to the main postfix.cf
# see defaults - https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf
variable "postfix_cf" {
  default = <<EOT
relayhost = [smtp-relay.brevo.com]:587
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl/passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous
smtp_tls_security_level = encrypt
smtpd_tls_cert_file=/tmp/ssl/tls.crt
smtpd_tls_key_file=/tmp/ssl/tls.key
smtpd_use_tls=yes
# Require STARTTLS before any AUTH command on the SMTPD listener.
# Without this, a misconfigured client that skips STARTTLS would send
# PLAIN/LOGIN creds in the clear. docker-mailserver's default does NOT
# enforce this at the main.cf level for submission (587).
# Note: smtpd_sasl_auth_only (sometimes cited) is NOT a real Postfix
# parameter — only smtpd_tls_auth_only is. Addresses code-vnw.
smtpd_tls_auth_only = yes
header_size_limit = 4096000

# Debug mail tls
smtpd_tls_loglevel = 1
#smtpd_tls_ciphers = TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:!aNULL:!SEED:!CAMELLIA:!RSA+AES:!SHA1
#tls_medium_cipherlist = ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:!aNULL:!SEED:!CAMELLIA:!RSA+AES:!SHA1

# Rate limiting (brute-force protection)
smtpd_client_connection_rate_limit = 10
smtpd_client_message_rate_limit = 30
anvil_rate_time_unit = 60s

# Disable the postscreen decision cache. The default (btree) driver
# requires an exclusive file lock for every access, and with postscreen
# re-spawning per connection (master.cf: maxproc=1) that produces thousands
# of 'unable to get exclusive lock' fatals per day — stalling SMTP
# acceptance and starving inbound delivery. lmdb would avoid the lock but
# isn't compiled into docker-mailserver 15.0.0's Postfix build
# (postconf -m → no lmdb). Proxy:btree is unsafe because postscreen does
# its own locking. An empty value disables the cache entirely — legitimate
# clients pay the greet/bare-newline re-check on every new TCP session,
# which is trivial at our volume (~100 deliveries/day).
postscreen_cache_map =
EOT
}
extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] Phase 2 of platform stack split. 5 more modules extracted into independent stacks. All applied successfully with zero destroys. Cloudflared now reads k8s_users from Vault directly to compute user_domains. Woodpecker pipeline runs all 8 extracted stacks in parallel. Memory bumped to 6Gi for 9 concurrent TF processes. Platform reduced from 27 to 19 modules. 2026-03-17 21:34:11 +00:00			`# this is appended and merged to the main postfix.cf`
			`# see defaults - https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf`
			`variable "postfix_cf" {`
			`default = <<EOT`
mailserver: overhaul inbound delivery, monitoring, CrowdSec, and migrate to Brevo relay Inbound: - Direct MX to mail.viktorbarzin.me (ForwardEmail relay attempted and abandoned) - Dedicated MetalLB IP 10.0.20.202 with ETP: Local for CrowdSec real-IP detection - Removed Cloudflare Email Routing (can't store-and-forward) - Fixed dual SPF violation, hardened to -all - Added MTA-STS, TLSRPT, imported Rspamd DKIM into Terraform - Removed dead BIND zones from config.tfvars (199 lines) Outbound: - Migrated from Mailgun (100/day) to Brevo (300/day free) - Added Brevo DKIM CNAMEs and verification TXT Monitoring: - Probe frequency: 30m → 20m, alert thresholds adjusted to 60m - Enabled Dovecot exporter scraping (port 9166) - Added external SMTP monitor on public IP Documentation: - New docs/architecture/mailserver.md with full architecture - New docs/architecture/mailserver-visual.html visualization - Updated monitoring.md, CLAUDE.md, historical plan docs 2026-04-12 22:24:38 +01:00			`relayhost = [smtp-relay.brevo.com]:587`
extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] Phase 2 of platform stack split. 5 more modules extracted into independent stacks. All applied successfully with zero destroys. Cloudflared now reads k8s_users from Vault directly to compute user_domains. Woodpecker pipeline runs all 8 extracted stacks in parallel. Memory bumped to 6Gi for 9 concurrent TF processes. Platform reduced from 27 to 19 modules. 2026-03-17 21:34:11 +00:00			`smtp_sasl_auth_enable = yes`
			`smtp_sasl_password_maps = hash:/etc/postfix/sasl/passwd`
			`smtp_sasl_security_options = noanonymous`
			`smtp_sasl_tls_security_options = noanonymous`
			`smtp_tls_security_level = encrypt`
			`smtpd_tls_cert_file=/tmp/ssl/tls.crt`
			`smtpd_tls_key_file=/tmp/ssl/tls.key`
			`smtpd_use_tls=yes`
[mailserver] Require STARTTLS before AUTH on submission [ci skip] ## Context docker-mailserver 15.0.0's default Postfix config does NOT set `smtpd_tls_auth_only = yes`. Clients that skip STARTTLS on port 587 (or 25 with AUTH) can send PLAIN/LOGIN creds in cleartext. CrowdSec and rate limiting don't catch this — it's an auth-path leak, not a bruteforce. Addresses bd code-vnw. ## This change Adds `smtpd_tls_auth_only = yes` to `postfix_cf` (applied via the `postfix-main.cf` ConfigMap key consumed by docker-mailserver). Rolled the pod to pick up the new ConfigMap. ### Deviation from task spec code-vnw's fix field cited `smtpd_sasl_auth_only = yes`. That is NOT a real Postfix parameter — attempting it gets `postconf: warning: smtpd_sasl_auth_only: unknown parameter`. The acceptance test (reject PLAIN auth before STARTTLS) is satisfied by `smtpd_tls_auth_only`, which is the correct knob. Added an inline comment noting the common confusion. ## What is NOT in this change - Per-service override in master.cf (smtpd_tls_auth_only applied globally, which is safe because port 25 doesn't accept AUTH here) - Other Postfix hardening (sender_restrictions, etc.) ## Test Plan ### Automated ``` $ kubectl exec -n mailserver -c docker-mailserver deployment/mailserver -- \ postconf smtpd_tls_auth_only smtpd_tls_auth_only = yes $ kubectl rollout status deployment/mailserver -n mailserver deployment "mailserver" successfully rolled out ``` ### Manual Verification 1. `openssl s_client -connect mail.viktorbarzin.me:587 -starttls smtp` 2. At prompt, send `AUTH PLAIN <base64>` BEFORE `STARTTLS` 3. Expected: Postfix rejects with `503 5.5.1 Error: authentication not enabled` 4. Follow-up: STARTTLS first, then `AUTH PLAIN <base64>` — succeeds for valid creds ## Reproduce locally 1. From a shell with `kubectl` access to the cluster: 2. `kubectl exec -n mailserver -c docker-mailserver deployment/mailserver -- postconf smtpd_tls_auth_only` 3. Expected: `smtpd_tls_auth_only = yes` Closes: code-vnw Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-04-19 10:31:15 +00:00			`# Require STARTTLS before any AUTH command on the SMTPD listener.`
			`# Without this, a misconfigured client that skips STARTTLS would send`
			`# PLAIN/LOGIN creds in the clear. docker-mailserver's default does NOT`
			`# enforce this at the main.cf level for submission (587).`
			`# Note: smtpd_sasl_auth_only (sometimes cited) is NOT a real Postfix`
			`# parameter — only smtpd_tls_auth_only is. Addresses code-vnw.`
			`smtpd_tls_auth_only = yes`
extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] Phase 2 of platform stack split. 5 more modules extracted into independent stacks. All applied successfully with zero destroys. Cloudflared now reads k8s_users from Vault directly to compute user_domains. Woodpecker pipeline runs all 8 extracted stacks in parallel. Memory bumped to 6Gi for 9 concurrent TF processes. Platform reduced from 27 to 19 modules. 2026-03-17 21:34:11 +00:00			`header_size_limit = 4096000`

			`# Debug mail tls`
			`smtpd_tls_loglevel = 1`
			`#smtpd_tls_ciphers = TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:!aNULL:!SEED:!CAMELLIA:!RSA+AES:!SHA1`
			`#tls_medium_cipherlist = ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES128-SHA256:DHE-RSA-AES256-SHA256:!aNULL:!SEED:!CAMELLIA:!RSA+AES:!SHA1`

			`# Rate limiting (brute-force protection)`
			`smtpd_client_connection_rate_limit = 10`
			`smtpd_client_message_rate_limit = 30`
			`anvil_rate_time_unit = 60s`
[mailserver] Disable postscreen btree cache to stop SMTP lock-contention stalls ## Context Postfix inside docker-mailserver was spamming fatal errors at roughly 1 per minute — 5,464 of them in a 24h window — all of the same shape: ``` postfix/postscreen[NNN]: fatal: btree:/var/lib/postfix/postscreen_cache: unable to get exclusive lock: Resource temporarily unavailable ``` Every time one of these fires, the postscreen process dies mid-connection and the inbound SMTP session is dropped. Legitimate mail (including Brevo deliveries for our e2e email-roundtrip probe) gets re-queued by the sender and arrives late — frequently past the probe's 180s IMAP polling window, producing a 35%/7d probe success rate and the EmailRoundtripStale alert noise that was originally flagged as "probably nothing." ## Root cause `master.cf` declares postscreen with `maxproc=1`, but postscreen still re-spawns per incoming connection (or for short-lived reopens), and each instance opens the shared btree cache with an exclusive file lock. Under any concurrency (two TCP SYNs arriving close together, or a retry during teardown), the second process hits EWOULDBLOCK on fcntl and Postfix treats that as fatal. Three options were considered: \| Option \| Verdict \| \|--------\|---------\| \| (a) Disable cache (postscreen_cache_map = ) \| ✓ chosen \| \| (b) Switch btree → lmdb \| ✗ lmdb not compiled into docker-mailserver 15.0.0's postfix (`postconf -m` has no lmdb) \| \| (c) proxy:btree via proxymap \| ✗ unsafe — Postfix docs: "postscreen does its own locking, not safe via proxymap" \| \| (d) Memcached sidecar \| ✗ new moving part; deferred \| Option (a) is a small trade-off: legitimate clients re-run the greet-action / bare-newline-action checks on every fresh TCP session instead of hitting the 7-day whitelist cache. At our volume (~100 deliveries/day, ~72 of which are the probe itself) that's negligible CPU. DNSBL re-evaluation is also avoided only partially, but this mailserver already has `postscreen_dnsbl_action = ignore` so the cache's DNSBL role was doing nothing anyway. ## This change Appends a stanza to the user-merged postfix main.cf stored in `variable.postfix_cf` that sets `postscreen_cache_map =` (empty value). Postfix treats an empty cache_map as "no persistent cache" — per-session decisions are still enforced, they just aren't cached across sessions. Before: ``` smtpd ──► postscreen (maxproc=1, btree cache with exclusive lock) ├─ concurrent access → fcntl EWOULDBLOCK → fatal └─ connection dropped, sender retries, mail arrives late ``` After: ``` smtpd ──► postscreen (no cache, per-session checks only) └─ no shared file, no lock → no fatal, no dropped session ``` No change to master.cf (postscreen still the front-end), no change to DNSBL / greet / bare-newline policy. ## What is NOT in this change - Dovecot userdb dedup (shipped in the previous commit). - Email-roundtrip probe widening (next commit). - Rebuilding docker-mailserver image with lmdb support (deferred — disabling the cache is simpler and sufficient at our volume). ## Test Plan ### Automated `postconf -m` in the running container to confirm lmdb is genuinely absent (ruling out option (b) before we commit to (a)): ``` btree cidr environ fail hash inline internal ldap memcache nis pcre pipemap proxy randmap regexp socketmap static tcp texthash unionmap unix ``` No lmdb entry — confirmed. `scripts/tg plan -target=module.mailserver.kubernetes_config_map.mailserver_config`: ``` ~ "postfix-main.cf" = <<-EOT + postscreen_cache_map = ``` `scripts/tg apply`: ``` Apply complete! Resources: 0 added, 1 changed, 0 destroyed. ``` Reloader triggers pod rollout — baseline error count before apply was 34 `unable to get exclusive lock` lines per `--tail=500` log window. ### Manual Verification Post-rollout, when the new pod is Ready: 1. `kubectl -n mailserver exec <pod> -c docker-mailserver -- postconf postscreen_cache_map` Expect: empty (no value) 2. Watch for 15 min: `kubectl -n mailserver logs -l app=mailserver -c docker-mailserver --tail=1000 \| grep -c "unable to get exclusive lock"` Expect: 0 new occurrences (any hits are from before the rollout). 3. Trigger a probe run manually: `kubectl -n mailserver create job --from=cronjob/email-roundtrip-monitor probe-verify-$(date +%s)` then `kubectl -n mailserver logs job/probe-verify-...` Expect: `Round-trip SUCCESS` with duration < 120s. ## Reproduce locally 1. `kubectl -n mailserver exec <pod> -c docker-mailserver -- postconf postscreen_cache_map` 2. Expect: `postscreen_cache_map =` (empty value) 3. `kubectl -n mailserver logs -l app=mailserver -c docker-mailserver --since=15m \| grep -c "unable to get exclusive lock"` 4. Expect: 0 Closes: code-1dc Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> 2026-04-18 21:32:48 +00:00
			`# Disable the postscreen decision cache. The default (btree) driver`
			`# requires an exclusive file lock for every access, and with postscreen`
			`# re-spawning per connection (master.cf: maxproc=1) that produces thousands`
			`# of 'unable to get exclusive lock' fatals per day — stalling SMTP`
			`# acceptance and starving inbound delivery. lmdb would avoid the lock but`
			`# isn't compiled into docker-mailserver 15.0.0's Postfix build`
			`# (postconf -m → no lmdb). Proxy:btree is unsafe because postscreen does`
			`# its own locking. An empty value disables the cache entirely — legitimate`
			`# clients pay the greet/bare-newline re-check on every new TCP session,`
			`# which is trivial at our volume (~100 deliveries/day).`
			`postscreen_cache_map =`
extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] Phase 2 of platform stack split. 5 more modules extracted into independent stacks. All applied successfully with zero destroys. Cloudflared now reads k8s_users from Vault directly to compute user_domains. Woodpecker pipeline runs all 8 extracted stacks in parallel. Memory bumped to 6Gi for 9 concurrent TF processes. Platform reduced from 27 to 19 modules. 2026-03-17 21:34:11 +00:00			`EOT`
			`}`