infra/stacks
Viktor Barzin e2516b07a3 [mailserver] Disable postscreen btree cache to stop SMTP lock-contention stalls
## Context

Postfix inside docker-mailserver was spamming fatal errors at roughly
1 per minute — 5,464 of them in a 24h window — all of the same shape:

```
postfix/postscreen[NNN]: fatal: btree:/var/lib/postfix/postscreen_cache:
unable to get exclusive lock: Resource temporarily unavailable
```

Every time one of these fires, the postscreen process dies mid-connection
and the inbound SMTP session is dropped. Legitimate mail (including Brevo
deliveries for our e2e email-roundtrip probe) gets re-queued by the sender
and arrives late — frequently past the probe's 180s IMAP polling window,
producing a 35%/7d probe success rate and the EmailRoundtripStale alert
noise that was originally flagged as "probably nothing."

## Root cause

`master.cf` declares postscreen with `maxproc=1`, but postscreen still
re-spawns per incoming connection (or for short-lived reopens), and each
instance opens the shared btree cache with an exclusive file lock. Under
any concurrency (two TCP SYNs arriving close together, or a retry during
teardown), the second process hits EWOULDBLOCK on fcntl and Postfix
treats that as fatal.

Three options were considered:

  | Option | Verdict |
  |--------|---------|
  | (a) Disable cache (postscreen_cache_map = )  | ✓ chosen |
  | (b) Switch btree → lmdb                       | ✗ lmdb not compiled into docker-mailserver 15.0.0's postfix (`postconf -m` has no lmdb) |
  | (c) proxy:btree via proxymap                  | ✗ unsafe — Postfix docs: "postscreen does its own locking, not safe via proxymap" |
  | (d) Memcached sidecar                         | ✗ new moving part; deferred |

Option (a) is a small trade-off: legitimate clients re-run the
greet-action / bare-newline-action checks on every fresh TCP session
instead of hitting the 7-day whitelist cache. At our volume (~100
deliveries/day, ~72 of which are the probe itself) that's negligible CPU.
DNSBL re-evaluation is also avoided only partially, but this mailserver
already has `postscreen_dnsbl_action = ignore` so the cache's DNSBL role
was doing nothing anyway.

## This change

Appends a stanza to the user-merged postfix main.cf stored in
`variable.postfix_cf` that sets `postscreen_cache_map =` (empty value).
Postfix treats an empty cache_map as "no persistent cache" — per-session
decisions are still enforced, they just aren't cached across sessions.

Before:

```
smtpd ──► postscreen (maxproc=1, btree cache with exclusive lock)
                ├─ concurrent access → fcntl EWOULDBLOCK → fatal
                └─ connection dropped, sender retries, mail arrives late
```

After:

```
smtpd ──► postscreen (no cache, per-session checks only)
                └─ no shared file, no lock → no fatal, no dropped session
```

No change to master.cf (postscreen still the front-end), no change to
DNSBL / greet / bare-newline policy.

## What is NOT in this change

- Dovecot userdb dedup (shipped in the previous commit).
- Email-roundtrip probe widening (next commit).
- Rebuilding docker-mailserver image with lmdb support (deferred —
  disabling the cache is simpler and sufficient at our volume).

## Test Plan

### Automated

`postconf -m` in the running container to confirm lmdb is genuinely absent
(ruling out option (b) before we commit to (a)):

```
btree  cidr  environ  fail  hash  inline  internal  ldap  memcache
nis  pcre  pipemap  proxy  randmap  regexp  socketmap  static  tcp
texthash  unionmap  unix
```

No lmdb entry — confirmed.

`scripts/tg plan -target=module.mailserver.kubernetes_config_map.mailserver_config`:

```
  ~ "postfix-main.cf" = <<-EOT
      + postscreen_cache_map =
```

`scripts/tg apply`:

```
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
```

Reloader triggers pod rollout — baseline error count before apply was 34
`unable to get exclusive lock` lines per `--tail=500` log window.

### Manual Verification

Post-rollout, when the new pod is Ready:

1. `kubectl -n mailserver exec <pod> -c docker-mailserver -- postconf postscreen_cache_map`
   Expect: empty (no value)
2. Watch for 15 min: `kubectl -n mailserver logs -l app=mailserver -c docker-mailserver --tail=1000 | grep -c "unable to get exclusive lock"`
   Expect: 0 new occurrences (any hits are from before the rollout).
3. Trigger a probe run manually:
   `kubectl -n mailserver create job --from=cronjob/email-roundtrip-monitor probe-verify-$(date +%s)`
   then `kubectl -n mailserver logs job/probe-verify-...`
   Expect: `Round-trip SUCCESS` with duration < 120s.

## Reproduce locally

1. `kubectl -n mailserver exec <pod> -c docker-mailserver -- postconf postscreen_cache_map`
2. Expect: `postscreen_cache_map =` (empty value)
3. `kubectl -n mailserver logs -l app=mailserver -c docker-mailserver --since=15m | grep -c "unable to get exclusive lock"`
4. Expect: 0

Closes: code-1dc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:32:48 +00:00
..
_template [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
actualbudget [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
affine [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
authentik [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
beads-server [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
blog [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
broker-sync [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
changedetection [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
city-guesser [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
claude-agent-service [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
claude-memory [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
cloudflared [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
cnpg [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
coturn [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
crowdsec [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
cyberchef [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
dashy [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
dawarich [dawarich] Re-enable Sidekiq worker with resource limits + probes 2026-04-18 21:13:05 +00:00
dbaas [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
descheduler [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
diun [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
ebook2audiobook [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
ebooks [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
echo [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
excalidraw [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
external-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
f1-stream [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
foolery [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
forgejo [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
freedify [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
freshrss [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
frigate [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
grampsweb [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
hackmd [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
headscale [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
health [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
hermes-agent [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
homepage [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
immich [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
infra [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
infra-maintenance [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
insta2spotify [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
isponsorblocktv [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
jsoncrack [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
k8s-dashboard [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
k8s-portal [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
kms [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
kyverno [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
linkwarden [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
mailserver [mailserver] Disable postscreen btree cache to stop SMTP lock-contention stalls 2026-04-18 21:32:48 +00:00
matrix [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
meshcentral [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
metallb [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
metrics-server [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
monitoring [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
n8n [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
navidrome [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
netbox [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
networking-toolbox [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
nextcloud [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
nfs-csi [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
novelapp [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
ntfy [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
nvidia [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
onlyoffice [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
openclaw [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
osm_routing [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
owntracks [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
paperless-ngx [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
payslip-ingest [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
phpipam [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
platform [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
plotting-book [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
poison-fountain [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
priority-pass [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
privatebin [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
proxmox-csi [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
pvc-autoresizer [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rbac [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
real-estate-crawler [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
redis [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
reloader [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
resume [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
reverse-proxy [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rybbit [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
sealed-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
send [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
servarr [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
shadowsocks [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
speedtest [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
status-page [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
stirling-pdf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
tandoor [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
technitium [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
terminal [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
tor-proxy [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
trading-bot [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
traefik [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
travel_blog [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
tuya-bridge [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
uptime-kuma [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
url [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
vault [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
vaultwarden [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
vpa [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
wealthfolio [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
webhook_handler [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
whisper [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
wireguard [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
woodpecker [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
xray [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
ytdlp [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00