## Context
Postfix inside docker-mailserver was spamming fatal errors at roughly
1 per minute — 5,464 of them in a 24h window — all of the same shape:
```
postfix/postscreen[NNN]: fatal: btree:/var/lib/postfix/postscreen_cache:
unable to get exclusive lock: Resource temporarily unavailable
```
Every time one of these fires, the postscreen process dies mid-connection
and the inbound SMTP session is dropped. Legitimate mail (including Brevo
deliveries for our e2e email-roundtrip probe) gets re-queued by the sender
and arrives late — frequently past the probe's 180s IMAP polling window,
producing a 35%/7d probe success rate and the EmailRoundtripStale alert
noise that was originally flagged as "probably nothing."
## Root cause
`master.cf` declares postscreen with `maxproc=1`, but postscreen still
re-spawns per incoming connection (or for short-lived reopens), and each
instance opens the shared btree cache with an exclusive file lock. Under
any concurrency (two TCP SYNs arriving close together, or a retry during
teardown), the second process hits EWOULDBLOCK on fcntl and Postfix
treats that as fatal.
Three options were considered:
| Option | Verdict |
|--------|---------|
| (a) Disable cache (postscreen_cache_map = ) | ✓ chosen |
| (b) Switch btree → lmdb | ✗ lmdb not compiled into docker-mailserver 15.0.0's postfix (`postconf -m` has no lmdb) |
| (c) proxy:btree via proxymap | ✗ unsafe — Postfix docs: "postscreen does its own locking, not safe via proxymap" |
| (d) Memcached sidecar | ✗ new moving part; deferred |
Option (a) is a small trade-off: legitimate clients re-run the
greet-action / bare-newline-action checks on every fresh TCP session
instead of hitting the 7-day whitelist cache. At our volume (~100
deliveries/day, ~72 of which are the probe itself) that's negligible CPU.
DNSBL re-evaluation is also avoided only partially, but this mailserver
already has `postscreen_dnsbl_action = ignore` so the cache's DNSBL role
was doing nothing anyway.
## This change
Appends a stanza to the user-merged postfix main.cf stored in
`variable.postfix_cf` that sets `postscreen_cache_map =` (empty value).
Postfix treats an empty cache_map as "no persistent cache" — per-session
decisions are still enforced, they just aren't cached across sessions.
Before:
```
smtpd ──► postscreen (maxproc=1, btree cache with exclusive lock)
├─ concurrent access → fcntl EWOULDBLOCK → fatal
└─ connection dropped, sender retries, mail arrives late
```
After:
```
smtpd ──► postscreen (no cache, per-session checks only)
└─ no shared file, no lock → no fatal, no dropped session
```
No change to master.cf (postscreen still the front-end), no change to
DNSBL / greet / bare-newline policy.
## What is NOT in this change
- Dovecot userdb dedup (shipped in the previous commit).
- Email-roundtrip probe widening (next commit).
- Rebuilding docker-mailserver image with lmdb support (deferred —
disabling the cache is simpler and sufficient at our volume).
## Test Plan
### Automated
`postconf -m` in the running container to confirm lmdb is genuinely absent
(ruling out option (b) before we commit to (a)):
```
btree cidr environ fail hash inline internal ldap memcache
nis pcre pipemap proxy randmap regexp socketmap static tcp
texthash unionmap unix
```
No lmdb entry — confirmed.
`scripts/tg plan -target=module.mailserver.kubernetes_config_map.mailserver_config`:
```
~ "postfix-main.cf" = <<-EOT
+ postscreen_cache_map =
```
`scripts/tg apply`:
```
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
```
Reloader triggers pod rollout — baseline error count before apply was 34
`unable to get exclusive lock` lines per `--tail=500` log window.
### Manual Verification
Post-rollout, when the new pod is Ready:
1. `kubectl -n mailserver exec <pod> -c docker-mailserver -- postconf postscreen_cache_map`
Expect: empty (no value)
2. Watch for 15 min: `kubectl -n mailserver logs -l app=mailserver -c docker-mailserver --tail=1000 | grep -c "unable to get exclusive lock"`
Expect: 0 new occurrences (any hits are from before the rollout).
3. Trigger a probe run manually:
`kubectl -n mailserver create job --from=cronjob/email-roundtrip-monitor probe-verify-$(date +%s)`
then `kubectl -n mailserver logs job/probe-verify-...`
Expect: `Round-trip SUCCESS` with duration < 120s.
## Reproduce locally
1. `kubectl -n mailserver exec <pod> -c docker-mailserver -- postconf postscreen_cache_map`
2. Expect: `postscreen_cache_map =` (empty value)
3. `kubectl -n mailserver logs -l app=mailserver -c docker-mailserver --since=15m | grep -c "unable to get exclusive lock"`
4. Expect: 0
Closes: code-1dc
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>