From c1ee6863b317c38f63e38474d1842973b8c2f82e Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Fri, 3 Jul 2026 14:39:13 +0000 Subject: [PATCH] mailserver docs: troubleshooting entry for the postsrsd 100%-CPU spin Hit during the docs@ rollout: after a pod restart postsrsd came up spinning without binding its TCP ports, so postfix cleanup tempfailed every message with 451 queue file write error. Document the signature and the supervisorctl-restart / pod-recreate fix. Co-Authored-By: Claude Fable 5 --- docs/architecture/mailserver.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/architecture/mailserver.md b/docs/architecture/mailserver.md index 8f8c56ea..a7925849 100644 --- a/docs/architecture/mailserver.md +++ b/docs/architecture/mailserver.md @@ -311,6 +311,21 @@ Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from External ## Troubleshooting +### All mail tempfailing with `451 4.3.0 queue file write error` (postsrsd spin) + +Seen 2026-07-03 right after a pod restart. Signature in `/var/log/mail/mail.log`: +`postfix/cleanup: warning: tcp:localhost:10001 lookup error` + +`sender_canonical_maps map lookup problem ... message not accepted, try again later`. +Cause: **postsrsd** (SRS daemon, `sender_canonical_maps = tcp:localhost:10001`) +came up spinning at 100% CPU without binding 10001/10002 — supervisor shows it +`RUNNING` but `ss -ltn | grep 1000` is empty and its log is empty. Postfix then +tempfails every message (inbound AND submission); senders retry so nothing is +lost, and the roundtrip probe alerts within the hour. +Fix: `supervisorctl restart postsrsd` inside the container; if the fresh +process spins again (it did once), `kubectl -n mailserver delete pod` for a +full re-init — that healed it. Root cause not pinned down (one-off bad init; +postsrsd 1.10). + ### Inbound mail not arriving 1. **DNS/MX**: `dig MX viktorbarzin.me +short` → should show `mail.viktorbarzin.me` 2. **WAN reachability**: `nc -zw5 mail.viktorbarzin.me 25` from outside