docs: design + ADR-0019 — free backup MX via Roller Network secondary MX
All checks were successful
ci/woodpecker/push/default Pipeline was successful

Viktor wants inbound email to survive homelab outages without loss;
delayed delivery is acceptable and the budget is zero, which rules out
the previously doc-flagged Dynu option. Design adopts Roller Network's
free Secondary MX (3-week store-and-forward queue, no forced filtering,
catch-all-compatible) with our-side postscreen/rspamd whitelisting,
five validation gates before any DNS change, and a live failover test.
Also records the dangling-MTA-STS finding (TXT published, policy host
absent) as a follow-up. Implementation starts only after Viktor reviews
these docs; account will use rollernet@viktorbarzin.me.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-07-04 09:59:16 +00:00
parent 1a63fee4e4
commit c91fa881e6
2 changed files with 244 additions and 0 deletions

View file

@ -0,0 +1,65 @@
# Inbound mail gets a free store-and-forward backup MX (Roller Network)
`viktorbarzin.me` has run a single direct MX to the home IP since the 2026-04-12
inbound overhaul, with sender-MTA retry (15 days, sender-dependent) as the only
outage protection — a documented "No Backup MX" decision made after ForwardEmail's
forced anti-spoofing rejected legitimate forwarded mail and Cloudflare Email
Routing proved pass-through-only (no queue). Viktor now wants inbound mail to
survive homelab outages **without loss** (2026-07-04): delayed delivery is fine,
mid-outage reading is not required, and the budget is **$0** — which rules out
the doc-flagged Dynu fallback ($9.99/yr).
We adopt **Roller Network's free-tier Secondary MX** (`mail.rollernet.us` +
`mail2.rollernet.us` at equal MX preference 20, primary untouched): a
purpose-built store-and-forward relay with a **3-week queue** (sliding retries,
15 min doubling to 1-week max), **no forced spam filtering** on the secondary
path, and a valid-user table with a default-*allow-any* mode that preserves our
catch-all's infinite ad-hoc aliases. Our side whitelists their relay CIDRs in
postscreen (skip DNSBL/pregreet for queue drains) and exempts them from
SPF/DMARC *scoring* in rspamd — the ForwardEmail lesson applied at the right
layer; DKIM verification and content/AV scanning stay fully active. Go-live is
gated: confirm the free tier still includes Secondary MX, confirm the 10 MB/day
overage lock answers 4xx (defer) rather than 5xx (bounce), capture their
authoritative relay CIDRs, apply the whitelist **before** the MX records, and
finish with a live failover test (mailserver scaled to 0, probes from Gmail +
Brevo, verified queue-and-drain). Design:
[`plans/2026-07-04-backup-mx-rollernet-design.md`](../plans/2026-07-04-backup-mx-rollernet-design.md).
## Considered options
- **Dynu Email Backup ($9.99/yr)** — the previously doc-flagged option; simple,
but queue lifetime is undocumented (FAQ hints at 1224 h retry ceilings),
filtering behaviour is a black box, and it costs money the free requirement
excludes.
- **Self-hosted VPS relay** (Hetzner ~€50/yr, or Oracle Always-Free at $0) —
full control (30-day queue, own TLS/MTA-STS story), but a second
internet-facing pet to patch and monitor; Oracle hard-blocks egress port 25,
forcing delivery to the primary on a custom port, and idles risk free-tier
reclamation.
- **Cloudflare Email Routing / mailflare** — no store-and-forward (pass-through
only) / a terminal inbox on Cloudflare respectively; both previously
evaluated and rejected (2026-04-12; 2026-07-04, memory #7148).
- **Harden-only** (guard hard-5xx misconfig modes, add paging) — cheaper but
does not address multi-day outages or short-retry senders; deferred as a
complementary track, not an alternative.
## Consequences
- Outage mail queues **in plaintext at a third party** for up to 3 weeks —
accepted; same trust class as Brevo holding our outbound relay traffic.
- The backup path bypasses postscreen DNSBL and SPF/DMARC scoring for
Rollernet's CIDRs; content/AV/Bayes and DKIM verification still apply. A
slight spam uptick during outages is possible (catch-all absorbs to `spam@`).
- The free tier's **10 MB/day cap** locks the domain until midnight Pacific
when exceeded; the G2 gate decides whether that lock defers (harmless) or
bounces (revisit: paid tier or accept). Overage never affects the primary
path — only mail arriving via the backup while locked.
- Two more records in a Cloudflare zone already near the Free-plan 200-record
cap (headroom must be verified at apply time).
- **MTA-STS was found dangling** during design: the `_mta-sts` TXT is published
but no policy host exists, so MTA-STS is inert today. Any future fix must
list the Rollernet MX hosts in the policy or enforcing senders will skip the
backup path.
- `architecture/mailserver.md` §"No Backup MX" is superseded at implementation
time; a new runbook covers ACC queue inspection, post-outage drain checks,
and Accept-and-Hold for planned maintenance.