infra/docs/runbooks
Viktor Barzin e6e5fc5f17 [docs] Mailserver architecture — richer diagrams + steady-state accuracy [ci skip]
## Context

After code-yiu Phases 1a–6 landed, `docs/architecture/mailserver.md` still
carried the pre-HAProxy Mermaid diagram, a retired Dovecot-exporter
component row, stale PVC names (`-proxmox` suffixes that were renamed
`-encrypted` during the LUKS migration), a wrong probe schedule
(claimed 10 min, actually 20 min), and a Mailgun-API claim for the
probe (it's been on Brevo since code-n5l). The two-path architecture
(external-via-HAProxy + intra-cluster-via-ClusterIP) that defines the
current design wasn't visualised at all.

## This change

Rewrote the Architecture Diagram section to show **both ingress paths
in one Mermaid flowchart**, colour-coded:

- External (orange): Sender → pfSense NAT → HAProxy → NodePort →
  **alt PROXY listeners** (2525/4465/5587/10993).
- Intra-cluster (blue): Roundcube / probe → ClusterIP Service →
  **stock listeners** (25/465/587/993), no PROXY.
- The pod subgraph shows both listener sets feeding the same Postfix /
  Rspamd / Dovecot / Maildir pipeline.
- Security dotted edges: Postfix log stream → CrowdSec agent →
  LAPI → pfSense bouncer decisions.
- Monitoring dotted edges: probe → Brevo HTTP → MX → pod → IMAP →
  Pushgateway/Uptime Kuma.

Added a **sequenceDiagram** for the external SMTP roundtrip — walks
through the wire-level handshake from external MTA → pfSense NAT →
HAProxy TCP connect → PROXY v2 header write → kube-proxy SNAT → pod
postscreen parse → smtpd banner. Makes the "how does the pod see the
real IP despite SNAT?" question self-answering.

Added a **Port mapping table** listing all 8 container listeners (4
stock + 4 alt) with their Service, NodePort, PROXY-required flag, and
who uses each path. Replaces the ambiguous prose about "alt ports".

Fixed stale bits:
- Removed Dovecot Exporter row from Components (retired in code-1ik).
- Added pfSense HAProxy row.
- Probe schedule: every 10 min → **every 20 min** (`*/20 * * * *`).
- Probe API: Mailgun → **Brevo HTTP**.
- PVC names: `-proxmox` → **`-encrypted`** (all three); storage class
  `proxmox-lvm` → **`proxmox-lvm-encrypted`**.
- Added `mailserver-backup-host` + `roundcube-backup-host` RWX NFS
  PVCs to the Storage table with backup flow pointer.
- Expanded Troubleshooting → Inbound to include HAProxy health check
  + container-listener verification steps.
- Secrets table: `brevo_api_key` now marked as used by both relay +
  probe; `mailgun_api_key` marked historical.

Added a prominent **UPDATE 2026-04-19** header to
`docs/runbooks/mailserver-proxy-protocol.md` pointing future readers
at the implemented state in `mailserver-pfsense-haproxy.md`. Research
doc preserved as a decision record — it's the canonical "why not just
pin the pod?" reference.

## What is NOT in this change

- No Terraform changes; this is docs-only.
- No changes to the runbook (`mailserver-pfsense-haproxy.md`) — it was
  already rewritten during Phase 6.

## Test Plan

### Automated
```
$ awk '/^```mermaid/ {c++} END{print c}' docs/architecture/mailserver.md
2
$ grep -c '\-encrypted' docs/architecture/mailserver.md
5  # PVC references normalised
$ grep -c '\-proxmox' docs/architecture/mailserver.md
0  # no stale names left
```

### Manual Verification
Render `docs/architecture/mailserver.md` on GitHub or any Mermaid-
capable viewer:
1. Top Architecture Diagram should show two labelled paths into the
   pod, colour-coded (orange = external, blue = intra-cluster).
2. Sequence diagram should show 10 numbered steps ending at Rspamd +
   Dovecot delivery.
3. Port Mapping table should make it obvious that the 4 alt container
   ports are only reachable via `mailserver-proxy` NodePort and require
   PROXY v2.
2026-04-19 12:40:53 +00:00
..
beads-auto-dispatch.md [beads-server] Auto-dispatch agent beads via CronJobs 2026-04-18 22:35:46 +00:00
mailserver-pfsense-haproxy.md [mailserver] Phase 6 — decommission MetalLB LB path [ci skip] 2026-04-19 12:36:11 +00:00
mailserver-proxy-protocol.md [docs] Mailserver architecture — richer diagrams + steady-state accuracy [ci skip] 2026-04-19 12:40:53 +00:00
nfs-prerequisites.md [docs] Add NFS prerequisite runbook for nfs_volume module [ci skip] 2026-04-19 10:40:55 +00:00
r730-ram-upgrade-272gb.md docs: update hardware inventory for R730 RAM upgrade to 272GB 2026-04-02 00:48:13 +03:00
restore-etcd.md backup & DR: add alerting, fix rotation, secure MySQL password, add runbooks 2026-03-19 20:34:33 +00:00
restore-full-cluster.md rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip] 2026-04-13 18:37:04 +00:00
restore-lvm-snapshot.md update backup/DR docs and runbooks for 3-2-1 architecture 2026-04-06 15:06:01 +03:00
restore-mysql.md feat: add per-database backups for PostgreSQL and MySQL 2026-04-14 22:39:33 +00:00
restore-postgresql.md feat: add per-database backups for PostgreSQL and MySQL 2026-04-14 22:39:33 +00:00
restore-pvc-from-backup.md rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip] 2026-04-13 18:37:04 +00:00
restore-vault.md update backup/DR docs and runbooks for 3-2-1 architecture 2026-04-06 15:06:01 +03:00
restore-vaultwarden.md update backup/DR docs and runbooks for 3-2-1 architecture 2026-04-06 15:06:01 +03:00