Commit graph

2 commits

Author SHA1 Message Date
Viktor Barzin
e6e5fc5f17 [docs] Mailserver architecture — richer diagrams + steady-state accuracy [ci skip]
## Context

After code-yiu Phases 1a–6 landed, `docs/architecture/mailserver.md` still
carried the pre-HAProxy Mermaid diagram, a retired Dovecot-exporter
component row, stale PVC names (`-proxmox` suffixes that were renamed
`-encrypted` during the LUKS migration), a wrong probe schedule
(claimed 10 min, actually 20 min), and a Mailgun-API claim for the
probe (it's been on Brevo since code-n5l). The two-path architecture
(external-via-HAProxy + intra-cluster-via-ClusterIP) that defines the
current design wasn't visualised at all.

## This change

Rewrote the Architecture Diagram section to show **both ingress paths
in one Mermaid flowchart**, colour-coded:

- External (orange): Sender → pfSense NAT → HAProxy → NodePort →
  **alt PROXY listeners** (2525/4465/5587/10993).
- Intra-cluster (blue): Roundcube / probe → ClusterIP Service →
  **stock listeners** (25/465/587/993), no PROXY.
- The pod subgraph shows both listener sets feeding the same Postfix /
  Rspamd / Dovecot / Maildir pipeline.
- Security dotted edges: Postfix log stream → CrowdSec agent →
  LAPI → pfSense bouncer decisions.
- Monitoring dotted edges: probe → Brevo HTTP → MX → pod → IMAP →
  Pushgateway/Uptime Kuma.

Added a **sequenceDiagram** for the external SMTP roundtrip — walks
through the wire-level handshake from external MTA → pfSense NAT →
HAProxy TCP connect → PROXY v2 header write → kube-proxy SNAT → pod
postscreen parse → smtpd banner. Makes the "how does the pod see the
real IP despite SNAT?" question self-answering.

Added a **Port mapping table** listing all 8 container listeners (4
stock + 4 alt) with their Service, NodePort, PROXY-required flag, and
who uses each path. Replaces the ambiguous prose about "alt ports".

Fixed stale bits:
- Removed Dovecot Exporter row from Components (retired in code-1ik).
- Added pfSense HAProxy row.
- Probe schedule: every 10 min → **every 20 min** (`*/20 * * * *`).
- Probe API: Mailgun → **Brevo HTTP**.
- PVC names: `-proxmox` → **`-encrypted`** (all three); storage class
  `proxmox-lvm` → **`proxmox-lvm-encrypted`**.
- Added `mailserver-backup-host` + `roundcube-backup-host` RWX NFS
  PVCs to the Storage table with backup flow pointer.
- Expanded Troubleshooting → Inbound to include HAProxy health check
  + container-listener verification steps.
- Secrets table: `brevo_api_key` now marked as used by both relay +
  probe; `mailgun_api_key` marked historical.

Added a prominent **UPDATE 2026-04-19** header to
`docs/runbooks/mailserver-proxy-protocol.md` pointing future readers
at the implemented state in `mailserver-pfsense-haproxy.md`. Research
doc preserved as a decision record — it's the canonical "why not just
pin the pod?" reference.

## What is NOT in this change

- No Terraform changes; this is docs-only.
- No changes to the runbook (`mailserver-pfsense-haproxy.md`) — it was
  already rewritten during Phase 6.

## Test Plan

### Automated
```
$ awk '/^```mermaid/ {c++} END{print c}' docs/architecture/mailserver.md
2
$ grep -c '\-encrypted' docs/architecture/mailserver.md
5  # PVC references normalised
$ grep -c '\-proxmox' docs/architecture/mailserver.md
0  # no stale names left
```

### Manual Verification
Render `docs/architecture/mailserver.md` on GitHub or any Mermaid-
capable viewer:
1. Top Architecture Diagram should show two labelled paths into the
   pod, colour-coded (orange = external, blue = intra-cluster).
2. Sequence diagram should show 10 numbered steps ending at Rspamd +
   Dovecot delivery.
3. Port Mapping table should make it obvious that the 4 alt container
   ports are only reachable via `mailserver-proxy` NodePort and require
   PROXY v2.
2026-04-19 12:40:53 +00:00
Viktor Barzin
09c1105648 [mailserver] Delete postfix_cf_reference_DO_NOT_USE dead code [ci skip]
## Context

`infra/stacks/mailserver/modules/mailserver/variables.tf` carried a
130-line historical scaffolding variable
`postfix_cf_reference_DO_NOT_USE` containing a reference copy of an
older Postfix `main.cf` layout. The variable name itself signalled
dead-code intent ("DO_NOT_USE"), and a repo-wide
`grep -rn postfix_cf_reference infra/` confirmed zero consumers — no
module, no stack, no script, no doc ever referenced it. Carrying dead
Terraform variables costs nothing at runtime but actively wastes
reviewer attention on every `git blame`, drives up `variables.tf` read
time, and lets drift calcify.

Trade-offs considered:
- Keep it "just in case" → rejected; the file it mirrored
  (`/usr/share/postfix/main.cf.dist`) is already canonical upstream and
  reproducible inside any docker-mailserver container.
- Move it to a comment block → rejected; same noise cost, no value
  over deletion (authoritative source is in the image).

## This change

Drops the entire `variable "postfix_cf_reference_DO_NOT_USE" { ... }`
block (136 lines incl. trailing blank). No other variable touched, no
resource touched, no comment elsewhere touched. `variables.tf` now
contains only the single live variable `postfix_cf` that is actually
consumed by the module.

## What is NOT in this change

- No Terraform state modification — variable was never read, so state
  has no record of it.
- No Postfix runtime behaviour change — `postfix_cf` (the live one) is
  untouched.
- No fix for the pre-existing `kubernetes_deployment.mailserver` /
  `kubernetes_service.mailserver` drift that `terragrunt plan` surfaces
  independently. Those 2 in-place updates are known and tracked
  separately; this commit explicitly avoids conflating cleanup with
  drift resolution.
- No apply needed — pure source hygiene.

## Test Plan

### Automated

Reference check before edit:
```
$ grep -rn postfix_cf_reference /home/wizard/code/infra/
infra/stacks/mailserver/modules/mailserver/variables.tf:41:variable "postfix_cf_reference_DO_NOT_USE" {
```
(single match — the declaration itself)

Reference check after edit:
```
$ grep -rn postfix_cf_reference /home/wizard/code/infra/
(no matches)
```

`terragrunt validate` (from `infra/stacks/mailserver/`):
```
Success! The configuration is valid, but there were some
validation warnings as shown above.
```
(warnings are pre-existing `kubernetes_namespace` → `_v1` deprecation
notices, unrelated)

`terragrunt plan` (from `infra/stacks/mailserver/`):
```
  # module.mailserver.kubernetes_deployment.mailserver will be updated in-place
  # module.mailserver.kubernetes_service.mailserver will be updated in-place
Plan: 0 to add, 2 to change, 0 to destroy.
```
Both in-place updates are the known pre-existing drift
(volume_mount ordering + stale `metallb.io/ip-allocated-from-pool`
annotation). No change is attributable to this commit — the dead
variable was never referenced, so removing it leaves state untouched.

### Manual Verification

1. `cd infra/stacks/mailserver/modules/mailserver/`
2. `grep -c postfix_cf_reference variables.tf` → expected `0`
3. `wc -l variables.tf` → expected `39` (was `175`; 136 lines removed
   including the trailing blank after the EOT)
4. Open `variables.tf` → expected: only `variable "postfix_cf"` remains
5. `cd ../..` (stack root) → `terragrunt validate` → expected:
   `Success! The configuration is valid`
6. `terragrunt plan` → expected: `Plan: 0 to add, 2 to change, 0 to
   destroy.` (the 2 are the pre-existing drift, not from this commit).

Closes: code-o3q

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 00:05:44 +00:00