docs: correct cloudflared-502 post-mortem + fix stale .200 Traefik ref [ci skip]
Real root cause of the 2026-06-01 full-site 502 was not a missed reference but an out-of-band fix that Terraform reverted: the 2026-05-30 Traefik .200->.203 migration repointed the Cloudflare tunnel to the Traefik service DNS via the CF Global API Key, but never landed that change in cloudflare.tf (left at .200). A terragrunt apply on 2026-06-01 reconciled live back to the stale .200, breaking all external ingress. Rewrite the post-mortem around the "codify out-of-band fixes or TF reverts them" lesson (a Terraform-Only-rule violation). Also fix docs/runbooks/kms-public-exposure.md, which still claimed Traefik served on 10.0.20.200:443 (now .203) — same migration fallout. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
f807050eb5
commit
9fb3e6e851
2 changed files with 40 additions and 35 deletions
|
|
@ -216,7 +216,7 @@ If the activation surface needs to come down (abuse, legal, audit):
|
|||
|
||||
The k8s service stays reachable on the LAN
|
||||
(`10.0.20.202:1688` directly, and the website at `kms.viktorbarzin.lan`
|
||||
via Traefik on `10.0.20.200:443`) — only the WAN port-forward is removed.
|
||||
via Traefik on `10.0.20.203:443`) — only the WAN port-forward is removed.
|
||||
|
||||
To put it back, recreate the NAT rule (target alias `k8s_kms_lb`,
|
||||
port `1688`) and the filter rule with the same per-source caps. The alias
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue