cloudflared: fix tunnel origin .200 -> Traefik svc DNS (full-site 502 outage) [ci skip]

The Cloudflare tunnel routed *.viktorbarzin.me and the apex to
https://10.0.20.200:443, but Traefik moved off the shared MetalLB .200
onto its dedicated 10.0.20.203 on 2026-05-30 (commit 0c01adac). Nothing
serves HTTPS on .200:443 anymore, so cloudflared could not reach its
origin (no route to host / i/o timeout) and Cloudflare returned 502 for
every externally-proxied service. Internal/LAN access (split-horizon ->
.203) was unaffected, which masked the outage.

Repoint both ingress rules at the in-cluster Traefik Service DNS
(https://traefik.traefik.svc.cluster.local:443) -- the design the docs
already described but the code never implemented -- so the tunnel is
decoupled from the Traefik LB IP and this cannot recur on a future move.

Applied live via targeted apply on the tunnel config resource only;
[ci skip] because live already matches and a full stack apply would
churn unrelated pre-existing drift (Keel annotations, DKIM re-chunk).

Post-mortem: docs/post-mortems/2026-06-01-cloudflared-stale-traefik-origin.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-01 21:21:40 +00:00
parent 30a644d3cd
commit f807050eb5
2 changed files with 81 additions and 4 deletions

View file

@ -74,18 +74,22 @@ resource "cloudflare_zero_trust_tunnel_cloudflared_config" "sof" {
warp_routing {
enabled = true
}
# Wildcard rule routes all subdomains through tunnel to Traefik.
# Traefik handles host-based routing via K8s Ingress resources.
# Wildcard rule routes all subdomains through the tunnel to Traefik,
# which handles host-based routing via K8s Ingress resources.
# Origin = in-cluster Traefik Service DNS (NOT a MetalLB LB IP) so the
# tunnel is decoupled from LB-IP changes. A raw IP here caused a full-site
# 502 on 2026-06-01 when Traefik moved 10.0.20.200 -> .203; see
# docs/post-mortems/2026-06-01-cloudflared-stale-traefik-origin.md.
ingress_rule {
hostname = "*.viktorbarzin.me"
service = "https://10.0.20.200:443"
service = "https://traefik.traefik.svc.cluster.local:443"
origin_request {
no_tls_verify = true
}
}
ingress_rule {
hostname = "viktorbarzin.me"
service = "https://10.0.20.200:443"
service = "https://traefik.traefik.svc.cluster.local:443"
origin_request {
no_tls_verify = true
}