Viktor Barzin 09a0c1fad4 docs/plans: Traefik dedicated IP + ETP=Local migration (design + plan)

Move Traefik off shared MetalLB IP 10.0.20.200 to a dedicated 10.0.20.203
with externalTrafficPolicy=Local, to (1) restore real client IPs for CrowdSec
on the 24 non-proxied apps (currently SNAT'd to a node IP) and (2) enable QUIC.
Forced off the shared IP because MetalLB forbids mixed ETP on a shared IP
(10.0.20.200 also carries the Terraform state DB). In-place cutover selected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-30 00:27:04 +00:00

6.1 KiB

Raw Blame History

Plan: Migrate Traefik to dedicated IP 10.0.20.203 + ETP=Local

Date: 2026-05-30 · Pairs with: 2026-05-30-traefik-dedicated-ip-etp-local-design.md Status: Draft — review required before executing. Nothing applied yet.

Goal: real client IPs to CrowdSec + working QUIC on the 24 direct apps, by moving Traefik off the shared 10.0.20.200 onto its own 10.0.20.203 with externalTrafficPolicy: Local. Shared IP .200 (incl. the TF state DB) is left untouched until the final cleanup step.

Recommended cutover: in-place (simplest, most maintainable) inside a short planned window. Additive/zero-downtime variant noted at the end.

Phase 0 — Pre-flight (read-only, ~10 min)

Snapshot current state (already captured in chat; re-confirm at execution):
- Traefik svc: IP 10.0.20.200, allow-shared-ip=shared, ETP=Cluster.
- .200 shared by 10 services incl. dbaas/postgresql-lb:5432 (TF state).
- DNS apex viktorbarzin.me A = 10.0.20.200 (Technitium primary, split-horizon).
- pfSense rdr: WAN 443 tcp+udp → alias <nginx> (=10.0.20.200); admin@10.0.20.1.
- Traefik 3 replicas (node4, node5, +1), PDB minAvailable=2.
Confirm 10.0.20.203 still free in pool 10.0.20.200-220.
Lower DNS TTL on the apex record to 60s (Technitium) ~30 min ahead of cutover to shrink the window. (Restore to normal afterward.)
Baseline checks to compare against (run now, save output):
- curl -sI https://immich.viktorbarzin.me (direct app) → 200/redirect
- curl -sI https://<a-proxied-app> → 200 (proxied path)
- PG state reachable: nc -vz 10.0.20.200 5432 (or a terragrunt plan no-op)
- Traefik access log shows 10.0.20.103 for a direct app (the bug we're fixing)
- http3check.net for immich → QUIC FAILS (baseline)

Phase 1 — Terraform: dedicated IP + ETP=Local (reversible)

Edit stacks/traefik/modules/traefik/main.tf, Helm service block (~L165-173):

service = {
  type = "LoadBalancer"
  annotations = {
    "metallb.io/loadBalancerIPs" = "10.0.20.203"   # was 10.0.20.200
    # allow-shared-ip REMOVED — Traefik no longer shares an IP
  }
  spec = {
    externalTrafficPolicy = "Local"                 # was Cluster
  }
}

scripts/tg plan in stacks/traefik — review: only the Traefik Service changes (new IP, ETP, annotation removed). No change to other stacks.
scripts/tg apply.
Immediately verify (ingress is briefly broken until DNS+pfSense move):
- kubectl get svc traefik -n traefik → IP 10.0.20.203, ETP=Local.
- kubectl get svc -A | grep 10.0.20.200 → the other 9 services still hold .200.
- nc -vz 10.0.20.200 5432 → TF state DB still reachable (critical).
- curl -sI --resolve <app>:443:10.0.20.203 https://<direct-app> → 200 (proves .203 serves before DNS moves).

Rollback (Phase 1): revert the three lines → scripts/tg apply. Back to .200.

Phase 2 — Internal DNS cutover (Technitium)

Update split-horizon apex: viktorbarzin.me A → 10.0.20.203 (primary; AXFR replicates to secondary/tertiary, or kick technitium-zone-sync).
Verify internal resolution: dig +short immich.viktorbarzin.me → 10.0.20.203 from a cluster/LAN client; curl -sI https://immich.viktorbarzin.me → 200.

Rollback (Phase 2): apex A → 10.0.20.200.

Phase 3 — pfSense (live firewall — operator-driven, alias not literal)

Per the "create a VIP/alias, don't hardcode" requirement:

Create a pfSense Firewall Alias (Firewall ▸ Aliases), type Host: name traefik_lb, value 10.0.20.203. (This is the correct pfSense object for a NAT-forward target — same kind as the existing <nginx> alias. If a CARP/IP-Alias Virtual IP is intended instead, confirm at review; a routed K8s LB IP normally uses an Alias, not a VIP.)
Repoint the 443 forward (Firewall ▸ NAT ▸ Port Forward): change the existing WAN https (TCP and UDP) rule's target from nginx → traefik_lb. Leave the auto firewall rule linked. Do not touch the http-alt/7443 rules (those are xray on <k8s_shared_lb>).
Apply pfSense changes.
Verify externally:
- http3check.net for immich → QUIC OK (h3 established).
- External curl to a few direct apps → 200.
- Traefik access log now shows real client IPs for direct apps (not 10.0.20.103).

Rollback (Phase 3): point the 443 rule's target back to nginx.

Phase 4 — Verify CrowdSec + the fleet (the real prize)

Traefik logs: real public IPs on direct apps (sample several).
CrowdSec: confirm it now ingests real IPs (a test decision / metrics); confirm the source-IP allowlist (10.0.20.0/22, 192.168.1.0/24, tailnet) is active so family/LAN aren't banned.
Proxied apps unaffected (spot-check 2-3 — still real IPs via Cloudflare).
All other .200 services healthy (PG state, headscale, wireguard, coturn, xray, etc.).
Restore DNS TTL to normal.

Phase 5 — Cleanup / docs

Confirm Traefik no longer answers on .200 (it shouldn't after Phase 1).
Update docs (design doc "Affected docs" list): .claude/CLAUDE.md, docs/architecture/networking.md, service-catalog, memory ids 3241-3246.
Commit TF + docs.

Rollback (full)

Reverse order: pfSense 443 target → nginx; apex A → .200; revert the Traefik Service TF (IP .200, allow-shared-ip=shared, ETP=Cluster) → apply. kubectl/Helm reach the API server directly (not via Traefik), so control is retained even if ingress is down mid-cutover.

Additive (zero-downtime) variant — if the window is unacceptable

Instead of editing the Helm Service in place: add a second raw kubernetes_service (type LoadBalancer, IP .203, ETP=Local, ports web/80→8000, websecure/443→8443 TCP, websecure-http3/443→8443 UDP, selector = Traefik pod labels). Both .200 (old) and .203 (new) serve Traefik. Cut DNS+pfSense to .203, verify, then convert the Helm Service to ClusterIP (drops .200). More config to carry long-term (a hand-maintained Service duplicating Helm) — weigh against the brief in-place window.

6.1 KiB Raw Blame History