6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
12 KiB
Plan: Migrate Traefik to dedicated IP 10.0.20.203 + ETP=Local
Date: 2026-05-30 · Pairs with: 2026-05-30-traefik-dedicated-ip-etp-local-design.md
Status: Draft — review required before executing. Nothing applied yet.
Goal: real client IPs to CrowdSec + working QUIC on the 24 direct apps, by
moving Traefik off the shared 10.0.20.200 onto its own 10.0.20.203 with
externalTrafficPolicy: Local. Shared IP .200 (incl. the TF state DB) is
left untouched until the final cleanup step.
Recommended cutover: in-place (simplest, most maintainable) inside a short planned window. Additive/zero-downtime variant noted at the end.
Phase 0 — Pre-flight (read-only, ~10 min)
- Snapshot current state (already captured in chat; re-confirm at execution):
- Traefik svc: IP
10.0.20.200,allow-shared-ip=shared, ETP=Cluster. .200shared by 10 services incl.dbaas/postgresql-lb:5432(TF state).- DNS apex
viktorbarzin.me A = 10.0.20.200(Technitium primary, split-horizon). - pfSense rdr: WAN 443 tcp+udp → alias
<nginx>(=10.0.20.200);admin@10.0.20.1. - Traefik 3 replicas (node4, node5, +1), PDB minAvailable=2.
- Traefik svc: IP
- Confirm
10.0.20.203still free in pool10.0.20.200-220. - Lower DNS TTL on the apex record to 60s (Technitium) ~30 min ahead of cutover to shrink the window. (Restore to normal afterward.)
- Baseline checks to compare against (run now, save output):
curl -sI https://immich.viktorbarzin.me(direct app) → 200/redirectcurl -sI https://<a-proxied-app>→ 200 (proxied path)- PG state reachable:
nc -vz 10.0.20.200 5432(or aterragrunt planno-op) - Traefik access log shows
10.0.20.103for a direct app (the bug we're fixing) http3check.netfor immich → QUIC FAILS (baseline)
Phase 1 — Terraform: dedicated IP + ETP=Local (reversible)
Edit stacks/traefik/modules/traefik/main.tf, Helm service block (~L165-173):
service = {
type = "LoadBalancer"
annotations = {
"metallb.io/loadBalancerIPs" = "10.0.20.203" # was 10.0.20.200
# allow-shared-ip REMOVED — Traefik no longer shares an IP
}
spec = {
externalTrafficPolicy = "Local" # was Cluster
}
}
scripts/tg planinstacks/traefik— review: only the Traefik Service changes (new IP, ETP, annotation removed). No change to other stacks.scripts/tg apply.- Immediately verify (ingress is briefly broken until DNS+pfSense move):
kubectl get svc traefik -n traefik→ IP10.0.20.203, ETP=Local.kubectl get svc -A | grep 10.0.20.200→ the other 9 services still hold.200.nc -vz 10.0.20.200 5432→ TF state DB still reachable (critical).curl -sI --resolve <app>:443:10.0.20.203 https://<direct-app>→ 200 (proves.203serves before DNS moves).
Rollback (Phase 1): revert the three lines → scripts/tg apply. Back to .200.
Phase 2 — Internal DNS cutover (Technitium)
- Update split-horizon apex:
viktorbarzin.me A → 10.0.20.203(primary; AXFR replicates to secondary/tertiary, or kicktechnitium-zone-sync). - Verify internal resolution:
dig +short immich.viktorbarzin.me→10.0.20.203from a cluster/LAN client;curl -sI https://immich.viktorbarzin.me→ 200.
Rollback (Phase 2): apex A → 10.0.20.200.
Phase 3 — pfSense (live firewall — operator-driven, alias not literal)
Per the "create a VIP/alias, don't hardcode" requirement:
- Create a pfSense Firewall Alias (Firewall ▸ Aliases), type Host:
name
traefik_lb, value10.0.20.203. (This is the correct pfSense object for a NAT-forward target — same kind as the existing<nginx>alias. If a CARP/IP-Alias Virtual IP is intended instead, confirm at review; a routed K8s LB IP normally uses an Alias, not a VIP.) - Repoint the 443 forward (Firewall ▸ NAT ▸ Port Forward): change the
existing WAN
https(TCP and UDP) rule's target fromnginx→traefik_lb. Leave the auto firewall rule linked. Do not touch thehttp-alt/7443rules (those are xray on<k8s_shared_lb>). - Apply pfSense changes.
- Verify externally:
http3check.netfor immich → QUIC OK (h3 established).- External
curlto a few direct apps → 200. - Traefik access log now shows real client IPs for direct apps (not
10.0.20.103).
Rollback (Phase 3): point the 443 rule's target back to nginx.
Phase 4 — Verify CrowdSec + the fleet (the real prize)
- Traefik logs: real public IPs on direct apps (sample several).
- CrowdSec: confirm it now ingests real IPs (a test decision / metrics);
confirm the source-IP allowlist (
10.0.20.0/22,192.168.1.0/24, tailnet) is active so family/LAN aren't banned. - Proxied apps unaffected (spot-check 2-3 — still real IPs via Cloudflare).
- All other
.200services healthy (PG state, headscale, wireguard, coturn, xray, etc.). - Restore DNS TTL to normal.
Phase 5 — Cleanup / docs
- Confirm Traefik no longer answers on
.200(it shouldn't after Phase 1). - Update docs (design doc "Affected docs" list):
.claude/CLAUDE.md,docs/architecture/networking.md, service-catalog, memory ids 3241-3246. - Commit TF + docs.
Rollback (full)
Reverse order: pfSense 443 target → nginx; apex A → .200; revert the
Traefik Service TF (IP .200, allow-shared-ip=shared, ETP=Cluster) → apply.
kubectl/Helm reach the API server directly (not via Traefik), so control is
retained even if ingress is down mid-cutover.
Additive (zero-downtime) variant — if the window is unacceptable
Instead of editing the Helm Service in place: add a second raw
kubernetes_service (type LoadBalancer, IP .203, ETP=Local, ports
web/80→8000, websecure/443→8443 TCP, websecure-http3/443→8443 UDP, selector =
Traefik pod labels). Both .200 (old) and .203 (new) serve Traefik. Cut
DNS+pfSense to .203, verify, then convert the Helm Service to ClusterIP
(drops .200). More config to carry long-term (a hand-maintained Service
duplicating Helm) — weigh against the brief in-place window.
Attempt 1 — 2026-05-30 — ROLLED BACK (post-mortem)
First execution was rolled back to the .200 baseline; all service restored,
TF state reconciled (No changes). The cutover achieved its primary goal
mid-flight (real external client IPs reached CrowdSec — confirmed real IPs
like 34.107.119.124 in Traefik logs instead of node 10.0.20.103), but a
missed dependency took proxied apps down, forcing rollback. Fix the plan
before retrying:
- BLOCKER — cloudflared targets the LB IP. The
cloudflaredtunnel is token-based / Cloudflare-dashboard-managed (args: [tunnel]+TUNNEL_TOKEN; no localconfig.yaml). Its ingress sends*.viktorbarzin.meto the Traefik LB IP10.0.20.200. Moving Traefik to.203left cloudflared pointing at a dead IP → every proxied app (vault, home, …) went down. The retry MUST also repoint the tunnel ingress.200 → .203in Cloudflare (API/dashboard) as part of the same cutover — ideally point cloudflared at the Traefik ClusterIP/service so it's IP-independent. - Vault-ingress circular dependency. Fetching the Technitium password from Vault during the window failed (Vault's ingress was down). Fix used: pre-fetch all creds before touching Traefik (worked). The DNS step then restored Vault.
- SIGPIPE → stuck PG state locks. Piping
scripts/tgthroughhead/grep(early pipe close) SIGPIPE-killed terragrunt before it released the PG advisory lock, leaving an idleterraform_stateconnection holding the lock (force-unlockcan't release another session's advisory lock). Always runtgto a file, never pipe through early-closing filters. Clear a stuck one by terminating the idle backend:pg_terminate_backend(<pid>)for the idle conn holdingpg_locks.objidof the workspace. - ETP=Local + hairpin. Internal hosts that resolve
*.viktorbarzin.mevia public DNS and hairpin (e.g. the devvm) become flaky under ETP=Local. True external clients and internal-direct (.203) clients work. Ensure such hosts resolve internally (Technitium split-horizon). - QUIC verification.
http3check.netwas unreliable here (failed on TCP while real clients got 200s) — don't rely on it; confirm from a real device on cellular.
Left in place for retry: pfSense alias traefik_lb (=10.0.20.203, NAT
reverted to nginx); pfSense config.xml backups config.xml.bak-traefik-*.
Attempt 2 — 2026-05-30 — SUCCESS
Live and verified, no proxied/Vault outage this time. Key change vs attempt 1: decouple cloudflared from the LB IP FIRST, so moving Traefik no longer touches the proxied path or Vault's ingress.
Executed order (all lessons applied — tg always run to a file, creds
pre-fetched while Vault up):
- Cloudflare tunnel ingress repointed
https://10.0.20.200:443→https://traefik.traefik.svc.cluster.local:443(both*.viktorbarzin.meand apex rules;noTLSVerifykept; catch-all 404 kept). Done via the Cloudflare Global API Key (secret/platform→cloudflare_api_key, emailvbarzin@gmail.com,X-Auth-Email+X-Auth-Keyheaders — NOT the tunnel token, which is not an API credential). Tunnel: account02e035473cfc4834fb10c5d35470d8b4, id75182cd7-bb91-4310-b961-5d8967da8b41. → proxied apps now IP-independent. - Traefik Service →
10.0.20.203+ETP=Local(single service;tg apply). Proxied apps + Vault stayed up (cloudflared → ClusterIP). - Technitium apex
viktorbarzin.me A→10.0.20.203(ttl 60). - pfSense 443 (tcp+udp) NAT
nginx→traefik_lb(.203);/etc/rc.filter_configure.
Verified: proxied 307/200 throughout; direct apps 200; real external
client IPs now reach Traefik/CrowdSec (216.73.217.51, 54.x, 52.x — not
node 10.0.20.103); PG state DB OK; TF state reconciled (tg apply exit 0).
Notes / follow-ups:
- Out-of-band (not in TF): the cloudflared tunnel ingress (remote/dashboard
config) and the pfSense
traefik_lbalias + NAT. Codify the tunnel config in TF (cloudflare_zero_trust_tunnel_cloudflared_config) so→ClusterIPis declarative — pre-existing gap (tunnel was already remote-managed). - QUIC: infra correct (ETP=Local + UDP 443 →
.203+ Traefik h3 listener).http3check.netis unreliable here — it hits the IPv6 AAAA (2001:470:6e:43d::2, separate HE-tunnel path, unchanged) and fails before reaching Traefik. Confirm QUIC from a real device (Chrome → Protocolh3). - pfSense
nginxalias (=.200) is now unused;traefik_lb(=.203) is live.
IPv6 follow-up — 2026-05-30 — DONE (HAProxy bridge, real client IPs)
The ETP=Local cutover fixed real client IPs + QUIC on the IPv4 direct path
only. The IPv6 path (HE 6in4 tunnel 2001:470:6e:43d::2 → pfSense) still ran
socat, which (a) masked every IPv6 client as 10.0.20.1, and (b) broke
outright once Traefik's proxyProtocol.trustedIPs started requiring PROXY-v2
from 10.0.20.1. Replaced socat with a standalone HAProxy bridge on pfSense
using send-proxy-v2 so real IPv6 client IPs reach Traefik/CrowdSec.
Executed:
- Traefik (TF,
stacks/traefik/.../main.tf): addedproxyProtocol = { trustedIPs = ["10.0.20.1"] }to theweb+websecureentrypoints. Bounded risk — only connections from10.0.20.1(the bridge) are PROXY-parsed; real IPv4 clients (ETP=Local, own source IP) are untouched. Applied; IPv4 + proxied verified 200 immediately after. - pfSense HAProxy (
/usr/local/etc/ipv6-haproxy.cfg): 6 frontends on[::2]:{443,80,25,465,587,993}→ Traefik.203:{443,80}and mail NodePorts{30125,30126,30127,30128}(.101-103), allsend-proxy-v2, nocheck(a plain check would false-DOWN the PROXY-expecting listeners). - Persistence: rewrote
rc.d/ipv6proxy→ manages HAProxy (service ipv6proxy {start,stop,status}, graceful-sf); rewroteipv6_proxy.sh(config.xml<shellcmd>boot entrypoint) to keep the nginx-off-[::]patch thenservice ipv6proxy onestart. socat backups kept as*.socat-bak-*.
Verified: web over ::2 = 200; Traefik logs show real public IPv6 clients
(e.g. 2620:10d:c092:500::6:1eda), zero 10.0.20.1 artifacts; mail-over-IPv6
220 banners on ::2:25/587 + IMAPS connect on ::2:993; IPv4 direct/proxied
- QUIC (
alt-svc: h3=":443") unaffected. No QUIC over IPv6 (bridge is TCP/h2). Authoritative as-built:docs/architecture/networking.md→ "IPv6 Ingress".