Attempt rolled back to .200 baseline. Root blocker: cloudflared is a token/dashboard-managed tunnel whose ingress targets the Traefik LB IP (10.0.20.200), so moving Traefik to .203 took down all proxied apps. Retry must also repoint the tunnel ingress (Cloudflare API). Also documents the vault-ingress circular dep, SIGPIPE->stuck PG state-lock gotcha, and the ETP=Local hairpin caveat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
163 lines
8.5 KiB
Markdown
163 lines
8.5 KiB
Markdown
# Plan: Migrate Traefik to dedicated IP 10.0.20.203 + ETP=Local
|
|
|
|
**Date:** 2026-05-30 · **Pairs with:** `2026-05-30-traefik-dedicated-ip-etp-local-design.md`
|
|
**Status:** Draft — review required before executing. Nothing applied yet.
|
|
|
|
Goal: real client IPs to CrowdSec + working QUIC on the 24 direct apps, by
|
|
moving Traefik off the shared `10.0.20.200` onto its own `10.0.20.203` with
|
|
`externalTrafficPolicy: Local`. Shared IP `.200` (incl. the TF state DB) is
|
|
left untouched until the final cleanup step.
|
|
|
|
> Recommended cutover: **in-place** (simplest, most maintainable) inside a short
|
|
> planned window. Additive/zero-downtime variant noted at the end.
|
|
|
|
## Phase 0 — Pre-flight (read-only, ~10 min)
|
|
|
|
- [ ] Snapshot current state (already captured in chat; re-confirm at execution):
|
|
- Traefik svc: IP `10.0.20.200`, `allow-shared-ip=shared`, ETP=Cluster.
|
|
- `.200` shared by 10 services incl. `dbaas/postgresql-lb:5432` (TF state).
|
|
- DNS apex `viktorbarzin.me A = 10.0.20.200` (Technitium primary, split-horizon).
|
|
- pfSense rdr: WAN 443 tcp+udp → alias `<nginx>` (=10.0.20.200); `admin@10.0.20.1`.
|
|
- Traefik 3 replicas (node4, node5, +1), PDB minAvailable=2.
|
|
- [ ] Confirm `10.0.20.203` still free in pool `10.0.20.200-220`.
|
|
- [ ] **Lower DNS TTL** on the apex record to 60s (Technitium) ~30 min ahead of
|
|
cutover to shrink the window. (Restore to normal afterward.)
|
|
- [ ] Baseline checks to compare against (run now, save output):
|
|
- `curl -sI https://immich.viktorbarzin.me` (direct app) → 200/redirect
|
|
- `curl -sI https://<a-proxied-app>` → 200 (proxied path)
|
|
- PG state reachable: `nc -vz 10.0.20.200 5432` (or a `terragrunt plan` no-op)
|
|
- Traefik access log shows `10.0.20.103` for a direct app (the bug we're fixing)
|
|
- `http3check.net` for immich → QUIC FAILS (baseline)
|
|
|
|
## Phase 1 — Terraform: dedicated IP + ETP=Local (reversible)
|
|
|
|
Edit `stacks/traefik/modules/traefik/main.tf`, Helm `service` block (~L165-173):
|
|
|
|
```hcl
|
|
service = {
|
|
type = "LoadBalancer"
|
|
annotations = {
|
|
"metallb.io/loadBalancerIPs" = "10.0.20.203" # was 10.0.20.200
|
|
# allow-shared-ip REMOVED — Traefik no longer shares an IP
|
|
}
|
|
spec = {
|
|
externalTrafficPolicy = "Local" # was Cluster
|
|
}
|
|
}
|
|
```
|
|
|
|
- [ ] `scripts/tg plan` in `stacks/traefik` — review: only the Traefik Service
|
|
changes (new IP, ETP, annotation removed). No change to other stacks.
|
|
- [ ] `scripts/tg apply`.
|
|
- [ ] **Immediately verify** (ingress is briefly broken until DNS+pfSense move):
|
|
- `kubectl get svc traefik -n traefik` → IP `10.0.20.203`, ETP=Local.
|
|
- `kubectl get svc -A | grep 10.0.20.200` → the other 9 services still hold `.200`.
|
|
- **`nc -vz 10.0.20.200 5432`** → TF state DB still reachable (critical).
|
|
- `curl -sI --resolve <app>:443:10.0.20.203 https://<direct-app>` → 200
|
|
(proves `.203` serves before DNS moves).
|
|
|
|
**Rollback (Phase 1):** revert the three lines → `scripts/tg apply`. Back to `.200`.
|
|
|
|
## Phase 2 — Internal DNS cutover (Technitium)
|
|
|
|
- [ ] Update split-horizon apex: `viktorbarzin.me A → 10.0.20.203` (primary;
|
|
AXFR replicates to secondary/tertiary, or kick `technitium-zone-sync`).
|
|
- [ ] Verify internal resolution: `dig +short immich.viktorbarzin.me` → `10.0.20.203`
|
|
from a cluster/LAN client; `curl -sI https://immich.viktorbarzin.me` → 200.
|
|
|
|
**Rollback (Phase 2):** apex A → `10.0.20.200`.
|
|
|
|
## Phase 3 — pfSense (live firewall — operator-driven, alias not literal)
|
|
|
|
Per the "create a VIP/alias, don't hardcode" requirement:
|
|
|
|
- [ ] **Create a pfSense Firewall Alias** (Firewall ▸ Aliases), type Host:
|
|
name `traefik_lb`, value `10.0.20.203`. *(This is the correct pfSense
|
|
object for a NAT-forward target — same kind as the existing `<nginx>`
|
|
alias. If a CARP/IP-Alias Virtual IP is intended instead, confirm at
|
|
review; a routed K8s LB IP normally uses an Alias, not a VIP.)*
|
|
- [ ] **Repoint the 443 forward** (Firewall ▸ NAT ▸ Port Forward): change the
|
|
existing WAN `https` (TCP **and** UDP) rule's target from `nginx` →
|
|
`traefik_lb`. Leave the auto firewall rule linked. Do **not** touch the
|
|
`http-alt`/`7443` rules (those are xray on `<k8s_shared_lb>`).
|
|
- [ ] Apply pfSense changes.
|
|
- [ ] Verify externally:
|
|
- `http3check.net` for immich → **QUIC OK** (h3 established).
|
|
- External `curl` to a few direct apps → 200.
|
|
- Traefik access log now shows **real client IPs** for direct apps (not `10.0.20.103`).
|
|
|
|
**Rollback (Phase 3):** point the 443 rule's target back to `nginx`.
|
|
|
|
## Phase 4 — Verify CrowdSec + the fleet (the real prize)
|
|
|
|
- [ ] Traefik logs: real public IPs on direct apps (sample several).
|
|
- [ ] CrowdSec: confirm it now ingests real IPs (a test decision / metrics);
|
|
**confirm the source-IP allowlist** (`10.0.20.0/22`, `192.168.1.0/24`,
|
|
tailnet) is active so family/LAN aren't banned.
|
|
- [ ] Proxied apps unaffected (spot-check 2-3 — still real IPs via Cloudflare).
|
|
- [ ] All other `.200` services healthy (PG state, headscale, wireguard, coturn,
|
|
xray, etc.).
|
|
- [ ] Restore DNS TTL to normal.
|
|
|
|
## Phase 5 — Cleanup / docs
|
|
|
|
- [ ] Confirm Traefik no longer answers on `.200` (it shouldn't after Phase 1).
|
|
- [ ] Update docs (design doc "Affected docs" list): `.claude/CLAUDE.md`,
|
|
`docs/architecture/networking.md`, service-catalog, memory ids 3241-3246.
|
|
- [ ] Commit TF + docs.
|
|
|
|
## Rollback (full)
|
|
|
|
Reverse order: pfSense 443 target → `nginx`; apex A → `.200`; revert the
|
|
Traefik Service TF (IP `.200`, `allow-shared-ip=shared`, ETP=Cluster) → apply.
|
|
kubectl/Helm reach the API server directly (not via Traefik), so control is
|
|
retained even if ingress is down mid-cutover.
|
|
|
|
## Additive (zero-downtime) variant — if the window is unacceptable
|
|
|
|
Instead of editing the Helm Service in place: add a second raw
|
|
`kubernetes_service` (type LoadBalancer, IP `.203`, ETP=Local, ports
|
|
web/80→8000, websecure/443→8443 TCP, websecure-http3/443→8443 UDP, selector =
|
|
Traefik pod labels). Both `.200` (old) and `.203` (new) serve Traefik. Cut
|
|
DNS+pfSense to `.203`, verify, then convert the Helm Service to ClusterIP
|
|
(drops `.200`). More config to carry long-term (a hand-maintained Service
|
|
duplicating Helm) — weigh against the brief in-place window.
|
|
|
|
## Attempt 1 — 2026-05-30 — ROLLED BACK (post-mortem)
|
|
|
|
First execution was rolled back to the `.200` baseline; all service restored,
|
|
TF state reconciled (`No changes`). The cutover **achieved its primary goal
|
|
mid-flight** (real external client IPs reached CrowdSec — confirmed real IPs
|
|
like `34.107.119.124` in Traefik logs instead of node `10.0.20.103`), but a
|
|
**missed dependency took proxied apps down**, forcing rollback. Fix the plan
|
|
before retrying:
|
|
|
|
1. **BLOCKER — cloudflared targets the LB IP.** The `cloudflared` tunnel is
|
|
**token-based / Cloudflare-dashboard-managed** (`args: [tunnel]` +
|
|
`TUNNEL_TOKEN`; no local `config.yaml`). Its ingress sends `*.viktorbarzin.me`
|
|
to the **Traefik LB IP `10.0.20.200`**. Moving Traefik to `.203` left
|
|
cloudflared pointing at a dead IP → **every proxied app (vault, home, …)
|
|
went down**. **The retry MUST also repoint the tunnel ingress `.200 → .203`
|
|
in Cloudflare (API/dashboard)** as part of the same cutover — ideally point
|
|
cloudflared at the Traefik *ClusterIP/service* so it's IP-independent.
|
|
2. **Vault-ingress circular dependency.** Fetching the Technitium password from
|
|
Vault *during* the window failed (Vault's ingress was down). Fix used:
|
|
pre-fetch all creds before touching Traefik (worked). The DNS step then
|
|
restored Vault.
|
|
3. **SIGPIPE → stuck PG state locks.** Piping `scripts/tg` through `head`/`grep`
|
|
(early pipe close) SIGPIPE-killed terragrunt before it released the PG
|
|
advisory lock, leaving an idle `terraform_state` connection holding the lock
|
|
(`force-unlock` can't release another session's advisory lock). **Always run
|
|
`tg` to a file, never pipe through early-closing filters.** Clear a stuck
|
|
one by terminating the idle backend: `pg_terminate_backend(<pid>)` for the
|
|
idle conn holding `pg_locks.objid` of the workspace.
|
|
4. **ETP=Local + hairpin.** Internal hosts that resolve `*.viktorbarzin.me` via
|
|
*public* DNS and hairpin (e.g. the devvm) become flaky under ETP=Local.
|
|
True external clients and internal-direct (`.203`) clients work. Ensure such
|
|
hosts resolve internally (Technitium split-horizon).
|
|
5. **QUIC verification.** `http3check.net` was unreliable here (failed on TCP
|
|
while real clients got 200s) — don't rely on it; confirm from a real device
|
|
on cellular.
|
|
|
|
**Left in place for retry:** pfSense alias `traefik_lb` (=`10.0.20.203`, NAT
|
|
reverted to `nginx`); pfSense `config.xml` backups `config.xml.bak-traefik-*`.
|