6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.7 KiB
Design: Dedicated MetalLB IP for Traefik with externalTrafficPolicy=Local
Date: 2026-05-30 Status: Draft — for review (no changes applied yet) Author: Viktor + Claude
Problem
Two issues share one root cause on the Traefik ingress LoadBalancer:
-
CrowdSec is blind to real client IPs on the 24 non-proxied/direct apps. Traefik logs
10.0.20.103(k8s-node3's IP) as the client for the overwhelming majority of direct-app requests (measured: 2522 hits vs 3 real external IPs). Cause: the Traefik LB isexternalTrafficPolicy: Cluster, so kube-proxy SNATs every external client to the MetalLB-elected node's IP before Traefik sees it. CrowdSec therefore makes ban decisions against an internal node IP it would never block → no effective IP-based protection on the direct apps (immich, forgejo, send, ytdlp, servarr, ebooks, novelapp, freedify, affine, health, f1-stream, kms, k8s-portal, etc. — 24 total). Proxied apps are unaffected — they arrive via the cloudflared tunnel and get real IPs through Cloudflare'sX-Forwarded-For. -
HTTP/3 / QUIC does not complete for the direct apps. An external probe (
http3check.net) confirms "QUIC connection could not be established" despiteAlt-Svc: h3being advertised and UDP 443 reaching Traefik (verified: pfSense NATs UDP 443 → Traefik LB; Traefik binds UDP 8443). Same root cause:ETP=Cluster+ 3 replicas means kube-proxy SNATs and can spread the UDP flow across pods, which breaks the QUIC handshake.
Both are fixed by externalTrafficPolicy: Local on the Traefik LB (no SNAT →
real client IPs preserved → QUIC stays pinned to one pod).
Why we can't just flip ETP on the current IP
Traefik currently shares MetalLB IP 10.0.20.200 with 9 other services
via metallb.io/allow-shared-ip:
dbaas/postgresql-lb (Terraform state backend), headscale/headscale-server,
wireguard/wireguard, coturn/coturn, xray/xray-reality,
shadowsocks/shadowsocks, beads-server/dolt, servarr/qbittorrent-torrenting,
tor-proxy/torrserver-bt.
Per MetalLB docs, services sharing an IP must all use Cluster (or point
to identical pods). Mixing Local and Cluster on a shared IP is not
allowed and would break the IP allocation — taking down all ingress and
the Terraform state DB (locking out terragrunt itself), plus VPN/DNS path.
→ Traefik must move to its own IP.
Target state
- New dedicated MetalLB IP
10.0.20.203(free; pool is10.0.20.200-220), not shared,externalTrafficPolicy: Local, for the Traefik LB. 10.0.20.200keeps the other 9 services unchanged (still allCluster).- Internal split-horizon DNS apex
viktorbarzin.me A→10.0.20.203(currently10.0.20.200). All*.viktorbarzin.meCNAME → apex, so this one record moves every internal ingress hostname. - pfSense: the WAN 443 (TCP and UDP) port-forward target moves from the
<nginx>alias to a new pfSense alias for10.0.20.203(per request: define a VIP/alias, do not hardcode the IP in rules — matches the existing<nginx>/<k8s_shared_lb>alias pattern).
Key decisions
- Dedicated IP, not shared — forced by the MetalLB mixed-ETP rule above.
10.0.20.203— first free IP after technitium (.201) and kms (.202).- pfSense reference by alias, not literal IP (user requirement) — create
alias e.g.
traefik_lb=10.0.20.203, reference it in the rdr + firewall pass rule. One place to change later. - Cutover style — two options, decided at review (see plan):
- In-place (recommended for maintainability): change the Helm Service to the new IP + ETP=Local in one edit; brief cutover window (mitigated by pre-lowering DNS TTL + staging the pfSense change).
- Additive (zero-downtime): stand up a second LB Service on
.203(ETP=Local) alongside the existing.200one, cut DNS/pfSense over, then retire Traefik from.200. More moving parts to maintain.
Risks & watch-items
- Terraform state backend lives on
.200— every phase must verifydbaas/postgresql-lb:5432stays reachable. We never touch.200's config, only remove Traefik from it at the end; low risk but explicitly checked. - Live-firewall edit (pfSense rdr + alias) — done via the pfSense UI
(persisted in config.xml); CLI
pfctledits don't persist. Per the network-device rule, this step is operator-driven/confirmed, not automated. - CrowdSec behavior change — once it sees real public IPs on direct
apps, it will start making real ban decisions there. Confirm the security
allowlist (source-IP allowlist
10.0.20.0/22,192.168.1.0/24, tailnet; identityme@viktorbarzin.me) is correct so family/legit IPs aren't banned. - MetalLB ETP=Local node election —
.203is announced only from a node running a ready Traefik pod. Traefik has 3 replicas (node4, node5, +1) and PDB minAvailable=2, so ≥2 eligible nodes always exist; re-elects on failure. - Cloudflare-proxied apps route via the cloudflared tunnel → Traefik ClusterIP, not the LB IP, so they are unaffected — verified in plan.
- Cutover window for the in-place option — keep it short; have rollback staged.
Out of scope
- No change to the 9 services on
10.0.20.200. - No change to Cloudflare-proxied apps' path.
- No re-architecture of the pfSense↔K8s ingress beyond the 443 target move.
Affected docs (update on apply)
.claude/CLAUDE.md(Networking & Resilience / Service-Specific notes)docs/architecture/networking.md(or equivalent — Traefik LB IP, ETP)docs/runbooks/— add a short "Traefik LB IP / ETP" runbook entry.claude/reference/service-catalog.mdif it records LB IPs- memory: update the QUIC/ingress entries (ids 3241-3246)