coredns: pods get internal split-horizon answers for viktorbarzin.me [ci skip]

Forward the viktorbarzin.me:53 pod block to the Technitium ClusterIP (10.96.0.53, same as the .lan block) instead of 8.8.8.8/1.1.1.1. Pods become ordinary internal clients (CNAME -> apex -> live Traefik LB; mail -> 10.0.20.1), fixing the 27 non-proxied [External] uptime-kuma monitors that rode the TP-Link NAT loopback (hard-down since 06-09; loopback refuses flows whose source equals the reflection target, which all pfSense-SNAT'd cluster traffic does). Enabled by re-testing a stale premise: on k8s 1.34 pods DO reach the ETP=Local Traefik LB IP (kube-proxy short-circuits in-cluster traffic to LB IPs; verified from pods on three non-Traefik nodes) — re-verify after major k8s upgrades; canary = [External] fleet going red. The NAT-layer alternatives (pfSense rdr, SNAT-drop) were rejected: both fight return-path asymmetry and deepen TP-Link dependency. Verified in-pod: immich -> .203 + HTTPS 200, mail -> 10.0.20.1, forgejo -> Traefik ClusterIP (pin kept for Technitium-outage resilience). Proxied [External] monitors now test the internal path — true edge fidelity moves to the external vantage (ha-london, next fix). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:21:34 +00:00 · 2026-06-10 16:21:34 +00:00 · 59a531b8e0
commit 59a531b8e0
parent 35c89fa90c
5 changed files with 36 additions and 15 deletions
--- a/docs/architecture/dns.md
+++ b/docs/architecture/dns.md
@ -271,7 +271,7 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons
 - **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed)
 - **10.0.x.x clients (k8s nodes, devvm, other VMs)** — handled at the resolver since 2026-06-10: **pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium** (`10.0.20.201`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary). Every client of pfSense Unbound — all VLANs, k8s nodes included — therefore gets internal answers with **zero per-host configuration** (no `/etc/hosts` pins, no resolved drop-ins; both earlier same-day approaches were removed, nodes are stock). Names not behind Traefik keep distinct records in the zone (e.g. `mail.viktorbarzin.me → 10.0.20.1`, verified working on :993/:25). See `docs/runbooks/pfsense-unbound.md` for the override config + rollback, and `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md` for the incident that motivated this (kubelet forgejo pulls riding the broken hairpin; the containerd hosts.toml mirror cannot fix it — Traefik 404s bare-IP requests and the registry auth realm is an absolute public URL).
  - **devvm**: also covered by a `~viktorbarzin.me → 10.0.20.201` resolved routing domain (predates the pfSense override, provisioned by `setup-devvm.sh`) — redundant-but-harmless belt-and-suspenders.
-  - **in-cluster PODS are deliberately carved out**: the Traefik LB IP is `externalTrafficPolicy=Local` and unreachable from pods, so the `.203` answers pfSense now returns must NOT reach them. CoreDNS has a dedicated `viktorbarzin.me:53` block (in `stacks/technitium`, TF-managed): forgejo is pinned to Traefik's **ClusterIP** (interpolated from the live Service at plan time) and all other `.me` names forward to `8.8.8.8/1.1.1.1` — preserving pods' pre-existing public-IP behavior (beads code-yh33).
+  - **in-cluster PODS are ordinary internal clients too** (since 2026-06-10 evening): CoreDNS's dedicated `viktorbarzin.me:53` block (in `stacks/technitium`, TF-managed) forwards to the Technitium ClusterIP (`10.96.0.53`, same as the `.lan` block), so pods get the same split-horizon answers as everyone else. This works because on k8s 1.34 **pods CAN reach the ETP=Local Traefik LB IP** — kube-proxy short-circuits in-cluster traffic to LB IPs via the cluster path (verified from pods on three non-Traefik nodes; re-verify after major k8s upgrades — the canary is the uptime-kuma `[External]` fleet going red). forgejo stays pinned to Traefik's **ClusterIP** in the same block so CI pushes survive a Technitium outage. History: the block briefly forwarded to `8.8.8.8/1.1.1.1` (morning of 2026-06-10), which kept pods on public IPs and the broken TP-Link NAT loopback — 27 non-proxied `[External]` uptime-kuma monitors dark (beads code-yh33). Note: in-cluster `[External]` monitors now test DNS+Traefik+service via the internal path for ALL names, including Cloudflare-proxied ones — genuine edge-path fidelity is the job of a true external vantage (ha-london), not in-cluster probes.
  - **Trade-off**: `viktorbarzin.me` resolution via pfSense now depends on in-cluster Technitium (3 replicas). During a full cluster outage the zone SERVFAILs LAN-wide — acceptable, the services behind it are down anyway; node bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds.
  - **Residual nondeterminism**: nodes keep `94.140.14.14` as a secondary resolver (netplan/qm `--nameserver`). If systemd-resolved fails over to it during a pfSense DNS blip, `.me` answers are public again until it switches back — a rare, self-healing window, accepted.

--- a/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
+++ b/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
@ -108,6 +108,19 @@ replaced them:
   `.me` names to `8.8.8.8/1.1.1.1`, preserving pods' pre-existing
   public-IP behavior. Replaces the old forgejo rewrite in `.:53`.

+   **Addendum (same day, evening):** the "pods cannot reach the
+   ETP=Local LB IP" premise was re-tested and is FALSE on k8s 1.34
+   (kube-proxy short-circuits in-cluster traffic to LB IPs via the
+   cluster path; verified from pods on three non-Traefik nodes). The
+   public-answer carve-out had meanwhile left pods as the only client
+   class still riding the TP-Link NAT loopback, which hard-died
+   2026-06-09 — 27 non-proxied `[External]` uptime-kuma monitors dark.
+   Fix: the block now forwards to the Technitium ClusterIP
+   (`10.96.0.53`) — pods are ordinary internal clients; forgejo pin
+   kept for Technitium-outage resilience. In-cluster `[External]`
+   monitors now test the internal path for all names; genuine
+   edge-path fidelity belongs to a true external vantage (ha-london).
+
 node5/6 were also re-pointed from link-DNS=Technitium to
 `10.0.20.1 94.140.14.14` (netplan + `qm set --nameserver` on PVE VMs
 205/206) for fleet parity, and their `global-dns.conf` was deleted.