forgejo pulls: route *.viktorbarzin.me to Technitium, drop /etc/hosts pins [ci skip]

Supersedes this morning's per-node /etc/hosts pin (no hardcoded service IPs on nodes, per Viktor). Technitium's split-horizon zone already resolves forgejo.viktorbarzin.me -> CNAME apex -> live Traefik LB IP (ingress-dns-sync auto-CNAMEs every ingress host; apex drift probe alerts) -- the nodes just never queried it. Rolled the devvm's systemd-resolved routing-domain pattern (~viktorbarzin.me -> 10.0.20.201) to all 7 nodes, removed the pins, verified getent + crictl pull via pure DNS. Also demoted node5/6's cloud-init global-dns.conf (DNS=8.8.8.8 1.1.1.1) to FallbackDNS-only: public servers in the global set race the routing domain. Its justification ("Technitium NXDOMAINs forgejo") was obsolete -- exactly the stale comment that pointed new nodes at the hairpin. hosts.toml mirror kept but documented as vestigial (Traefik 404s bare-IP requests; registry auth realm is an absolute URL). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 07:56:31 +00:00 · 2026-06-10 07:56:31 +00:00 · 1ee1bf0817
commit 1ee1bf0817
parent b6976ce014
7 changed files with 135 additions and 66 deletions
--- a/docs/architecture/dns.md
+++ b/docs/architecture/dns.md
@ -269,10 +269,11 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons

 - **Affected**: Non-proxied domains (ha-sofia, immich, headscale, calibre, vaultwarden, etc.) for 192.168.1.x clients
 - **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed)
- **Not affected**: 10.0.x.x and K8s clients — these resolve non-proxied domains to the public IP and rely on pfSense NAT reflection, which is **intermittently broken** (observed i/o timeouts to `176.12.22.76:443` from k8s nodes and the devvm, 2026-06-04 → 2026-06-10). Hairpin-sensitive paths on this network get explicit per-leg fixes instead:
-  - **kubelet image pulls of `forgejo.viktorbarzin.me`**: `/etc/hosts` pin `10.0.20.203 forgejo.viktorbarzin.me` on every k8s node (marker `forgejo-internal-pin`; deployed via `modules/create-template-vm/k8s-node-containerd-setup.sh` for new nodes, `scripts/setup-forgejo-containerd-mirror.sh` rollout for existing ones). The containerd hosts.toml mirror alone is insufficient — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry's Bearer auth realm is an absolute public URL fetched outside the mirror. Root cause of the 2026-06-10 tuya-bridge outage (`docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`).
-  - **in-cluster pods → forgejo**: CoreDNS `rewrite name exact forgejo.viktorbarzin.me traefik.traefik.svc.cluster.local` (2026-06-04, beads code-yh33).
-  - **devvm git → forgejo**: still exposed to the hairpin (manual `/etc/hosts` pin workaround when it flares).
+- **Not affected**: 10.0.x.x and K8s clients — these resolve non-proxied domains to the public IP and rely on pfSense NAT reflection, which is **intermittently broken** (observed i/o timeouts to `176.12.22.76:443` from k8s nodes and the devvm, 2026-06-04 → 2026-06-10). Hairpin-sensitive paths on this network route `*.viktorbarzin.me` to Technitium instead, via a systemd-resolved **routing domain** (`/etc/systemd/resolved.conf.d/viktorbarzin.conf`: `DNS=10.0.20.201`, `Domains=~viktorbarzin.me`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary) — no hardcoded service IPs on clients:
+  - **k8s nodes (kubelet image pulls of `forgejo.viktorbarzin.me`)**: routing-domain drop-in on all 7 nodes (2026-06-10, replacing a same-day `/etc/hosts` pin; deployed via `modules/create-template-vm/cloud_init.yaml` for new nodes, `scripts/setup-forgejo-containerd-mirror.sh` rollout for existing ones). The containerd hosts.toml mirror alone is insufficient — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry's Bearer auth realm is an absolute public URL fetched outside the mirror. Caution: public servers must NOT sit in the nodes' global resolved `DNS=` set — they merge with and race the routing domain (the old node5/6 `global-dns.conf` did exactly this; now `FallbackDNS=` only). Root cause analysis: `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`.
+  - **devvm**: same `viktorbarzin.conf` drop-in (predates the node rollout; provisioned by `setup-devvm.sh`).
+  - **in-cluster pods → forgejo**: CoreDNS `rewrite name exact forgejo.viktorbarzin.me traefik.traefik.svc.cluster.local` (2026-06-04, beads code-yh33) — pods bypass node resolved entirely.
+  - **Trade-off**: `*.viktorbarzin.me` resolution from nodes/devvm now depends on in-cluster Technitium (3 replicas). During a full cluster outage these names SERVFAIL — acceptable, the services behind them are down anyway; bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds.

 Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).

--- a/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
+++ b/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
@ -59,28 +59,55 @@ CoreDNS forgejo rewrite (2026-06-04) covers pods only, not kubelet.

 ## Fix

-`/etc/hosts` pin on every k8s node (hot, no drain, no containerd restart):
+**Initial mitigation (same morning):** `/etc/hosts` pin
+`10.0.20.203 forgejo.viktorbarzin.me` on every node — restored service
+immediately (resolve + token + blob legs all internal with correct SNI).
+
+**Superseded same day (Viktor: "no hardcoded IPs in nodes") by a DNS-based
+fix.** Discovery: Technitium's split-horizon zone *already* resolves
+`forgejo.viktorbarzin.me → CNAME viktorbarzin.me → A <live Traefik IP>` —
+the `technitium-ingress-dns-sync` CronJob auto-CNAMEs every ingress host
+hourly, the apex A record tracks the live Traefik LB IP, and the
+`viktorbarzin-apex-probe` canary alerts on drift. The nodes simply never
+queried Technitium (resolv chain: pfSense + public AdGuard fallback). The
+devvm already solved this with a systemd-resolved **routing domain**
+drop-in; the same was rolled to all 7 nodes:

 ```
-10.0.20.203 forgejo.viktorbarzin.me # forgejo-internal-pin (managed: setup-forgejo-containerd-mirror.sh)
+# /etc/systemd/resolved.conf.d/viktorbarzin.conf
+[Resolve]
+DNS=10.0.20.201
+Domains=~viktorbarzin.me
 ```

-Go's resolver (containerd) consults `/etc/hosts` first, so resolve + token
-+ blob legs all go to internal Traefik with correct SNI and a valid
-wildcard cert (no `skip_verify` needed on this path). Applied live to all
-7 nodes; persisted in `modules/create-template-vm/k8s-node-containerd-setup.sh`
-(new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
-rollout). hosts.toml mirror left in place (harmless, uniform config).
+The `/etc/hosts` pins were then removed (verified `getent` still returns
+the Traefik IP via DNS, and `crictl pull` succeeds). On node5/6 the
+cloud-init `global-dns.conf` (`DNS=8.8.8.8 1.1.1.1`) was demoted to
+`FallbackDNS=` only — public servers in the global set merge with and
+race the routing domain. That file's original justification ("Technitium
+NXDOMAINs forgejo.viktorbarzin.me") was obsolete: the ingress-dns-sync
+has since added forgejo to the zone — a stale comment that actively
+pointed new nodes at the hairpin.

-**Renumber hazard:** the pin hardcodes Traefik's LB IP, same as the
-hosts.toml mirror and the 5 literals broken by the 2026-05-30 `.200→.203`
-move. Any future Traefik LB renumber must update both (grep nodes for
-`forgejo-internal-pin`).
+Persisted in `modules/create-template-vm/cloud_init.yaml` (new nodes; DNS
+drop-ins) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
+rollout). hosts.toml mirror left in place but documented as vestigial.
+
+**Renumber hazard: resolved.** A future Traefik LB renumber propagates
+via the apex A record automatically (drift probe alerts if it doesn't);
+only the vestigial hosts.toml literal goes stale. **New trade-off:**
+`*.viktorbarzin.me` resolution from nodes now depends on in-cluster
+Technitium (3 replicas); in a full cluster outage these names SERVFAIL —
+acceptable, the services are down anyway, and bootstrap images pull via
+the IP-addressed `10.0.20.10` mirrors.

 ## Verification

- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes;
-  `curl https://forgejo.viktorbarzin.me/v2/` → 401 (internal route, valid TLS).
+- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes
+  **with no `/etc/hosts` entry** (pure DNS via the routing domain);
+  `resolvectl status` shows `~viktorbarzin.me` routed to `10.0.20.201`;
+  general resolution (`getent hosts google.com`) intact on every node;
+  `crictl pull` of the tuya_bridge image succeeds via the DNS path.
 - tuya-bridge pod Running; `/health` `ok=true`; 27/27 devices
  `success=true`; 7/7 `*_tuya_cloud_up` gauges = 1; no tuya-related alerts.

@ -90,10 +117,16 @@ move. Any future Traefik LB renumber must update both (grep nodes for
  latency bomb with the blast delayed until the cache misses.
 - Registry token realms are absolute URLs: any "redirect the registry"
  scheme must also redirect the *name*, not just the endpoint.
- The remaining hairpin-exposed leg is **devvm git** (manual `/etc/hosts`
-  workaround documented in memory); a durable LAN-wide fix would need
-  pfSense Unbound host overrides (live network device — deliberate,
-  separate change).
+- Before inventing a redirect mechanism, check what the DNS authority
+  already serves: the Technitium split-horizon zone had the correct,
+  auto-maintained answer all along — the clients just weren't asking it.
+- Stale config comments are load-bearing: the obsolete "Technitium
+  NXDOMAINs forgejo" comment in cloud-init steered new nodes onto public
+  DNS, recreating the hairpin exposure on every node added after it.
+- All `10.0.x` legs are now DNS-routed (nodes + devvm via routing domain,
+  pods via CoreDNS rewrite). pfSense Unbound host overrides remain an
+  option for other LAN segments if a non-Technitium client ever needs
+  internal answers (live network device — deliberate, separate change).

 ## Related

--- a/docs/runbooks/forgejo-registry-setup.md
+++ b/docs/runbooks/forgejo-registry-setup.md
@ -119,9 +119,9 @@ cd infra/stacks/kyverno && scripts/tg apply
 cd infra/stacks/monitoring && scripts/tg apply
 cd infra/stacks/forgejo && scripts/tg apply

-# Containerd hosts.toml + /etc/hosts pin on each existing k8s node — VM
-# cloud-init only fires on first boot. The /etc/hosts pin
-# (10.0.20.203 forgejo.viktorbarzin.me) is what makes pulls hairpin-proof:
+# Resolved routing domain (+ vestigial containerd hosts.toml) on each
+# existing k8s node — VM cloud-init only fires on first boot. The routing
+# domain (~viktorbarzin.me -> Technitium) is what makes pulls hairpin-proof:
 # the hosts.toml mirror alone falls back to public DNS (Traefik 404s its
 # bare-IP requests, and the registry auth realm is an absolute public URL).
 infra/scripts/setup-forgejo-containerd-mirror.sh
@ -138,9 +138,11 @@ docker pull alpine:3.20
 docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
 docker push forgejo.viktorbarzin.me/viktor/smoketest:1

-# Per-node pull path: pin present + name resolves internally + pull works.
-ssh wizard@<node> 'grep forgejo-internal-pin /etc/hosts && getent hosts forgejo.viktorbarzin.me'
-# Expect: 10.0.20.203  forgejo.viktorbarzin.me
+# Per-node pull path: routing domain active + name resolves to the live
+# Traefik LB (via Technitium split-horizon zone) + pull works.
+ssh wizard@<node> 'resolvectl status | grep -A2 "~viktorbarzin.me"; getent hosts forgejo.viktorbarzin.me'
+# Expect: DNS Domain ~viktorbarzin.me on server 10.0.20.201, and
+#         getent -> the current Traefik LB IP (10.0.20.203 today)
 ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1

 # Confirm the cluster-wide Secret was synced into a fresh namespace.