forgejo pulls: pin registry name to internal Traefik in node /etc/hosts [ci skip]

tuya-bridge was down 7.5h (ImagePullBackOff on k8s-node3): fresh kubelet pulls of forgejo.viktorbarzin.me images depended on the intermittently broken public-IP hairpin. The containerd hosts.toml mirror cannot keep pulls internal on its own — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry Bearer realm is an absolute public URL fetched outside the mirror. Third incident of this class (buildkit 06-04, tripit/devvm 06-09). Fix: /etc/hosts pin 10.0.20.203 forgejo.viktorbarzin.me on every node — covers resolve + token + blob legs with correct SNI and valid cert. Applied live to all 7 nodes; persisted in the cloud-init bootstrap and the existing-node rollout script. Docs updated (registry bullet, dns.md hairpin scope + stale .200 literals, runbook) + post-mortem. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 07:15:24 +00:00 · 2026-06-10 07:15:24 +00:00 · b6976ce014
commit b6976ce014
parent eb8695743b
6 changed files with 150 additions and 11 deletions
--- a/docs/architecture/dns.md
+++ b/docs/architecture/dns.md
@ -258,16 +258,21 @@ The TP-Link AP (dumb AP on 192.168.1.x) does not support hairpin NAT. LAN client
 Technitium's **Split Horizon AddressTranslation** app post-processes DNS responses for 192.168.1.0/24 clients, translating the public IP to the internal Traefik LB IP:

 ```
-176.12.22.76 → 10.0.20.200
+176.12.22.76 → 10.0.20.203
 ```

+(Was `10.0.20.200` until Traefik's 2026-05-30 move to its dedicated `.203` LB IP.)
+
 **DNS Rebinding Protection** has `viktorbarzin.me` in `privateDomains` to allow the translated private IP without being stripped as a rebinding attack.

 ### Scope

 - **Affected**: Non-proxied domains (ha-sofia, immich, headscale, calibre, vaultwarden, etc.) for 192.168.1.x clients
 - **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed)
- **Not affected**: 10.0.x.x and K8s clients (reach public IP via pfSense outbound NAT normally)
+- **Not affected**: 10.0.x.x and K8s clients — these resolve non-proxied domains to the public IP and rely on pfSense NAT reflection, which is **intermittently broken** (observed i/o timeouts to `176.12.22.76:443` from k8s nodes and the devvm, 2026-06-04 → 2026-06-10). Hairpin-sensitive paths on this network get explicit per-leg fixes instead:
+  - **kubelet image pulls of `forgejo.viktorbarzin.me`**: `/etc/hosts` pin `10.0.20.203 forgejo.viktorbarzin.me` on every k8s node (marker `forgejo-internal-pin`; deployed via `modules/create-template-vm/k8s-node-containerd-setup.sh` for new nodes, `scripts/setup-forgejo-containerd-mirror.sh` rollout for existing ones). The containerd hosts.toml mirror alone is insufficient — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry's Bearer auth realm is an absolute public URL fetched outside the mirror. Root cause of the 2026-06-10 tuya-bridge outage (`docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`).
+  - **in-cluster pods → forgejo**: CoreDNS `rewrite name exact forgejo.viktorbarzin.me traefik.traefik.svc.cluster.local` (2026-06-04, beads code-yh33).
+  - **devvm git → forgejo**: still exposed to the hairpin (manual `/etc/hosts` pin workaround when it flares).

 Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).

@ -462,7 +467,7 @@ post-processing does NOT run for 192.168.1.x clients anymore. Non-proxied
 services break hairpin on LAN clients again. Options:

 1. **Switch service to proxied Cloudflare** (preferred) — set `dns_type = "proxied"` in the `ingress_factory` module call; DNS now resolves to Cloudflare edge, hairpin-independent.
-2. **Add a local-data override on pfSense Unbound** — under `Services → DNS Resolver → Host Overrides`, set `<service>.viktorbarzin.me → 10.0.20.200` (Traefik LB IP). This is equivalent to what Split Horizon did, applied at the resolver.
+2. **Add a local-data override on pfSense Unbound** — under `Services → DNS Resolver → Host Overrides`, set `<service>.viktorbarzin.me → 10.0.20.203` (Traefik LB IP). This is equivalent to what Split Horizon did, applied at the resolver.
 3. **Revert to prior NAT rdr + Technitium Split Horizon** — documented in `docs/runbooks/pfsense-unbound.md` rollback section.

 K8s-side Split Horizon is still configured and applies when `*.viktorbarzin.me` queries DO reach Technitium (e.g., from pods that query via CoreDNS → Technitium forwarding for `.viktorbarzin.me` via pfSense). Verify Technitium split-horizon app:
@ -470,7 +475,7 @@ K8s-side Split Horizon is still configured and applies when `*.viktorbarzin.me`
 1. Verify Split Horizon app is installed on all instances
 2. Check CronJob status: `kubectl get cronjob -n technitium technitium-split-horizon-sync`
 3. Run the job manually: `kubectl create job --from=cronjob/technitium-split-horizon-sync test-sh -n technitium`
-4. Test: `dig @10.0.20.201 immich.viktorbarzin.me` — should return 10.0.20.200 for 192.168.1.x source
+4. Test: `dig @10.0.20.201 immich.viktorbarzin.me` — should return 10.0.20.203 for 192.168.1.x source

 ### Zone Not Replicating to Secondary/Tertiary

--- a/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
+++ b/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
@ -0,0 +1,105 @@
+# 2026-06-10 — tuya-bridge down 7.5h: forgejo image pulls ride the public-IP hairpin
+
+## Impact
+
+- `tuya-bridge` (Flask/tinytuya bridge feeding HA-Sofia's ATS, fuse-main,
+  fuse-garage and 4 thermostat REST sensors) unavailable ~02:15–09:50 EEST.
+  HA REST sensors 503'd; the official-tuya integration devices were
+  unaffected (hybrid architecture limited the blast radius to the 3 power
+  devices' advanced telemetry + thermostats extras).
+- Third incident from the same root cause class:
+  Woodpecker buildkit pushes (2026-06-04, code-yh33), tripit
+  ImagePullBackOff on node2/node3 + devvm git timeouts (2026-06-09),
+  tuya-bridge (this one).
+
+## Timeline (EEST)
+
+- **02:15** — tuya-bridge pod rescheduled onto `k8s-node3` (its previous
+  node5/6-era home was rebuilt 14d ago; the forgejo-path image was never
+  cached on node3 — only stale `docker.io/*` copies). Kubelet must pull
+  `forgejo.viktorbarzin.me/viktor/tuya_bridge:3216c87a`.
+- **02:15→09:30** — 51 consecutive pull failures:
+  `dial tcp 176.12.22.76:443: i/o timeout` → ImagePullBackOff. HA shows
+  503s (emo observed at 02:20).
+- **09:40** — investigation: forgejo healthy via internal Traefik
+  (`10.0.20.203`), manifest exists; node3's hosts.toml mirror present and
+  correct; bare-IP request to the mirror returns **404 from Traefik**;
+  registry auth realm is the **absolute** public URL.
+- **09:48** — `/etc/hosts` pin `10.0.20.203 forgejo.viktorbarzin.me` added
+  on node3; `crictl pull` succeeds immediately; pod replaced → Running;
+  `/health` ok; all 27 device `getstatus()` calls succeed; all 7
+  `*_tuya_cloud_up` Prometheus gauges = 1.
+- **10:05** — pin rolled to all 7 nodes; provisioning scripts + docs updated.
+
+## Root cause
+
+Fresh kubelet pulls of `forgejo.viktorbarzin.me` images depend on pfSense
+NAT reflection of the public IP `176.12.22.76`, which is intermittently
+broken from the `10.0.20.0/24` network. The containerd
+`certs.d/.../hosts.toml` mirror that was *believed* to keep pulls internal
+cannot do so, for two independent reasons:
+
+1. **Traefik routes by Host/SNI.** The mirror entry
+   `[host."https://10.0.20.203"]` makes containerd dial the bare IP (no
+   SNI, `Host: 10.0.20.203`) — no Traefik router matches → **404** → con-
+   tainerd treats the mirror as a miss and falls back to
+   `server = "https://forgejo.viktorbarzin.me"` → public DNS → hairpin.
+2. **The Bearer auth realm is absolute.** `/v2/` challenges with
+   `realm="https://forgejo.viktorbarzin.me/v2/token"`; containerd fetches
+   that URL verbatim — this leg never goes through the mirror at all.
+
+So every fresh pull silently depended on hairpin luck. Cached images masked
+the problem; it only fired when a pod landed on a node without the image
+(node rebuilds, new nodes, evictions, new tags).
+
+Why DNS-side fixes don't reach this path: nodes resolve via systemd-resolved
+→ pfSense (10.0.20.1) + public fallback (94.140.14.14), so Technitium
+split-horizon (scoped to `192.168.1.0/24` clients) never applies; the
+CoreDNS forgejo rewrite (2026-06-04) covers pods only, not kubelet.
+
+## Fix
+
+`/etc/hosts` pin on every k8s node (hot, no drain, no containerd restart):
+
+```
+10.0.20.203 forgejo.viktorbarzin.me # forgejo-internal-pin (managed: setup-forgejo-containerd-mirror.sh)
+```
+
+Go's resolver (containerd) consults `/etc/hosts` first, so resolve + token
+ blob legs all go to internal Traefik with correct SNI and a valid
+wildcard cert (no `skip_verify` needed on this path). Applied live to all
+7 nodes; persisted in `modules/create-template-vm/k8s-node-containerd-setup.sh`
+(new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
+rollout). hosts.toml mirror left in place (harmless, uniform config).
+
+**Renumber hazard:** the pin hardcodes Traefik's LB IP, same as the
+hosts.toml mirror and the 5 literals broken by the 2026-05-30 `.200→.203`
+move. Any future Traefik LB renumber must update both (grep nodes for
+`forgejo-internal-pin`).
+
+## Verification
+
+- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes;
+  `curl https://forgejo.viktorbarzin.me/v2/` → 401 (internal route, valid TLS).
+- tuya-bridge pod Running; `/health` `ok=true`; 27/27 devices
+  `success=true`; 7/7 `*_tuya_cloud_up` gauges = 1; no tuya-related alerts.
+
+## Lessons
+
+- A mirror that *can* fall back to a broken path is not a fix — it's a
+  latency bomb with the blast delayed until the cache misses.
+- Registry token realms are absolute URLs: any "redirect the registry"
+  scheme must also redirect the *name*, not just the endpoint.
+- The remaining hairpin-exposed leg is **devvm git** (manual `/etc/hosts`
+  workaround documented in memory); a durable LAN-wide fix would need
+  pfSense Unbound host overrides (live network device — deliberate,
+  separate change).
+
+## Related
+
+- Beads `code-2or8` (Tuya Cloud subscription) — verified resolved during
+  this incident: subscription is active again, all gauges green; closed.
+- 2026-06-09 tripit ImagePullBackOff — same cause, self-recovered when the
+  hairpin flapped back; the two `ScrapeTargetDown[tripit]` alerts firing
+  during this investigation were scrapes of *Completed* cronjob pod
+  endpoints (separate monitoring wart, not this outage).
--- a/docs/runbooks/forgejo-registry-setup.md
+++ b/docs/runbooks/forgejo-registry-setup.md
@ -119,8 +119,11 @@ cd infra/stacks/kyverno && scripts/tg apply
 cd infra/stacks/monitoring && scripts/tg apply
 cd infra/stacks/forgejo && scripts/tg apply

-# Containerd hosts.toml on each existing k8s node — VM cloud-init
-# only fires on first boot.
+# Containerd hosts.toml + /etc/hosts pin on each existing k8s node — VM
+# cloud-init only fires on first boot. The /etc/hosts pin
+# (10.0.20.203 forgejo.viktorbarzin.me) is what makes pulls hairpin-proof:
+# the hosts.toml mirror alone falls back to public DNS (Traefik 404s its
+# bare-IP requests, and the registry auth realm is an absolute public URL).
 infra/scripts/setup-forgejo-containerd-mirror.sh
 ```

@ -135,7 +138,9 @@ docker pull alpine:3.20
 docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
 docker push forgejo.viktorbarzin.me/viktor/smoketest:1

-# Pull from a k8s node.
+# Per-node pull path: pin present + name resolves internally + pull works.
+ssh wizard@<node> 'grep forgejo-internal-pin /etc/hosts && getent hosts forgejo.viktorbarzin.me'
+# Expect: 10.0.20.203  forgejo.viktorbarzin.me
 ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1

 # Confirm the cluster-wide Secret was synced into a fresh namespace.