forgejo pulls: route *.viktorbarzin.me to Technitium, drop /etc/hosts pins [ci skip]

Supersedes this morning's per-node /etc/hosts pin (no hardcoded service IPs on nodes, per Viktor). Technitium's split-horizon zone already resolves forgejo.viktorbarzin.me -> CNAME apex -> live Traefik LB IP (ingress-dns-sync auto-CNAMEs every ingress host; apex drift probe alerts) -- the nodes just never queried it. Rolled the devvm's systemd-resolved routing-domain pattern (~viktorbarzin.me -> 10.0.20.201) to all 7 nodes, removed the pins, verified getent + crictl pull via pure DNS. Also demoted node5/6's cloud-init global-dns.conf (DNS=8.8.8.8 1.1.1.1) to FallbackDNS-only: public servers in the global set race the routing domain. Its justification ("Technitium NXDOMAINs forgejo") was obsolete -- exactly the stale comment that pointed new nodes at the hairpin. hosts.toml mirror kept but documented as vestigial (Traefik 404s bare-IP requests; registry auth realm is an absolute URL). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 07:56:31 +00:00 · 2026-06-10 07:56:31 +00:00 · 1ee1bf0817
commit 1ee1bf0817
parent b6976ce014
7 changed files with 135 additions and 66 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -38,7 +38,7 @@ Violations cause state drift, which causes future applies to break or silently r
  - **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. Smoke-test target: `echo.viktorbarzin.me` (auth=public, header-reflecting backend).
 - **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<backend>.<ns>.svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct, blog has `/net-diag.sh` direct), declare a second `ingress_factory` with `ingress_path = ["/<path>"]` pointing at the bare backend service. Active on: blog (except `/net-diag.sh`), www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering.
 - **Docker images**: Always build for `linux/amd64`. SHA-tag rule is being phased out — see `docs/plans/2026-05-16-auto-upgrade-apps-{design,plan}.md`. New model: CI pushes `:latest` (optionally also `:<8-char-sha>` for traceability), Keel polls and triggers rollouts. Cache-staleness concern from the old rule is resolved at the nginx layer (URL-split — manifests pass through, blobs cached). Until Phase 1 of the migration completes (per the plan), follow the SHA-tag rule for new services to match existing pattern.
- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. **Kubelet pulls** are kept off the hairpin by an `/etc/hosts` pin `10.0.20.203 forgejo.viktorbarzin.me` on every node (marker comment `forgejo-internal-pin`) — the older containerd `hosts.toml` mirror (`[host."https://10.0.20.203"]`, `skip_verify = true`) still exists but is NOT sufficient on its own: Traefik routes by Host/SNI and 404s the mirror's bare-IP requests, and the registry's Bearer auth realm is the absolute `https://forgejo.viktorbarzin.me/v2/token` URL which containerd fetches outside the mirror — so without the pin every fresh pull degrades to public DNS → hairpin → intermittent `dial tcp 176.12.22.76:443: i/o timeout` ImagePullBackOff (tuya-bridge 7.5h outage 2026-06-10, tripit 2026-06-09; see `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`). In-cluster pods (notably Woodpecker buildkit build pods pushing images) resolve `forgejo.viktorbarzin.me` via a CoreDNS `rewrite name exact ... traefik.traefik.svc.cluster.local` (Corefile in `stacks/technitium/modules/technitium/main.tf`), since they use neither the node pin nor the containerd mirror; without it, buildkit pushes intermittently timed out on the public-IP hairpin (added 2026-06-04, beads code-yh33). **Was `.200` until 2026-06-01** — Traefik's 2026-05-30 move to its dedicated `.203` left the mirror pointing at the now-dead `.200:443`, silently breaking every *fresh* forgejo pull (cached images kept running, so it stayed hidden until a new image tag was pulled); if the Traefik LB IP ever moves again, update BOTH the hosts.toml mirror and the `/etc/hosts` pin. Pin + mirror source lives in `modules/create-template-vm/k8s-node-containerd-setup.sh` (new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing nodes). Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest` + any buildkit `*cache*` tag (so `--cache-from`/`--cache-to` refs survive retention — added 2026-06-09); **went live (DRY_RUN=false) 2026-06-09** after verifying 0 running images on the delete set — the registry PVC is at its 50Gi autoresize ceiling on the HDD (we did NOT move it to SSD, see beads code-oflt), so live retention is what keeps it from filling. Integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
+- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. **Kubelet pulls** are kept off the hairpin by a systemd-resolved routing domain on every node (`/etc/systemd/resolved.conf.d/viktorbarzin.conf`: `DNS=10.0.20.201` + `Domains=~viktorbarzin.me`) — `*.viktorbarzin.me` lookups go to Technitium, whose split-horizon zone CNAMEs every ingress host (auto-synced hourly by `technitium-ingress-dns-sync`) to the zone apex whose A record tracks the **live** Traefik LB IP (canary: `viktorbarzin-apex-probe`, alerts ViktorBarzinApexDrift). No hardcoded service IPs on nodes; the devvm uses the same drop-in. The containerd `hosts.toml` mirror (`[host."https://10.0.20.203"]`, `skip_verify = true`) still exists but is **vestigial** — it can NOT keep pulls internal on its own: Traefik routes by Host/SNI and 404s the mirror's bare-IP requests, and the registry's Bearer auth realm is the absolute `https://forgejo.viktorbarzin.me/v2/token` URL fetched outside the mirror — without internal DNS every fresh pull degrades to public DNS → hairpin → intermittent `dial tcp 176.12.22.76:443: i/o timeout` ImagePullBackOff (tuya-bridge 7.5h outage 2026-06-10, tripit 2026-06-09; see `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`). **Do not put public servers in the nodes' global resolved `DNS=` set** — they merge with the routing-domain set and race it (this was the node5/6 cloud-init `global-dns.conf` bug; now demoted to `FallbackDNS=`). In-cluster pods (notably Woodpecker buildkit build pods pushing images) resolve `forgejo.viktorbarzin.me` via a CoreDNS `rewrite name exact ... traefik.traefik.svc.cluster.local` (Corefile in `stacks/technitium/modules/technitium/main.tf`), since they use neither the node routing domain nor the containerd mirror; without it, buildkit pushes intermittently timed out on the public-IP hairpin (added 2026-06-04, beads code-yh33). **Was `.200` until 2026-06-01** — Traefik's 2026-05-30 move to its dedicated `.203` left the mirror pointing at the now-dead `.200:443`, silently breaking every *fresh* forgejo pull; a future LB renumber is now handled by DNS (apex record + drift probe) — only the vestigial hosts.toml literal would go stale. DNS drop-in + mirror source lives in `modules/create-template-vm/{cloud_init.yaml,k8s-node-containerd-setup.sh}` (new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing nodes). Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest` + any buildkit `*cache*` tag (so `--cache-from`/`--cache-to` refs survive retention — added 2026-06-09); **went live (DRY_RUN=false) 2026-06-09** after verifying 0 running images on the delete set — the registry PVC is at its 50Gi autoresize ceiling on the HDD (we did NOT move it to SSD, see beads code-oflt), so live retention is what keeps it from filling. Integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
 - **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts.
 - **Node memory changes**: When changing VM memory on any k8s node, update kubelet `systemReserved`, `kubeReserved`, and eviction thresholds accordingly. Config: `/var/lib/kubelet/config.yaml`. Template: `stacks/infra/main.tf`. Current values: systemReserved=512Mi, kubeReserved=512Mi, evictionHard=500Mi, evictionSoft=1Gi.
 - **Node OS disk tuning** (in `stacks/infra/main.tf`): kubelet `imageGCHighThresholdPercent=70` (was 85), `imageGCLowThresholdPercent=60` (was 80), ext4 `commit=60` in fstab (was default 5s), journald `SystemMaxUse=200M` + `MaxRetentionSec=3day`.
--- a/docs/architecture/dns.md
+++ b/docs/architecture/dns.md
@ -269,10 +269,11 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons

 - **Affected**: Non-proxied domains (ha-sofia, immich, headscale, calibre, vaultwarden, etc.) for 192.168.1.x clients
 - **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed)
- **Not affected**: 10.0.x.x and K8s clients — these resolve non-proxied domains to the public IP and rely on pfSense NAT reflection, which is **intermittently broken** (observed i/o timeouts to `176.12.22.76:443` from k8s nodes and the devvm, 2026-06-04 → 2026-06-10). Hairpin-sensitive paths on this network get explicit per-leg fixes instead:
-  - **kubelet image pulls of `forgejo.viktorbarzin.me`**: `/etc/hosts` pin `10.0.20.203 forgejo.viktorbarzin.me` on every k8s node (marker `forgejo-internal-pin`; deployed via `modules/create-template-vm/k8s-node-containerd-setup.sh` for new nodes, `scripts/setup-forgejo-containerd-mirror.sh` rollout for existing ones). The containerd hosts.toml mirror alone is insufficient — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry's Bearer auth realm is an absolute public URL fetched outside the mirror. Root cause of the 2026-06-10 tuya-bridge outage (`docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`).
-  - **in-cluster pods → forgejo**: CoreDNS `rewrite name exact forgejo.viktorbarzin.me traefik.traefik.svc.cluster.local` (2026-06-04, beads code-yh33).
-  - **devvm git → forgejo**: still exposed to the hairpin (manual `/etc/hosts` pin workaround when it flares).
+- **Not affected**: 10.0.x.x and K8s clients — these resolve non-proxied domains to the public IP and rely on pfSense NAT reflection, which is **intermittently broken** (observed i/o timeouts to `176.12.22.76:443` from k8s nodes and the devvm, 2026-06-04 → 2026-06-10). Hairpin-sensitive paths on this network route `*.viktorbarzin.me` to Technitium instead, via a systemd-resolved **routing domain** (`/etc/systemd/resolved.conf.d/viktorbarzin.conf`: `DNS=10.0.20.201`, `Domains=~viktorbarzin.me`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary) — no hardcoded service IPs on clients:
+  - **k8s nodes (kubelet image pulls of `forgejo.viktorbarzin.me`)**: routing-domain drop-in on all 7 nodes (2026-06-10, replacing a same-day `/etc/hosts` pin; deployed via `modules/create-template-vm/cloud_init.yaml` for new nodes, `scripts/setup-forgejo-containerd-mirror.sh` rollout for existing ones). The containerd hosts.toml mirror alone is insufficient — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry's Bearer auth realm is an absolute public URL fetched outside the mirror. Caution: public servers must NOT sit in the nodes' global resolved `DNS=` set — they merge with and race the routing domain (the old node5/6 `global-dns.conf` did exactly this; now `FallbackDNS=` only). Root cause analysis: `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`.
+  - **devvm**: same `viktorbarzin.conf` drop-in (predates the node rollout; provisioned by `setup-devvm.sh`).
+  - **in-cluster pods → forgejo**: CoreDNS `rewrite name exact forgejo.viktorbarzin.me traefik.traefik.svc.cluster.local` (2026-06-04, beads code-yh33) — pods bypass node resolved entirely.
+  - **Trade-off**: `*.viktorbarzin.me` resolution from nodes/devvm now depends on in-cluster Technitium (3 replicas). During a full cluster outage these names SERVFAIL — acceptable, the services behind them are down anyway; bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds.

 Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).

--- a/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
+++ b/docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md
@ -59,28 +59,55 @@ CoreDNS forgejo rewrite (2026-06-04) covers pods only, not kubelet.

 ## Fix

-`/etc/hosts` pin on every k8s node (hot, no drain, no containerd restart):
+**Initial mitigation (same morning):** `/etc/hosts` pin
+`10.0.20.203 forgejo.viktorbarzin.me` on every node — restored service
+immediately (resolve + token + blob legs all internal with correct SNI).
+
+**Superseded same day (Viktor: "no hardcoded IPs in nodes") by a DNS-based
+fix.** Discovery: Technitium's split-horizon zone *already* resolves
+`forgejo.viktorbarzin.me → CNAME viktorbarzin.me → A <live Traefik IP>` —
+the `technitium-ingress-dns-sync` CronJob auto-CNAMEs every ingress host
+hourly, the apex A record tracks the live Traefik LB IP, and the
+`viktorbarzin-apex-probe` canary alerts on drift. The nodes simply never
+queried Technitium (resolv chain: pfSense + public AdGuard fallback). The
+devvm already solved this with a systemd-resolved **routing domain**
+drop-in; the same was rolled to all 7 nodes:

 ```
-10.0.20.203 forgejo.viktorbarzin.me # forgejo-internal-pin (managed: setup-forgejo-containerd-mirror.sh)
+# /etc/systemd/resolved.conf.d/viktorbarzin.conf
+[Resolve]
+DNS=10.0.20.201
+Domains=~viktorbarzin.me
 ```

-Go's resolver (containerd) consults `/etc/hosts` first, so resolve + token
-+ blob legs all go to internal Traefik with correct SNI and a valid
-wildcard cert (no `skip_verify` needed on this path). Applied live to all
-7 nodes; persisted in `modules/create-template-vm/k8s-node-containerd-setup.sh`
-(new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
-rollout). hosts.toml mirror left in place (harmless, uniform config).
+The `/etc/hosts` pins were then removed (verified `getent` still returns
+the Traefik IP via DNS, and `crictl pull` succeeds). On node5/6 the
+cloud-init `global-dns.conf` (`DNS=8.8.8.8 1.1.1.1`) was demoted to
+`FallbackDNS=` only — public servers in the global set merge with and
+race the routing domain. That file's original justification ("Technitium
+NXDOMAINs forgejo.viktorbarzin.me") was obsolete: the ingress-dns-sync
+has since added forgejo to the zone — a stale comment that actively
+pointed new nodes at the hairpin.

-**Renumber hazard:** the pin hardcodes Traefik's LB IP, same as the
-hosts.toml mirror and the 5 literals broken by the 2026-05-30 `.200→.203`
-move. Any future Traefik LB renumber must update both (grep nodes for
-`forgejo-internal-pin`).
+Persisted in `modules/create-template-vm/cloud_init.yaml` (new nodes; DNS
+drop-ins) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
+rollout). hosts.toml mirror left in place but documented as vestigial.
+
+**Renumber hazard: resolved.** A future Traefik LB renumber propagates
+via the apex A record automatically (drift probe alerts if it doesn't);
+only the vestigial hosts.toml literal goes stale. **New trade-off:**
+`*.viktorbarzin.me` resolution from nodes now depends on in-cluster
+Technitium (3 replicas); in a full cluster outage these names SERVFAIL —
+acceptable, the services are down anyway, and bootstrap images pull via
+the IP-addressed `10.0.20.10` mirrors.

 ## Verification

- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes;
-  `curl https://forgejo.viktorbarzin.me/v2/` → 401 (internal route, valid TLS).
+- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes
+  **with no `/etc/hosts` entry** (pure DNS via the routing domain);
+  `resolvectl status` shows `~viktorbarzin.me` routed to `10.0.20.201`;
+  general resolution (`getent hosts google.com`) intact on every node;
+  `crictl pull` of the tuya_bridge image succeeds via the DNS path.
 - tuya-bridge pod Running; `/health` `ok=true`; 27/27 devices
  `success=true`; 7/7 `*_tuya_cloud_up` gauges = 1; no tuya-related alerts.

@ -90,10 +117,16 @@ move. Any future Traefik LB renumber must update both (grep nodes for
  latency bomb with the blast delayed until the cache misses.
 - Registry token realms are absolute URLs: any "redirect the registry"
  scheme must also redirect the *name*, not just the endpoint.
- The remaining hairpin-exposed leg is **devvm git** (manual `/etc/hosts`
-  workaround documented in memory); a durable LAN-wide fix would need
-  pfSense Unbound host overrides (live network device — deliberate,
-  separate change).
+- Before inventing a redirect mechanism, check what the DNS authority
+  already serves: the Technitium split-horizon zone had the correct,
+  auto-maintained answer all along — the clients just weren't asking it.
+- Stale config comments are load-bearing: the obsolete "Technitium
+  NXDOMAINs forgejo" comment in cloud-init steered new nodes onto public
+  DNS, recreating the hairpin exposure on every node added after it.
+- All `10.0.x` legs are now DNS-routed (nodes + devvm via routing domain,
+  pods via CoreDNS rewrite). pfSense Unbound host overrides remain an
+  option for other LAN segments if a non-Technitium client ever needs
+  internal answers (live network device — deliberate, separate change).

 ## Related

--- a/docs/runbooks/forgejo-registry-setup.md
+++ b/docs/runbooks/forgejo-registry-setup.md
@ -119,9 +119,9 @@ cd infra/stacks/kyverno && scripts/tg apply
 cd infra/stacks/monitoring && scripts/tg apply
 cd infra/stacks/forgejo && scripts/tg apply

-# Containerd hosts.toml + /etc/hosts pin on each existing k8s node — VM
-# cloud-init only fires on first boot. The /etc/hosts pin
-# (10.0.20.203 forgejo.viktorbarzin.me) is what makes pulls hairpin-proof:
+# Resolved routing domain (+ vestigial containerd hosts.toml) on each
+# existing k8s node — VM cloud-init only fires on first boot. The routing
+# domain (~viktorbarzin.me -> Technitium) is what makes pulls hairpin-proof:
 # the hosts.toml mirror alone falls back to public DNS (Traefik 404s its
 # bare-IP requests, and the registry auth realm is an absolute public URL).
 infra/scripts/setup-forgejo-containerd-mirror.sh
@ -138,9 +138,11 @@ docker pull alpine:3.20
 docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
 docker push forgejo.viktorbarzin.me/viktor/smoketest:1

-# Per-node pull path: pin present + name resolves internally + pull works.
-ssh wizard@<node> 'grep forgejo-internal-pin /etc/hosts && getent hosts forgejo.viktorbarzin.me'
-# Expect: 10.0.20.203  forgejo.viktorbarzin.me
+# Per-node pull path: routing domain active + name resolves to the live
+# Traefik LB (via Technitium split-horizon zone) + pull works.
+ssh wizard@<node> 'resolvectl status | grep -A2 "~viktorbarzin.me"; getent hosts forgejo.viktorbarzin.me'
+# Expect: DNS Domain ~viktorbarzin.me on server 10.0.20.201, and
+#         getent -> the current Traefik LB IP (10.0.20.203 today)
 ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1

 # Confirm the cluster-wide Secret was synced into a fresh namespace.
--- a/modules/create-template-vm/cloud_init.yaml
+++ b/modules/create-template-vm/cloud_init.yaml
@ -90,18 +90,35 @@ runcmd:
  - sed -i 's/#Compress=yes/Compress=yes/' /etc/systemd/journald.conf
  - systemctl restart systemd-journald
  %{if is_k8s_template}
-  # systemd-resolved global DNS fallback. Without this, only the
-  # link-level DNS from Proxmox's `qm set --nameserver` (Technitium,
-  # 10.0.20.201) is consulted — and Technitium returns NXDOMAIN for
-  # forgejo.viktorbarzin.me, so kubelet image pulls from the Forgejo
-  # registry break. Public DNS upstream + Technitium fallback matches
-  # the pre-existing manual setup on k8s-node1..4.
+  # systemd-resolved split DNS, two drop-ins (2026-06-10, replaces the
+  # public-first global DNS that was here before):
+  #
+  # viktorbarzin.conf — routing domain ~viktorbarzin.me -> Technitium
+  # (10.0.20.201). The technitium-ingress-dns-sync CronJob keeps a CNAME
+  # for every ingress host (incl. forgejo.viktorbarzin.me) chained to the
+  # zone apex, whose A record auto-tracks the live Traefik LB IP (canary:
+  # viktorbarzin-apex-probe). Keeps kubelet pulls of forgejo images off
+  # the flaky public NAT-hairpin with no hardcoded service IPs. (The old
+  # comment claiming Technitium NXDOMAINs forgejo.viktorbarzin.me is
+  # obsolete — ingress-dns-sync added it to the split-horizon zone. See
+  # docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md.)
+  #
+  # global-dns.conf — emergency fallback only. Public servers must NOT
+  # sit in the global DNS= set: they merge with viktorbarzin.conf's set
+  # and race the ~viktorbarzin.me routing domain, intermittently
+  # returning the public IP again. General resolution uses the
+  # link-level DNS from Proxmox's `qm set --nameserver`.
  - mkdir -p /etc/systemd/resolved.conf.d
+  - |
+    cat > /etc/systemd/resolved.conf.d/viktorbarzin.conf <<'EOF'
+    [Resolve]
+    DNS=10.0.20.201
+    Domains=~viktorbarzin.me
+    EOF
  - |
    cat > /etc/systemd/resolved.conf.d/global-dns.conf <<'EOF'
    [Resolve]
-    DNS=8.8.8.8 1.1.1.1
-    FallbackDNS=10.0.20.201
+    FallbackDNS=8.8.8.8 1.1.1.1
    EOF
  - systemctl restart systemd-resolved
  # Re-enabled 2026-05-10: unattended-upgrades is back on, but with a tight
--- a/modules/create-template-vm/k8s-node-containerd-setup.sh
+++ b/modules/create-template-vm/k8s-node-containerd-setup.sh
@ -49,10 +49,16 @@ server = "https://ghcr.io"
  capabilities = ["pull", "resolve"]
 GHCR

-# Forgejo OCI registry: prefer in-cluster Traefik LB (10.0.20.203) to
-# avoid hairpin NAT. Traefik serves the *.viktorbarzin.me wildcard so
-# SNI verification succeeds. If the mirror is unreachable, fall back to
-# public DNS resolution (needs the global DNS fallback set up below).
+# Forgejo OCI registry. NOTE: this hosts.toml mirror is VESTIGIAL — it
+# cannot keep pulls off the public hairpin on its own (Traefik routes by
+# Host/SNI and 404s the mirror's bare-IP requests, and the registry's
+# Bearer auth realm is the absolute https://forgejo.viktorbarzin.me/v2/token
+# URL fetched outside the mirror). What actually keeps forgejo pulls
+# internal is the systemd-resolved routing domain ~viktorbarzin.me ->
+# Technitium (viktorbarzin.conf, written by cloud_init.yaml), which
+# resolves forgejo to the live Traefik LB via the split-horizon zone.
+# Kept for config uniformity; harmless. See
+# docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md.
 mkdir -p /etc/containerd/certs.d/forgejo.viktorbarzin.me
 cat > /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml <<'FORGEJO'
 server = "https://forgejo.viktorbarzin.me"
@ -62,20 +68,6 @@ server = "https://forgejo.viktorbarzin.me"
  skip_verify = true
 FORGEJO

-# /etc/hosts pin — REQUIRED in addition to the hosts.toml mirror. The
-# mirror alone cannot make forgejo pulls hairpin-proof for two reasons
-# (2026-06-10 tuya-bridge outage, third incident of this class):
-#   a) Traefik routes by Host/SNI and 404s the mirror's bare-IP requests,
-#      so containerd always falls back to `server` (public DNS → hairpin).
-#   b) The registry's Bearer auth realm is the absolute URL
-#      https://forgejo.viktorbarzin.me/v2/token, which containerd fetches
-#      verbatim — that leg never goes through the mirror at all.
-# Pinning the name to Traefik's LB fixes resolve + token + blob legs with
-# correct SNI and a valid cert. If Traefik's LB IP ever changes, update
-# this pin together with the hosts.toml IP above.
-grep -q forgejo-internal-pin /etc/hosts || \
-  echo '10.0.20.203 forgejo.viktorbarzin.me # forgejo-internal-pin (managed: setup-forgejo-containerd-mirror.sh)' >> /etc/hosts
-
 # quay.io + registry.k8s.io: include mirror configs that match node4's
 # layout (no real pull-through cache today, server line is the direct
 # upstream). Keeping these present makes the per-node config uniform and
--- a/scripts/setup-forgejo-containerd-mirror.sh
+++ b/scripts/setup-forgejo-containerd-mirror.sh
@ -1,19 +1,24 @@
 #!/usr/bin/env bash
-# One-shot deployment of the forgejo.viktorbarzin.me containerd hosts.toml
-# entry + /etc/hosts pin across every k8s node. Cloud-init only fires on VM
+# One-shot deployment of the forgejo pull path across every k8s node:
+# systemd-resolved routing domain ~viktorbarzin.me -> Technitium, plus the
+# (vestigial) containerd hosts.toml entry. Cloud-init only fires on VM
 # provision, so existing nodes need this manual rollout.
 #
-# The /etc/hosts pin (forgejo.viktorbarzin.me -> Traefik LB) is what actually
-# makes pulls hairpin-proof: Traefik 404s the mirror's bare-IP requests (no
-# Host/SNI match) and the registry's Bearer auth realm is the absolute public
-# URL, so the hosts.toml mirror alone always degrades to the flaky public-IP
-# hairpin (2026-06-10 tuya-bridge outage; see
+# The routing domain is what actually makes pulls hairpin-proof: Technitium's
+# split-horizon zone resolves forgejo.viktorbarzin.me (CNAME, auto-synced from
+# ingresses) to the zone apex whose A record tracks the live Traefik LB IP —
+# no hardcoded service IPs on nodes. The hosts.toml mirror alone CANNOT do
+# this: Traefik 404s its bare-IP requests (no Host/SNI match) and the registry
+# Bearer auth realm is the absolute public URL fetched outside the mirror
+# (2026-06-10 tuya-bridge outage; see
 # docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md).
 #
 # What it does, per node:
 #   1. drain (ignore-daemonsets, delete-emptydir-data)
-#   2. ssh in: mkdir + write /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
-#      + append the forgejo /etc/hosts pin
+#   2. ssh in: write /etc/systemd/resolved.conf.d/viktorbarzin.conf (routing
+#      domain), neuter any public global-dns.conf to FallbackDNS-only, drop
+#      legacy forgejo-internal-pin /etc/hosts lines, restart systemd-resolved,
+#      write /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
 #   3. systemctl restart containerd
 #   4. uncordon
 #
@ -46,12 +51,31 @@ for n in $NODES; do

  ssh -o StrictHostKeyChecking=accept-new "wizard@$n" sudo bash <<EOF
 set -euo pipefail
+mkdir -p /etc/systemd/resolved.conf.d
+cat > /etc/systemd/resolved.conf.d/viktorbarzin.conf <<'CONF'
+# Route *.viktorbarzin.me to Technitium (split-horizon zone -> live Traefik LB),
+# so kubelet image pulls of forgejo.viktorbarzin.me never traverse the public
+# NAT-hairpin. Everything else uses the link DNS.
+# Managed: setup-forgejo-containerd-mirror.sh / cloud_init.yaml
+[Resolve]
+DNS=10.0.20.201
+Domains=~viktorbarzin.me
+CONF
+# Public servers in the global DNS= set would race the routing domain —
+# demote any legacy global-dns.conf to emergency fallback only.
+if [ -f /etc/systemd/resolved.conf.d/global-dns.conf ]; then
+  cat > /etc/systemd/resolved.conf.d/global-dns.conf <<'CONF'
+# Emergency fallback only (used when no link DNS is configured at all).
+[Resolve]
+FallbackDNS=8.8.8.8 1.1.1.1
+CONF
+fi
+sed -i '/forgejo-internal-pin/d' /etc/hosts
+systemctl restart systemd-resolved
 mkdir -p "$CERTS_DIR"
 cat > "$CERTS_DIR/hosts.toml" <<'TOML'
 $HOSTS_TOML
 TOML
-grep -q forgejo-internal-pin /etc/hosts || \
-  echo '10.0.20.203 forgejo.viktorbarzin.me # forgejo-internal-pin (managed: setup-forgejo-containerd-mirror.sh)' >> /etc/hosts
 systemctl restart containerd
 EOF