dns: pfSense forward-zone for viktorbarzin.me, nodes fully stock [ci skip]
Round 3 of the forgejo-pull hairpin fix (per Viktor: no per-node customization — split-brain lives in the DNS infra): - pfSense Unbound domain override viktorbarzin.me -> Technitium 10.0.20.201 (applied via php write_config, backup on-box). Every Unbound client on every VLAN now gets the internal split-horizon answers (live Traefik IP via apex CNAME) with zero per-host config. - CoreDNS carve-out (TF, applied): dedicated viktorbarzin.me:53 block — forgejo pinned to Traefik ClusterIP via data source (pods cannot reach the ETP=Local LB IP pfSense now returns), all other .me names kept on public resolvers (pods' pre-existing behavior). Replaces the .:53 forgejo rewrite. - Removed the same-day resolved routing-domain drop-ins from all 7 nodes; node5/6 link DNS repointed Technitium -> pfSense (netplan + qm 205/206) for fleet parity; cloud-init no longer writes any DNS drop-ins. - Docs: dns.md, pfsense-unbound runbook (override + rollback), registry bullet, post-mortem final-architecture addendum. Verified: nodes resolve forgejo -> .203 via pfSense, crictl pull OK, pods resolve forgejo -> ClusterIP / others -> public, mail record works, .lan zone unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
1ee1bf0817
commit
2b8c0def30
8 changed files with 182 additions and 101 deletions
|
|
@ -89,25 +89,50 @@ NXDOMAINs forgejo.viktorbarzin.me") was obsolete: the ingress-dns-sync
|
|||
has since added forgejo to the zone — a stale comment that actively
|
||||
pointed new nodes at the hairpin.
|
||||
|
||||
Persisted in `modules/create-template-vm/cloud_init.yaml` (new nodes; DNS
|
||||
drop-ins) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
|
||||
rollout). hosts.toml mirror left in place but documented as vestigial.
|
||||
**Final architecture (same day, round 3 — Viktor: "no customization,
|
||||
everything handled by the DNS infra"):** the routing-domain drop-ins were
|
||||
ALSO removed; nodes are now completely stock. Two resolver-side changes
|
||||
replaced them:
|
||||
|
||||
1. **pfSense Unbound domain override** `viktorbarzin.me → 10.0.20.201`
|
||||
(forward-zone to Technitium). Every Unbound client on every VLAN gets
|
||||
the internal split-horizon answers with zero per-host config. No
|
||||
DNSSEC complications (zone unsigned), private-IP answers pass, mail's
|
||||
non-Traefik record (`→ 10.0.20.1`) verified working. Runbook:
|
||||
`docs/runbooks/pfsense-unbound.md`; on-box backup
|
||||
`config.xml.bak-2026-06-10-pre-me-forward`.
|
||||
2. **CoreDNS pod carve-out** (TF, `stacks/technitium`): a dedicated
|
||||
`viktorbarzin.me:53` server block pins forgejo to Traefik's
|
||||
**ClusterIP** (interpolated from the live Service — pods cannot reach
|
||||
the ETP=Local LB IP that pfSense now returns) and forwards all other
|
||||
`.me` names to `8.8.8.8/1.1.1.1`, preserving pods' pre-existing
|
||||
public-IP behavior. Replaces the old forgejo rewrite in `.:53`.
|
||||
|
||||
node5/6 were also re-pointed from link-DNS=Technitium to
|
||||
`10.0.20.1 94.140.14.14` (netplan + `qm set --nameserver` on PVE VMs
|
||||
205/206) for fleet parity, and their `global-dns.conf` was deleted.
|
||||
|
||||
**Renumber hazard: resolved.** A future Traefik LB renumber propagates
|
||||
via the apex A record automatically (drift probe alerts if it doesn't);
|
||||
only the vestigial hosts.toml literal goes stale. **New trade-off:**
|
||||
`*.viktorbarzin.me` resolution from nodes now depends on in-cluster
|
||||
Technitium (3 replicas); in a full cluster outage these names SERVFAIL —
|
||||
acceptable, the services are down anyway, and bootstrap images pull via
|
||||
the IP-addressed `10.0.20.10` mirrors.
|
||||
only the vestigial hosts.toml literal goes stale. **Trade-offs:**
|
||||
`viktorbarzin.me` resolution via pfSense depends on in-cluster Technitium
|
||||
(3 replicas) — SERVFAIL during a full cluster outage (services down
|
||||
anyway; bootstrap images pull via the IP-addressed `10.0.20.10` mirrors).
|
||||
Nodes keep `94.140.14.14` as secondary DNS: a resolved failover during a
|
||||
pfSense blip briefly re-exposes public answers — rare, self-healing,
|
||||
accepted.
|
||||
|
||||
## Verification
|
||||
## Verification (final architecture)
|
||||
|
||||
- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes
|
||||
**with no `/etc/hosts` entry** (pure DNS via the routing domain);
|
||||
`resolvectl status` shows `~viktorbarzin.me` routed to `10.0.20.201`;
|
||||
general resolution (`getent hosts google.com`) intact on every node;
|
||||
`crictl pull` of the tuya_bridge image succeeds via the DNS path.
|
||||
- All 7 nodes stock (no pins, no drop-ins); `getent hosts
|
||||
forgejo.viktorbarzin.me` → `10.0.20.203` via pfSense → Technitium;
|
||||
general resolution intact; `crictl pull` succeeds end-to-end.
|
||||
- pfSense: forgejo/immich/vault → apex CNAME → `.203`; mail →
|
||||
`10.0.20.1` (`:993` verified); `google.com` public; `.lan` auth-zone
|
||||
unaffected.
|
||||
- Pods: forgejo → `10.111.111.95` (Traefik ClusterIP),
|
||||
immich → `176.12.22.76` (public, status quo) — verified in-pod after
|
||||
CoreDNS reload.
|
||||
- tuya-bridge pod Running; `/health` `ok=true`; 27/27 devices
|
||||
`success=true`; 7/7 `*_tuya_cloud_up` gauges = 1; no tuya-related alerts.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue