dns: pfSense forward-zone for viktorbarzin.me, nodes fully stock [ci skip]
Round 3 of the forgejo-pull hairpin fix (per Viktor: no per-node customization — split-brain lives in the DNS infra): - pfSense Unbound domain override viktorbarzin.me -> Technitium 10.0.20.201 (applied via php write_config, backup on-box). Every Unbound client on every VLAN now gets the internal split-horizon answers (live Traefik IP via apex CNAME) with zero per-host config. - CoreDNS carve-out (TF, applied): dedicated viktorbarzin.me:53 block — forgejo pinned to Traefik ClusterIP via data source (pods cannot reach the ETP=Local LB IP pfSense now returns), all other .me names kept on public resolvers (pods' pre-existing behavior). Replaces the .:53 forgejo rewrite. - Removed the same-day resolved routing-domain drop-ins from all 7 nodes; node5/6 link DNS repointed Technitium -> pfSense (netplan + qm 205/206) for fleet parity; cloud-init no longer writes any DNS drop-ins. - Docs: dns.md, pfsense-unbound runbook (override + rollback), registry bullet, post-mortem final-architecture addendum. Verified: nodes resolve forgejo -> .203 via pfSense, crictl pull OK, pods resolve forgejo -> ClusterIP / others -> public, mail record works, .lan zone unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
1ee1bf0817
commit
2b8c0def30
8 changed files with 182 additions and 101 deletions
|
|
@ -269,11 +269,11 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons
|
|||
|
||||
- **Affected**: Non-proxied domains (ha-sofia, immich, headscale, calibre, vaultwarden, etc.) for 192.168.1.x clients
|
||||
- **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed)
|
||||
- **Not affected**: 10.0.x.x and K8s clients — these resolve non-proxied domains to the public IP and rely on pfSense NAT reflection, which is **intermittently broken** (observed i/o timeouts to `176.12.22.76:443` from k8s nodes and the devvm, 2026-06-04 → 2026-06-10). Hairpin-sensitive paths on this network route `*.viktorbarzin.me` to Technitium instead, via a systemd-resolved **routing domain** (`/etc/systemd/resolved.conf.d/viktorbarzin.conf`: `DNS=10.0.20.201`, `Domains=~viktorbarzin.me`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary) — no hardcoded service IPs on clients:
|
||||
- **k8s nodes (kubelet image pulls of `forgejo.viktorbarzin.me`)**: routing-domain drop-in on all 7 nodes (2026-06-10, replacing a same-day `/etc/hosts` pin; deployed via `modules/create-template-vm/cloud_init.yaml` for new nodes, `scripts/setup-forgejo-containerd-mirror.sh` rollout for existing ones). The containerd hosts.toml mirror alone is insufficient — Traefik 404s its bare-IP requests (no Host/SNI match) and the registry's Bearer auth realm is an absolute public URL fetched outside the mirror. Caution: public servers must NOT sit in the nodes' global resolved `DNS=` set — they merge with and race the routing domain (the old node5/6 `global-dns.conf` did exactly this; now `FallbackDNS=` only). Root cause analysis: `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`.
|
||||
- **devvm**: same `viktorbarzin.conf` drop-in (predates the node rollout; provisioned by `setup-devvm.sh`).
|
||||
- **in-cluster pods → forgejo**: CoreDNS `rewrite name exact forgejo.viktorbarzin.me traefik.traefik.svc.cluster.local` (2026-06-04, beads code-yh33) — pods bypass node resolved entirely.
|
||||
- **Trade-off**: `*.viktorbarzin.me` resolution from nodes/devvm now depends on in-cluster Technitium (3 replicas). During a full cluster outage these names SERVFAIL — acceptable, the services behind them are down anyway; bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds.
|
||||
- **10.0.x.x clients (k8s nodes, devvm, other VMs)** — handled at the resolver since 2026-06-10: **pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium** (`10.0.20.201`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary). Every client of pfSense Unbound — all VLANs, k8s nodes included — therefore gets internal answers with **zero per-host configuration** (no `/etc/hosts` pins, no resolved drop-ins; both earlier same-day approaches were removed, nodes are stock). Names not behind Traefik keep distinct records in the zone (e.g. `mail.viktorbarzin.me → 10.0.20.1`, verified working on :993/:25). See `docs/runbooks/pfsense-unbound.md` for the override config + rollback, and `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md` for the incident that motivated this (kubelet forgejo pulls riding the broken hairpin; the containerd hosts.toml mirror cannot fix it — Traefik 404s bare-IP requests and the registry auth realm is an absolute public URL).
|
||||
- **devvm**: also covered by a `~viktorbarzin.me → 10.0.20.201` resolved routing domain (predates the pfSense override, provisioned by `setup-devvm.sh`) — redundant-but-harmless belt-and-suspenders.
|
||||
- **in-cluster PODS are deliberately carved out**: the Traefik LB IP is `externalTrafficPolicy=Local` and unreachable from pods, so the `.203` answers pfSense now returns must NOT reach them. CoreDNS has a dedicated `viktorbarzin.me:53` block (in `stacks/technitium`, TF-managed): forgejo is pinned to Traefik's **ClusterIP** (interpolated from the live Service at plan time) and all other `.me` names forward to `8.8.8.8/1.1.1.1` — preserving pods' pre-existing public-IP behavior (beads code-yh33).
|
||||
- **Trade-off**: `viktorbarzin.me` resolution via pfSense now depends on in-cluster Technitium (3 replicas). During a full cluster outage the zone SERVFAILs LAN-wide — acceptable, the services behind it are down anyway; node bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds.
|
||||
- **Residual nondeterminism**: nodes keep `94.140.14.14` as a secondary resolver (netplan/qm `--nameserver`). If systemd-resolved fails over to it during a pfSense DNS blip, `.me` answers are public again until it switches back — a rare, self-healing window, accepted.
|
||||
|
||||
Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).
|
||||
|
||||
|
|
@ -462,10 +462,18 @@ The zone-sync CronJob (runs every 30min) pushes the following to the Prometheus
|
|||
|
||||
### Hairpin NAT Not Working (LAN → *.viktorbarzin.me Fails)
|
||||
|
||||
Since 2026-04-19 (Workstream D), pfSense Unbound answers LAN DNS queries
|
||||
directly instead of forwarding to Technitium, so the Technitium Split Horizon
|
||||
post-processing does NOT run for 192.168.1.x clients anymore. Non-proxied
|
||||
services break hairpin on LAN clients again. Options:
|
||||
**Since 2026-06-10 this is largely solved at the resolver**: pfSense Unbound
|
||||
carries a domain override forwarding the entire `viktorbarzin.me` zone to
|
||||
Technitium, so ANY client that queries pfSense (all VLANs + 192.168.1.x
|
||||
clients pointed at `192.168.1.2`) gets the internal Traefik answer. If
|
||||
hairpin still fails for a client, first check which resolver it actually
|
||||
uses — clients on the TP-Link's own DHCP DNS (router/ISP) bypass pfSense
|
||||
entirely. Options for those:
|
||||
|
||||
(Historical context: 2026-04-19 Workstream D made Unbound answer LAN
|
||||
queries directly, which had removed the Technitium Split Horizon
|
||||
post-processing from the LAN path until the 2026-06-10 domain override
|
||||
restored internal answers at the zone level.)
|
||||
|
||||
1. **Switch service to proxied Cloudflare** (preferred) — set `dns_type = "proxied"` in the `ingress_factory` module call; DNS now resolves to Cloudflare edge, hairpin-independent.
|
||||
2. **Add a local-data override on pfSense Unbound** — under `Services → DNS Resolver → Host Overrides`, set `<service>.viktorbarzin.me → 10.0.20.203` (Traefik LB IP). This is equivalent to what Split Horizon did, applied at the resolver.
|
||||
|
|
|
|||
|
|
@ -89,25 +89,50 @@ NXDOMAINs forgejo.viktorbarzin.me") was obsolete: the ingress-dns-sync
|
|||
has since added forgejo to the zone — a stale comment that actively
|
||||
pointed new nodes at the hairpin.
|
||||
|
||||
Persisted in `modules/create-template-vm/cloud_init.yaml` (new nodes; DNS
|
||||
drop-ins) and `scripts/setup-forgejo-containerd-mirror.sh` (existing-node
|
||||
rollout). hosts.toml mirror left in place but documented as vestigial.
|
||||
**Final architecture (same day, round 3 — Viktor: "no customization,
|
||||
everything handled by the DNS infra"):** the routing-domain drop-ins were
|
||||
ALSO removed; nodes are now completely stock. Two resolver-side changes
|
||||
replaced them:
|
||||
|
||||
1. **pfSense Unbound domain override** `viktorbarzin.me → 10.0.20.201`
|
||||
(forward-zone to Technitium). Every Unbound client on every VLAN gets
|
||||
the internal split-horizon answers with zero per-host config. No
|
||||
DNSSEC complications (zone unsigned), private-IP answers pass, mail's
|
||||
non-Traefik record (`→ 10.0.20.1`) verified working. Runbook:
|
||||
`docs/runbooks/pfsense-unbound.md`; on-box backup
|
||||
`config.xml.bak-2026-06-10-pre-me-forward`.
|
||||
2. **CoreDNS pod carve-out** (TF, `stacks/technitium`): a dedicated
|
||||
`viktorbarzin.me:53` server block pins forgejo to Traefik's
|
||||
**ClusterIP** (interpolated from the live Service — pods cannot reach
|
||||
the ETP=Local LB IP that pfSense now returns) and forwards all other
|
||||
`.me` names to `8.8.8.8/1.1.1.1`, preserving pods' pre-existing
|
||||
public-IP behavior. Replaces the old forgejo rewrite in `.:53`.
|
||||
|
||||
node5/6 were also re-pointed from link-DNS=Technitium to
|
||||
`10.0.20.1 94.140.14.14` (netplan + `qm set --nameserver` on PVE VMs
|
||||
205/206) for fleet parity, and their `global-dns.conf` was deleted.
|
||||
|
||||
**Renumber hazard: resolved.** A future Traefik LB renumber propagates
|
||||
via the apex A record automatically (drift probe alerts if it doesn't);
|
||||
only the vestigial hosts.toml literal goes stale. **New trade-off:**
|
||||
`*.viktorbarzin.me` resolution from nodes now depends on in-cluster
|
||||
Technitium (3 replicas); in a full cluster outage these names SERVFAIL —
|
||||
acceptable, the services are down anyway, and bootstrap images pull via
|
||||
the IP-addressed `10.0.20.10` mirrors.
|
||||
only the vestigial hosts.toml literal goes stale. **Trade-offs:**
|
||||
`viktorbarzin.me` resolution via pfSense depends on in-cluster Technitium
|
||||
(3 replicas) — SERVFAIL during a full cluster outage (services down
|
||||
anyway; bootstrap images pull via the IP-addressed `10.0.20.10` mirrors).
|
||||
Nodes keep `94.140.14.14` as secondary DNS: a resolved failover during a
|
||||
pfSense blip briefly re-exposes public answers — rare, self-healing,
|
||||
accepted.
|
||||
|
||||
## Verification
|
||||
## Verification (final architecture)
|
||||
|
||||
- `getent hosts forgejo.viktorbarzin.me` → `10.0.20.203` on all 7 nodes
|
||||
**with no `/etc/hosts` entry** (pure DNS via the routing domain);
|
||||
`resolvectl status` shows `~viktorbarzin.me` routed to `10.0.20.201`;
|
||||
general resolution (`getent hosts google.com`) intact on every node;
|
||||
`crictl pull` of the tuya_bridge image succeeds via the DNS path.
|
||||
- All 7 nodes stock (no pins, no drop-ins); `getent hosts
|
||||
forgejo.viktorbarzin.me` → `10.0.20.203` via pfSense → Technitium;
|
||||
general resolution intact; `crictl pull` succeeds end-to-end.
|
||||
- pfSense: forgejo/immich/vault → apex CNAME → `.203`; mail →
|
||||
`10.0.20.1` (`:993` verified); `google.com` public; `.lan` auth-zone
|
||||
unaffected.
|
||||
- Pods: forgejo → `10.111.111.95` (Traefik ClusterIP),
|
||||
immich → `176.12.22.76` (public, status quo) — verified in-pod after
|
||||
CoreDNS reload.
|
||||
- tuya-bridge pod Running; `/health` `ok=true`; 27/27 devices
|
||||
`success=true`; 7/7 `*_tuya_cloud_up` gauges = 1; no tuya-related alerts.
|
||||
|
||||
|
|
|
|||
|
|
@ -93,6 +93,55 @@ Verify via the Technitium API:
|
|||
curl -sk "http://127.0.0.1:5380/api/zones/options/get?token=$TOK&zone=viktorbarzin.lan" | jq .response.zoneTransfer
|
||||
```
|
||||
|
||||
## Domain Override: viktorbarzin.me → Technitium (2026-06-10)
|
||||
|
||||
`$config['unbound']['domainoverrides']` carries one entry forwarding the
|
||||
whole `viktorbarzin.me` zone to Technitium `10.0.20.201` (forward-zone, not
|
||||
AXFR). Every Unbound client — all VLANs + 192.168.1.x via the WAN listener —
|
||||
gets Technitium's internal split-horizon answers: ingress hosts CNAME to the
|
||||
zone apex whose A record auto-tracks the live Traefik LB IP
|
||||
(`technitium-ingress-dns-sync` + `viktorbarzin-apex-probe` canary). This is
|
||||
what keeps kubelet forgejo image pulls (and everything else on 10.0.x) off
|
||||
the broken public NAT-hairpin with zero per-host DNS config — see
|
||||
`docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`.
|
||||
|
||||
Notes:
|
||||
|
||||
- The domain is NOT DNSSEC-signed (no DS records), so no `domain-insecure`
|
||||
needed; private-IP answers pass without `private-domain` custom options
|
||||
(verified empirically — pfSense handles domain overrides correctly).
|
||||
- **Cluster-outage behavior**: the zone SERVFAILs while Technitium is down
|
||||
(forward-zone, no local copy). Deliberate — the services are down anyway.
|
||||
Contrast with `viktorbarzin.lan`, which is AXFR-slaved to survive outages.
|
||||
- **In-cluster pods must NOT see these answers** (Traefik LB is ETP=Local,
|
||||
unreachable from pods). CoreDNS has a dedicated `viktorbarzin.me:53`
|
||||
carve-out (stacks/technitium) — do not remove it while this override exists.
|
||||
- Added with the standard SSH + PHP pattern (see "host override" memories /
|
||||
this file's style):
|
||||
|
||||
```php
|
||||
require_once("config.inc"); require_once("unbound.inc");
|
||||
global $config;
|
||||
$config["unbound"]["domainoverrides"][] = [
|
||||
"domain" => "viktorbarzin.me", "ip" => "10.0.20.201",
|
||||
"descr" => "...", "tls_hostname" => "",
|
||||
];
|
||||
write_config("add viktorbarzin.me domain override -> Technitium");
|
||||
services_unbound_configure();
|
||||
```
|
||||
|
||||
Rollback: remove the entry from the array (match on `domain`), then
|
||||
`write_config()` + `services_unbound_configure()`. Pre-change backup:
|
||||
`/cf/conf/config.xml.bak-2026-06-10-pre-me-forward` (on-box).
|
||||
|
||||
Verify:
|
||||
|
||||
```
|
||||
dig +short @10.0.20.1 forgejo.viktorbarzin.me # apex CNAME + live Traefik IP
|
||||
dig +short @10.0.20.1 mail.viktorbarzin.me # 10.0.20.1 (non-Traefik record)
|
||||
dig +short @10.0.20.1 google.com # public, unaffected
|
||||
```
|
||||
|
||||
## Operational Checks
|
||||
|
||||
```bash
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue