[dns] pfSense: Unbound replaces dnsmasq (WS D)

Replace pfSense dnsmasq (DNS Forwarder) with Unbound (DNS Resolver) so
LAN-side .viktorbarzin.lan resolution survives a full Kubernetes outage.

Out-of-band pfSense changes (not in Terraform; pfSense config.xml is
VM-managed). Backup at /cf/conf/config.xml.2026-04-19-pre-unbound on-box
+ /mnt/backup/pfsense/ nightly.

- <unbound> enabled; listens on lan, opt1, wan, lo0
- <forwarding> on + <forward_tls_upstream> → DoT to Cloudflare
  (1.1.1.1 / 1.0.0.1 port 853, SNI cloudflare-dns.com)
- <dnssec>, <prefetch>, <prefetchkey>, <dnsrecordcache> (serve-expired)
- msgcachesize=256MB, cache_max_ttl=7d, cache_min_ttl=60s
- custom_options: auth-zone viktorbarzin.lan master=10.0.20.201
  fallback-enabled=yes for-upstream=yes + serve-expired-ttl=259200
- <dnsmasq><enable> removed; dnsmasq stopped
- NAT rdr WAN UDP 53 → 10.0.20.201 removed (Unbound listens on WAN now)
- Technitium zone viktorbarzin.lan: zoneTransferNetworkACL set to
  10.0.20.1, 10.0.10.1, 192.168.1.2 (pfSense source IPs)

Verified:
- unbound-control list_auth_zones: viktorbarzin.lan serial 49367
- dig @127.0.0.1 idrac.viktorbarzin.lan returns 192.168.1.4 with aa flag
  (served from auth-zone, not forwarded)
- dig @127.0.0.1 example.com +dnssec returns ad flag (DoT + validated)
- /var/unbound/viktorbarzin.lan.zone has ~114 records
- K8s outage drill passed: scale technitium=0 → dig still returns via
  WAN/LAN/OPT1 interfaces → scale restored
- LAN/management/K8s VLAN clients all resolve via pfSense 192.168.1.2 /
  10.0.10.1 / 10.0.20.1 respectively

Trade-off: Technitium Split Horizon hairpin for 192.168.1.x →
*.viktorbarzin.me (non-proxied) no longer runs via pfSense (Unbound
answers locally). Fix if it bites: switch service to proxied or add
Unbound Host Override. Documented in docs/runbooks/pfsense-unbound.md.

Closes: code-k0d
This commit is contained in:
Viktor Barzin 2026-04-19 15:52:41 +00:00
parent bc866d53fa
commit 33d934c32f
2 changed files with 253 additions and 28 deletions

View file

@ -1,6 +1,6 @@
# DNS Architecture
Last updated: 2026-04-19 (NodeLocal DNSCache deployed — Workstream C)
Last updated: 2026-04-19 (WS C — NodeLocal DNSCache deployed; WS D — pfSense Unbound replaces dnsmasq)
## Overview
@ -22,10 +22,9 @@ graph TB
end
subgraph "pfSense (10.0.20.1)"
pf_dnsmasq[dnsmasq<br/>Forwarder]
pf_unbound[Unbound<br/>Resolver<br/>auth-zone AXFR]
pf_kea[Kea DHCP4<br/>3 subnets, 53 reservations]
pf_ddns[Kea DHCP-DDNS<br/>RFC 2136]
pf_nat[NAT rdr<br/>UDP 53 → Technitium]
end
subgraph "Kubernetes Cluster"
@ -53,18 +52,17 @@ graph TB
Internet -->|DNS query| CF
CF -->|CNAME to tunnel| CFTunnel
LAN -->|DNS query UDP 53| pf_nat
pf_nat -->|forward| LB_DNS
LAN -->|DNS query UDP 53| pf_unbound
pf_kea -->|lease event| pf_ddns
pf_ddns -->|A + PTR| LB_DNS
pf_dnsmasq -->|.viktorbarzin.lan| LB_DNS
pf_dnsmasq -->|public queries| CF
pf_unbound -->|AXFR viktorbarzin.lan| LB_DNS
pf_unbound -->|public queries DoT :853| CF
NodeLocalDNS -->|cache miss| KubeDNSUpstream
KubeDNSUpstream --> CoreDNS
CoreDNS -->|.viktorbarzin.lan| ClusterIP
CoreDNS -->|public queries| pf_dnsmasq
CoreDNS -->|public queries| pf_unbound
LB_DNS --> Primary
LB_DNS --> Secondary
@ -86,7 +84,7 @@ graph TB
| CoreDNS | K8s `kube-system` | Cluster default | K8s service discovery + forwarding to Technitium |
| NodeLocal DNSCache | K8s `kube-system` (DaemonSet) | `k8s-dns-node-cache:1.23.1` | Per-node DNS cache, transparent interception on 10.96.0.10 + 169.254.20.10. Insulates pods from CoreDNS/Technitium/pfSense disruption. |
| Cloudflare DNS | SaaS | N/A | Public domain management (~50 domains) |
| pfSense dnsmasq | 10.0.20.1 | pfSense 2.7.x | DNS forwarder for management VLAN |
| pfSense Unbound | 10.0.20.1 | pfSense 2.7.2 (Unbound 1.19) | DNS resolver on LAN/OPT1/WAN; AXFR-slaves `viktorbarzin.lan` from Technitium; DoT upstream to Cloudflare |
| Kea DHCP-DDNS | 10.0.20.1 | pfSense 2.7.x | Automatic DNS registration on DHCP lease |
| phpIPAM | K8s namespace `phpipam` | v1.7.0 | IPAM ↔ DNS bidirectional sync |
@ -98,7 +96,7 @@ graph TB
| NodeLocal DNSCache | `stacks/nodelocal-dns/` | DaemonSet (5 pods), ConfigMap, kube-dns-upstream Service, headless metrics Service |
| Cloudflared | `stacks/cloudflared/` | Cloudflare DNS records (A, AAAA, CNAME, MX, TXT), tunnel config |
| phpIPAM | `stacks/phpipam/` | dns-sync CronJob, pfsense-import CronJob |
| pfSense | `stacks/pfsense/` | VM config (DNS config is via pfSense web UI) |
| pfSense | `stacks/pfsense/` | VM config only (Unbound config is managed out-of-band via pfSense web UI / direct config.xml edits; see `docs/runbooks/pfsense-unbound.md`) |
## DNS Resolution Paths
@ -122,39 +120,50 @@ Pod → NodeLocal DNSCache (intercepts on kube-dns:10.96.0.10)
→ cache hit: serve locally
→ cache miss: forward to kube-dns-upstream (selects CoreDNS pods directly)
→ CoreDNS: forward to pfSense (10.0.20.1), fallback 8.8.8.8, 1.1.1.1
→ pfSense dnsmasq → Cloudflare (1.1.1.1)
→ pfSense Unbound:
- .viktorbarzin.lan → local auth-zone (AXFR-cached from Technitium)
- public → DoT to Cloudflare (1.1.1.1 / 1.0.0.1 port 853)
```
### LAN Client (192.168.1.x) → Any Domain
```
Client gets DNS=192.168.1.2 (pfSense WAN) from DHCP
→ pfSense NAT rdr on WAN interface → Technitium LB (10.0.20.201)
→ Technitium resolves:
- .viktorbarzin.lan → local zone
- .viktorbarzin.me (non-proxied) → recursive, then Split Horizon translates
176.12.22.76 → 10.0.20.200 for 192.168.1.0/24 clients
- other → recursive to Cloudflare DoH (1.1.1.1)
→ pfSense Unbound listens on 192.168.1.2:53 directly (no NAT rdr)
- .viktorbarzin.lan → auth-zone (AXFR-cached from Technitium 10.0.20.201)
Survives full Technitium/K8s outage — auth-zone keeps serving from
/var/unbound/viktorbarzin.lan.zone with `fallback-enabled: yes`.
- .viktorbarzin.me (non-proxied) and other public → DoT to Cloudflare
(1.1.1.1 / 1.0.0.1 on port 853, SNI cloudflare-dns.com)
```
Client source IPs are preserved (no SNAT on 192.168.1.x → 10.0.20.x path) — Technitium logs show real per-device IPs.
**Trade-off vs. prior NAT rdr**: Split Horizon hairpin translation
(`176.12.22.76 → 10.0.20.200` for 192.168.1.x clients) was only applied
when queries reached Technitium via the NAT rdr. With Unbound answering
on 192.168.1.2:53 directly, non-proxied `*.viktorbarzin.me` queries on the
192.168.1.x LAN return the public IP, which the TP-Link AP can't hairpin.
If hairpin is broken on LAN for a given non-proxied service, the fix is
either (a) switch the service to proxied (via `dns_type = "proxied"`)
or (b) add a local-data override on pfSense Unbound. The pre-Unbound
state is documented in the `docs/runbooks/pfsense-unbound.md` rollback
section.
### Management VLAN (10.0.10.x) → Any Domain
```
Client gets DNS from Kea DHCP → pfSense (10.0.10.1)
→ pfSense dnsmasq:
- .viktorbarzin.lan → forward to Technitium (10.0.20.201)
- other → forward to Cloudflare (1.1.1.1)
→ pfSense Unbound:
- .viktorbarzin.lan → auth-zone (local)
- other → DoT to Cloudflare (1.1.1.1 / 1.0.0.1 port 853)
```
### K8s VLAN (10.0.20.x) → Any Domain
```
Client gets DNS from Kea DHCP → pfSense (10.0.20.1)
→ pfSense dnsmasq:
- .viktorbarzin.lan → forward to Technitium (10.0.20.201)
- other → forward to Cloudflare (1.1.1.1)
→ pfSense Unbound:
- .viktorbarzin.lan → auth-zone (local)
- other → DoT to Cloudflare (1.1.1.1 / 1.0.0.1 port 853)
```
## Technitium DNS — Internal DNS Server
@ -438,13 +447,26 @@ The zone-sync CronJob (runs every 30min) pushes the following to the Prometheus
### LAN Clients Can't Resolve
1. Verify pfSense NAT rule redirects UDP 53 on WAN to 10.0.20.201
2. Check Technitium LB service: `kubectl get svc -n technitium technitium-dns`
3. Test from LAN: `dig @192.168.1.2 idrac.viktorbarzin.lan`
4. Check `externalTrafficPolicy: Local` — if no Technitium pod runs on the node receiving traffic, it drops
1. Verify pfSense Unbound is running: `ssh admin@10.0.20.1 "sockstat -l -4 -p 53 | grep unbound"` — expect listeners on `192.168.1.2:53`, `10.0.10.1:53`, `10.0.20.1:53`, `127.0.0.1:53`
2. Verify the auth-zone is loaded: `ssh admin@10.0.20.1 "unbound-control -c /var/unbound/unbound.conf list_auth_zones"` — expect `viktorbarzin.lan. serial N`
3. Test from LAN: `dig @192.168.1.2 idrac.viktorbarzin.lan` (should return with `aa` flag)
4. Test public upstream: `dig @192.168.1.2 example.com +dnssec` (should have `ad` flag — DoT via Cloudflare working)
5. If auth-zone can't AXFR: check Technitium `viktorbarzin.lan` zone options → `zoneTransferNetworkACL` contains `10.0.20.1, 10.0.10.1, 192.168.1.2`
6. See `docs/runbooks/pfsense-unbound.md` for full Unbound runbook and rollback instructions
### Hairpin NAT Not Working (LAN → *.viktorbarzin.me Fails)
Since 2026-04-19 (Workstream D), pfSense Unbound answers LAN DNS queries
directly instead of forwarding to Technitium, so the Technitium Split Horizon
post-processing does NOT run for 192.168.1.x clients anymore. Non-proxied
services break hairpin on LAN clients again. Options:
1. **Switch service to proxied Cloudflare** (preferred) — set `dns_type = "proxied"` in the `ingress_factory` module call; DNS now resolves to Cloudflare edge, hairpin-independent.
2. **Add a local-data override on pfSense Unbound** — under `Services → DNS Resolver → Host Overrides`, set `<service>.viktorbarzin.me → 10.0.20.200` (Traefik LB IP). This is equivalent to what Split Horizon did, applied at the resolver.
3. **Revert to prior NAT rdr + Technitium Split Horizon** — documented in `docs/runbooks/pfsense-unbound.md` rollback section.
K8s-side Split Horizon is still configured and applies when `*.viktorbarzin.me` queries DO reach Technitium (e.g., from pods that query via CoreDNS → Technitium forwarding for `.viktorbarzin.me` via pfSense). Verify Technitium split-horizon app:
1. Verify Split Horizon app is installed on all instances
2. Check CronJob status: `kubectl get cronjob -n technitium technitium-split-horizon-sync`
3. Run the job manually: `kubectl create job --from=cronjob/technitium-split-horizon-sync test-sh -n technitium`
@ -479,6 +501,7 @@ For external `.viktorbarzin.me` records:
## Incident History
- **2026-04-14 (SEV1)**: NFS `fsid=0` caused Technitium primary data loss on restart. Fixed by migrating all 3 instances to `proxmox-lvm-encrypted`, adding zone-sync CronJob (30min AXFR). See [post-mortem](../post-mortems/2026-04-14-nfs-fsid0-dns-vault-outage.md).
- **2026-04-19 (hardening, not outage)**: Workstream D — pfSense Unbound replaces dnsmasq as the pfSense DNS service. Unbound AXFR-slaves `viktorbarzin.lan` from Technitium so LAN-side resolution survives a full K8s outage. WAN NAT rdr `192.168.1.2:53 → 10.0.20.201` removed (Unbound listens on WAN directly). DoT upstream via Cloudflare. See `docs/runbooks/pfsense-unbound.md` and bd `code-k0d`.
## Related

View file

@ -0,0 +1,202 @@
# pfSense Unbound DNS Resolver
Last updated: 2026-04-19
## Overview
pfSense runs **Unbound** (DNS Resolver) as its sole DNS service, replacing
dnsmasq (DNS Forwarder) as of 2026-04-19 (DNS hardening Workstream D,
bd `code-k0d`).
Unbound AXFR-slaves the `viktorbarzin.lan` zone from the Technitium primary
via the `10.0.20.201` LoadBalancer, so LAN-side `.lan` resolution survives
a full Kubernetes outage. Public queries go to Cloudflare via DNS-over-TLS
(`1.1.1.1` + `1.0.0.1` on port 853, SNI `cloudflare-dns.com`).
## Listeners
Unbound binds on:
| Interface | IP | Purpose |
|-----------|-----|---------|
| WAN | `192.168.1.2:53` | LAN (192.168.1.0/24) clients querying via pfSense WAN |
| LAN | `10.0.10.1:53` | Management VLAN clients |
| OPT1 | `10.0.20.1:53` | K8s VLAN clients (CoreDNS upstream) |
| lo0 | `127.0.0.1:53` | pfSense itself |
The prior WAN NAT `rdr` rule (`192.168.1.2:53 → 10.0.20.201`) was removed in
the same change — Unbound now answers directly on WAN.
## Config Summary
Relevant `<unbound>` keys in `/cf/conf/config.xml`:
| Key | Value | Meaning |
|-----|-------|---------|
| `enable` | flag | Enable Unbound |
| `dnssec` | flag | DNSSEC validation on |
| `forwarding` | flag | Forwarding mode (send recursive queries to upstream) |
| `forward_tls_upstream` | flag | Use DoT for upstream forwarders |
| `prefetch` | flag | Prefetch records near expiry |
| `prefetchkey` | flag | Prefetch DNSKEY records |
| `dnsrecordcache` | flag | `serve-expired: yes` |
| `active_interface` | `lan,opt1,wan,lo0` | Listen interfaces |
| `msgcachesize` | `256` (MB) | Message cache (rrset-cache auto-doubles to 512MB) |
| `cache_max_ttl` | `604800` | 7 days |
| `cache_min_ttl` | `60` | 60 seconds |
| `custom_options` | base64 | Contains `serve-expired-ttl: 259200` + `auth-zone:` block |
Upstream DoT forwarders live in `<system>`:
- `dnsserver[0] = 1.1.1.1`
- `dnsserver[1] = 1.0.0.1`
- `dns1host = cloudflare-dns.com`
- `dns2host = cloudflare-dns.com`
## Auth-Zone for viktorbarzin.lan
The custom_options block declares:
```
server:
serve-expired-ttl: 259200
auth-zone:
name: "viktorbarzin.lan"
master: 10.0.20.201
fallback-enabled: yes
for-downstream: yes
for-upstream: yes
zonefile: "viktorbarzin.lan.zone"
allow-notify: 10.0.20.201
```
- `master: 10.0.20.201` — AXFR source (Technitium LoadBalancer)
- `fallback-enabled: yes` — if the zone can't refresh from master, fall back to normal recursion for this name (prevents hard-fail if AXFR breaks)
- `for-downstream: yes` — answer queries for this zone with AA flag
- `for-upstream: yes` — Unbound's internal iterator also uses this zone
- `zonefile` is relative to the chroot (`/var/unbound/viktorbarzin.lan.zone`)
- `allow-notify: 10.0.20.201` — accept NOTIFY from Technitium
## Technitium-side ACL
Zone `viktorbarzin.lan` on Technitium has `zoneTransfer = UseSpecifiedNetworkACL`
with ACL entries:
- `10.0.20.1` (pfSense OPT1)
- `10.0.10.1` (pfSense LAN)
- `192.168.1.2` (pfSense WAN)
Verify via the Technitium API:
```
curl -sk "http://127.0.0.1:5380/api/zones/options/get?token=$TOK&zone=viktorbarzin.lan" | jq .response.zoneTransfer
```
## Operational Checks
```bash
# Is Unbound listening?
ssh admin@10.0.20.1 "sockstat -l -4 -p 53"
# Auth-zone loaded?
ssh admin@10.0.20.1 "unbound-control -c /var/unbound/unbound.conf list_auth_zones"
# Expected: viktorbarzin.lan. serial NNNNN
# LAN record via auth-zone? (aa flag = authoritative / from auth-zone)
dig @192.168.1.2 idrac.viktorbarzin.lan +norec
# Public record via DoT? (ad flag = DNSSEC validated, via 1.1.1.1/1.0.0.1)
dig @192.168.1.2 example.com +dnssec
# Zonefile has all records?
ssh admin@10.0.20.1 "wc -l /var/unbound/viktorbarzin.lan.zone"
```
## K8s Outage Drill
Tests that `.lan` resolution survives a full Technitium outage:
```bash
# Scale Technitium primary to 0
kubectl -n technitium scale deploy/technitium --replicas=0
# Wait ~5 seconds, then test from a LAN client
ssh devvm.viktorbarzin.lan "dig @192.168.1.2 idrac.viktorbarzin.lan +short"
# Expected: 192.168.1.4 (served from Unbound's cached auth-zone)
# Restore immediately
kubectl -n technitium scale deploy/technitium --replicas=1
```
Completed successfully on 2026-04-19 initial deployment.
Note: secondary/tertiary Technitium pods remain up and continue to serve
queries via the `10.0.20.201` LoadBalancer even when the primary is down —
so the strongest proof that Unbound's auth-zone serves locally is to also
scale those down (optional, not part of the routine drill).
## Backup & Rollback
### Backups
- **On-box**: `/cf/conf/config.xml.2026-04-19-pre-unbound` (created before this
workstream ran — keep for 30 days, then delete)
- **Daily**: PVE `daily-backup` script copies `/cf/conf/config.xml` and a full
pfSense config tar to `/mnt/backup/pfsense/` on the Proxmox host at 05:00
- **Offsite**: Synology `pve-backup/pfsense/` (synced daily by
`offsite-sync-backup`)
### Rollback to dnsmasq
If Unbound misbehaves, revert to dnsmasq + NAT rdr:
```bash
# On pfSense
cp /cf/conf/config.xml.2026-04-19-pre-unbound /cf/conf/config.xml
# Tell pfSense to re-read config and reload services
php -r 'require_once("config.inc"); require_once("config.lib.inc"); disable_path_cache();'
/etc/rc.restart_webgui # reloads PHP config caches
# Restart services
php -r 'require_once("config.inc"); require_once("services.inc"); services_dnsmasq_configure(); services_unbound_configure(); filter_configure();'
/etc/rc.filter_configure # re-applies NAT rules (brings back rdr)
```
Verify:
```bash
sockstat -l -4 -p 53 | grep dnsmasq # expect dnsmasq on 10.0.10.1 and 10.0.20.1
pfctl -sn | grep '53' # expect rdr on wan UDP 53 → 10.0.20.201
```
### Rollback without wiping new changes
If you only want to stop Unbound without restoring the whole config, edit
config.xml and remove `<enable/>` from `<unbound>` + add it back to `<dnsmasq>`,
then re-run `services_unbound_configure()` + `services_dnsmasq_configure()`.
You also need to re-add the WAN NAT rdr in `<nat><rule>` (see the backup XML
for the exact shape — tracker `1775670025`).
## Known Gotchas
1. **pfSense regenerates `/var/unbound/unbound.conf`** on every service reload
from `<unbound>` in `config.xml`. Edits to unbound.conf are NOT durable.
2. **`unbound-control` default config path is wrong**. Always use
`unbound-control -c /var/unbound/unbound.conf <cmd>`.
3. **`custom_options` is base64-encoded** in config.xml. Use `base64 -d` to
decode in a shell, or `base64_decode()` in PHP.
4. **`interface-automatic: yes` is NOT used** when `active_interface` is
explicitly set to a list — pfSense emits explicit `interface: <ip>` lines.
5. **`auth-zone`'s `zonefile` path is relative to the Unbound chroot**
(`/var/unbound`), NOT absolute. Using an absolute path silently fails.
6. **DoT forwarders need `forward_tls_upstream`** flag AND `dns1host` /
`dns2host` set in `<system>` for SNI — without the hostname, pfSense emits
`forward-addr: 1.1.1.1@853` (no `#`) which Cloudflare rejects with
certificate hostname mismatch.
## Related Docs
- `docs/architecture/dns.md` — overall DNS architecture (K8s side, Technitium, CoreDNS)
- `docs/architecture/networking.md` — VLAN layout, pfSense interface mapping
- `.claude/skills/pfsense/skill.md` — SSH / CLI patterns for pfSense management