Commit graph

2 commits

Author SHA1 Message Date
Viktor Barzin
f6685a23a9 [dns] Kea: multi-IP DHCP option 6 (10.0.10, 10.0.20) + TSIG-signed DDNS (WS E)
Workstream E of the DNS hardening push. Two independent pfSense-side
changes to eliminate single-point DNS failures and the unauthenticated
RFC 2136 update vector.

Part 1 — Multi-IP DHCP option 6
- Before: clients on 10.0.10/24 got only 10.0.10.1; clients on 10.0.20/24
  got only 10.0.20.1. Internal resolver outage == cluster-wide DNS dark.
- After:
  - 10.0.10/24 -> [10.0.10.1, 94.140.14.14]
  - 10.0.20/24 -> [10.0.20.1, 94.140.14.14]
- 192.168.1/24 deliberately untouched (served by TP-Link AP, not pfSense
  Kea — pfSense WAN DHCP is disabled); already ships [192.168.1.2,
  94.140.14.14] so the end state is consistent across all three subnets.
- Applied via PHP: set $cfg['dhcpd']['lan']['dnsserver'] and
  $cfg['dhcpd']['opt1']['dnsserver'] as arrays. pfSense's
  services_kea4_configure() implodes the array into "data: a, b" on the
  "domain-name-servers" option-data entry (services.inc L1214).
- Verified:
  - DevVM (10.0.10.10) resolv.conf shows "nameserver 10.0.10.1" +
    "nameserver 94.140.14.14" after networkd renew.
  - k8s-node1 (10.0.20.101) same after networkctl reload + systemd-resolved
    restart.
  - Fallback drill on k8s-node1: `ip route add blackhole 10.0.20.1/32`;
    dig @10.0.20.1 google.com -> "no servers could be reached"; dig
    @94.140.14.14 google.com -> 216.58.204.110; system resolver
    (getent hosts) succeeds via the fallback IP. Blackhole route removed.

Part 2 — TSIG-signed Kea DHCP-DDNS
- Before: /usr/local/etc/kea/kea-dhcp-ddns.conf had `tsig-keys: []` and
  Technitium's viktorbarzin.lan zone had update=Deny. Unauthenticated
  update vector was latent (DDNS wiring in Kea DHCP4 is actually off
  today — "DDNS: disabled" in dhcpd.log) but would activate as soon as
  anyone turned on ddnsupdate on LAN/OPT1.
- Generated HMAC-SHA256 secret, base64-encoded 32 random bytes.
- Stored in Vault: secret/viktor/kea_ddns_tsig_secret (version 27).
- Created TSIG key "kea-ddns" on primary/secondary/tertiary Technitium
  instances via /api/settings/set (tsigKeys[]).
- Updated kea-dhcp-ddns.conf on pfSense with
  tsig-keys[]={name: "kea-ddns", algorithm: "HMAC-SHA256", secret: …}
  and key-name: kea-ddns on each forward-ddns / reverse-ddns domain.
  Pre-change backup at /usr/local/etc/kea/kea-dhcp-ddns.conf.2026-04-19-pre-tsig.
- Configured viktorbarzin.lan + 10.0.10.in-addr.arpa +
  20.0.10.in-addr.arpa + 1.168.192.in-addr.arpa on Technitium primary:
  - update = UseSpecifiedNetworkACL
  - updateNetworkACL = [10.0.20.1, 10.0.10.1, 192.168.1.2]
  - updateSecurityPolicies = [{tsigKeyName: kea-ddns,
                               domain: "*.<zone>", allowedTypes: [ANY]}]
  Technitium requires BOTH a source-IP match AND a valid TSIG signature.
- Verified TSIG end-to-end:
  - Signed A-record update from pfSense -> "successfully processed",
    dig returns 10.99.99.99 (log: "TSIG KeyName: kea-ddns; TSIG Algo:
    hmac-sha256; TSIG Error: NoError; RCODE: NoError").
  - Signed PTR update same zone pattern -> dig -x returns tsig-test
    FQDN.
  - Unsigned update from pfSense IP (in ACL) -> "update failed:
    REFUSED" (log: "refused a zone UPDATE request [...] due to Dynamic
    Updates Security Policy").
  - Test records cleaned up via signed nsupdate.

Safety
- pfSense config backup: /cf/conf/config.xml.2026-04-19-pre-kea-multi-ip
  (145898 bytes, pre-change snapshot — keep 30d).
- DDNS config backup: /usr/local/etc/kea/kea-dhcp-ddns.conf.2026-04-19-pre-tsig.
- TSIG secret lives only in Vault + in config.xml/kea-dhcp-ddns.conf on
  pfSense; not committed to git.

Docs
- architecture/dns.md: zone dynamic-updates section records the TSIG
  policy; Incident History gets a WS E entry.
- architecture/networking.md: DHCP Coverage table now shows the DNS
  option 6 values per subnet; pfSense block notes the TSIG-signed DDNS
  and config backup path.
- runbooks/pfsense-unbound.md: new "Kea DHCP-DDNS TSIG" section covers
  key rotation, emergency bypass, and enforcement-verification.

Closes: code-o6j

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:12:23 +00:00
Viktor Barzin
33d934c32f [dns] pfSense: Unbound replaces dnsmasq (WS D)
Replace pfSense dnsmasq (DNS Forwarder) with Unbound (DNS Resolver) so
LAN-side .viktorbarzin.lan resolution survives a full Kubernetes outage.

Out-of-band pfSense changes (not in Terraform; pfSense config.xml is
VM-managed). Backup at /cf/conf/config.xml.2026-04-19-pre-unbound on-box
+ /mnt/backup/pfsense/ nightly.

- <unbound> enabled; listens on lan, opt1, wan, lo0
- <forwarding> on + <forward_tls_upstream> → DoT to Cloudflare
  (1.1.1.1 / 1.0.0.1 port 853, SNI cloudflare-dns.com)
- <dnssec>, <prefetch>, <prefetchkey>, <dnsrecordcache> (serve-expired)
- msgcachesize=256MB, cache_max_ttl=7d, cache_min_ttl=60s
- custom_options: auth-zone viktorbarzin.lan master=10.0.20.201
  fallback-enabled=yes for-upstream=yes + serve-expired-ttl=259200
- <dnsmasq><enable> removed; dnsmasq stopped
- NAT rdr WAN UDP 53 → 10.0.20.201 removed (Unbound listens on WAN now)
- Technitium zone viktorbarzin.lan: zoneTransferNetworkACL set to
  10.0.20.1, 10.0.10.1, 192.168.1.2 (pfSense source IPs)

Verified:
- unbound-control list_auth_zones: viktorbarzin.lan serial 49367
- dig @127.0.0.1 idrac.viktorbarzin.lan returns 192.168.1.4 with aa flag
  (served from auth-zone, not forwarded)
- dig @127.0.0.1 example.com +dnssec returns ad flag (DoT + validated)
- /var/unbound/viktorbarzin.lan.zone has ~114 records
- K8s outage drill passed: scale technitium=0 → dig still returns via
  WAN/LAN/OPT1 interfaces → scale restored
- LAN/management/K8s VLAN clients all resolve via pfSense 192.168.1.2 /
  10.0.10.1 / 10.0.20.1 respectively

Trade-off: Technitium Split Horizon hairpin for 192.168.1.x →
*.viktorbarzin.me (non-proxied) no longer runs via pfSense (Unbound
answers locally). Fix if it bites: switch service to proxied or add
Unbound Host Override. Documented in docs/runbooks/pfsense-unbound.md.

Closes: code-k0d
2026-04-19 15:52:41 +00:00