infra/docs
Viktor Barzin f6685a23a9 [dns] Kea: multi-IP DHCP option 6 (10.0.10, 10.0.20) + TSIG-signed DDNS (WS E)
Workstream E of the DNS hardening push. Two independent pfSense-side
changes to eliminate single-point DNS failures and the unauthenticated
RFC 2136 update vector.

Part 1 — Multi-IP DHCP option 6
- Before: clients on 10.0.10/24 got only 10.0.10.1; clients on 10.0.20/24
  got only 10.0.20.1. Internal resolver outage == cluster-wide DNS dark.
- After:
  - 10.0.10/24 -> [10.0.10.1, 94.140.14.14]
  - 10.0.20/24 -> [10.0.20.1, 94.140.14.14]
- 192.168.1/24 deliberately untouched (served by TP-Link AP, not pfSense
  Kea — pfSense WAN DHCP is disabled); already ships [192.168.1.2,
  94.140.14.14] so the end state is consistent across all three subnets.
- Applied via PHP: set $cfg['dhcpd']['lan']['dnsserver'] and
  $cfg['dhcpd']['opt1']['dnsserver'] as arrays. pfSense's
  services_kea4_configure() implodes the array into "data: a, b" on the
  "domain-name-servers" option-data entry (services.inc L1214).
- Verified:
  - DevVM (10.0.10.10) resolv.conf shows "nameserver 10.0.10.1" +
    "nameserver 94.140.14.14" after networkd renew.
  - k8s-node1 (10.0.20.101) same after networkctl reload + systemd-resolved
    restart.
  - Fallback drill on k8s-node1: `ip route add blackhole 10.0.20.1/32`;
    dig @10.0.20.1 google.com -> "no servers could be reached"; dig
    @94.140.14.14 google.com -> 216.58.204.110; system resolver
    (getent hosts) succeeds via the fallback IP. Blackhole route removed.

Part 2 — TSIG-signed Kea DHCP-DDNS
- Before: /usr/local/etc/kea/kea-dhcp-ddns.conf had `tsig-keys: []` and
  Technitium's viktorbarzin.lan zone had update=Deny. Unauthenticated
  update vector was latent (DDNS wiring in Kea DHCP4 is actually off
  today — "DDNS: disabled" in dhcpd.log) but would activate as soon as
  anyone turned on ddnsupdate on LAN/OPT1.
- Generated HMAC-SHA256 secret, base64-encoded 32 random bytes.
- Stored in Vault: secret/viktor/kea_ddns_tsig_secret (version 27).
- Created TSIG key "kea-ddns" on primary/secondary/tertiary Technitium
  instances via /api/settings/set (tsigKeys[]).
- Updated kea-dhcp-ddns.conf on pfSense with
  tsig-keys[]={name: "kea-ddns", algorithm: "HMAC-SHA256", secret: …}
  and key-name: kea-ddns on each forward-ddns / reverse-ddns domain.
  Pre-change backup at /usr/local/etc/kea/kea-dhcp-ddns.conf.2026-04-19-pre-tsig.
- Configured viktorbarzin.lan + 10.0.10.in-addr.arpa +
  20.0.10.in-addr.arpa + 1.168.192.in-addr.arpa on Technitium primary:
  - update = UseSpecifiedNetworkACL
  - updateNetworkACL = [10.0.20.1, 10.0.10.1, 192.168.1.2]
  - updateSecurityPolicies = [{tsigKeyName: kea-ddns,
                               domain: "*.<zone>", allowedTypes: [ANY]}]
  Technitium requires BOTH a source-IP match AND a valid TSIG signature.
- Verified TSIG end-to-end:
  - Signed A-record update from pfSense -> "successfully processed",
    dig returns 10.99.99.99 (log: "TSIG KeyName: kea-ddns; TSIG Algo:
    hmac-sha256; TSIG Error: NoError; RCODE: NoError").
  - Signed PTR update same zone pattern -> dig -x returns tsig-test
    FQDN.
  - Unsigned update from pfSense IP (in ACL) -> "update failed:
    REFUSED" (log: "refused a zone UPDATE request [...] due to Dynamic
    Updates Security Policy").
  - Test records cleaned up via signed nsupdate.

Safety
- pfSense config backup: /cf/conf/config.xml.2026-04-19-pre-kea-multi-ip
  (145898 bytes, pre-change snapshot — keep 30d).
- DDNS config backup: /usr/local/etc/kea/kea-dhcp-ddns.conf.2026-04-19-pre-tsig.
- TSIG secret lives only in Vault + in config.xml/kea-dhcp-ddns.conf on
  pfSense; not committed to git.

Docs
- architecture/dns.md: zone dynamic-updates section records the TSIG
  policy; Incident History gets a WS E entry.
- architecture/networking.md: DHCP Coverage table now shows the DNS
  option 6 values per subnet; pfSense block notes the TSIG-signed DDNS
  and config backup path.
- runbooks/pfsense-unbound.md: new "Kea DHCP-DDNS TSIG" section covers
  key rotation, emergency bypass, and enforcement-verification.

Closes: code-o6j

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:12:23 +00:00
..
architecture [dns] Kea: multi-IP DHCP option 6 (10.0.10, 10.0.20) + TSIG-signed DDNS (WS E) 2026-04-19 16:12:23 +00:00
plans [docs] Update anti-AI and rybbit docs after rewrite-body removal 2026-04-17 21:43:13 +00:00
post-mortems [docs] post-mortem: clarify the sizeLimit vs container memory limit gotcha 2026-04-18 13:23:14 +00:00
runbooks [dns] Kea: multi-IP DHCP option 6 (10.0.10, 10.0.20) + TSIG-signed DDNS (WS E) 2026-04-19 16:12:23 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage TrueNAS NFS, democratic-csi, and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest