diff --git a/docs/architecture/dns.md b/docs/architecture/dns.md
index d0491ca0..acf5abee 100644
--- a/docs/architecture/dns.md
+++ b/docs/architecture/dns.md
@@ -1,10 +1,10 @@
# DNS Architecture
-Last updated: 2026-04-19
+Last updated: 2026-04-19 (NodeLocal DNSCache deployed — Workstream C)
## Overview
-DNS is served by a split architecture: **Technitium DNS** handles internal resolution (`.viktorbarzin.lan`) and recursive lookups, while **Cloudflare DNS** manages all public domains (`.viktorbarzin.me`). Kubernetes pods use **CoreDNS** which forwards to Technitium for internal zones. All three Technitium instances run on encrypted block storage with zone replication via AXFR every 30 minutes.
+DNS is served by a split architecture: **Technitium DNS** handles internal resolution (`.viktorbarzin.lan`) and recursive lookups, while **Cloudflare DNS** manages all public domains (`.viktorbarzin.me`). Kubernetes pods use **CoreDNS** which forwards to Technitium for internal zones. All three Technitium instances run on encrypted block storage with zone replication via AXFR every 30 minutes. A **NodeLocal DNSCache** DaemonSet runs on every node and transparently intercepts pod DNS traffic, caching responses locally so pods keep resolving even during CoreDNS, Technitium, or pfSense disruptions.
## Architecture Diagram
@@ -29,7 +29,9 @@ graph TB
end
subgraph "Kubernetes Cluster"
+ NodeLocalDNS[NodeLocal DNSCache
DaemonSet, 5 nodes
169.254.20.10 + 10.96.0.10]
CoreDNS[CoreDNS
kube-system
.:53 + viktorbarzin.lan:53]
+ KubeDNSUpstream[kube-dns-upstream
ClusterIP, selects CoreDNS pods]
subgraph "Technitium HA (namespace: technitium)"
Primary[Primary
technitium]
@@ -59,6 +61,8 @@ graph TB
pf_dnsmasq -->|.viktorbarzin.lan| LB_DNS
pf_dnsmasq -->|public queries| CF
+ NodeLocalDNS -->|cache miss| KubeDNSUpstream
+ KubeDNSUpstream --> CoreDNS
CoreDNS -->|.viktorbarzin.lan| ClusterIP
CoreDNS -->|public queries| pf_dnsmasq
@@ -80,6 +84,7 @@ graph TB
|-----------|----------|---------|---------|
| Technitium DNS | K8s namespace `technitium` | 14.3.0 | Primary internal DNS + recursive resolver |
| CoreDNS | K8s `kube-system` | Cluster default | K8s service discovery + forwarding to Technitium |
+| NodeLocal DNSCache | K8s `kube-system` (DaemonSet) | `k8s-dns-node-cache:1.23.1` | Per-node DNS cache, transparent interception on 10.96.0.10 + 169.254.20.10. Insulates pods from CoreDNS/Technitium/pfSense disruption. |
| Cloudflare DNS | SaaS | N/A | Public domain management (~50 domains) |
| pfSense dnsmasq | 10.0.20.1 | pfSense 2.7.x | DNS forwarder for management VLAN |
| Kea DHCP-DDNS | 10.0.20.1 | pfSense 2.7.x | Automatic DNS registration on DHCP lease |
@@ -90,6 +95,7 @@ graph TB
| Stack | Path | DNS Resources |
|-------|------|---------------|
| Technitium | `stacks/technitium/` | 3 deployments, services, PVCs, 4 CronJobs, CoreDNS ConfigMap |
+| NodeLocal DNSCache | `stacks/nodelocal-dns/` | DaemonSet (5 pods), ConfigMap, kube-dns-upstream Service, headless metrics Service |
| Cloudflared | `stacks/cloudflared/` | Cloudflare DNS records (A, AAAA, CNAME, MX, TXT), tunnel config |
| phpIPAM | `stacks/phpipam/` | dns-sync CronJob, pfsense-import CronJob |
| pfSense | `stacks/pfsense/` | VM config (DNS config is via pfSense web UI) |
@@ -99,10 +105,12 @@ graph TB
### K8s Pod → Internal Domain (.viktorbarzin.lan)
```
-Pod → CoreDNS (kube-dns:53)
- → template: if 2+ labels before .viktorbarzin.lan → NXDOMAIN (ndots:5 junk filter)
- → forward to Technitium ClusterIP (10.96.0.53)
- → Technitium resolves from viktorbarzin.lan zone
+Pod → NodeLocal DNSCache (intercepts on kube-dns:10.96.0.10)
+ → cache hit: serve locally (TTL 30s / stale up to 86400s via CoreDNS upstream)
+ → cache miss: forward to kube-dns-upstream (selects CoreDNS pods directly)
+ → CoreDNS: template matches 2+ labels before .viktorbarzin.lan → NXDOMAIN
+ → CoreDNS: forward to Technitium ClusterIP (10.96.0.53)
+ → Technitium resolves from viktorbarzin.lan zone
```
The ndots:5 template in CoreDNS short-circuits queries like `www.cloudflare.com.viktorbarzin.lan` (caused by K8s search domain expansion) by returning NXDOMAIN for any query with 2+ labels before `.viktorbarzin.lan`. Only single-label queries (e.g., `idrac.viktorbarzin.lan`) reach Technitium.
@@ -110,9 +118,11 @@ The ndots:5 template in CoreDNS short-circuits queries like `www.cloudflare.com.
### K8s Pod → Public Domain
```
-Pod → CoreDNS (kube-dns:53)
- → forward to pfSense (10.0.20.1), fallback 8.8.8.8, 1.1.1.1
- → pfSense dnsmasq → Cloudflare (1.1.1.1)
+Pod → NodeLocal DNSCache (intercepts on kube-dns:10.96.0.10)
+ → cache hit: serve locally
+ → cache miss: forward to kube-dns-upstream (selects CoreDNS pods directly)
+ → CoreDNS: forward to pfSense (10.0.20.1), fallback 8.8.8.8, 1.1.1.1
+ → pfSense dnsmasq → Cloudflare (1.1.1.1)
```
### LAN Client (192.168.1.x) → Any Domain
@@ -252,6 +262,23 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons
Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).
+## NodeLocal DNSCache
+
+A DaemonSet in `kube-system` (`node-local-dns`, image `registry.k8s.io/dns/k8s-dns-node-cache:1.23.1`) runs on every node including the control plane. Each pod uses `hostNetwork: true` + `NET_ADMIN` and installs iptables NOTRACK rules so it transparently serves DNS on both:
+
+- **169.254.20.10** — the canonical link-local IP from the upstream docs
+- **10.96.0.10** — the `kube-dns` ClusterIP, so existing pods (which already use this as their nameserver) hit the on-node cache with no kubelet change
+
+Cache misses go to a separate `kube-dns-upstream` ClusterIP service (not `kube-dns`, to avoid looping back to ourselves) that selects the CoreDNS pods directly via `k8s-app=kube-dns`.
+
+Priority class is `system-node-critical`; tolerations are permissive (`operator: Exists`) so the DaemonSet runs on tainted master and other reserved nodes. Kyverno `dns_config` drift is suppressed via `ignore_changes` on the DaemonSet.
+
+**Caching**: `cluster.local:53` caches 9984 success / 9984 denial entries with 30s/5s TTLs. Other zones cache 30s. If CoreDNS is killed, nodes keep answering cached names — verified on 2026-04-19 by deleting all three CoreDNS pods and running `dig @169.254.20.10 idrac.viktorbarzin.lan` + `dig @169.254.20.10 github.com` from a pod (both returned answers).
+
+**Kubelet clusterDNS**: **Unchanged** — still `10.96.0.10`. NodeLocal DNSCache co-listens on that IP so traffic interception is transparent; switching kubelet to `169.254.20.10` would require a rolling reconfigure of every node and provides no additional cache benefit over transparent mode.
+
+**Metrics**: A headless Service `node-local-dns` (ClusterIP `None`) exposes each pod on port `9253` for Prometheus scraping (annotated `prometheus.io/scrape=true`).
+
## CoreDNS Configuration
CoreDNS is managed via Terraform in `stacks/technitium/modules/technitium/` — the Corefile ConfigMap lives in `main.tf`, and scaling/PDB are in `coredns.tf` (a `kubernetes_deployment_v1_patch` against the kubeadm-managed Deployment).
@@ -401,11 +428,13 @@ The zone-sync CronJob (runs every 30min) pushes the following to the Prometheus
### DNS Not Resolving Internal Domains
-1. Check Technitium pods: `kubectl get pod -n technitium`
-2. Check all 3 are healthy: `kubectl get pod -n technitium -l dns-server=true`
-3. Test from a pod: `kubectl exec -it -- nslookup idrac.viktorbarzin.lan 10.96.0.53`
-4. Check CoreDNS logs: `kubectl logs -n kube-system -l k8s-app=kube-dns`
-5. Verify ClusterIP service: `kubectl get svc -n technitium technitium-dns-internal`
+1. Check NodeLocal DNSCache pods first — pod queries go through these: `kubectl -n kube-system get pod -l k8s-app=node-local-dns -o wide`
+2. Check Technitium pods: `kubectl get pod -n technitium`
+3. Check all 3 are healthy: `kubectl get pod -n technitium -l dns-server=true`
+4. Test via NodeLocal DNSCache from a pod: `kubectl exec -it -- dig @169.254.20.10 idrac.viktorbarzin.lan`
+5. Bypass NodeLocal DNSCache (test CoreDNS directly): `kubectl exec -it -- dig @ idrac.viktorbarzin.lan` (`kubectl get svc -n kube-system kube-dns-upstream`)
+6. Check CoreDNS logs: `kubectl logs -n kube-system -l k8s-app=kube-dns`
+7. Verify ClusterIP service: `kubectl get svc -n technitium technitium-dns-internal`
### LAN Clients Can't Resolve
diff --git a/stacks/nodelocal-dns/main.tf b/stacks/nodelocal-dns/main.tf
new file mode 100644
index 00000000..93626347
--- /dev/null
+++ b/stacks/nodelocal-dns/main.tf
@@ -0,0 +1,16 @@
+module "nodelocal_dns" {
+ source = "./modules/nodelocal-dns"
+
+ # Canonical link-local IP from upstream NodeLocal DNSCache docs.
+ link_local_ip = "169.254.20.10"
+
+ # kube-dns ClusterIP — co-listened so transparent interception works
+ # without mutating kubelet clusterDNS on every node.
+ kube_dns_ip = "10.96.0.10"
+
+ # Technitium ClusterIP — upstream for .viktorbarzin.lan.
+ technitium_ip = "10.96.0.53"
+
+ image = "registry.k8s.io/dns/k8s-dns-node-cache:1.23.1"
+ tier = local.tiers.core
+}
diff --git a/stacks/nodelocal-dns/modules/nodelocal-dns/main.tf b/stacks/nodelocal-dns/modules/nodelocal-dns/main.tf
new file mode 100644
index 00000000..3dce76bc
--- /dev/null
+++ b/stacks/nodelocal-dns/modules/nodelocal-dns/main.tf
@@ -0,0 +1,359 @@
+// NodeLocal DNSCache — per-node DNS cache as a DaemonSet.
+//
+// Why: insulates pods from transient CoreDNS / pfSense issues. Each node
+// runs a CoreDNS-based cache listening on the link-local IP (169.254.20.10)
+// AND on the kube-dns ClusterIP (10.96.0.10) via hostNetwork + NET_ADMIN
+// iptables NOTRACK rules. Pods already use 10.96.0.10 as their resolver
+// (verified in /etc/resolv.conf), so traffic is transparently intercepted
+// on the node and served from the local cache — no kubelet clusterDNS
+// change required.
+//
+// Upstream CoreDNS is reached via a separate headless service
+// `kube-dns-upstream` that selects the CoreDNS pods directly (distinct
+// ClusterIP from kube-dns so we can forward without looping back to
+// ourselves).
+//
+// Sources:
+// https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/
+// https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
+
+variable "link_local_ip" {
+ type = string
+ default = "169.254.20.10"
+}
+
+variable "kube_dns_ip" {
+ type = string
+ default = "10.96.0.10"
+}
+
+variable "technitium_ip" {
+ type = string
+ default = "10.96.0.53"
+}
+
+variable "image" {
+ type = string
+ default = "registry.k8s.io/dns/k8s-dns-node-cache:1.23.1"
+}
+
+variable "tier" {
+ type = string
+ default = "0-core"
+}
+
+locals {
+ namespace = "kube-system"
+ app_label = "node-local-dns"
+}
+
+// ---------------------------------------------------------------------------
+// ServiceAccount + RBAC
+// ---------------------------------------------------------------------------
+
+resource "kubernetes_service_account" "node_local_dns" {
+ metadata {
+ name = "node-local-dns"
+ namespace = local.namespace
+ labels = {
+ "k8s-app" = local.app_label
+ }
+ }
+}
+
+// ---------------------------------------------------------------------------
+// Upstream service — routes cache misses to CoreDNS pods (not the kube-dns
+// ClusterIP, because we're co-listening on that IP ourselves).
+// ---------------------------------------------------------------------------
+
+resource "kubernetes_service" "kube_dns_upstream" {
+ metadata {
+ name = "kube-dns-upstream"
+ namespace = local.namespace
+ labels = {
+ "k8s-app" = "kube-dns"
+ "kubernetes.io/cluster-service" = "true"
+ "kubernetes.io/name" = "KubeDNSUpstream"
+ }
+ }
+ spec {
+ selector = {
+ "k8s-app" = "kube-dns"
+ }
+ port {
+ name = "dns"
+ port = 53
+ protocol = "UDP"
+ target_port = "53"
+ }
+ port {
+ name = "dns-tcp"
+ port = 53
+ protocol = "TCP"
+ target_port = "53"
+ }
+ }
+}
+
+// ---------------------------------------------------------------------------
+// Headless service — Prometheus metrics scrape target (one endpoint per node).
+// ---------------------------------------------------------------------------
+
+resource "kubernetes_service" "node_local_dns" {
+ metadata {
+ name = "node-local-dns"
+ namespace = local.namespace
+ labels = {
+ "k8s-app" = local.app_label
+ "kubernetes.io/cluster-service" = "true"
+ }
+ annotations = {
+ "prometheus.io/port" = "9253"
+ "prometheus.io/scrape" = "true"
+ }
+ }
+ spec {
+ cluster_ip = "None"
+ selector = {
+ "k8s-app" = local.app_label
+ }
+ port {
+ name = "metrics"
+ port = 9253
+ target_port = "9253"
+ }
+ }
+}
+
+// ---------------------------------------------------------------------------
+// Corefile — inline here so changes are reviewable via Terraform plan.
+// The node-cache binary does string replacement for __PILLAR__ tokens at
+// startup; we pre-fill LOCAL/DNS_SERVER with our real IPs and leave
+// __PILLAR__CLUSTER__DNS__ for the runtime substitution from
+// kube-dns-upstream endpoints.
+// ---------------------------------------------------------------------------
+
+resource "kubernetes_config_map" "node_local_dns" {
+ metadata {
+ name = "node-local-dns"
+ namespace = local.namespace
+ labels = {
+ "k8s-app" = local.app_label
+ }
+ }
+ data = {
+ "Corefile" = <<-EOF
+ cluster.local:53 {
+ errors
+ cache {
+ success 9984 30
+ denial 9984 5
+ }
+ reload
+ loop
+ bind ${var.link_local_ip} ${var.kube_dns_ip}
+ forward . __PILLAR__CLUSTER__DNS__ {
+ force_tcp
+ }
+ prometheus :9253
+ health ${var.link_local_ip}:8080
+ }
+ in-addr.arpa:53 {
+ errors
+ cache 30
+ reload
+ loop
+ bind ${var.link_local_ip} ${var.kube_dns_ip}
+ forward . __PILLAR__CLUSTER__DNS__ {
+ force_tcp
+ }
+ prometheus :9253
+ }
+ ip6.arpa:53 {
+ errors
+ cache 30
+ reload
+ loop
+ bind ${var.link_local_ip} ${var.kube_dns_ip}
+ forward . __PILLAR__CLUSTER__DNS__ {
+ force_tcp
+ }
+ prometheus :9253
+ }
+ viktorbarzin.lan:53 {
+ errors
+ cache 30
+ reload
+ loop
+ bind ${var.link_local_ip} ${var.kube_dns_ip}
+ forward . ${var.technitium_ip}
+ prometheus :9253
+ }
+ .:53 {
+ errors
+ cache 30
+ reload
+ loop
+ bind ${var.link_local_ip} ${var.kube_dns_ip}
+ forward . __PILLAR__CLUSTER__DNS__
+ prometheus :9253
+ }
+ EOF
+ }
+}
+
+// ---------------------------------------------------------------------------
+// DaemonSet
+// ---------------------------------------------------------------------------
+
+resource "kubernetes_daemon_set_v1" "node_local_dns" {
+ metadata {
+ name = "node-local-dns"
+ namespace = local.namespace
+ labels = {
+ "k8s-app" = local.app_label
+ tier = var.tier
+ }
+ }
+ spec {
+ selector {
+ match_labels = {
+ "k8s-app" = local.app_label
+ }
+ }
+ strategy {
+ type = "RollingUpdate"
+ rolling_update {
+ max_unavailable = "10%"
+ }
+ }
+ template {
+ metadata {
+ labels = {
+ "k8s-app" = local.app_label
+ }
+ annotations = {
+ # Ensure pods pick up Corefile changes without waiting for a
+ # reload (CoreDNS reload plugin picks up changes within 30s,
+ # but a hash annotation forces an immediate rollout).
+ "node-local-dns/corefile-hash" = sha256(kubernetes_config_map.node_local_dns.data["Corefile"])
+ }
+ }
+ spec {
+ priority_class_name = "system-node-critical"
+ service_account_name = kubernetes_service_account.node_local_dns.metadata[0].name
+ host_network = true
+ dns_policy = "Default"
+ termination_grace_period_seconds = 0
+
+ toleration {
+ operator = "Exists"
+ }
+
+ container {
+ name = "node-cache"
+ image = var.image
+ image_pull_policy = "IfNotPresent"
+
+ resources {
+ # Per cluster CPU-limits-removed policy: requests only, no limit.
+ requests = {
+ cpu = "25m"
+ memory = "32Mi"
+ }
+ limits = {
+ memory = "128Mi"
+ }
+ }
+
+ args = [
+ "-localip",
+ "${var.link_local_ip},${var.kube_dns_ip}",
+ "-conf",
+ "/etc/Corefile",
+ "-upstreamsvc",
+ kubernetes_service.kube_dns_upstream.metadata[0].name,
+ "-skipteardown=true",
+ ]
+
+ security_context {
+ capabilities {
+ add = ["NET_ADMIN"]
+ }
+ }
+
+ port {
+ name = "dns"
+ container_port = 53
+ protocol = "UDP"
+ }
+ port {
+ name = "dns-tcp"
+ container_port = 53
+ protocol = "TCP"
+ }
+ port {
+ name = "metrics"
+ container_port = 9253
+ protocol = "TCP"
+ }
+
+ liveness_probe {
+ http_get {
+ host = var.link_local_ip
+ path = "/health"
+ port = "8080"
+ }
+ initial_delay_seconds = 60
+ timeout_seconds = 5
+ }
+
+ volume_mount {
+ name = "xtables-lock"
+ mount_path = "/run/xtables.lock"
+ read_only = false
+ }
+ volume_mount {
+ name = "config-volume"
+ mount_path = "/etc/coredns"
+ }
+ volume_mount {
+ name = "kube-dns-config"
+ mount_path = "/etc/kube-dns"
+ }
+ }
+
+ volume {
+ name = "xtables-lock"
+ host_path {
+ path = "/run/xtables.lock"
+ type = "FileOrCreate"
+ }
+ }
+ volume {
+ name = "kube-dns-config"
+ config_map {
+ name = "kube-dns"
+ optional = true
+ }
+ }
+ volume {
+ name = "config-volume"
+ config_map {
+ name = kubernetes_config_map.node_local_dns.metadata[0].name
+ items {
+ key = "Corefile"
+ path = "Corefile.base"
+ }
+ }
+ }
+ }
+ }
+ }
+
+ lifecycle {
+ # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with
+ # ndots=2 on every pod; ignoring avoids spurious plan drift.
+ ignore_changes = [spec[0].template[0].spec[0].dns_config]
+ }
+}
diff --git a/stacks/nodelocal-dns/terragrunt.hcl b/stacks/nodelocal-dns/terragrunt.hcl
new file mode 100644
index 00000000..ac8dda28
--- /dev/null
+++ b/stacks/nodelocal-dns/terragrunt.hcl
@@ -0,0 +1,11 @@
+include "root" {
+ path = find_in_parent_folders()
+}
+
+# CoreDNS ConfigMap + kube-dns Service live in the technitium stack.
+# NodeLocal DNSCache co-listens on the kube-dns ClusterIP (10.96.0.10)
+# via hostNetwork + iptables NOTRACK — no kubelet clusterDNS change needed.
+dependency "technitium" {
+ config_path = "../technitium"
+ skip_outputs = true
+}