DNS is served by a split architecture: **Technitium DNS** handles internal resolution (`.viktorbarzin.lan`) and recursive lookups, while **Cloudflare DNS** manages all public domains (`.viktorbarzin.me`). Kubernetes pods use **CoreDNS** which forwards to Technitium for internal zones. All three Technitium instances run on encrypted block storage with zone replication via AXFR every 30 minutes.
| pfSense | `stacks/pfsense/` | VM config (DNS config is via pfSense web UI) |
## DNS Resolution Paths
### K8s Pod → Internal Domain (.viktorbarzin.lan)
```
Pod → CoreDNS (kube-dns:53)
→ template: if 2+ labels before .viktorbarzin.lan → NXDOMAIN (ndots:5 junk filter)
→ forward to Technitium ClusterIP (10.96.0.53)
→ Technitium resolves from viktorbarzin.lan zone
```
The ndots:5 template in CoreDNS short-circuits queries like `www.cloudflare.com.viktorbarzin.lan` (caused by K8s search domain expansion) by returning NXDOMAIN for any query with 2+ labels before `.viktorbarzin.lan`. Only single-label queries (e.g., `idrac.viktorbarzin.lan`) reach Technitium.
### K8s Pod → Public Domain
```
Pod → CoreDNS (kube-dns:53)
→ forward to pfSense (10.0.20.1), fallback 8.8.8.8, 1.1.1.1
→ pfSense dnsmasq → Cloudflare (1.1.1.1)
```
### LAN Client (192.168.1.x) → Any Domain
```
Client gets DNS=192.168.1.2 (pfSense WAN) from DHCP
→ pfSense NAT rdr on WAN interface → Technitium LB (10.0.20.201)
→ Technitium resolves:
- .viktorbarzin.lan → local zone
- .viktorbarzin.me (non-proxied) → recursive, then Split Horizon translates
176.12.22.76 → 10.0.20.200 for 192.168.1.0/24 clients
- other → recursive to Cloudflare DoH (1.1.1.1)
```
Client source IPs are preserved (no SNAT on 192.168.1.x → 10.0.20.x path) — Technitium logs show real per-device IPs.
### Management VLAN (10.0.10.x) → Any Domain
```
Client gets DNS from Kea DHCP → pfSense (10.0.10.1)
→ pfSense dnsmasq:
- .viktorbarzin.lan → forward to Technitium (10.0.20.201)
- other → forward to Cloudflare (1.1.1.1)
```
### K8s VLAN (10.0.20.x) → Any Domain
```
Client gets DNS from Kea DHCP → pfSense (10.0.20.1)
→ pfSense dnsmasq:
- .viktorbarzin.lan → forward to Technitium (10.0.20.201)
- other → forward to Cloudflare (1.1.1.1)
```
## Technitium DNS — Internal DNS Server
### Deployment Topology
Three independent Technitium instances, each with its own encrypted block storage PVC (`proxmox-lvm-encrypted`, 2Gi each):
| Instance | Deployment | PVC | Web Service | Role |
All three pods share the `dns-server=true` label, so the DNS LoadBalancer (10.0.20.201) and ClusterIP (10.96.0.53) route queries to any healthy instance.
### High Availability
- **Pod anti-affinity**: `required` on `kubernetes.io/hostname` — all 3 pods run on different nodes
- **PodDisruptionBudget**: `minAvailable=2` — at least 2 DNS pods survive voluntary disruptions
- **Recreate strategy**: Each deployment uses `Recreate` (RWO block storage)
- **Zone sync CronJob** (`technitium-zone-sync`, every 30min): Replicates all primary zones to secondary/tertiary via AXFR. Idempotent — skips existing zones, creates missing ones as Secondary type.
| Cache min TTL | 60s | Reduces re-queries for short-TTL domains (e.g., headscale: 18s) |
| Cache max TTL | 7 days | Long cache for stable records |
| Serve stale | Enabled (3 days) | Resilience during upstream failures |
### Ad Blocking
Technitium runs built-in DNS blocking with:
- **OISD Big List** (~486K domains)
- **StevenBlack hosts list**
Blocking is enabled on all three instances (`DNS_SERVER_ENABLE_BLOCKING=true` on secondary/tertiary).
### Query Logging
| Backend | Status | Retention | Purpose |
|---------|--------|-----------|---------|
| MySQL (`technitium` DB) | Disabled | — | Legacy, disabled by password-sync CronJob |
| PostgreSQL (`technitium` DB on CNPG) | Enabled | 90 days | Primary query log store |
Grafana dashboard (`grafana-technitium-dashboard` ConfigMap) visualizes query logs from the MySQL datasource. A Grafana datasource is auto-provisioned via sidecar.
### Web UI & Ingress
- **Web UI**: `technitium.viktorbarzin.me` (Authentik-protected via `ingress_factory`)
- **DNS-over-HTTPS**: `dns.viktorbarzin.me` (separate ingress, port 80)
The TP-Link AP (dumb AP on 192.168.1.x) does not support hairpin NAT. LAN clients resolving non-proxied `*.viktorbarzin.me` domains get the public IP `176.12.22.76`, but can't reach it because the TP-Link won't route back to the local network.
### Solution
Technitium's **Split Horizon AddressTranslation** app post-processes DNS responses for 192.168.1.0/24 clients, translating the public IP to the internal Traefik LB IP:
```
176.12.22.76 → 10.0.20.200
```
**DNS Rebinding Protection** has `viktorbarzin.me` in `privateDomains` to allow the translated private IP without being stripped as a rebinding attack.
**Kyverno ndots injection**: A Kyverno policy injects `ndots:2` on all pods cluster-wide to reduce search domain expansion noise. The template regex is a second layer of defense for any queries that still get expanded.
All public domains are under the `viktorbarzin.me` zone. DNS records are **auto-created per service** via the `ingress_factory` module's `dns_type` parameter. A small number of records (Helm-managed ingresses, special cases) remain centrally managed in `config.tfvars`.
### How DNS Records Are Created
```
stacks/<service>/main.tf
module "ingress" {
source = ingress_factory
dns_type = "proxied" # ← auto-creates Cloudflare DNS record
- **Proxied (orange cloud)**: Traffic routes through Cloudflare CDN → Cloudflared tunnel → Traefik. Benefits: DDoS protection, caching, no public IP exposure.
- **Non-proxied (grey cloud)**: DNS resolves directly to public IP. Required for services needing direct connections (mail, VPN, WebSocket-heavy apps).
### Zone Settings
- **HTTP/3 (QUIC)**: Enabled globally via `cloudflare_zone_settings_override`
## DHCP → DNS Auto-Registration
Devices get automatic DNS registration without manual intervention. See [networking.md § IPAM & DNS Auto-Registration](networking.md#ipam--dns-auto-registration) for the full data flow diagram.
Summary:
1.**Kea DHCP** on pfSense assigns IP (53 reservations across 3 subnets)
2.**Kea DDNS** sends RFC 2136 dynamic update to Technitium (A + PTR records) — immediate
- **2026-04-14 (SEV1)**: NFS `fsid=0` caused Technitium primary data loss on restart. Fixed by migrating all 3 instances to `proxmox-lvm-encrypted`, adding zone-sync CronJob (30min AXFR). See [post-mortem](../post-mortems/2026-04-14-nfs-fsid0-dns-vault-outage.md).