infra/docs
Viktor Barzin 0f6321ce86 [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C)
Adds per-node DNS cache that transparently intercepts pod queries on
10.96.0.10 (kube-dns ClusterIP) AND 169.254.20.10 (link-local) via
hostNetwork + NET_ADMIN iptables NOTRACK rules. Pods keep using their
existing /etc/resolv.conf (nameserver 10.96.0.10) unchanged — no kubelet
rollout needed for transparent mode.

Layout mirrors existing stacks (technitium, descheduler, kured):
  stacks/nodelocal-dns/
    main.tf                                 # module wiring + IP params
    modules/nodelocal-dns/main.tf           # SA, Services, ConfigMap, DS

Key decisions:
  - Image: registry.k8s.io/dns/k8s-dns-node-cache:1.23.1
  - Co-listens on 169.254.20.10 + 10.96.0.10 (transparent interception)
  - Upstream path: kube-dns-upstream (new headless svc) → CoreDNS pods
    (separate ClusterIP avoids cache looping back through itself)
  - viktorbarzin.lan zone forwards directly to Technitium ClusterIP
    (10.96.0.53), bypassing CoreDNS for internal names
  - priorityClassName: system-node-critical
  - tolerations: operator=Exists (runs on master + all tainted nodes)
  - No CPU limit (cluster-wide policy); mem requests=32Mi, limit=128Mi
  - Kyverno dns_config drift suppressed on the DaemonSet
  - Kubelet clusterDNS NOT changed — transparent mode is sufficient;
    rolling 5 nodes just to switch to 169.254.20.10 has no additional
    benefit and expanding blast radius for no reason.

Verified:
  - DaemonSet 5/5 Ready across k8s-master + 4 workers
  - dig @169.254.20.10 idrac.viktorbarzin.lan -> 192.168.1.4
  - dig @169.254.20.10 github.com -> 140.82.121.3
  - Deleted all 3 CoreDNS pods; cached queries still resolved via
    NodeLocal DNSCache (resilience confirmed)

Docs: architecture/dns.md — adds NodeLocal DNSCache to Components table,
graph diagram, stacks table; rewrites pod DNS resolution paths to show
the cache layer; adds troubleshooting entry.

Closes: code-2k6
2026-04-19 15:46:41 +00:00
..
architecture [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
plans [docs] Update anti-AI and rybbit docs after rewrite-body removal 2026-04-17 21:43:13 +00:00
post-mortems [docs] post-mortem: clarify the sizeLimit vs container memory limit gotcha 2026-04-18 13:23:14 +00:00
runbooks [dns] static-client DNS — Proxmox host, registry VM dual-resolver setup (WS F) 2026-04-19 15:43:49 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage TrueNAS NFS, democratic-csi, and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest