infra/docs
Viktor Barzin af6574a006 [dns] Fix CoreDNS serve_stale syntax — 24h TTL, no refresh-mode arg
CoreDNS refused to load the new Corefile with `serve_stale 3600s 86400s`:

  plugin/cache: invalid value for serve_stale refresh mode: 86400s

serve_stale takes one DURATION and an optional refresh_mode keyword
("immediate" or "verify"), not two durations. Simplified to
`serve_stale 86400s` (serve cached entries for up to 24h when upstream
is unreachable). The new CoreDNS pods were CrashLoopBackOff; the two
old pods kept serving traffic so there was no outage, but the partial
apply left the cluster wedged with the bad ConfigMap.

Also collapses the inline viktorbarzin.lan cache block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 15:18:43 +00:00
..
architecture [dns] Fix CoreDNS serve_stale syntax — 24h TTL, no refresh-mode arg 2026-04-19 15:18:43 +00:00
plans [docs] Update anti-AI and rybbit docs after rewrite-body removal 2026-04-17 21:43:13 +00:00
post-mortems [docs] post-mortem: clarify the sizeLimit vs container memory limit gotcha 2026-04-18 13:23:14 +00:00
runbooks [dns] DNS reliability & hardening — Technitium + CoreDNS + alerts + readiness gate 2026-04-19 14:53:41 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage TrueNAS NFS, democratic-csi, and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest