infra/docs
Viktor Barzin 50d0f1affa kyverno: strip orphaned keel.sh/match-tag fleet-wide (image-swap fix)
The 2026-05-26 migration flipped the keel default force->patch and dropped
match-tag from the inject-keel-annotations patch, but Kyverno's add-only
mutate can't remove an annotation that's no longer listed -- 194 workloads
kept a stale keel.sh/match-tag=true. Under it Keel cross-assigned images in
multi-image pods: the blog's nginx<->nginx-exporter images were swapped and
the site was down 2026-05-26 -> 06-01 (nginx received the exporter's
-nginx.scrape-uri arg and CrashLoopBackOff'd); changedetection was silently
swapped (app lost its /datastore PVC + env, ran ephemeral for days).

- policy now sets keel.sh/match-tag=null (strips on admission, never re-added)
- swept the annotation off all 194 existing workloads (kubectl, no pod restart)
- AGENTS.md: documents the strip; post-mortem added

blog + changedetection un-swapped via kubectl set image (TF-ignored images);
both 2/2 and serving 200. Policy already applied via scripts/tg (Tier-1 PG
state authoritative). [ci skip]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 19:50:41 +00:00
..
architecture backup: stop offsite-copying regenerable data; shrink nextcloud backup; pin nextcloud image 2026-06-01 15:15:26 +00:00
benchmarks infra/llama-cpp: benchmark report + -fa flag fix 2026-05-10 15:03:16 +00:00
plans traefik+pfsense: real IPv6 client IPs via HAProxy PROXY-v2 bridge 2026-05-30 09:51:23 +00:00
post-mortems kyverno: strip orphaned keel.sh/match-tag fleet-wide (image-swap fix) 2026-06-01 19:50:41 +00:00
runbooks docs(kms): document the consequence-gated edition switch (changepk + ODT) 2026-06-01 15:15:27 +00:00
known-issues.md docs: known-issues entry for the Ubuntu 26.04 / NVIDIA driver gap 2026-05-17 11:15:26 +00:00
README.md [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest