infra/docs
Viktor Barzin 077ac97df5
Some checks failed
ci/woodpecker/push/default Pipeline failed
k8s-version-upgrade: auto-restore apiserver OIDC after control-plane bumps
kubeadm upgrade apply regenerates the apiserver static-pod manifest and drops
the --authentication-config flag, silently breaking SSO (kubectl/kubelogin + the
k8s dashboard) until someone manually re-applied the rbac stack. That manual step
ran after every control-plane upgrade — the one thing keeping autonomous patch
upgrades from being truly hands-off (it bit us this cycle: an earlier master bump
left SSO broken until we noticed).

Automate it: the rbac stack now publishes its existing OIDC restore script (the
same one its null_resource runs) to a kube-system/apiserver-oidc-restore
ConfigMap, and the upgrade chain's phase_master re-runs it on master right after
the kubeadm upgrade — while tigera-operator is still quiesced so the flag-add
apiserver restart can't crashloop it. The script is idempotent and health-gates
/livez with auto-rollback; the step is non-fatal (a failure only lags SSO until
the next rbac apply, it won't abort the upgrade). phase_master already self-skips
when master is at target, so this only fires when master was actually upgraded.

The chain SA gets a name-scoped get on that one ConfigMap. Runbook updated: the
manual restore is now a documented fallback (command corrected — it needs
-replace, since the null_resource trigger hash never changes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-19 06:04:30 +00:00
..
adr homelab: add memory verb-group (v0.3.0) — direct claude-memory HTTP client 2026-06-19 05:56:25 +00:00
architecture k8s-version-upgrade: move detection to nightly 23:00 UTC (overnight upgrades) 2026-06-17 18:16:32 +00:00
benchmarks fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
plans Merge remote-tracking branch 'origin/master' into wizard/reconcile-mirror 2026-06-16 22:32:43 +00:00
post-mortems Merge remote-tracking branch 'origin/master' into wizard/reconcile-mirror 2026-06-16 22:32:43 +00:00
runbooks k8s-version-upgrade: auto-restore apiserver OIDC after control-plane bumps 2026-06-19 06:04:30 +00:00
known-issues.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
README.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest