viktor/infra

Viktor Barzin e55c549c9a [redis] Phase 7 step 2: remove Bitnami helm_release + orphan PVCs Bringing the 2026-04-19 rework to its end-state. Cutover soaked for ~1h with 0 alerts firing and 127 ops/sec on the v2 master — skipped the nominal 24h rollback window per user direction. - Removed `helm_release.redis` (Bitnami chart v25.3.2) from TF. Helm destroy cleaned up the StatefulSet redis-node (already scaled to 0), ConfigMaps, ServiceAccount, RBAC, and the deprecated `redis` + `redis-headless` ClusterIP services that the chart owned. - Removed `null_resource.patch_redis_service` — the kubectl-patch hack that worked around the Bitnami chart's broken service selector. No Helm chart, no patch needed. - Removed the dead `depends_on = [helm_release.redis]` from the HAProxy deployment. - `kubectl delete pvc -n redis redis-data-redis-node-{0,1}` for the two orphan PVCs the StatefulSet template left behind (K8s doesn't cascade-delete). - Simplified the top-of-file comment and the redis-v2 architecture comment — they talked about the parallel-cluster migration state that no longer exists. Folded in the sentinel hostname gotcha, the redis 8.x image requirement, and the BGSAVE+AOF-rewrite memory reasoning so the rationale survives in the code rather than only in beads. - `RedisDown` alert no longer matches `redis-node\|redis-v2` — just `redis-v2` since that's the only StatefulSet now. Kept the `or on() vector(0)` so the alert fires when kube_state_metrics has no sample (e.g. after accidental delete). - `docs/architecture/databases.md` trimmed: no more "pending TF removal" or "cold rollback for 24h" language. Verification after apply: - kubectl get all -n redis: redis-v2-{0,1,2} (3/3 Running) + redis-haproxy-* (3 pods, PDB minAvailable=2). Services: redis-master + redis-v2-headless only. - PVCs: data-redis-v2-{0,1,2} only (redis-data-redis-node-* deleted). - Sentinel: all 3 agree mymaster = redis-v2-0 hostname. - HAProxy: PING PONG, DBSIZE 92, 127 ops/sec on master. - Prometheus: 0 firing redis alerts. Closes: code-v2b Closes: code-2mw Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-19 16:32:14 +00:00
..
architecture	[redis] Phase 7 step 2: remove Bitnami helm_release + orphan PVCs	2026-04-19 16:32:14 +00:00
plans	[docs] Update anti-AI and rybbit docs after rewrite-body removal	2026-04-17 21:43:13 +00:00
post-mortems	[docs] post-mortem: clarify the sizeLimit vs container memory limit gotcha	2026-04-18 13:23:14 +00:00
runbooks	[dns] Kea: multi-IP DHCP option 6 (10.0.10, 10.0.20) + TSIG-signed DDNS (WS E)	2026-04-19 16:12:23 +00:00
README.md	add architecture documentation for all infrastructure subsystems [ci skip]	2026-03-24 00:55:25 +02:00

README.md

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

Physical Network: 192.168.1.0/24 - Physical devices and host network
Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

Public: viktorbarzin.me
Internal: viktorbarzin.lan

Architecture Documentation

Document	Description
Overview	Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking	Network topology, VLANs, routing, and firewall rules
VPN	Headscale mesh VPN and Cloudflare Tunnel configuration
Storage	TrueNAS NFS, democratic-csi, and persistent volume management
Authentication	Authentik SSO, OIDC flows, and service integration
Security	CrowdSec IPS, Kyverno policies, and security controls
Monitoring	Prometheus, Grafana, Loki, and observability stack
Secrets Management	HashiCorp Vault integration and secret rotation
CI/CD	Woodpecker CI pipeline and deployment automation
Backup & DR	Backup strategy, disaster recovery, and restore procedures
Compute	Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases	PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy	Namespace isolation, tier system, and resource quotas

Operations

Runbooks - Step-by-step operational procedures
Plans - Infrastructure change plans and rollout strategies

Getting Started

Review the Overview for a high-level understanding
Read the Networking doc to understand connectivity
Check Compute for resource management patterns
Explore individual architecture docs based on your area of interest