Bringing the 2026-04-19 rework to its end-state. Cutover soaked for ~1h
with 0 alerts firing and 127 ops/sec on the v2 master — skipped the
nominal 24h rollback window per user direction.
- Removed `helm_release.redis` (Bitnami chart v25.3.2) from TF. Helm
destroy cleaned up the StatefulSet redis-node (already scaled to 0),
ConfigMaps, ServiceAccount, RBAC, and the deprecated `redis` + `redis-headless`
ClusterIP services that the chart owned.
- Removed `null_resource.patch_redis_service` — the kubectl-patch hack
that worked around the Bitnami chart's broken service selector. No
Helm chart, no patch needed.
- Removed the dead `depends_on = [helm_release.redis]` from the HAProxy
deployment.
- `kubectl delete pvc -n redis redis-data-redis-node-{0,1}` for the two
orphan PVCs the StatefulSet template left behind (K8s doesn't cascade-delete).
- Simplified the top-of-file comment and the redis-v2 architecture
comment — they talked about the parallel-cluster migration state that
no longer exists. Folded in the sentinel hostname gotcha, the redis
8.x image requirement, and the BGSAVE+AOF-rewrite memory reasoning
so the rationale survives in the code rather than only in beads.
- `RedisDown` alert no longer matches `redis-node|redis-v2` — just
`redis-v2` since that's the only StatefulSet now. Kept the `or on()
vector(0)` so the alert fires when kube_state_metrics has no sample
(e.g. after accidental delete).
- `docs/architecture/databases.md` trimmed: no more "pending TF removal"
or "cold rollback for 24h" language.
Verification after apply:
- kubectl get all -n redis: redis-v2-{0,1,2} (3/3 Running) + redis-haproxy-*
(3 pods, PDB minAvailable=2). Services: redis-master + redis-v2-headless only.
- PVCs: data-redis-v2-{0,1,2} only (redis-data-redis-node-* deleted).
- Sentinel: all 3 agree mymaster = redis-v2-0 hostname.
- HAProxy: PING PONG, DBSIZE 92, 127 ops/sec on master.
- Prometheus: 0 firing redis alerts.
Closes: code-v2b
Closes: code-2mw
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| architecture | ||
| plans | ||
| post-mortems | ||
| runbooks | ||
| README.md | ||
Infrastructure Documentation
This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.
Quick Reference
Network Ranges
- Physical Network:
192.168.1.0/24- Physical devices and host network - Management VLAN 10:
10.0.10.0/24- Infrastructure VMs and management - Kubernetes VLAN 20:
10.0.20.0/24- Kubernetes cluster network
Key URLs
- Public:
viktorbarzin.me - Internal:
viktorbarzin.lan
Architecture Documentation
| Document | Description |
|---|---|
| Overview | Infrastructure overview, hardware specs, VM inventory, and service catalog |
| Networking | Network topology, VLANs, routing, and firewall rules |
| VPN | Headscale mesh VPN and Cloudflare Tunnel configuration |
| Storage | TrueNAS NFS, democratic-csi, and persistent volume management |
| Authentication | Authentik SSO, OIDC flows, and service integration |
| Security | CrowdSec IPS, Kyverno policies, and security controls |
| Monitoring | Prometheus, Grafana, Loki, and observability stack |
| Secrets Management | HashiCorp Vault integration and secret rotation |
| CI/CD | Woodpecker CI pipeline and deployment automation |
| Backup & DR | Backup strategy, disaster recovery, and restore procedures |
| Compute | Proxmox VMs, GPU passthrough, K8s resource management, and VPA |
| Databases | PostgreSQL, MySQL, Redis, and database operators |
| Multi-tenancy | Namespace isolation, tier system, and resource quotas |
Operations
- Runbooks - Step-by-step operational procedures
- Plans - Infrastructure change plans and rollout strategies
Getting Started
- Review the Overview for a high-level understanding
- Read the Networking doc to understand connectivity
- Check Compute for resource management patterns
- Explore individual architecture docs based on your area of interest