Stand up the infra for Viktor's break-glass: when the devvm is wedged (cluster healthy), open breakglass.viktorbarzin.me, have Claude SSH in to diagnose/fix, and power-cycle VM 102 via the Proxmox host if needed. App half landed in the claude-agent-service repo. New stack stacks/claude-breakglass/ — own namespace + SA, NO Vault role (ESO syncs only its key, so the pod has zero direct Vault access). Hardened to survive the pressure it exists to fix: priorityClassName tier-0-core, broad node-pressure tolerations, anti-affinity off node1, imagePullPolicy Always. auth="required" ingress so it rides the Authentik resilience proxy and stays reachable via the basic-auth fallback during an auth-stack outage. Runs the shared claude-agent-service image with the breakglass entrypoint. files/breakglass-pve is the PVE forced-command (status|forensics|reset|stop| start|cycle on VM 102, forensics-first). Isolation: the shared claude-agent pod's terraform-state Vault policy is explicitly DENIED secret/claude-breakglass/* (stacks/vault/main.tf) so a prompt-injected agent on that pod can't read the root-on-devvm key. traefik: add a checksum/auth-proxy-htpasswd annotation so the auth-proxy rolls when the emergency basic-auth password rotates (it's a subPath mount that doesn't auto-update) — regenerated this session so Viktor has a known emergency credential, which the auth-stack-outage failure domain requires. Docs: docs/runbooks/breakglass-ui.md (full incident + bootstrap procedure, incl. the per-host from= NAT quirks) and a security.md note recording the two new privileged footholds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| adr | ||
| architecture | ||
| benchmarks | ||
| plans | ||
| post-mortems | ||
| runbooks | ||
| known-issues.md | ||
| README.md | ||
Infrastructure Documentation
This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.
Quick Reference
Network Ranges
- Physical Network:
192.168.1.0/24- Physical devices and host network - Management VLAN 10:
10.0.10.0/24- Infrastructure VMs and management - Kubernetes VLAN 20:
10.0.20.0/24- Kubernetes cluster network
Key URLs
- Public:
viktorbarzin.me - Internal:
viktorbarzin.lan
Architecture Documentation
| Document | Description |
|---|---|
| Overview | Infrastructure overview, hardware specs, VM inventory, and service catalog |
| Networking | Network topology, VLANs, routing, and firewall rules |
| VPN | Headscale mesh VPN and Cloudflare Tunnel configuration |
| Storage | Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management |
| Authentication | Authentik SSO, OIDC flows, and service integration |
| Security | CrowdSec IPS, Kyverno policies, and security controls |
| Monitoring | Prometheus, Grafana, Loki, and observability stack |
| Secrets Management | HashiCorp Vault integration and secret rotation |
| CI/CD | Woodpecker CI pipeline and deployment automation |
| Backup & DR | Backup strategy, disaster recovery, and restore procedures |
| Compute | Proxmox VMs, GPU passthrough, K8s resource management, and VPA |
| Databases | PostgreSQL, MySQL, Redis, and database operators |
| Multi-tenancy | Namespace isolation, tier system, and resource quotas |
Operations
- Runbooks - Step-by-step operational procedures
- Plans - Infrastructure change plans and rollout strategies
Getting Started
- Review the Overview for a high-level understanding
- Read the Networking doc to understand connectivity
- Check Compute for resource management patterns
- Explore individual architecture docs based on your area of interest