infra/docs
Viktor Barzin 5ea238c707 t3: pin t3@0.0.24 + stop nightly auto-update (auth-outage fix) [ci skip]
The t3-autoupdate timer (re-enabled by the provisioner's step 5b with
`--now`, which fires the missed daily job immediately on a Persistent
timer) pulled t3@nightly 0.0.25 mid-day. That build ran forward schema
migrations on every ~/.t3 state.sqlite (auth_pairing_links/auth_sessions
role->scopes, +proof_key_thumbprint) AND changed the bootstrap API,
breaking t3-mint/pairing for ALL devvm users (pair prompt, no session).

- t3-autoupdate.sh: now a pinned-version ENFORCER (T3_PIN=0.0.24), not a
  nightly tracker -- re-asserts the pin (a no-op when correct).
- t3-provision-users.sh step 5b: drop `--now` (it triggered the
  immediate missed-job run that pulled the bad build).
- setup-devvm.sh: install pinned t3@0.0.24 at machine setup.
- unit Descriptions + service-catalog reflect the pin.
- post-mortem: 2026-06-09-t3-nightly-autoupdate-auth-outage.md.

Host already reconciled out-of-band: rolled back to 0.0.24, re-enabled
the (now-pinned) enforcer, reset the 2 new users' disposable DBs,
surgically reverted wizard's auth tables to level-30 (96 threads + live
session preserved). All users verified 302 + t3_session.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:21:39 +00:00
..
architecture fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
benchmarks fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
plans t3-dispatch: re-pair on present-but-invalid t3_session cookie 2026-06-09 21:21:39 +00:00
post-mortems t3: pin t3@0.0.24 + stop nightly auto-update (auth-outage fix) [ci skip] 2026-06-09 21:21:39 +00:00
runbooks fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
known-issues.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
README.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest