viktor/infra

Viktor Barzin c9289192c7 security(wave1): Vault audit-tail sidecar (live) + doc reality-check ## Vault audit-tail sidecar (APPLIED + VERIFIED) - Added `audit-tail` extraContainer to vault helm chart values: busybox:1.37 with `tail -F /vault/audit/vault-audit.log`. Reads the audit PVC (`audit` volume from the chart's auditStorage), emits JSON audit events to stdout. kubelet captures the stdout; once Loki+Alloy are deployed (blocked on code-146x), these logs flow automatically to Loki with `container="audit-tail"`. - Resources: 5m CPU / 16Mi mem request, 32Mi limit. PVC mount is readOnly. - Applied via `tg apply -target=helm_release.vault`. All 3 vault pods rolled cleanly (OnDelete strategy, manual one-at-a-time, auto-unseal each ~10s). - Verified: `kubectl logs -n vault vault-2 -c audit-tail` shows live JSON audit lines from ESO token issuance, KV reads, etc. ## Doc reality-check While verifying logs reached Loki, discovered Loki is NOT actually deployed. `stacks/monitoring/modules/monitoring/loki.tf` defines `helm_release.loki` but has a self-referencing `depends_on = [helm_release.loki]` that prevented apply. No `loki` Helm release in the cluster, no Loki pods, no Loki Service. The monitoring.md "Loki: deployed" claim was aspirational. - security.md W1.2 row: PENDING → PARTIAL (sidecar live, shipping blocked on code-146x) - security.md W1.3 row: gated on code-146x added - monitoring.md Loki row: marked NOT DEPLOYED with cross-ref to code-146x ## New beads task - code-146x P1 — Loki + log shipper missing. Lists the helm_release self-depends_on bug, investigation paths, and revised wave 1 sequencing (Loki/Alloy is prereq 0). ## Wave 1 status update - W1.2: Vault audit device + XFF + audit-tail sidecar all LIVE; Loki shipping blocked on code-146x - W1.1, W1.3, W1.6, W1.7: still not started (W1.6 also blocked on code-3ad Calico Installation CR) - W1.4, W1.5: code committed, blocked on code-e2dp (Kyverno provider crash) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-22 14:16:57 +00:00
..
architecture	security(wave1): Vault audit-tail sidecar (live) + doc reality-check	2026-05-22 14:16:57 +00:00
benchmarks	infra/llama-cpp: benchmark report + -fa flag fix	2026-05-22 14:16:41 +00:00
plans	docs/plans: add agent presence implementation plan (2026-05-17)	2026-05-22 14:16:56 +00:00
post-mortems	nvidia: pin chart to v25.10.1 after v26.3.1 upgrade revealed missing ubuntu26.04 driver images	2026-05-22 14:16:56 +00:00
runbooks	docs(security): wave 1 plan — Kyverno enforce, NetworkPolicy egress, audit logging, source-IP anomaly	2026-05-22 14:16:57 +00:00
known-issues.md	docs: known-issues entry for the Ubuntu 26.04 / NVIDIA driver gap	2026-05-22 14:16:56 +00:00
README.md	[docs] TrueNAS decommission cleanup — remove references from active docs	2026-04-19 16:55:43 +00:00

README.md

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

Physical Network: 192.168.1.0/24 - Physical devices and host network
Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

Public: viktorbarzin.me
Internal: viktorbarzin.lan

Architecture Documentation

Document	Description
Overview	Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking	Network topology, VLANs, routing, and firewall rules
VPN	Headscale mesh VPN and Cloudflare Tunnel configuration
Storage	Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management
Authentication	Authentik SSO, OIDC flows, and service integration
Security	CrowdSec IPS, Kyverno policies, and security controls
Monitoring	Prometheus, Grafana, Loki, and observability stack
Secrets Management	HashiCorp Vault integration and secret rotation
CI/CD	Woodpecker CI pipeline and deployment automation
Backup & DR	Backup strategy, disaster recovery, and restore procedures
Compute	Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases	PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy	Namespace isolation, tier system, and resource quotas

Operations

Runbooks - Step-by-step operational procedures
Plans - Infrastructure change plans and rollout strategies

Getting Started

Review the Overview for a high-level understanding
Read the Networking doc to understand connectivity
Check Compute for resource management patterns
Explore individual architecture docs based on your area of interest