infra/docs
Viktor Barzin aac807fb3a
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
pve-host: ship journal to Loki (snoopy command audit + sshd-pve) for emo's root SSH
Emo's Claude agent was given root SSH to the Proxmox host (`ssh pve`, dedicated
shared-root key emo-pve-agent@devvm) so he can manage the host — e.g. the R730
fan daemon — through his agent. To keep an audit trail of what that agent does,
and to feed the long-pending Wave-1 S1 security rule, the PVE host now ships its
systemd journal to cluster Loki:

- snoopy logs every execve() to journald (identifier=snoopy), enabled via
  /etc/ld.so.preload; config scripts/pve-snoopy.ini.
- promtail v3.5.1 (amd64) ships /var/log/journal to Loki as {job="pve-journal"}
  (full host journal; filter identifier="snoopy" for the command audit), and
  relabels sshd auth to {job="sshd-pve"} — which ACTIVATES S1 (it was PENDING
  only for lack of this shipper). Config/unit: scripts/pve-promtail.{yaml,service}.

S1 won't false-fire on legitimate access: the devvm SNATs through pfSense to
192.168.1.2, which is already in the S1 source-IP allowlist.

Loki is reached via an /etc/hosts pin (10.0.20.203 loki.viktorbarzin.lan);
follow-up noted to register a Technitium CNAME so it auto-tracks LB renumbers.

Host pieces are hand-managed (not Terraform), like fan-control and the rpi-sofia
promtail — these files are the source of truth. Docs updated: security.md
(S1 LIVE) and monitoring.md ("External host: pve").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 19:31:45 +00:00
..
architecture pve-host: ship journal to Loki (snoopy command audit + sshd-pve) for emo's root SSH 2026-06-10 19:31:45 +00:00
benchmarks fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
plans workstation: emo direct master push — allow-then-audit [ci skip] 2026-06-10 14:53:43 +00:00
post-mortems coredns: pods get internal split-horizon answers for viktorbarzin.me [ci skip] 2026-06-10 16:21:34 +00:00
runbooks Merge forgejo/master: reconcile diverged lineages [ci skip] 2026-06-10 15:21:50 +00:00
known-issues.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
README.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest