infra/docs
Viktor Barzin c6784f87b5 [docs] Add NFS prerequisite runbook for nfs_volume module [ci skip]
## Context

`modules/kubernetes/nfs_volume` creates the K8s PV but NOT the underlying
directory on the Proxmox NFS host (`192.168.1.127:/srv/nfs/<subdir>`).
The first time a new consumer is added, the mount fails with
`mount.nfs: … No such file or directory` and the pod hangs in
ContainerCreating.

This bit us twice during the Wave 1/2 rollout — once for the mailserver
backup (code-z26) and again for the Roundcube backup (code-1f6). Both
times the fix was `ssh root@192.168.1.127 'mkdir -p /srv/nfs/<subdir>'`.
Rather than automate the SSH dependency into the module (which would
break hermeticity and fail for operators without host SSH), this runbook
documents the manual bootstrap step and the rationale.

Addresses bd code-yo4.

## This change

New file: `docs/runbooks/nfs-prerequisites.md`. Lists known consumers,
gives the copy-paste SSH command, and explains why auto-creation was
rejected (two options, neither worth the churn).

## What is NOT in this change

- Any automation of the bootstrap — runbook only
- Migration to `nfs-subdir-external-provisioner` — explicitly out of scope

## Test Plan

### Automated
```
$ cat docs/runbooks/nfs-prerequisites.md | head -5
# NFS Prerequisites for `modules/kubernetes/nfs_volume`

The `nfs_volume` Terraform module creates a `PersistentVolume` pointing at a
path on the Proxmox NFS server (`192.168.1.127`). It does **not** create the
underlying directory on the server.
```

### Manual Verification
Before the next stack adds a new `nfs_volume` consumer, read the runbook
and run the `ssh root@192.168.1.127 'mkdir -p ...'` step. First pod
reaches Ready within a minute of the PV creation.

Closes: code-yo4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 10:40:55 +00:00
..
architecture [docs] Document Fail2ban-disabled rationale (CrowdSec is policy) [ci skip] 2026-04-18 23:47:59 +00:00
plans [docs] Update anti-AI and rybbit docs after rewrite-body removal 2026-04-17 21:43:13 +00:00
post-mortems [docs] post-mortem: clarify the sizeLimit vs container memory limit gotcha 2026-04-18 13:23:14 +00:00
runbooks [docs] Add NFS prerequisite runbook for nfs_volume module [ci skip] 2026-04-19 10:40:55 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00

Infrastructure Documentation

This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.

Quick Reference

Network Ranges

  • Physical Network: 192.168.1.0/24 - Physical devices and host network
  • Management VLAN 10: 10.0.10.0/24 - Infrastructure VMs and management
  • Kubernetes VLAN 20: 10.0.20.0/24 - Kubernetes cluster network

Key URLs

  • Public: viktorbarzin.me
  • Internal: viktorbarzin.lan

Architecture Documentation

Document Description
Overview Infrastructure overview, hardware specs, VM inventory, and service catalog
Networking Network topology, VLANs, routing, and firewall rules
VPN Headscale mesh VPN and Cloudflare Tunnel configuration
Storage TrueNAS NFS, democratic-csi, and persistent volume management
Authentication Authentik SSO, OIDC flows, and service integration
Security CrowdSec IPS, Kyverno policies, and security controls
Monitoring Prometheus, Grafana, Loki, and observability stack
Secrets Management HashiCorp Vault integration and secret rotation
CI/CD Woodpecker CI pipeline and deployment automation
Backup & DR Backup strategy, disaster recovery, and restore procedures
Compute Proxmox VMs, GPU passthrough, K8s resource management, and VPA
Databases PostgreSQL, MySQL, Redis, and database operators
Multi-tenancy Namespace isolation, tier system, and resource quotas

Operations

  • Runbooks - Step-by-step operational procedures
  • Plans - Infrastructure change plans and rollout strategies

Getting Started

  1. Review the Overview for a high-level understanding
  2. Read the Networking doc to understand connectivity
  3. Check Compute for resource management patterns
  4. Explore individual architecture docs based on your area of interest