7.2 KiB
Executable file
Infrastructure Repository Knowledge
Instructions for Claude
- When the user says "remember" something: Always update this file (
.claude/CLAUDE.md) with the information so it persists across sessions - When discovering new patterns or versions: Add them to the appropriate section below
- Use
/update-knowledgecommand: Or edit this file directly to add learnings
Execution Environment (CRITICAL)
- File operations (Read, Edit, Write, Glob, Grep): Run locally at
/Volumes/wizard/code/infra - Git commands: Run locally (git status, git log, git diff, etc.)
- ALL other commands: Use the remote executor relay (kubectl, terraform, helm, python, etc.)
Remote Command Execution (ALWAYS USE THIS)
For any command that is not file editing or git, use the file-based relay:
To execute a remote command:
# 1. Write command
echo "your-command-here" > /Volumes/wizard/code/infra/.claude/cmd_input.txt
# 2. Wait and check status
sleep 1 && cat /Volumes/wizard/code/infra/.claude/cmd_status.txt
# 3. Read output (when status is "done:*")
cat /Volumes/wizard/code/infra/.claude/cmd_output.txt
Status values: ready | running | done:N (N = exit code)
Requires user to start executor in another terminal:
.claude/remote-executor.sh wizard@10.0.10.10 /home/wizard/code/infra
Overview
Terraform-based infrastructure repository managing a home Kubernetes cluster on Proxmox VMs. Uses git-crypt for secrets encryption.
Directory Structure
main.tf- Main Terraform entry point, imports all modulesmodules/kubernetes/- Kubernetes service deployments (one folder per service)modules/create-vm/- Proxmox VM creation modulesecrets/- Encrypted secrets (TLS certs, keys) via git-cryptcli/- Go CLI tool for infrastructure managementscripts/- Helper scripts (cluster management, node updates)playbooks/- Ansible playbooks for node configurationdiagram/- Infrastructure diagrams (Python-based)
Key Patterns
- Each service in
modules/kubernetes/<service>/main.tfdefines its own namespace, deployments, services, and ingress - NFS storage from
10.0.10.15for persistent data - TLS secrets managed via
setup_tls_secretmodule - Ingress uses nginx-ingress with annotations for customization
- GPU workloads use
node_selector = { "gpu": "true" } - Services expose to
*.viktorbarzin.medomains
NFS Volume Pattern
Prefer inline NFS volumes over separate PV/PVC resources. Use the nfs {} block directly in pod/deployment/cronjob specs:
volume {
name = "data"
nfs {
server = "10.0.10.15"
path = "/mnt/main/<service>"
}
}
Only use PV/PVC when the Helm chart requires existingClaim (like the Nextcloud Helm chart).
Factory Pattern (for multi-user services)
Used when a service needs one instance per user. Structure:
modules/kubernetes/<service>/
├── main.tf # Namespace, TLS secret, user module calls
└── factory/
└── main.tf # Deployment, service, ingress templates with ${var.name}
Examples: actualbudget, freedify
To add a new user:
- Export NFS share at
/mnt/main/<service>/<username>in TrueNAS - Add Cloudflare route in tfvars
- Add module block in main.tf calling factory
Common Variables
tls_secret_name- TLS certificate secret nametier- Deployment tier label- Service-specific passwords passed as variables
Service Versions (as of 2025-01)
- Immich: v2.4.1
- Freedify: latest (music streaming, factory pattern)
Useful Commands
# ALWAYS use -target for terraform apply (speeds up execution)
terraform apply -target=module.kubernetes_cluster.module.<service_name>
terraform plan -target=module.kubernetes_cluster.module.<service_name>
terraform fmt -recursive
kubectl get pods -A
Terraform target examples:
terraform apply -target=module.kubernetes_cluster.module.monitoring- Apply monitoringterraform apply -target=module.kubernetes_cluster.module.immich- Apply immichterraform apply -target=module.docker-registry-vm- Apply docker registry VM- Only skip
-targetwhen explicitly told to apply everything
Module Structure
Top-level modules in main.tf:
module.k8s-node-template- K8s node VM templatemodule.non-k8s-node-template- Non-k8s VM templatemodule.docker-registry-template- Docker registry templatemodule.docker-registry-vm- Docker registry VMmodule.kubernetes_cluster- Main K8s cluster (contains all services)
Kubernetes Services (under module.kubernetes_cluster.module.*)
Core (tier 0-1):
metallb,dbaas,technitium,nginx-ingress,crowdsec,cloudflaredredis,metrics-server,authentik,nvidia,vaultwarden,reverse-proxywireguard,headscale,xray,monitoring
GPU (tier 2):
immich,frigate,ollama,ebook2audiobook
Edge/Aux (tier 3-4):
blog,drone,hackmd,mailserver,privatebin,shadowsockscity-guesser,echo,url,webhook_handler,excalidraw,travel_blogdashy,send,ytdlp,uptime-kuma,calibre,audiobookshelfpaperless-ngx,jsoncrack,servarr,ntfy,cyberchef,diunmeshcentral,nextcloud,homepage,matrix,linkwarden,actualbudgetowntracks,dawarich,changedetection,tandoor,n8n,real-estate-crawlertor-proxy,onlyoffice,forgejo,freshrss,navidrome,networking-toolboxtuya-bridge,stirling-pdf,isponsorblocktv,rybbit,wealthfoliokyverno,speedtest,freedify,netbox,f1-stream,kms,k8s-dashboarddescheduler,reloader,infra-maintenance
CI/CD
- Drone CI (
.drone.yml) for automated deployments - Auto-updates TLS certificates
- ALWAYS add
[ci skip]to commit messages when you've already runterraform applyto avoid triggering CI redundantly - After committing, run
git push origin masterto sync changes
Infrastructure
- Proxmox hypervisor for VMs
- Kubernetes cluster with GPU node (5 nodes: k8s-master + k8s-node1-4, running v1.34.2)
- NFS server at 10.0.10.15 for storage
- Redis shared service at
redis.redis.svc.cluster.local
Git Operations (IMPORTANT)
- Git is slow on this repo due to many files - commands can take 30+ seconds
- Use
GIT_OPTIONAL_LOCKS=0prefix if git hangs - Local SSH is blocked - use remote executor to push:
echo "git push origin master" > .claude/cmd_input.txt - Always commit only specific files you changed, not everything
Prometheus Alerts
- Alert rules are in
modules/kubernetes/monitoring/prometheus_chart_values.tpl - Under
serverFiles.alerting_rules.yml.groups - Groups: "R730 Host", "Nvidia Tesla T4 GPU", "Power", "Cluster"
- kube-state-metrics provides:
kube_deployment_*,kube_statefulset_*,kube_daemonset_*
DEFCON Levels
Services are organized by criticality in modules/kubernetes/main.tf:
- Level 1 (Critical): wireguard, technitium, headscale, nginx-ingress, xray, authentik, cloudflare, monitoring
- Level 2 (Storage): vaultwarden, redis, immich, nvidia, metrics-server, uptime-kuma, crowdsec, kyverno
- Level 3 (Admin): k8s-dashboard, reverse-proxy
- Level 4 (Active): mailserver, shadowsocks, dawarich, nextcloud, calibre, actualbudget, etc.
- Level 5 (Optional): blog, drone, hackmd, ollama, servarr, paperless-ngx, etc.