From 6ba4878f3af301e6206c19d1705a6df22f97a847 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 11 Apr 2026 17:00:10 +0100 Subject: [PATCH] docs: update storage architecture for NFS migration to Proxmox host [ci skip] --- .claude/CLAUDE.md | 24 +++++++++++++++--------- .claude/reference/proxmox-inventory.md | 25 +++++++++++++++++++++++-- AGENTS.md | 16 ++++++++++------ 3 files changed, 48 insertions(+), 17 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index f37527a5..43284b09 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -139,16 +139,22 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle Choose storage class based on workload type: -| Use **proxmox-lvm** when | Use **NFS** (`nfs_volume` module) when | -|--------------------------|----------------------------------------| -| Database files (SQLite, embedded DBs) | Shared data across multiple pods (RWX) | -| Write-heavy / fsync-heavy workloads | Media libraries (music, ebooks, photos) | -| Single-pod app state (RWO is fine) | Backup destinations (cloud sync picks up from NFS) | -| Latency-sensitive data | Large datasets (>10Gi) where snapshots matter | -| Any new service by default | Data you want to browse/inspect from outside k8s | +| Use **proxmox-lvm** when | Use **NFS** (`nfs_volume` module) when | Use **nfs-proxmox** SC when | +|--------------------------|----------------------------------------|-----------------------------| +| Database files (SQLite, embedded DBs) | Shared data across multiple pods (RWX) | Dynamic provisioning on Proxmox host NFS | +| Write-heavy / fsync-heavy workloads | Media libraries (music, ebooks, photos) | Vault (dynamic PVC creation) | +| Single-pod app state (RWO is fine) | Backup destinations (cloud sync picks up from NFS) | | +| Latency-sensitive data | Large datasets (>10Gi) where snapshots matter | | +| Any new service by default | Data you want to browse/inspect from outside k8s | | **Default is proxmox-lvm.** Only use NFS when you need RWX, backup pipeline integration, or it's a large shared media library. +**NFS servers:** +- **Proxmox host** (192.168.1.127): Primary NFS for all workloads. HDD at `/srv/nfs` (ext4 thin LV `pve/nfs-data`, 1TB). SSD at `/srv/nfs-ssd` (ext4 LV `ssd/nfs-ssd-data`, 100GB). Exports use `insecure` option (required — pfSense NATs source ports >1024 between VLANs). +- **TrueNAS** (10.0.10.15): **Immich only** (8 PVCs). `nfs-truenas` StorageClass retained exclusively for Immich. + +**Migration note**: CSI PV `volumeAttributes` are immutable — cannot update NFS server in place. New PV/PVC pairs required (convention: append `-host` to PV name). + **proxmox-lvm PVC template** (Terraform): ```hcl resource "kubernetes_persistent_volume_claim" "data_proxmox" { @@ -188,9 +194,9 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" { **Offsite sync (two paths)**: - `Synology/Backup/Viki/pve-backup/` — structured data from PVE host (PVC files, DB dumps, pfsense, PVE config) -- `Synology/Backup/Viki/truenas/` — NFS media from TrueNAS Cloud Sync (Immich, audiobookshelf, servarr — narrowed, excludes backup dirs) +- `Synology/Backup/Viki/truenas/` — NFS media from TrueNAS Cloud Sync (Immich only — audiobookshelf and servarr migrated to Proxmox host NFS) -**App-level CronJobs** (write to TrueNAS NFS, mirrored to sda weekly): +**App-level CronJobs** (write to Proxmox host NFS, mirrored to sda weekly): - MySQL (daily), PostgreSQL (daily), Vault (weekly), Vaultwarden (6h + integrity), Redis (weekly), etcd (weekly) - **Convention**: New proxmox-lvm apps MUST add a backup CronJob writing to `/mnt/main/-backup/` diff --git a/.claude/reference/proxmox-inventory.md b/.claude/reference/proxmox-inventory.md index 8eeb56ea..4ad08e8f 100644 --- a/.claude/reference/proxmox-inventory.md +++ b/.claude/reference/proxmox-inventory.md @@ -11,6 +11,27 @@ - **Disks**: 1.1TB RAID1 SAS (unused) + 931GB Samsung SSD + 10.7TB RAID1 HDD - **Proxmox access**: `ssh root@192.168.1.127` +## NFS Exports (Proxmox Host) + +The Proxmox host serves NFS for all workloads except Immich (which remains on TrueNAS). + +### HDD NFS +- **LV**: `pve/nfs-data` (thin LV, 1TB) +- **Filesystem**: ext4 (chosen over btrfs — btrfs CoW on LVM thin = double-CoW problem) +- **Mount**: `/srv/nfs` with `noatime,commit=30` +- **Export**: `/srv/nfs *(rw,no_subtree_check,no_root_squash,insecure,fsid=0)` + +### SSD NFS +- **LV**: `ssd/nfs-ssd-data` (100GB) +- **Filesystem**: ext4 +- **Mount**: `/srv/nfs-ssd` with `noatime,commit=30` +- **Export**: `/srv/nfs-ssd *(rw,no_subtree_check,no_root_squash,insecure,fsid=1)` +- **Current users**: Ollama (migrated from TrueNAS SSD `/mnt/ssd/ollama`) + +### Notes +- `insecure` option required: pfSense NATs source ports >1024 when routing between VLANs +- 21 stacks migrated from TrueNAS, only Immich (8 PVCs) remains on TrueNAS + ## Memory Layout (updated 2026-04-01) ### Physical DIMM Slot Map @@ -79,7 +100,7 @@ Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB 10.0.10.0/24 - Management: Wizard (10.0.10.10), TrueNAS NFS (10.0.10.15) 10.0.20.0/24 - Kubernetes: pfSense GW (10.0.20.1), Registry (10.0.20.10), k8s-master (10.0.20.100), DNS (10.0.20.101), MetalLB (10.0.20.102-200) -192.168.1.0/24 - Physical: Proxmox (192.168.1.127) +192.168.1.0/24 - Physical: Proxmox (192.168.1.127, NFS server for k8s) ``` ## Network Bridges @@ -101,7 +122,7 @@ Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB | 204 | k8s-node4 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker | | 220 | docker-registry | running | 4 | 4GB | vmbr1:vlan20 | 64G | MAC DE:AD:BE:EF:22:22 (10.0.20.10) | | 300 | Windows10 | running | 16 | 8GB | vmbr0 | 100G | Windows VM | -| 9000 | truenas | running | 16 | 8GB | vmbr1:vlan10 | 32G+7x256G+1T | NFS (10.0.10.15) | +| 9000 | truenas | running | 16 | 8GB | vmbr1:vlan10 | 32G+7x256G+1T | NFS (10.0.10.15) — Immich only | **Total VM RAM allocated**: 180 GB of 272 GB (66%) — 92 GB free for future VMs diff --git a/AGENTS.md b/AGENTS.md index 427cf562..6e025293 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,7 +3,7 @@ ## Critical Rules (MUST FOLLOW) - **ALL changes through Terraform/Terragrunt** — NEVER `kubectl apply/edit/patch/delete` for persistent changes. Read-only kubectl is fine. - **NEVER put secrets in plaintext** — use `secrets.sops.json` (SOPS-encrypted) or `terraform.tfvars` (git-crypt, legacy) -- **NEVER restart NFS on TrueNAS** — causes cluster-wide mount failures across all pods +- **NEVER restart NFS on Proxmox host or TrueNAS** — causes cluster-wide mount failures across all pods - **NEVER commit secrets** — triple-check before every commit - **`[ci skip]` in commit messages** when changes were already applied locally - **Ask before `git push`** — always confirm with the user first @@ -59,15 +59,19 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro - `scripts/cluster_healthcheck.sh` — 25-check cluster health script ## Storage -- **NFS** (`nfs-truenas` StorageClass): For app data. Use the `nfs_volume` module, never inline `nfs {}` blocks. +- **NFS — Proxmox host** (primary): HDD pool at `/srv/nfs` (ext4 thin LV `pve/nfs-data`, 1TB) and SSD pool at `/srv/nfs-ssd` (ext4 LV `ssd/nfs-ssd-data`, 100GB) on 192.168.1.127. Serves all workloads except Immich. + - `nfs-proxmox` StorageClass: Dynamic provisioning (Vault uses this). + - `nfs_volume` module: Static PVs for most services. Use this, never inline `nfs {}` blocks. +- **NFS — TrueNAS** (10.0.10.15): Now **only serves Immich** (8 PVCs). `nfs-truenas` StorageClass retained for Immich only. - **proxmox-lvm** (`proxmox-lvm` StorageClass): For databases (PostgreSQL, MySQL). TopoLVM driver. -- **TrueNAS**: 10.0.10.15. NFS exports managed via `secrets/nfs_exports.sh`. - **SQLite on NFS is unreliable** (fsync issues) — always use proxmox-lvm or local disk for databases. - **NFS mount options**: Always `soft,timeo=30,retrans=3` to prevent uninterruptible sleep (D state). -- **NFS export directory must exist** on TrueNAS before Terraform can create the PV. +- **NFS export directory must exist** on the NFS server before Terraform can create the PV. +- **NFS `insecure` option**: Required on Proxmox host exports — pfSense NATs source ports >1024 when routing between VLANs, which NFS `secure` mode rejects. +- **CSI PV volumeAttributes are immutable**: Can't update NFS server in place. Must create new PV/PVC pairs (pattern: append `-host` to PV name). ## Shared Variables (never hardcode) -`var.nfs_server` (10.0.10.15), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host` +`var.nfs_server` (192.168.1.127 — Proxmox host), `var.nfs_server_truenas` (10.0.10.15 — Immich only), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host` ## Tier System `0-core` | `1-cluster` | `2-gpu` | `3-edge` | `4-aux` — Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier label. @@ -96,7 +100,7 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro - **Fix crashed pods**: Run healthcheck first. Safe to delete evicted/failed pods and CrashLoopBackOff pods with >10 restarts. - **OOMKilled**: Check `kubectl describe limitrange tier-defaults -n `. Increase `resources.limits.memory` in the stack's main.tf. - **Add a secret**: `sops set secrets.sops.json '["key"]' '"value"'` then commit. -- **NFS exports**: Create dir on TrueNAS first, add to `secrets/nfs_directories.txt`, run `secrets/nfs_exports.sh`. +- **NFS exports**: Create dir on Proxmox host (`/srv/nfs/` or `/srv/nfs-ssd/`). For Immich (TrueNAS only): add to `secrets/nfs_directories.txt`, run `secrets/nfs_exports.sh`. ## Detailed Reference See `.claude/reference/patterns.md` for: NFS volume code examples, iSCSI details, Kyverno governance tables, anti-AI scraping layers, Terragrunt architecture, node rebuild procedure, archived troubleshooting runbooks index.