docs: update storage architecture for NFS migration to Proxmox host [ci skip]

This commit is contained in:
Viktor Barzin 2026-04-11 17:00:10 +01:00
parent 65551e4602
commit 6ba4878f3a
3 changed files with 48 additions and 17 deletions

View file

@ -139,16 +139,22 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle
Choose storage class based on workload type:
| Use **proxmox-lvm** when | Use **NFS** (`nfs_volume` module) when |
|--------------------------|----------------------------------------|
| Database files (SQLite, embedded DBs) | Shared data across multiple pods (RWX) |
| Write-heavy / fsync-heavy workloads | Media libraries (music, ebooks, photos) |
| Single-pod app state (RWO is fine) | Backup destinations (cloud sync picks up from NFS) |
| Latency-sensitive data | Large datasets (>10Gi) where snapshots matter |
| Any new service by default | Data you want to browse/inspect from outside k8s |
| Use **proxmox-lvm** when | Use **NFS** (`nfs_volume` module) when | Use **nfs-proxmox** SC when |
|--------------------------|----------------------------------------|-----------------------------|
| Database files (SQLite, embedded DBs) | Shared data across multiple pods (RWX) | Dynamic provisioning on Proxmox host NFS |
| Write-heavy / fsync-heavy workloads | Media libraries (music, ebooks, photos) | Vault (dynamic PVC creation) |
| Single-pod app state (RWO is fine) | Backup destinations (cloud sync picks up from NFS) | |
| Latency-sensitive data | Large datasets (>10Gi) where snapshots matter | |
| Any new service by default | Data you want to browse/inspect from outside k8s | |
**Default is proxmox-lvm.** Only use NFS when you need RWX, backup pipeline integration, or it's a large shared media library.
**NFS servers:**
- **Proxmox host** (192.168.1.127): Primary NFS for all workloads. HDD at `/srv/nfs` (ext4 thin LV `pve/nfs-data`, 1TB). SSD at `/srv/nfs-ssd` (ext4 LV `ssd/nfs-ssd-data`, 100GB). Exports use `insecure` option (required — pfSense NATs source ports >1024 between VLANs).
- **TrueNAS** (10.0.10.15): **Immich only** (8 PVCs). `nfs-truenas` StorageClass retained exclusively for Immich.
**Migration note**: CSI PV `volumeAttributes` are immutable — cannot update NFS server in place. New PV/PVC pairs required (convention: append `-host` to PV name).
**proxmox-lvm PVC template** (Terraform):
```hcl
resource "kubernetes_persistent_volume_claim" "data_proxmox" {
@ -188,9 +194,9 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" {
**Offsite sync (two paths)**:
- `Synology/Backup/Viki/pve-backup/` — structured data from PVE host (PVC files, DB dumps, pfsense, PVE config)
- `Synology/Backup/Viki/truenas/` — NFS media from TrueNAS Cloud Sync (Immich, audiobookshelf, servarr — narrowed, excludes backup dirs)
- `Synology/Backup/Viki/truenas/` — NFS media from TrueNAS Cloud Sync (Immich only — audiobookshelf and servarr migrated to Proxmox host NFS)
**App-level CronJobs** (write to TrueNAS NFS, mirrored to sda weekly):
**App-level CronJobs** (write to Proxmox host NFS, mirrored to sda weekly):
- MySQL (daily), PostgreSQL (daily), Vault (weekly), Vaultwarden (6h + integrity), Redis (weekly), etcd (weekly)
- **Convention**: New proxmox-lvm apps MUST add a backup CronJob writing to `/mnt/main/<app>-backup/`

View file

@ -11,6 +11,27 @@
- **Disks**: 1.1TB RAID1 SAS (unused) + 931GB Samsung SSD + 10.7TB RAID1 HDD
- **Proxmox access**: `ssh root@192.168.1.127`
## NFS Exports (Proxmox Host)
The Proxmox host serves NFS for all workloads except Immich (which remains on TrueNAS).
### HDD NFS
- **LV**: `pve/nfs-data` (thin LV, 1TB)
- **Filesystem**: ext4 (chosen over btrfs — btrfs CoW on LVM thin = double-CoW problem)
- **Mount**: `/srv/nfs` with `noatime,commit=30`
- **Export**: `/srv/nfs *(rw,no_subtree_check,no_root_squash,insecure,fsid=0)`
### SSD NFS
- **LV**: `ssd/nfs-ssd-data` (100GB)
- **Filesystem**: ext4
- **Mount**: `/srv/nfs-ssd` with `noatime,commit=30`
- **Export**: `/srv/nfs-ssd *(rw,no_subtree_check,no_root_squash,insecure,fsid=1)`
- **Current users**: Ollama (migrated from TrueNAS SSD `/mnt/ssd/ollama`)
### Notes
- `insecure` option required: pfSense NATs source ports >1024 when routing between VLANs
- 21 stacks migrated from TrueNAS, only Immich (8 PVCs) remains on TrueNAS
## Memory Layout (updated 2026-04-01)
### Physical DIMM Slot Map
@ -79,7 +100,7 @@ Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB
10.0.10.0/24 - Management: Wizard (10.0.10.10), TrueNAS NFS (10.0.10.15)
10.0.20.0/24 - Kubernetes: pfSense GW (10.0.20.1), Registry (10.0.20.10),
k8s-master (10.0.20.100), DNS (10.0.20.101), MetalLB (10.0.20.102-200)
192.168.1.0/24 - Physical: Proxmox (192.168.1.127)
192.168.1.0/24 - Physical: Proxmox (192.168.1.127, NFS server for k8s)
```
## Network Bridges
@ -101,7 +122,7 @@ Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB
| 204 | k8s-node4 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker |
| 220 | docker-registry | running | 4 | 4GB | vmbr1:vlan20 | 64G | MAC DE:AD:BE:EF:22:22 (10.0.20.10) |
| 300 | Windows10 | running | 16 | 8GB | vmbr0 | 100G | Windows VM |
| 9000 | truenas | running | 16 | 8GB | vmbr1:vlan10 | 32G+7x256G+1T | NFS (10.0.10.15) |
| 9000 | truenas | running | 16 | 8GB | vmbr1:vlan10 | 32G+7x256G+1T | NFS (10.0.10.15) — Immich only |
**Total VM RAM allocated**: 180 GB of 272 GB (66%) — 92 GB free for future VMs

View file

@ -3,7 +3,7 @@
## Critical Rules (MUST FOLLOW)
- **ALL changes through Terraform/Terragrunt** — NEVER `kubectl apply/edit/patch/delete` for persistent changes. Read-only kubectl is fine.
- **NEVER put secrets in plaintext** — use `secrets.sops.json` (SOPS-encrypted) or `terraform.tfvars` (git-crypt, legacy)
- **NEVER restart NFS on TrueNAS** — causes cluster-wide mount failures across all pods
- **NEVER restart NFS on Proxmox host or TrueNAS** — causes cluster-wide mount failures across all pods
- **NEVER commit secrets** — triple-check before every commit
- **`[ci skip]` in commit messages** when changes were already applied locally
- **Ask before `git push`** — always confirm with the user first
@ -59,15 +59,19 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
- `scripts/cluster_healthcheck.sh` — 25-check cluster health script
## Storage
- **NFS** (`nfs-truenas` StorageClass): For app data. Use the `nfs_volume` module, never inline `nfs {}` blocks.
- **NFS — Proxmox host** (primary): HDD pool at `/srv/nfs` (ext4 thin LV `pve/nfs-data`, 1TB) and SSD pool at `/srv/nfs-ssd` (ext4 LV `ssd/nfs-ssd-data`, 100GB) on 192.168.1.127. Serves all workloads except Immich.
- `nfs-proxmox` StorageClass: Dynamic provisioning (Vault uses this).
- `nfs_volume` module: Static PVs for most services. Use this, never inline `nfs {}` blocks.
- **NFS — TrueNAS** (10.0.10.15): Now **only serves Immich** (8 PVCs). `nfs-truenas` StorageClass retained for Immich only.
- **proxmox-lvm** (`proxmox-lvm` StorageClass): For databases (PostgreSQL, MySQL). TopoLVM driver.
- **TrueNAS**: 10.0.10.15. NFS exports managed via `secrets/nfs_exports.sh`.
- **SQLite on NFS is unreliable** (fsync issues) — always use proxmox-lvm or local disk for databases.
- **NFS mount options**: Always `soft,timeo=30,retrans=3` to prevent uninterruptible sleep (D state).
- **NFS export directory must exist** on TrueNAS before Terraform can create the PV.
- **NFS export directory must exist** on the NFS server before Terraform can create the PV.
- **NFS `insecure` option**: Required on Proxmox host exports — pfSense NATs source ports >1024 when routing between VLANs, which NFS `secure` mode rejects.
- **CSI PV volumeAttributes are immutable**: Can't update NFS server in place. Must create new PV/PVC pairs (pattern: append `-host` to PV name).
## Shared Variables (never hardcode)
`var.nfs_server` (10.0.10.15), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host`
`var.nfs_server` (192.168.1.127 — Proxmox host), `var.nfs_server_truenas` (10.0.10.15 — Immich only), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host`
## Tier System
`0-core` | `1-cluster` | `2-gpu` | `3-edge` | `4-aux` — Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier label.
@ -96,7 +100,7 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
- **Fix crashed pods**: Run healthcheck first. Safe to delete evicted/failed pods and CrashLoopBackOff pods with >10 restarts.
- **OOMKilled**: Check `kubectl describe limitrange tier-defaults -n <ns>`. Increase `resources.limits.memory` in the stack's main.tf.
- **Add a secret**: `sops set secrets.sops.json '["key"]' '"value"'` then commit.
- **NFS exports**: Create dir on TrueNAS first, add to `secrets/nfs_directories.txt`, run `secrets/nfs_exports.sh`.
- **NFS exports**: Create dir on Proxmox host (`/srv/nfs/<dir>` or `/srv/nfs-ssd/<dir>`). For Immich (TrueNAS only): add to `secrets/nfs_directories.txt`, run `secrets/nfs_exports.sh`.
## Detailed Reference
See `.claude/reference/patterns.md` for: NFS volume code examples, iSCSI details, Kyverno governance tables, anti-AI scraping layers, Terragrunt architecture, node rebuild procedure, archived troubleshooting runbooks index.