13 KiB
Storage Architecture
Last updated: 2026-04-13
Overview
The cluster uses two storage backends: Proxmox CSI for database block storage and Proxmox NFS for application data.
Block storage (Proxmox CSI): 65 PVCs for databases and stateful apps (CNPG PostgreSQL, MySQL InnoDB, Redis, Vaultwarden, Prometheus, Nextcloud, Calibre-Web, Forgejo, FreshRSS, ActualBudget, NovelApp, Headscale, Uptime Kuma, etc.) use StorageClass: proxmox-lvm, which provisions thin LVs directly from the Proxmox host's local-lvm storage (sdc, 10.7TB RAID1 HDD thin pool). This eliminates the previous double-CoW (ZFS + LVM-thin) path that caused 56 ZFS checksum errors.
NFS storage (Proxmox host): ~100 NFS shares for media libraries (Immich, audiobookshelf, servarr, navidrome), backup targets (*-backup/ directories), and app data are served directly from the Proxmox host at 192.168.1.127. Two NFS export roots exist:
- HDD NFS:
/srv/nfson ext4 LVpve/nfs-data(2TB) — bulk media and backup targets - SSD NFS:
/srv/nfs-ssdon ext4 LVssd/nfs-ssd-data(100GB) — high-performance data (Immich ML)
Both StorageClass: nfs-truenas (name kept for compatibility) and StorageClass: nfs-proxmox (identical) point to the Proxmox host. Migrated from TrueNAS (10.0.10.15) which has been fully decommissioned.
Backup storage (sda): 1.1TB RAID1 SAS disk, VG backup, LV data (ext4), mounted at /mnt/backup on PVE host. Dedicated backup disk for weekly PVC file backups, auto SQLite backups, pfSense backups, and PVE config. NFS data syncs directly to Synology via inotify change tracking (not stored on sda). Independent of live storage (sdc).
Migration (2026-04-02): All iSCSI block volumes were migrated from democratic-csi (TrueNAS iSCSI → ZFS → LVM-thin) to Proxmox CSI (direct LVM-thin hotplug). democratic-csi iSCSI driver has been removed.
Migration (2026-04): TrueNAS (10.0.10.15) fully decommissioned. All NFS storage migrated to the Proxmox host (192.168.1.127). ZFS datasets under /mnt/main/ and /mnt/ssd/ moved to ext4 LVs at /srv/nfs/ and /srv/nfs-ssd/. Legacy PVs referencing /mnt/main/ paths still work (bind-mounted or symlinked on the Proxmox host); new PVs use /srv/nfs/ and /srv/nfs-ssd/.
Architecture Diagram
graph TB
subgraph Proxmox["Proxmox Host (192.168.1.127)"]
sdc["sdc: 10.7TB RAID1 HDD<br/>VG pve, LV data (thin pool)<br/>65 proxmox-lvm PVCs"]
sda["sda: 1.1TB RAID1 SAS<br/>VG backup, LV data (ext4)<br/>/mnt/backup"]
NFS_HDD["LV pve/nfs-data (2TB ext4)<br/>/srv/nfs<br/>~100 NFS shares<br/>Media + backup targets"]
NFS_SSD["LV ssd/nfs-ssd-data (100GB ext4)<br/>/srv/nfs-ssd<br/>High-performance data<br/>(Immich ML)"]
NFS_Exports["NFS Exports<br/>managed by /etc/exports"]
NFS_HDD --> NFS_Exports
NFS_SSD --> NFS_Exports
end
subgraph K8s["Kubernetes Cluster"]
CSI_NFS["nfs-csi driver<br/>StorageClass: nfs-truenas / nfs-proxmox<br/>soft,timeo=30,retrans=3"]
CSI_PVE["Proxmox CSI plugin<br/>StorageClass: proxmox-lvm"]
NFS_PV["NFS PersistentVolumes<br/>RWX, ~100 volumes"]
Block_PV["Block PersistentVolumes<br/>RWO, 65 PVCs"]
Pods["Application Pods"]
DBPods["Database Pods<br/>PostgreSQL CNPG<br/>MySQL InnoDB"]
end
NFS_Exports -->|NFS mount| CSI_NFS
sdc -->|LVM-thin hotplug| CSI_PVE
CSI_NFS --> NFS_PV
CSI_PVE --> Block_PV
NFS_PV --> Pods
Block_PV --> DBPods
style Proxmox fill:#e1f5ff
style K8s fill:#fff4e1
style NFS_HDD fill:#c8e6c9
style NFS_SSD fill:#ffe0b2
Components
| Component | Version/Config | Location | Purpose |
|---|---|---|---|
| Proxmox CSI plugin | Helm chart | Namespace: proxmox-csi | Block storage via LVM-thin hotplug |
StorageClass proxmox-lvm |
RWO, WaitForFirstConsumer | Cluster-wide | Databases and stateful apps |
| Proxmox NFS (HDD) | LV pve/nfs-data, 2TB ext4 |
192.168.1.127:/srv/nfs | Bulk NFS data for all services |
| Proxmox NFS (SSD) | LV ssd/nfs-ssd-data, 100GB ext4 |
192.168.1.127:/srv/nfs-ssd | High-performance data (Immich ML) |
| nfs-csi | Helm chart | Namespace: nfs-csi | NFS CSI driver |
StorageClass nfs-truenas |
RWX, soft mount | Cluster-wide | NFS storage (name kept for compatibility, points to Proxmox) |
StorageClass nfs-proxmox |
RWX, soft mount | Cluster-wide | NFS storage (identical to nfs-truenas) |
TF module nfs_volume |
modules/kubernetes/nfs_volume/ |
Infra repo | Static NFS PV/PVC factory |
| DECOMMISSIONED | Was VMID 9000 at 10.0.10.15 | Replaced by Proxmox NFS (2026-04) | |
| REMOVED | Was namespace: iscsi-csi | Replaced by Proxmox CSI (2026-04-02) | |
iscsi-truenas |
REMOVED | Was cluster-wide | Replaced by proxmox-lvm |
How It Works
NFS Storage Flow
- Directory creation: NFS share directories are created under
/srv/nfs/<service>(HDD) or/srv/nfs-ssd/<service>(SSD) on the Proxmox host - Export configuration:
/etc/exportson the Proxmox host lists per-directory NFS exports - Terraform module: Stacks use
modules/kubernetes/nfs_volume/to declaratively create static PV + PVC pairs:module "nfs_data" { source = "../../modules/kubernetes/nfs_volume" name = "immich-data" namespace = kubernetes_namespace.immich.metadata[0].name nfs_server = var.nfs_server # 192.168.1.127 nfs_path = "/srv/nfs/immich" } - Pod mount: Applications reference PVCs in their deployment specs
- Mount options: All NFS mounts use
soft,timeo=30,retrans=3(set in StorageClass) to prevent indefinite hangs
Note: Some legacy PVs still reference /mnt/main/<service> paths. These work via compatibility symlinks/bind-mounts on the Proxmox host. New PVs should use /srv/nfs/<service> or /srv/nfs-ssd/<service>.
CRITICAL: Never use inline nfs {} blocks in pod specs — they default to hard,timeo=600 which causes 10-minute hangs on network issues. Always use the nfs-truenas or nfs-proxmox StorageClass via PVCs.
Block Storage Flow (Proxmox CSI) — NEW
- PVC creation: Pod requests a PVC with
storageClass: proxmox-lvm - CSI provisioning: Proxmox CSI plugin calls the Proxmox API to create a thin LV in the
local-lvmstorage - SCSI hotplug: The thin LV is hotplugged as a VirtIO-SCSI disk directly into the K8s node VM
- Filesystem: CSI formats the disk as ext4 and mounts it into the pod
- Exclusive access: RWO only — disk is attached to one VM at a time
- Topology: Nodes are labeled with
topology.kubernetes.io/region=pveandzone=pvefor scheduling
Key advantage: Single CoW layer (LVM-thin only). No ZFS, no iSCSI network hop, no double-CoW corruption.
Proxmox API token: csi@pve!csi-token with CSI role (VM.Audit VM.Config.Disk Datastore.Allocate Datastore.AllocateSpace Datastore.Audit). Stored in Vault at secret/viktor.
iSCSI Storage Flow (DEPRECATED — replaced 2026-04-02)
This section is historical. All iSCSI PVCs have been migrated to Proxmox CSI (
proxmox-lvm). The democratic-csi iSCSI driver is pending removal.
Zvol creation: democratic-csi creates ZFS zvols undermain/iscsi/<pvc-name>via SSH commandsTarget setup: TrueNAS iSCSI service exposes zvols as iSCSI LUNsInitiator connection: K8s nodes connect via open-iscsi
SQLite on NFS — Why It Fails
SQLite uses fsync() to guarantee durability. NFS's soft mount + async semantics break this:
- Soft mount returns success even if data is still in client cache
- Network blips during fsync → incomplete writes → corruption
- WAL mode helps but doesn't eliminate the race
Solution: Use Proxmox CSI (proxmox-lvm) for any SQLite database (Vaultwarden, plotting-book) or local disk (ephemeral).
Democratic-CSI Sidecar Resources (HISTORICAL — democratic-csi removed)
Democratic-csi has been removed along with TrueNAS decommissioning (2026-04). This section is kept for historical reference only.
Configuration
Key Files
| Path | Purpose |
|---|---|
/etc/exports (on Proxmox host) |
NFS export configuration for all service shares |
stacks/proxmox-csi/ |
Terraform stack for Proxmox CSI plugin + StorageClass |
stacks/nfs-csi/ |
NFS CSI driver + StorageClasses (nfs-truenas, nfs-proxmox) |
modules/kubernetes/nfs_volume/ |
Reusable module for static NFS PV/PVC creation |
config.tfvars |
Variable nfs_server = "192.168.1.127" shared by all stacks |
Vault Paths
| Path | Contents |
|---|---|
secret/viktor/truenas_ssh_key |
LEGACY — was SSH key for democratic-csi SSH driver (TrueNAS decommissioned) |
secret/viktor/truenas_root_password |
LEGACY — was TrueNAS root password (TrueNAS decommissioned) |
Terraform Stacks
stacks/proxmox-csi/: Deploys Proxmox CSI plugin +proxmox-lvmStorageClass + node topology labelsstacks/nfs-csi/: Deploys NFS CSI driver + StorageClasses for Proxmox NFS- All application stacks reference NFS volumes via
module "nfs_<name>"calls - Database PVCs use
storageClass: proxmox-lvm(CNPG, MySQL Helm VCT, Redis Helm, standalone PVCs)
NFS Export Management
NFS exports are NOT managed by Terraform. To add a new service:
- SSH to Proxmox host:
ssh root@192.168.1.127 - Create the directory:
mkdir -p /srv/nfs/<service> && chmod 777 /srv/nfs/<service> - Edit
/etc/exports— add the export entry - Reload exports:
exportfs -ra - Verify:
showmount -e 192.168.1.127
Decisions & Rationale
Why NFS for Most Workloads?
- Simplicity: No volume provisioning delays, instant mounts
- RWX support: Multiple pods can share one volume (Nextcloud, Immich)
- Good enough: For SQLite on NFS specifically, we accept the risk for low-value data (logs, caches) but mandate proxmox-lvm for critical DBs
Why Proxmox CSI for Databases? (formerly iSCSI)
- ACID guarantees: Block device + local filesystem = real fsync
- Performance: No NFS protocol overhead for random I/O, no network hop (LVM-thin hotplug direct to VM)
- Tested: PostgreSQL CNPG and MySQL InnoDB Cluster both run on proxmox-lvm, zero corruption
- Single CoW layer: LVM-thin only, no ZFS double-CoW issues
Why Soft Mount for NFS?
Hard mounts with default timeo=600 (10 minutes) cause:
- 10-minute pod startup delays if NFS server is unreachable
kubectl delete podhangs for 10 minutes- Kernel task hangs blocking node operations
Soft mount (soft,timeo=30,retrans=3) trades availability for responsiveness:
- Max 90s hang (30s × 3 retries)
- Operations return EIO after timeout → app can handle error
- Acceptable for non-critical data paths
Critical paths: Databases use proxmox-lvm (not NFS), so soft mount never affects data integrity.
Troubleshooting
NFS Mount Hangs
Symptom: Pod stuck in ContainerCreating, df -h hangs on NFS mount
Diagnosis:
# On K8s node
mount | grep nfs
showmount -e 192.168.1.127
# Check NFS server (Proxmox host)
ssh root@192.168.1.127
ls -la /srv/nfs/<service>
cat /etc/exports | grep <service>
Fix:
- Verify directory exists:
ls /srv/nfs/<service>(or/srv/nfs-ssd/<service>) - Verify export:
grep <service> /etc/exports - If missing: add to
/etc/exportsand runexportfs -ra - Restart NFS server:
systemctl restart nfs-server
iSCSI Session Drops (HISTORICAL — iSCSI removed)
iSCSI was replaced by Proxmox CSI (2026-04-02) and TrueNAS has been decommissioned. This section is kept for historical reference only.
SQLite Corruption on NFS
Symptom: database disk image is malformed, checksum errors
Diagnosis:
# In pod
sqlite3 /data/db.sqlite "PRAGMA integrity_check;"
Fix: Migrate to proxmox-lvm
- Create proxmox-lvm PVC in Terraform stack
- Restore from backup to new volume
- Update deployment to use new PVC
- Delete old NFS PVC
Slow NFS Performance
Symptom: High latency on file operations, iostat shows NFS wait times
Diagnosis:
# On Proxmox host
ssh root@192.168.1.127
iostat -x 5
lvs --reportformat json pve/nfs-data ssd/nfs-ssd-data
# On K8s node
nfsiostat 5
Optimization:
- Move hot data to SSD NFS: relocate from
/srv/nfs/<service>to/srv/nfs-ssd/<service>and update PV path - Tune NFS mount: add
rsize=1048576,wsize=1048576to StorageClassmountOptions
Related
- Runbooks:
docs/runbooks/restore-postgresql.mddocs/runbooks/restore-mysql.mddocs/runbooks/recover-nfs-mount.md
- Architecture:
docs/architecture/backup-dr.md(backup strategy using LVM snapshots and Proxmox host scripts) - Reference:
.claude/reference/service-catalog.md(which services use NFS vs proxmox-lvm)