Viktor Barzin d808694af4

ci/woodpecker/push/default Pipeline was successful

Details

ci/woodpecker/push/build-cli Pipeline was successful

Details

docs(storage): record harden-half shipped (orphan cleanup + ghost-reconcile)

2a orphan cleanup (67 Released PVs + 475 LVs removed, VG pve 997->~410) + 2b
csi-ghost-reconcile CronJob done — ghost-disk doom loop closed by construction,
beads code-dfjn retireable. Cap kept at 28 (lowering would reverse the
2026-05-25 eviction-cascade post-mortem fix). Phase-1: insta2spotify migrated
(noted its 3.26GB image re-pull blip on node reschedule).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-05 21:39:36 +00:00

24 KiB

Raw Blame History

Storage Architecture

Last updated: 2026-05-24

Overview

The cluster uses two storage backends: Proxmox CSI for database block storage and Proxmox NFS for application data.

Block storage (Proxmox CSI): ~69 PVCs for databases and stateful apps use two StorageClasses provisioned from the same local-lvm thin pool (sdc, 10.7TB RAID1 HDD):

proxmox-lvm: Unencrypted block storage for non-sensitive workloads (~26 PVCs)
proxmox-lvm-encrypted: LUKS2-encrypted block storage for all sensitive data (~43 PVCs) — databases, auth, email, password managers, git repos, health data, etc. Uses Argon2id key derivation with passphrase from Vault KV.
Both StorageClasses use reclaimPolicy: Retain. Deleting a PVC frees the SCSI-LUN slot (the volume is detached) but retains the underlying LV for data safety — the PV goes Released and the LV (plus its daily lvm-pvc-snapshot snapshots) lingers on the thin pool. ~63 such orphan Released PVs exist as of 2026-06-05; batch orphan-LV reclaim is tracked in beads code-dfjn. The slot is freed regardless — orphans consume thin-pool space, not LUN slots.

All services storing sensitive data were migrated to proxmox-lvm-encrypted on 2026-04-15. This eliminates the previous double-CoW (ZFS + LVM-thin) path and ensures data-at-rest encryption.

NFS storage (Proxmox host): ~100 NFS shares for media libraries (Immich, audiobookshelf, servarr, navidrome), backup targets (*-backup/ directories), and app data are served directly from the Proxmox host at 192.168.1.127. Two NFS export roots exist:

HDD NFS: /srv/nfs on ext4 LV pve/nfs-data (4TB) — bulk media and backup targets
SSD NFS: /srv/nfs-ssd on ext4 LV ssd/nfs-ssd-data (100GB) — high-performance data (Immich ML)

Both StorageClass: nfs-truenas and StorageClass: nfs-proxmox point to the Proxmox host and are functionally identical. The nfs-truenas name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster.

Backup storage (sda): 1.1TB RAID1 SAS disk, VG backup, LV data (ext4), mounted at /mnt/backup on PVE host. Dedicated backup disk for weekly PVC file backups, auto SQLite backups, pfSense backups, and PVE config. NFS data syncs directly to Synology via inotify change tracking (not stored on sda). Independent of live storage (sdc).

History (2026-04-02): iSCSI block volumes migrated from democratic-csi (TrueNAS iSCSI → ZFS → LVM-thin) to Proxmox CSI (direct LVM-thin hotplug). democratic-csi iSCSI driver removed.

History (2026-04-13): TrueNAS (VM 9000, 10.0.10.15) fully decommissioned. NFS storage migrated to the Proxmox host (192.168.1.127). ZFS datasets under /mnt/main/ and /mnt/ssd/ moved to ext4 LVs at /srv/nfs/ and /srv/nfs-ssd/. Legacy PVs referencing /mnt/main/ paths still work (bind-mounted or symlinked on the Proxmox host); new PVs use /srv/nfs/ and /srv/nfs-ssd/. TrueNAS VM still exists in stopped state on PVE pending user decision on deletion.

History (2026-06-05) — Wave 2 NFS migration + strategy decision: Decided to keep proxmox-csi and harden it (option ① — keeps PVC mobility, £0, no new hardware) rather than re-architect to TopoLVM (pins PVCs to a node) or Longhorn (2× write-amplification on the single shared sdc HDD). See docs/plans/2026-06-05-block-storage-harden-nfs-design.md. Migrated 5 non-DB, embedded-DB-free workloads off block to NFS to relieve the per-VM LUN cap: tandoor (media, PG-backed), speedtest (config, MySQL), hackmd (image uploads, MySQL — dropped LUKS for low-sensitivity images), changedetection (JSON datastore), send (upload blobs, Redis). Freed 5 SCSI-LUN slots (4 on the then-hot node6, 21→16). Each followed the scale-0 → busybox mover (cp -a) → swap claim_name → delete block PVC pattern. (Phase-1 follow-on 2026-06-05: insta2spotify also migrated — note its reschedule re-pulled a 3.26 GB image, a ~6 min blip; large-image services incur a pull-delay when a migration moves the pod to a fresh node.)

The "harden" half is now SHIPPED (2026-06-05):

Orphan cleanup — removed 67 Released proxmox PVs + 475 orphan LVs/snapshots (VG pve 997 → ~410 LVs; thin pool freed). 1 LV left (f127a41c, stuck-open stale qemu fd — harmless, clears on node reboot; do not force dmsetup remove).
Ghost-loop prevention — csi-ghost-reconcile CronJob (stacks/proxmox-csi/ghost-reconcile.tf, every 15 min) compares each worker VM's real scsi disks (Proxmox API, scoped CSI token) against k8s VolumeAttachments and safely detaches ghosts (PUT .../config delete=scsiN); detection mirrors check #47, with a 60 s re-confirm + per-run cap-5. Verified live (66 VAs, 0 ghosts). This closes the doom loop by construction — beads code-dfjn can be retired.
Cap deliberately kept at 28 (NOT lowered to 24): the labeler value (stacks/proxmox-csi/.../main.tf node_labels) was raised 24→28 per the 2026-05-25 eviction-cascade post-mortem; lowering it would reverse that fix. With auto-reconcile keeping drift at 0, the 28 cap is safe.

Architecture Diagram

graph TB
    subgraph Proxmox["Proxmox Host (192.168.1.127)"]
        sdc["sdc: 10.7TB RAID1 HDD<br/>VG pve, LV data (thin pool)<br/>~67 proxmox-lvm PVCs<br/>~28 proxmox-lvm-encrypted PVCs"]
        sda["sda: 1.1TB RAID1 SAS<br/>VG backup, LV data (ext4)<br/>/mnt/backup"]
        NFS_HDD["LV pve/nfs-data (4TB ext4)<br/>/srv/nfs<br/>~100 NFS shares<br/>Media + backup targets"]
        NFS_SSD["LV ssd/nfs-ssd-data (100GB ext4)<br/>/srv/nfs-ssd<br/>High-performance data<br/>(Immich ML)"]
        NFS_Exports["NFS Exports<br/>managed by /etc/exports"]
        NFS_HDD --> NFS_Exports
        NFS_SSD --> NFS_Exports
    end

    subgraph K8s["Kubernetes Cluster"]
        CSI_NFS["nfs-csi driver<br/>StorageClass: nfs-proxmox (+ legacy nfs-truenas)<br/>soft,timeo=30,retrans=3"]
        CSI_PVE["Proxmox CSI plugin<br/>StorageClass: proxmox-lvm<br/>StorageClass: proxmox-lvm-encrypted"]

        NFS_PV["NFS PersistentVolumes<br/>RWX, ~100 volumes"]
        Block_PV["Block PersistentVolumes<br/>RWO, ~67 PVCs (unencrypted)"]
        Enc_PV["Encrypted Block PVs<br/>RWO, ~28 PVCs (LUKS2)"]

        Pods["Application Pods"]
        DBPods["Database Pods<br/>PostgreSQL CNPG<br/>MySQL InnoDB"]
    end

    NFS_Exports -->|NFS mount| CSI_NFS
    sdc -->|LVM-thin hotplug| CSI_PVE

    CSI_NFS --> NFS_PV
    CSI_PVE --> Block_PV
    CSI_PVE --> Enc_PV

    NFS_PV --> Pods
    Block_PV --> Pods
    Enc_PV --> DBPods

    style Proxmox fill:#e1f5ff
    style K8s fill:#fff4e1
    style NFS_HDD fill:#c8e6c9
    style NFS_SSD fill:#ffe0b2

Components

Component	Version/Config	Location	Purpose
Proxmox CSI plugin	Helm chart	Namespace: proxmox-csi	Block storage via LVM-thin hotplug
StorageClass `proxmox-lvm`	RWO, WaitForFirstConsumer	Cluster-wide	Non-sensitive stateful apps
StorageClass `proxmox-lvm-encrypted`	RWO, WaitForFirstConsumer, LUKS2	Cluster-wide	All sensitive data (databases, auth, email, passwords, git)
Proxmox NFS (HDD)	LV `pve/nfs-data`, 4TB ext4	192.168.1.127:/srv/nfs	Bulk NFS data for all services
Proxmox NFS (SSD)	LV `ssd/nfs-ssd-data`, 100GB ext4	192.168.1.127:/srv/nfs-ssd	High-performance data (Immich ML)
nfs-csi	Helm chart	Namespace: nfs-csi	NFS CSI driver
StorageClass `nfs-proxmox`	RWX, soft mount	Cluster-wide	NFS storage, points to Proxmox host
StorageClass `nfs-truenas`	RWX, soft mount	Cluster-wide	Historical name — functionally identical to `nfs-proxmox`, points to the Proxmox host. Kept because SC names are immutable on 48 bound PVs.
TF module `nfs_volume`	`modules/kubernetes/nfs_volume/`	Infra repo	Static NFS PV/PVC factory
~~TrueNAS VM~~	DECOMMISSIONED 2026-04-13	Was VM 9000 at 10.0.10.15	Replaced by Proxmox NFS. VM still in stopped state pending deletion.
~~democratic-csi-iscsi~~	REMOVED	Was namespace: iscsi-csi	Replaced by Proxmox CSI (2026-04-02)
~~StorageClass `iscsi-truenas`~~	REMOVED	Was cluster-wide	Replaced by `proxmox-lvm`

How It Works

NFS Storage Flow

Directory creation: NFS share directories are created under /srv/nfs/<service> (HDD) or /srv/nfs-ssd/<service> (SSD) on the Proxmox host
Export configuration: /etc/exports on the Proxmox host lists per-directory NFS exports

Terraform module: Stacks use modules/kubernetes/nfs_volume/ to declaratively create static PV + PVC pairs:

module "nfs_data" {
  source     = "../../modules/kubernetes/nfs_volume"
  name       = "immich-data"
  namespace  = kubernetes_namespace.immich.metadata[0].name
  nfs_server = var.nfs_server  # 192.168.1.127
  nfs_path   = "/srv/nfs/immich"
}

Pod mount: Applications reference PVCs in their deployment specs
Mount options: All NFS mounts use soft,timeo=30,retrans=3 (set in StorageClass) to prevent indefinite hangs

Note: Some legacy PVs still reference /mnt/main/<service> paths. These work via compatibility symlinks/bind-mounts on the Proxmox host. New PVs should use /srv/nfs/<service> or /srv/nfs-ssd/<service>.

CRITICAL: Never use inline nfs {} blocks in pod specs — they default to hard,timeo=600 which causes 10-minute hangs on network issues. Always use the nfs-proxmox StorageClass (or the legacy nfs-truenas for existing PVs) via PVCs.

Block Storage Flow (Proxmox CSI) — NEW

PVC creation: Pod requests a PVC with storageClass: proxmox-lvm
CSI provisioning: Proxmox CSI plugin calls the Proxmox API to create a thin LV in the local-lvm storage
SCSI hotplug: The thin LV is hotplugged as a VirtIO-SCSI disk directly into the K8s node VM
Filesystem: CSI formats the disk as ext4 and mounts it into the pod
Exclusive access: RWO only — disk is attached to one VM at a time
Topology: Nodes are labeled with topology.kubernetes.io/region=pve and zone=pve for scheduling

Key advantage: Single CoW layer (LVM-thin only). No ZFS, no iSCSI network hop, no double-CoW corruption.

Proxmox API token: csi@pve!csi-token with CSI role (VM.Audit VM.Config.Disk Datastore.Allocate Datastore.AllocateSpace Datastore.Audit). Stored in Vault at secret/viktor.

Encrypted Block Storage Flow (proxmox-lvm-encrypted) — 2026-04-15

PVC creation: Pod requests a PVC with storageClass: proxmox-lvm-encrypted
CSI provisioning: Same as proxmox-lvm — thin LV created in local-lvm
LUKS encryption: CSI node plugin reads the encryption passphrase from K8s Secret proxmox-csi-encryption (namespace kube-system), formats the disk with LUKS2 (Argon2id key derivation), then creates ext4 on top
Transparent mounting: Application sees a normal ext4 filesystem — encryption/decryption is handled by dm-crypt in the kernel
Passphrase management: ExternalSecret syncs passphrase from Vault KV (secret/viktor/proxmox_csi_encryption_passphrase) → K8s Secret. Backup key at /root/.luks-backup-key on PVE host.

Services on encrypted storage (2026-04-15 migration): vaultwarden, dbaas (mysql+pg+pgadmin), mailserver, nextcloud, forgejo, matrix, n8n, affine, health, hackmd, redis, headscale, frigate, meshcentral, technitium, actualbudget, grampsweb, owntracks, wealthfolio, monitoring (alertmanager)

Services migrated later (post-audit catch-up): paperless-ngx (2026-04-25 — sensitive document scans had been left on plain proxmox-lvm by an abandoned attempt; rsync swap cleaned up the orphan and re-did via Terraform). Vault raft cluster (2026-04-25 — all 3 voters migrated from nfs-proxmox to proxmox-lvm-encrypted after the 2026-04-22 raft-leader-deadlock post-mortem found NFS fsync semantics incompatible with raft consensus log; rolled non-leader-first with force-finalize on the pvc-protection finalizer to avoid pod-recreating on the old PVCs).

CSI node plugin memory: Requires 1280Mi limit for LUKS2 Argon2id key derivation (~1GiB). Set via node.plugin.resources in Helm values (not node.resources).

Terraform stack: stacks/proxmox-csi/ manages both StorageClasses, the ExternalSecret, and CSI plugin resources.

iSCSI Storage Flow (DEPRECATED — replaced 2026-04-02)

This section is historical. All iSCSI PVCs have been migrated to Proxmox CSI (proxmox-lvm). The democratic-csi iSCSI driver is pending removal.

~~Zvol creation: democratic-csi creates ZFS zvols under main/iscsi/<pvc-name> via SSH commands~~
~~Target setup: TrueNAS iSCSI service exposes zvols as iSCSI LUNs~~
~~Initiator connection: K8s nodes connect via open-iscsi~~

SQLite on NFS — Why It Fails

SQLite uses fsync() to guarantee durability. NFS's soft mount + async semantics break this:

Soft mount returns success even if data is still in client cache
Network blips during fsync → incomplete writes → corruption
WAL mode helps but doesn't eliminate the race

Solution: Use Proxmox CSI (proxmox-lvm) for any SQLite database (Vaultwarden, plotting-book) or local disk (ephemeral).

Democratic-CSI Sidecar Resources (HISTORICAL — democratic-csi removed)

Democratic-csi has been removed along with TrueNAS decommissioning (2026-04). This section is kept for historical reference only.

Per-VM SCSI-LUN cap (29 block PVCs per K8s node)

The proxmox-csi-plugin hardcodes a per-VM LUN ceiling at 29. The plugin scans scsi1..scsi29 for a free slot when attaching a PVC (pkg/csi/utils.go:394: for lun = 1; lun < 30; lun++); when the loop exits without a hit, ControllerPublishVolume returns Internal desc = no free lun found. CSINode.allocatable.count is advertised as 28 for every worker — derived from this plugin limit, NOT from Proxmox or QEMU constraints.

What this means in practice:

Each K8s node VM can hold at most 29 block PVCs simultaneously (scsi0 is the OS disk).
Switching scsihw from virtio-scsi-pci to virtio-scsi-single gains per-disk iothread isolation but zero additional capacity — the cap lives in the CSI plugin, not the QEMU device topology. Proxmox itself allows scsi0..scsi30 (31 slots, $MAX_SCSI_DISKS = 31 in /usr/share/perl5/PVE/QemuServer/Drive.pm).
NFS PVCs (nfs.csi.k8s.io) are kernel NFS mounts and do not count against the SCSI cap. Moving non-DB workloads (config-only, static content, regenerable cache, pure upload buckets) to NFS is the simplest relief.
Symptom when the cap is hit: pods stuck ContainerCreating with FailedAttachVolume … no free lun found event, and the proxmox-csi controller hot-loops ControllerPublishVolume against the saturated VM.

Levers (in order of leverage-per-effort):

Migrate non-DB workloads off block to NFS. Pre-flight every candidate for embedded DBs (SQLite/LevelDB/RocksDB/H2/BoltDB) — they corrupt on NFS due to lock semantics. Wave 1 (2026-05-26) moved 5 services (excalidraw, resume, whisper, onlyoffice, f1-stream). Wave 2 (2026-06-05) moved 5 more (tandoor, speedtest, hackmd, changedetection, send — see History "2026-06-05"). Pre-flighted-and-rejected (stay on block): plotting-book (SQLite+WAL), stirling-pdf (H2), navidrome/ntfy/uptime-kuma/vaultwarden/ freshrss/actualbudget/openclaw (SQLite), rybbit (ClickHouse). This is the chosen long-term strategy (option ①) — keep proxmox-csi's mobility, shrink the block footprint, prevent the ghost loop (code-dfjn); not TopoLVM/Longhorn.
Add another K8s worker VM — each new worker brings up to 29 fresh slots; the most durable answer if PVC count keeps growing.
Patch+fork sergelogvinov/proxmox-csi-plugin to bump the loop bound from < 30 to < 31 (matches Proxmox MAX_SCSI_DISKS). +1 slot per VM. File upstream PR. Self-maintained image until merged.

Configuration

Key Files

Path	Purpose
`/etc/exports` (on Proxmox host)	NFS export configuration for all service shares
`stacks/proxmox-csi/`	Terraform stack for Proxmox CSI plugin + StorageClass
`stacks/nfs-csi/`	NFS CSI driver + StorageClasses (`nfs-proxmox` + legacy `nfs-truenas`)
`modules/kubernetes/nfs_volume/`	Reusable module for static NFS PV/PVC creation
`config.tfvars`	Variable `nfs_server = "192.168.1.127"` shared by all stacks

Vault Paths

Path	Contents
`secret/viktor/proxmox_csi_encryption_passphrase`	LUKS2 encryption passphrase for `proxmox-lvm-encrypted` StorageClass
~~`secret/viktor/truenas_ssh_key`~~	REMOVED — was SSH key for democratic-csi SSH driver (TrueNAS decommissioned 2026-04-13)
~~`secret/viktor/truenas_root_password`~~	REMOVED — was TrueNAS root password (TrueNAS decommissioned 2026-04-13)
~~`secret/viktor/truenas_api_key`~~	REMOVED — was TrueNAS API key (TrueNAS decommissioned 2026-04-13)
~~`secret/viktor/truenas_ssh_private_key`~~	REMOVED — was TrueNAS SSH private key (TrueNAS decommissioned 2026-04-13)

Terraform Stacks

stacks/proxmox-csi/: Deploys Proxmox CSI plugin + proxmox-lvm and proxmox-lvm-encrypted StorageClasses + ExternalSecret for encryption passphrase + node topology labels
stacks/nfs-csi/: Deploys NFS CSI driver + StorageClasses for Proxmox NFS
All application stacks reference NFS volumes via module "nfs_<name>" calls
Database PVCs use storageClass: proxmox-lvm (CNPG, MySQL Helm VCT, Redis Helm, standalone PVCs)

NFS Export Management

NFS exports are NOT managed by Terraform. To add a new service:

SSH to Proxmox host: ssh root@192.168.1.127
Create the directory: mkdir -p /srv/nfs/<service> && chmod 777 /srv/nfs/<service>
Edit /etc/exports — add the export entry
Reload exports: exportfs -ra
Verify: showmount -e 192.168.1.127

Decisions & Rationale

Why NFS for Most Workloads?

Simplicity: No volume provisioning delays, instant mounts
RWX support: Multiple pods can share one volume (Nextcloud, Immich)
Good enough: For SQLite on NFS specifically, we accept the risk for low-value data (logs, caches) but mandate proxmox-lvm for critical DBs

Why Proxmox CSI for Databases? (formerly iSCSI)

ACID guarantees: Block device + local filesystem = real fsync
Performance: No NFS protocol overhead for random I/O, no network hop (LVM-thin hotplug direct to VM)
Tested: PostgreSQL CNPG and MySQL InnoDB Cluster both run on proxmox-lvm, zero corruption
Single CoW layer: LVM-thin only, no ZFS double-CoW issues

Why Soft Mount for NFS?

Hard mounts with default timeo=600 (10 minutes) cause:

10-minute pod startup delays if NFS server is unreachable
kubectl delete pod hangs for 10 minutes
Kernel task hangs blocking node operations

Soft mount (soft,timeo=30,retrans=3) trades availability for responsiveness:

Max 90s hang (30s × 3 retries)
Operations return EIO after timeout → app can handle error
Acceptable for non-critical data paths

Critical paths: Databases use proxmox-lvm (not NFS), so soft mount never affects data integrity.

Troubleshooting

NFS Mount Hangs

Symptom: Pod stuck in ContainerCreating, df -h hangs on NFS mount

Diagnosis:

# On K8s node
mount | grep nfs
showmount -e 192.168.1.127

# Check NFS server (Proxmox host)
ssh root@192.168.1.127
ls -la /srv/nfs/<service>
cat /etc/exports | grep <service>

Fix:

Verify directory exists: ls /srv/nfs/<service> (or /srv/nfs-ssd/<service>)
Verify export: grep <service> /etc/exports
If missing: add to /etc/exports and run exportfs -ra
Restart NFS server: systemctl restart nfs-server

iSCSI Session Drops (HISTORICAL — iSCSI removed)

iSCSI was replaced by Proxmox CSI (2026-04-02) and TrueNAS has been decommissioned. This section is kept for historical reference only.

SQLite Corruption on NFS

Symptom: database disk image is malformed, checksum errors

Diagnosis:

# In pod
sqlite3 /data/db.sqlite "PRAGMA integrity_check;"

Fix: Migrate to proxmox-lvm

Create proxmox-lvm PVC in Terraform stack
Restore from backup to new volume
Update deployment to use new PVC
Delete old NFS PVC

Slow NFS Performance

Symptom: High latency on file operations, iostat shows NFS wait times

Diagnosis:

# On Proxmox host
ssh root@192.168.1.127
iostat -x 5
lvs --reportformat json pve/nfs-data ssd/nfs-ssd-data

# On K8s node
nfsiostat 5

Optimization:

Move hot data to SSD NFS: relocate from /srv/nfs/<service> to /srv/nfs-ssd/<service> and update PV path
Tune NFS mount: add rsize=1048576,wsize=1048576 to StorageClass mountOptions

Nextcloud as PVE-NFS browser

Both NFS export roots are mounted into the Nextcloud server pod — /srv/nfs at /mnt/pve-nfs and /srv/nfs-ssd at /mnt/pve-nfs-ssd — via standard NFS PVs (nfs_volume module). No host-level Unix user/group setup; Nextcloud is the sole household-facing surface.

ACL model — two patterns:

Root browser mounts (PVE NFS Pool, PVE NFS-SSD Pool): scoped to NC group admin. Used by Viktor for ad-hoc browsing of any cluster NFS state. Other users never see these mounts.
Per-archive mounts (e.g. /anca-elements → /mnt/pve-nfs/anca-elements): one NC External mount per archive, applicable_users set to the archive owners. Users see only the mounts assigned to them. Write/delete access is implicit at the OS level (NC pod writes via no_root_squash); deny semantics come from mount visibility — if the mount is not in your list, you cannot reach the path.

Why mount-level ACL, not Files Access Control: NC 30/31's workflow engine check classes are FileName (basename), FileMimeType, FileSize, FileSystemTags, and UserGroupMembership. There is no FilePath and no UserId check class. Per-(directory, user) rules are not expressible via FAC. Mount-level ACL via occ files_external:applicable is the supported primitive and maps cleanly onto the model.

Manifest: kubernetes_config_map_v1.nextcloud_external_storage_manifest in stacks/nextcloud/external_storage.tf. Mount entries reference NC usernames (admin, anca, emo — not display names; admin is Viktor). JSON shape:

{
  "rootMounts": [
    { "mountPoint": "/PVE NFS Pool",     "dataDir": "/mnt/pve-nfs",     "applicableGroup": "admin", "enableSharing": true },
    { "mountPoint": "/PVE NFS-SSD Pool", "dataDir": "/mnt/pve-nfs-ssd", "applicableGroup": "admin", "enableSharing": true }
  ],
  "archiveMounts": [
    { "mountPoint": "/anca-elements", "dataDir": "/mnt/pve-nfs/anca-elements", "applicableUsers": ["anca", "admin"], "applicableGroups": [], "enableSharing": false }
  ]
}

A one-shot K8s bootstrap Job applies the manifest idempotently on every tg apply via occ files_external:*, occ files_external:applicable, and occ files_external:option. enableSharing: true lets admin re-share a subfolder of the mount with another NC user/group/public link; default is false (NC's local-backend default).

Adding a new archive: drop the directory under /srv/nfs/<name>/ on PVE, append an archiveMounts entry to the manifest, then scripts/tg apply the nextcloud stack. See docs/runbooks/nextcloud-add-archive.md for the full step-by-step.

Trade-off: a compromised NC admin account has destructive reach over the cluster NFS roots (admin sees the root browser mounts). Accepted — Viktor's account is the single high-value target either way. No lateral movement to databases or block PVCs via this path (those are not NFS).

Backup: Synology retains a frozen copy of each archive (3-2-1 coverage); the existing offsite-sync-backup pipeline provides nightly delta sync from /srv/nfs/<archive> → Synology nfs/.

Runbooks:
- docs/runbooks/restore-postgresql.md
- docs/runbooks/restore-mysql.md
- docs/runbooks/recover-nfs-mount.md
- docs/runbooks/nextcloud-add-archive.md
Architecture: docs/architecture/backup-dr.md (backup strategy using LVM snapshots and Proxmox host scripts)
Reference: .claude/reference/service-catalog.md (which services use NFS vs proxmox-lvm)

24 KiB Raw Blame History Unescape Escape

Storage Architecture

Overview

Architecture Diagram

Components

How It Works

NFS Storage Flow

Block Storage Flow (Proxmox CSI) — NEW

Encrypted Block Storage Flow (proxmox-lvm-encrypted) — 2026-04-15

iSCSI Storage Flow (DEPRECATED — replaced 2026-04-02)

SQLite on NFS — Why It Fails

Democratic-CSI Sidecar Resources (HISTORICAL — democratic-csi removed)

Per-VM SCSI-LUN cap (29 block PVCs per K8s node)

Configuration

Key Files

Vault Paths

Terraform Stacks

NFS Export Management

Decisions & Rationale

Why NFS for Most Workloads?

Why Proxmox CSI for Databases? (formerly iSCSI)

Why Soft Mount for NFS?

Troubleshooting

NFS Mount Hangs

iSCSI Session Drops (HISTORICAL — iSCSI removed)

SQLite Corruption on NFS

Slow NFS Performance

Nextcloud as PVE-NFS browser

Related

24 KiB

Raw Blame History