docs(storage): record Wave-2 NFS migration + harden-proxmox-csi decision (option 1)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful

Document the 2026-06-05 decision to keep proxmox-csi and harden it (keep PVC
mobility, no hardware) over TopoLVM (pins to node) / Longhorn (2x writes on
single shared HDD). Wave-2 moved 5 non-DB workloads off block to NFS
(tandoor, speedtest, hackmd, changedetection, send), freeing 5 LUN slots.

- storage.md: live PVC counts, Retain-policy/orphan-LV note, Wave-2 history,
  updated cap-relief levers
- topolvm-evaluation.md: stamped NOT ADOPTED with rationale + pointer to the
  decision doc

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-05 20:15:21 +00:00
parent a0b34750ee
commit 63182730f9
2 changed files with 15 additions and 7 deletions

View file

@ -6,9 +6,10 @@ Last updated: 2026-05-24
The cluster uses two storage backends: **Proxmox CSI** for database block storage and **Proxmox NFS** for application data.
**Block storage (Proxmox CSI)**: ~95 PVCs for databases and stateful apps use two StorageClasses provisioned from the same `local-lvm` thin pool (sdc, 10.7TB RAID1 HDD):
- **`proxmox-lvm`**: Unencrypted block storage for non-sensitive workloads (~67 PVCs)
- **`proxmox-lvm-encrypted`**: LUKS2-encrypted block storage for all sensitive data (~28 PVCs) — databases, auth, email, password managers, git repos, health data, etc. Uses Argon2id key derivation with passphrase from Vault KV.
**Block storage (Proxmox CSI)**: ~69 PVCs for databases and stateful apps use two StorageClasses provisioned from the same `local-lvm` thin pool (sdc, 10.7TB RAID1 HDD):
- **`proxmox-lvm`**: Unencrypted block storage for non-sensitive workloads (~26 PVCs)
- **`proxmox-lvm-encrypted`**: LUKS2-encrypted block storage for all sensitive data (~43 PVCs) — databases, auth, email, password managers, git repos, health data, etc. Uses Argon2id key derivation with passphrase from Vault KV.
- **Both StorageClasses use `reclaimPolicy: Retain`.** Deleting a PVC frees the SCSI-LUN slot (the volume is detached) but **retains the underlying LV** for data safety — the PV goes `Released` and the LV (plus its daily `lvm-pvc-snapshot` snapshots) lingers on the thin pool. ~63 such orphan Released PVs exist as of 2026-06-05; batch orphan-LV reclaim is tracked in beads `code-dfjn`. The slot is freed regardless — orphans consume thin-pool space, not LUN slots.
All services storing sensitive data were migrated to `proxmox-lvm-encrypted` on 2026-04-15. This eliminates the previous double-CoW (ZFS + LVM-thin) path and ensures data-at-rest encryption.
@ -24,6 +25,8 @@ Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Pr
**History (2026-04-13)**: TrueNAS (VM 9000, 10.0.10.15) fully decommissioned. NFS storage migrated to the Proxmox host (192.168.1.127). ZFS datasets under `/mnt/main/` and `/mnt/ssd/` moved to ext4 LVs at `/srv/nfs/` and `/srv/nfs-ssd/`. Legacy PVs referencing `/mnt/main/` paths still work (bind-mounted or symlinked on the Proxmox host); new PVs use `/srv/nfs/` and `/srv/nfs-ssd/`. TrueNAS VM still exists in stopped state on PVE pending user decision on deletion.
**History (2026-06-05) — Wave 2 NFS migration + strategy decision**: Decided to **keep proxmox-csi and harden it** (option ① — keeps PVC mobility, £0, no new hardware) rather than re-architect to TopoLVM (pins PVCs to a node) or Longhorn (2× write-amplification on the single shared sdc HDD). See `docs/plans/2026-06-05-block-storage-harden-nfs-design.md`. Migrated 5 non-DB, embedded-DB-free workloads off block to NFS to relieve the per-VM LUN cap: **tandoor** (media, PG-backed), **speedtest** (config, MySQL), **hackmd** (image uploads, MySQL — dropped LUKS for low-sensitivity images), **changedetection** (JSON datastore), **send** (upload blobs, Redis). Freed 5 SCSI-LUN slots (4 on the then-hot node6, 21→16). Each followed the scale-0 → busybox mover (`cp -a`) → swap `claim_name` → delete block PVC pattern. **Remaining structural work (the "harden" half) is open in beads `code-dfjn`**: ghost-disk-loop *prevention* (cap disks/node, raise QMP timeout, auto-reconcile) + orphan Released-PV/LV cleanup.
## Architecture Diagram
```mermaid
@ -187,8 +190,13 @@ Levers (in order of leverage-per-effort):
1. **Migrate non-DB workloads off block** to NFS. Pre-flight every candidate
for embedded DBs (SQLite/LevelDB/RocksDB/H2/BoltDB) — they corrupt on NFS
due to lock semantics. Wave 1 (2026-05-26) moved 5 services
(excalidraw, resume, whisper, onlyoffice, f1-stream) and pre-flighted
two more out of scope (plotting-book → SQLite + WAL, stirling-pdf → H2).
(excalidraw, resume, whisper, onlyoffice, f1-stream). Wave 2 (2026-06-05)
moved 5 more (tandoor, speedtest, hackmd, changedetection, send — see
History "2026-06-05"). Pre-flighted-and-rejected (stay on block): plotting-book
(SQLite+WAL), stirling-pdf (H2), navidrome/ntfy/uptime-kuma/vaultwarden/
freshrss/actualbudget/openclaw (SQLite), rybbit (ClickHouse). **This is the
chosen long-term strategy (option ①)** — keep proxmox-csi's mobility, shrink
the block footprint, prevent the ghost loop (`code-dfjn`); not TopoLVM/Longhorn.
2. **Add another K8s worker VM** — each new worker brings up to 29 fresh
slots; the most durable answer if PVC count keeps growing.
3. **Patch+fork `sergelogvinov/proxmox-csi-plugin`** to bump the loop bound

View file

@ -1,8 +1,8 @@
# TopoLVM Migration Evaluation
**Date**: 2026-06-01
**Status**: Evaluation — not yet decided
**Decision**: Pending. Used to understand whether to commit to the migration.
**Status**: ❌ NOT ADOPTED — superseded 2026-06-05.
**Decision**: **Rejected in favour of option ① (harden proxmox-csi + NFS)** — TopoLVM pins PVCs to a node, which loses the cross-node pod mobility Viktor requires (a node going down must let pods reschedule elsewhere), and Option C's hardware spend was declined. Longhorn was also rejected (replication is 2× write-amplification on the single shared sdc HDD, with no DR benefit on a single host). See `2026-06-05-block-storage-harden-nfs-design.md` for the chosen path and full rationale. This doc is retained for its analysis — the LUN-cap mechanics, the three disk-layout options, and the effort estimate remain accurate reference if a second physical host is ever added (which would revive the Longhorn/replication option).
## Problem statement