Document the 2026-06-05 decision to keep proxmox-csi and harden it (keep PVC
mobility, no hardware) over TopoLVM (pins to node) / Longhorn (2x writes on
single shared HDD). Wave-2 moved 5 non-DB workloads off block to NFS
(tandoor, speedtest, hackmd, changedetection, send), freeing 5 LUN slots.
- storage.md: live PVC counts, Retain-policy/orphan-LV note, Wave-2 history,
updated cap-relief levers
- topolvm-evaluation.md: stamped NOT ADOPTED with rationale + pointer to the
decision doc
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Decision-support doc, NOT a commitment. Evaluates whether replacing
proxmox-csi with TopoLVM would lift the per-VM 29-PVC ceiling
permanently and at what cost.
Key trade-off documented: TopoLVM PVCs are pinned to the node where
the LV lives (topology.topolvm.cybozu.com/node). proxmox-csi PVCs
migrate between VMs when pods reschedule. The data-locality penalty
matters most for single-replica stateful services (MySQL standalone,
Nextcloud, Vaultwarden, mailserver, claude-memory, ~30 SQLite-backed
apps); replicated services (CNPG PG cluster, Redis-v2, Vault Raft)
absorb it.
Three disk-layout options:
A. Carve per-VM data disks from sdc — simple, no hardware,
IO contention unchanged
B. Hybrid SSD/HDD — SSD-constrained at 675 GiB free
C. Add a dedicated NVMe — also closes beads code-oflt (IO
contention), ~£200 hardware investment
Effort estimate: 2.5-3 weeks of focused work for the full migration;
covers TopoLVM install, lvmd config, per-VM disk provisioning,
LUKS plumbing, 5 migration waves (regenerable → huge PVCs),
backup-pipeline rewrite, deprecation.
Recommended next step before committing: small pilot on
k8s-node5/6 with one non-critical PVC to validate the operational
pattern end-to-end.
Related: docs/architecture/storage.md § Per-VM SCSI-LUN cap,
docs/runbooks/scale-k8s-cluster.md (Path 1+2 alternative),
beads code-oflt (IO isolation).