diff --git a/docs/architecture/storage.md b/docs/architecture/storage.md index 19e82fec..486246a6 100644 --- a/docs/architecture/storage.md +++ b/docs/architecture/storage.md @@ -25,7 +25,12 @@ Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Pr **History (2026-04-13)**: TrueNAS (VM 9000, 10.0.10.15) fully decommissioned. NFS storage migrated to the Proxmox host (192.168.1.127). ZFS datasets under `/mnt/main/` and `/mnt/ssd/` moved to ext4 LVs at `/srv/nfs/` and `/srv/nfs-ssd/`. Legacy PVs referencing `/mnt/main/` paths still work (bind-mounted or symlinked on the Proxmox host); new PVs use `/srv/nfs/` and `/srv/nfs-ssd/`. TrueNAS VM still exists in stopped state on PVE pending user decision on deletion. -**History (2026-06-05) — Wave 2 NFS migration + strategy decision**: Decided to **keep proxmox-csi and harden it** (option ① — keeps PVC mobility, £0, no new hardware) rather than re-architect to TopoLVM (pins PVCs to a node) or Longhorn (2× write-amplification on the single shared sdc HDD). See `docs/plans/2026-06-05-block-storage-harden-nfs-design.md`. Migrated 5 non-DB, embedded-DB-free workloads off block to NFS to relieve the per-VM LUN cap: **tandoor** (media, PG-backed), **speedtest** (config, MySQL), **hackmd** (image uploads, MySQL — dropped LUKS for low-sensitivity images), **changedetection** (JSON datastore), **send** (upload blobs, Redis). Freed 5 SCSI-LUN slots (4 on the then-hot node6, 21→16). Each followed the scale-0 → busybox mover (`cp -a`) → swap `claim_name` → delete block PVC pattern. **Remaining structural work (the "harden" half) is open in beads `code-dfjn`**: ghost-disk-loop *prevention* (cap disks/node, raise QMP timeout, auto-reconcile) + orphan Released-PV/LV cleanup. +**History (2026-06-05) — Wave 2 NFS migration + strategy decision**: Decided to **keep proxmox-csi and harden it** (option ① — keeps PVC mobility, £0, no new hardware) rather than re-architect to TopoLVM (pins PVCs to a node) or Longhorn (2× write-amplification on the single shared sdc HDD). See `docs/plans/2026-06-05-block-storage-harden-nfs-design.md`. Migrated 5 non-DB, embedded-DB-free workloads off block to NFS to relieve the per-VM LUN cap: **tandoor** (media, PG-backed), **speedtest** (config, MySQL), **hackmd** (image uploads, MySQL — dropped LUKS for low-sensitivity images), **changedetection** (JSON datastore), **send** (upload blobs, Redis). Freed 5 SCSI-LUN slots (4 on the then-hot node6, 21→16). Each followed the scale-0 → busybox mover (`cp -a`) → swap `claim_name` → delete block PVC pattern. (Phase-1 follow-on 2026-06-05: insta2spotify also migrated — note its reschedule re-pulled a 3.26 GB image, a ~6 min blip; large-image services incur a pull-delay when a migration moves the pod to a fresh node.) + +**The "harden" half is now SHIPPED (2026-06-05):** +- **Orphan cleanup** — removed 67 `Released` proxmox PVs + 475 orphan LVs/snapshots (VG `pve` 997 → ~410 LVs; thin pool freed). 1 LV left (`f127a41c`, stuck-open stale qemu fd — harmless, clears on node reboot; do not force `dmsetup remove`). +- **Ghost-loop prevention** — `csi-ghost-reconcile` CronJob (`stacks/proxmox-csi/ghost-reconcile.tf`, every 15 min) compares each worker VM's real scsi disks (Proxmox API, scoped CSI token) against k8s VolumeAttachments and safely detaches ghosts (`PUT .../config delete=scsiN`); detection mirrors check #47, with a 60 s re-confirm + per-run cap-5. Verified live (66 VAs, 0 ghosts). This closes the doom loop by construction — **beads `code-dfjn` can be retired.** +- **Cap deliberately kept at 28** (NOT lowered to 24): the labeler value (`stacks/proxmox-csi/.../main.tf` `node_labels`) was raised 24→28 per the 2026-05-25 eviction-cascade post-mortem; lowering it would reverse that fix. With auto-reconcile keeping drift at 0, the 28 cap is safe. ## Architecture Diagram