Commit graph

4 commits

Author SHA1 Message Date
Viktor Barzin
fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00
Viktor Barzin
6d224861c4 stem95su: scheduled Drive->site sync CronJob (every 10m)
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:42:26 +00:00
Viktor Barzin
63182730f9 docs(storage): record Wave-2 NFS migration + harden-proxmox-csi decision (option 1)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Document the 2026-06-05 decision to keep proxmox-csi and harden it (keep PVC
mobility, no hardware) over TopoLVM (pins to node) / Longhorn (2x writes on
single shared HDD). Wave-2 moved 5 non-DB workloads off block to NFS
(tandoor, speedtest, hackmd, changedetection, send), freeing 5 LUN slots.

- storage.md: live PVC counts, Retain-policy/orphan-LV note, Wave-2 history,
  updated cap-relief levers
- topolvm-evaluation.md: stamped NOT ADOPTED with rationale + pointer to the
  decision doc

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 20:15:21 +00:00
Viktor Barzin
82855848d1 plans: TopoLVM migration evaluation (Path 3 for LUN-cap relief)
Decision-support doc, NOT a commitment. Evaluates whether replacing
proxmox-csi with TopoLVM would lift the per-VM 29-PVC ceiling
permanently and at what cost.

Key trade-off documented: TopoLVM PVCs are pinned to the node where
the LV lives (topology.topolvm.cybozu.com/node). proxmox-csi PVCs
migrate between VMs when pods reschedule. The data-locality penalty
matters most for single-replica stateful services (MySQL standalone,
Nextcloud, Vaultwarden, mailserver, claude-memory, ~30 SQLite-backed
apps); replicated services (CNPG PG cluster, Redis-v2, Vault Raft)
absorb it.

Three disk-layout options:
  A. Carve per-VM data disks from sdc — simple, no hardware,
     IO contention unchanged
  B. Hybrid SSD/HDD — SSD-constrained at 675 GiB free
  C. Add a dedicated NVMe — also closes beads code-oflt (IO
     contention), ~£200 hardware investment

Effort estimate: 2.5-3 weeks of focused work for the full migration;
covers TopoLVM install, lvmd config, per-VM disk provisioning,
LUKS plumbing, 5 migration waves (regenerable → huge PVCs),
backup-pipeline rewrite, deprecation.

Recommended next step before committing: small pilot on
k8s-node5/6 with one non-critical PVC to validate the operational
pattern end-to-end.

Related: docs/architecture/storage.md § Per-VM SCSI-LUN cap,
docs/runbooks/scale-k8s-cluster.md (Path 1+2 alternative),
beads code-oflt (IO isolation).
2026-06-01 21:22:05 +00:00