redis: tolerate up to 1KB of AOF tail corruption on load

Post-2026-05-26 unclean node2 reboot left redis-v2-2's incremental AOF truncated at offset 84799139. With aof-load-corrupt-tail-max-size at its default 0, redis refuses to load any corruption and crashloops. Setting 1024 lets it truncate the corrupted tail and continue, which is the right call for a non-source-of-truth cache fronted by sentinel. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
cluster-health: emergency-stop Keel + roll back image downgrades + quota raises
2026-05-26 18:48:58 +00:00 · 2026-05-26 18:48:50 +00:00 · 2026-05-26 18:22:01 +00:00 · 2026-05-26 13:26:36 +00:00 · 2026-05-26 11:52:00 +00:00 · 2026-05-26 11:48:55 +00:00
84 changed files with 4011 additions and 2871 deletions
--- a/docs/architecture/backup-dr.md
+++ b/docs/architecture/backup-dr.md
@ -1,11 +1,28 @@
 # Backup & Disaster Recovery Architecture

-Last updated: 2026-05-24
+Last updated: 2026-05-26

-> **2026-05-24 session — what changed today** (deeper structural review pending — see the open backup-pipeline simplification audit):
+> **2026-05-26 — bypass list pruned to a single path** (follow-up to the
+> 2026-05-24 changes below):
+> - `nfs-mirror` now copies ollama, audiblez, ebook2audiobook, and every
+>   `*-backup` CronJob output onto sda. Previously these went sdc → Synology
+>   DIRECT via Step 2; now they ride leg 1 like everything else.
+> - **Bypass list (leg 2)** is now just `/srv/nfs/immich/` — too big for sda
+>   (1.5 T), no other choice.
+> - **frigate and temp**: dropped from BOTH legs — intentionally not backed up.
+>   frigate is a 14-day camera ring, temp is scratch space. User explicit ask
+>   2026-05-26.
+> - **prometheus, loki, alertmanager**: live-orphan dirs that no longer
+>   exist on `/srv/nfs`. Dropped from the exclude/include lists as no-ops.
+> - `/mnt/backup/anca-elements` (423 G) deleted — canonical copy lives in
+>   Immich since the 2026-05-24 ingest.
+> - Aftermath: sda 87% → 46% used; Synology `/Viki/nfs/` shrinks to
+>   immich-only on next monthly `--delete` pass (or manual cleanup —
+>   see runbook).
+>
+> **2026-05-24 session — what changed**:
 > - **anca-elements archive direction inverted** — Synology `/Backup/Anca/Elements` (770G) deleted; PVE `/srv/nfs/anca-elements` is now source of truth. `anca-elements-sync.sh` retired.
 > - **`anca-elements-mirror.{sh,service,timer}` retired**, subsumed into the new **`nfs-mirror`** weekly job covering all critical NFS subtrees (anca-elements + ~80 services) → sda.
-> - **`offsite-sync-backup` Step 2 filter inverted**: NFS-direct-to-Synology now only carries the sda-bypass paths (immich + frigate + prometheus + `*-backup` + …). Two-leg invariant: `nfs-mirror.sh EXCLUDES` ≡ `offsite-sync-backup Step 2 INCLUDES`. Cross-referenced in both scripts.
 > - **Synology `/Backup/Viki/nfs/<svc>/` orphan cleanup** — 84 dirs renamed in-place (btrfs metadata-only) to `/Backup/Viki/pve-backup/<svc>/` so daily-incremental Step 1 sees them as pre-existing and only ships deltas. No re-transfer.
 > - **Synology snapshot retention 7d → 3d**, all 8 backlog snapshots deleted via `sudo synosharesnapshot delete Backup ...`. Reclaimed ~800G btrfs (98% → 83% used). DSM API was blocked by 2FA; `sudo` over the existing `Administrator` SSH key worked with the Vault-stored password.
 > - **Manifest mechanism extended**: `nfs-mirror` now appends its transferred file list to `/mnt/backup/.changed-files` so daily Step 1 incremental picks it up (was previously only fed by `daily-backup`).
@ -16,19 +33,19 @@ The homelab runs a 3-2-1 strategy with a **two-leg** path to Synology so every N

 ```
 sdc /srv/nfs/<svc>/   ──nfs-mirror weekly──→  sda /mnt/backup/<svc>/   ──offsite-sync Step 1──→  Synology /Backup/Viki/pve-backup/<svc>/      [leg 1]
-sdc /srv/nfs/<bypass>/  ──inotify (nfs-change-tracker)──→  offsite-sync Step 2  ──→  Synology /Backup/Viki/nfs/<bypass>/                       [leg 2]
+sdc /srv/nfs/immich/  ──inotify (nfs-change-tracker)──→  offsite-sync Step 2  ──→  Synology /Backup/Viki/nfs/immich/                          [leg 2]
 sdc PVCs (LVM thin)   ──daily-backup~snapshot~rsync──→  sda /mnt/backup/{pvc-data,sqlite-backup,pfsense,pve-config}/  ──Step 1──→  Synology /Backup/Viki/pve-backup/
 ```

-The **bypass list** (paths that take leg 2 — too big for sda, transient, or already-a-backup): `immich`, `frigate`, `prometheus`, `loki`, `temp`, `alertmanager`, `ollama`, `audiblez`, `ebook2audiobook`, `*-backup`. Anything NOT in this list rides leg 1 via `nfs-mirror`.
+The **bypass list** (leg 2) is just `/srv/nfs/immich/` — too big for sda (1.5 T). **Not backed up at all**: `/srv/nfs/frigate/` (camera ring buffer), `/srv/nfs/temp/` (scratch). Everything else rides leg 1 via `nfs-mirror`.

 **3-2-1 Breakdown**:
 - **Copy 1** (live): all PVC data + VM disks on Proxmox sdc thin pool (10.7TB RAID1 HDD); all NFS data at `/srv/nfs[-ssd]/`
- **Copy 2** (local backup): sda `/mnt/backup` (1.1TB RAID1 SAS) — at **~90% used** post-2026-05-24 (was ~10% in April)
- **Copy 3** (offsite): Synology NAS at 192.168.1.13 — at **~83% used / 934G free** post-2026-05-24 (was 98% / 121G before today's cleanup)
-  - `Synology/Backup/Viki/pve-backup/` — sda contents (PVC backups + nfs-mirror output: ~90 service dirs)
-  - `Synology/Backup/Viki/nfs/` — bypass-list NFS (immich, frigate, etc.)
-  - `Synology/Backup/Viki/nfs-ssd/` — bypass-list SSD NFS (immich-ML, ollama, llamacpp)
+- **Copy 2** (local backup): sda `/mnt/backup` (1.1TB RAID1 SAS) — **46% used** post-2026-05-26 (was 87% before anca-elements cleanup; bypass-list pruning added ~260 G of *-backup + ollama + audiblez + ebook2audiobook)
+- **Copy 3** (offsite): Synology NAS at 192.168.1.13
+  - `Synology/Backup/Viki/pve-backup/` — sda contents (PVC backups + nfs-mirror output: ~90 service dirs, now also includes ollama/audiblez/ebook2audiobook/*-backup)
+  - `Synology/Backup/Viki/nfs/` — immich only (post-2026-05-26)
+  - `Synology/Backup/Viki/nfs-ssd/` — full SSD NFS (immich-ML, ollama, llamacpp); SSD has no sda-mirror leg, so all three go direct

 ## Architecture Diagram

@ -346,35 +363,33 @@ Two-step offsite sync:

 #### Step 2: sda-bypass NFS to Synology nfs/ + nfs-ssd/ (inotify change-tracked, FILTERED)

-**Role**: Only carries paths that **bypass sda** — i.e., paths the nfs-mirror script explicitly skips (immich, frigate, prometheus, *-backup, …). Paths that ARE on sda reach Synology via Step 1 and are explicitly excluded from Step 2 to prevent double-syncing. The Step 2 INCLUDE list MUST stay in sync with nfs-mirror's `EXCLUDES` — they are complementary.
+**Role**: Carries the single path that bypasses sda — `/srv/nfs/immich/` (1.5 T, doesn't fit on sda). Plus the full `/srv/nfs-ssd/` (immich-ML + ollama + llamacpp; the SSD has no sda-mirror leg). Everything else under `/srv/nfs/` rides leg 1.

-**Method**: `rsync --files-from /mnt/backup/.nfs-changes.log` with regex filter `^/srv/nfs/(immich|frigate|prometheus|loki|temp|alertmanager|ollama|audiblez|ebook2audiobook|[^/]+-backup)/`. The monthly full sync uses `--include='/<bypass-path>/***' … --exclude='*'` to limit to the same set. `nfs-ssd/` (all of immich-ML / ollama / llamacpp) is entirely bypass-list, so a plain `--delete` still applies.
+**Method**: `rsync --files-from /mnt/backup/.nfs-changes.log` with regex filter `^/srv/nfs/immich/`. The monthly full sync uses `--include='/immich/***' --exclude='*'` for the HDD leg, and a plain `--delete` for the SSD leg.

 **Change tracking**: `nfs-change-tracker.service` (systemd, inotifywait) on PVE host watches `/srv/nfs` and `/srv/nfs-ssd` continuously. Changed file paths are logged to `/mnt/backup/.nfs-changes.log`. Step 2 reads this log and transfers only changed files matching the bypass regex. Incremental syncs complete in seconds.

-**Monthly full sync**: On 1st Sunday of month, runs `rsync --delete` with the bypass-only include list for cleanup.
+**Monthly full sync**: On 1st Sunday of month, runs `rsync --delete` with the immich-only include list. The `--delete` pass also reaps any stale Synology `/Viki/nfs/<dir>/` from the broader pre-2026-05-26 bypass list (ollama, audiblez, ebook2audiobook, *-backup, frigate, prometheus, loki, temp, alertmanager).

 **`/srv/nfs/anca-elements/` history**: had its own dedicated Synology exclusion line earlier in 2026-05-24 because the original Synology source (`/volume1/Backup/Anca/Elements`) was being preserved while we moved canonical to PVE. After the original was deleted (same day), anca-elements joined the broader "NOT bypassing sda" category and is covered by Step 1 via `nfs-mirror`.

-**Layer 3a: NFS local mirror on sda (3-2-1 second copy)**: `/usr/local/bin/nfs-mirror` rsyncs the *critical* subset of `/srv/nfs/` → `/mnt/backup/<service>/` weekly (Mon 04:00). Single rsync invocation, single destination. The skip-list (in `nfs-mirror.sh` `EXCLUDES`) drops paths that don't justify a second local copy:
+**Layer 3a: NFS local mirror on sda (3-2-1 second copy)**: `/usr/local/bin/nfs-mirror` rsyncs `/srv/nfs/` → `/mnt/backup/<service>/` weekly (Mon 04:00). Single rsync invocation, single destination. As of 2026-05-26 the skip-list (in `nfs-mirror.sh` `EXCLUDES`) is intentionally minimal:

- **immich** (1.2T) — too big for sda; Synology offsite is the only 2nd copy by design
- **frigate** (camera recordings, 14d auto-rotate)
- **prometheus**, **loki** (TSDB + logs — rebuildable / policy-driven retention)
- **ollama**, **llamacpp**, **audiblez**, **ebook2audiobook** (re-downloadable / regenerable)
- **temp**, **alertmanager** (transient state)
- **`*-backup`** (CronJob outputs — these ARE backups; backing up the backup is meta)
- **/srv/nfs-ssd** entirely (after the SSD skips above, residual is ~0)
+- **immich** (1.5 T) — too big for sda; ships sdc → Synology direct (leg 2)
+- **frigate** (camera ring buffer) — intentionally NOT backed up
+- **temp** (scratch) — intentionally NOT backed up
+- **anca-elements** (legacy) — now in Immich; `/mnt/backup/anca-elements` deleted 2026-05-26
+- **/srv/nfs-ssd** entirely — its three dirs (immich-ML, ollama, llamacpp) all ship direct to Synology nfs-ssd/

-Everything else under `/srv/nfs/` (anca-elements + ~30 critical service NFS subtrees: mysql, postgresql, nextcloud, health, real-estate-crawler, audiobookshelf, servarr, technitium, openclaw, ...) lands at `/mnt/backup/<svc>/`. Total mirror size ≈ 900 GB (mostly anca-elements at 770G).
+Everything else under `/srv/nfs/` — mysql, postgresql, nextcloud, health, real-estate-crawler, audiobookshelf, servarr, technitium, openclaw, ollama (HDD), audiblez, ebook2audiobook, every `*-backup` CronJob output, … — lands at `/mnt/backup/<svc>/`. Mirror size ≈ 400 GB post-2026-05-26 (was ~900 GB with anca-elements).

 Pushes `nfs_mirror_last_run_timestamp` + `nfs_mirror_last_status` + `nfs_mirror_bytes` to Pushgateway. Alerts: `NfsMirrorStale` (>16d), `NfsMirrorFailing` (status != 0). `rsync -rlt --delete -H --no-perms --no-owner --no-group`; idempotent. Nice=10, IOSchedulingClass=idle (won't compete with foreground IO).

 > History: `anca-elements-mirror.{sh,service,timer}` was a precursor (2026-05-24 morning) dedicated to /srv/nfs/anca-elements only. Subsumed by `nfs-mirror` later the same day to consolidate ad-hoc copy scripts into one.

 **Destination**:
- `Synology/Backup/Viki/nfs/` — mirrors `/srv/nfs`
- `Synology/Backup/Viki/nfs-ssd/` — mirrors `/srv/nfs-ssd`
+- `Synology/Backup/Viki/nfs/` — immich only (post-2026-05-26)
+- `Synology/Backup/Viki/nfs-ssd/` — mirrors `/srv/nfs-ssd` (immich-ML, ollama, llamacpp)

 **Monitoring**: Pushes `offsite_backup_sync_last_success_timestamp` to Pushgateway. Alerts: `OffsiteBackupSyncStale` (>8d), `OffsiteBackupSyncFailing`.

--- a/docs/architecture/compute.md
+++ b/docs/architecture/compute.md
@ -79,13 +79,33 @@ graph TB

 **Total Cluster Resources**: 48 vCPUs, ~176GB RAM (k8s-node1 48GB + 4 nodes x 32GB)

-> **node1 RAM (2026-05-10)**: bumped from 32 → 48 GiB out-of-band via
-> `qm set 201 --memory 49152` because VMID 201 is intentionally not
-> managed by Terraform yet (telmate/proxmox provider bug with iSCSI
-> PVCs — see `infra/stacks/infra/main.tf` line 442). Driver: GPU
-> multi-tenancy (frigate + ytdlp + llama-swap + immich-ml) was
-> hitting 94% memory-request saturation on the old size. Adopt this
-> VM into TF (`module "k8s-node1"`) once we've migrated to bpg/proxmox.
+> **All Linux VMs are hand-managed in Proxmox, NOT in Terraform**
+> (decided 2026-05-26, commit 44c3770a). The telmate/proxmox v3.0.2
+> provider rewrites every disk slot on update — even ones covered by
+> `lifecycle.ignore_changes` — and it doesn't refresh per-disk
+> `mbps_*_concurrent` fields back from live state. We hit both bugs
+> in production (id=539 iSCSI mangling 2026-04-02, and the 2026-05-26
+> import attempt that corrupted k8s-node2 + k8s-node3 .conf files;
+> recovered via `/mnt/backup/pve-config/etc-pve/nodes/pve/qemu-server/`
+> nightly backups). What stays in TF: the cloud-init templates
+> (`k8s-node-template`, `non-k8s-node-template`,
+> `docker-registry-template` in `stacks/infra/main.tf`) — a fresh VM
+> still clones the right template and runs the same bootstrap.
+>
+> Per-VM I/O caps (defense against sdc saturation by a single noisy
+> guest) are applied by `apply-mbps-caps.{sh,service,timer}` on the
+> PVE host (sources in `infra/scripts/`, install pattern per
+> `architecture/backup-dr.md`). Timer fires `OnBootSec=5min` +
+> `OnCalendar=hourly`, so any drift (config restore, manual `qm
+> set`, fresh clone) self-heals within the hour. Current caps:
+> 102 devvm 60/60, 103 home-assistant 40/40, 200 k8s-master 100/60,
+> 201 k8s-node1 150/120, 202 k8s-node2 150/120, 203 k8s-node3 150/120,
+> 204 k8s-node4 150/120, 220 docker-registry 40/40.
+>
+> Re-adoption into TF (via the `bpg/proxmox` provider, which models
+> dynamic disks correctly) is possible but not scheduled — the
+> cloud-init template above already captures the bootstrap-
+> reproducibility goal.

 ### GPU Passthrough

--- a/docs/architecture/storage.md
+++ b/docs/architecture/storage.md
@ -158,6 +158,43 @@ SQLite uses `fsync()` to guarantee durability. NFS's soft mount + async semantic

 > Democratic-csi has been removed along with TrueNAS decommissioning (2026-04). This section is kept for historical reference only.

+### Per-VM SCSI-LUN cap (29 block PVCs per K8s node)
+
+**The proxmox-csi-plugin hardcodes a per-VM LUN ceiling at 29.** The plugin
+scans `scsi1..scsi29` for a free slot when attaching a PVC
+(`pkg/csi/utils.go:394`: `for lun = 1; lun < 30; lun++`); when the loop exits
+without a hit, ControllerPublishVolume returns
+`Internal desc = no free lun found`. `CSINode.allocatable.count` is advertised
+as `28` for every worker — derived from this plugin limit, NOT from Proxmox or
+QEMU constraints.
+
+What this means in practice:
+- Each K8s node VM can hold at most 29 block PVCs simultaneously (scsi0 is the
+  OS disk).
+- Switching `scsihw` from `virtio-scsi-pci` to `virtio-scsi-single` gains
+  per-disk iothread isolation but **zero additional capacity** — the cap lives
+  in the CSI plugin, not the QEMU device topology. Proxmox itself allows
+  `scsi0..scsi30` (31 slots, `$MAX_SCSI_DISKS = 31` in
+  `/usr/share/perl5/PVE/QemuServer/Drive.pm`).
+- NFS PVCs (`nfs.csi.k8s.io`) are kernel NFS mounts and do not count against
+  the SCSI cap. Moving non-DB workloads (config-only, static content,
+  regenerable cache, pure upload buckets) to NFS is the simplest relief.
+- Symptom when the cap is hit: pods stuck `ContainerCreating` with
+  `FailedAttachVolume … no free lun found` event, and the proxmox-csi
+  controller hot-loops `ControllerPublishVolume` against the saturated VM.
+
+Levers (in order of leverage-per-effort):
+1. **Migrate non-DB workloads off block** to NFS. Pre-flight every candidate
+   for embedded DBs (SQLite/LevelDB/RocksDB/H2/BoltDB) — they corrupt on NFS
+   due to lock semantics. Wave 1 (2026-05-26) moved 5 services
+   (excalidraw, resume, whisper, onlyoffice, f1-stream) and pre-flighted
+   two more out of scope (plotting-book → SQLite + WAL, stirling-pdf → H2).
+2. **Add another K8s worker VM** — each new worker brings up to 29 fresh
+   slots; the most durable answer if PVC count keeps growing.
+3. **Patch+fork `sergelogvinov/proxmox-csi-plugin`** to bump the loop bound
+   from `< 30` to `< 31` (matches Proxmox `MAX_SCSI_DISKS`). +1 slot per VM.
+   File upstream PR. Self-maintained image until merged.
+
 ## Configuration

 ### Key Files
--- a/docs/post-mortems/2026-05-17-gpu-driver-ubuntu2604-mismatch.md
+++ b/docs/post-mortems/2026-05-17-gpu-driver-ubuntu2604-mismatch.md
@ -130,8 +130,13 @@ to-working state pending an upstream fix or kernel rollback.

 - [x] Pin gpu-operator chart to v25.10.1 in TF
 - [x] Document situation in this post-mortem
- [ ] Roll back k8s-node1 host kernel to 6.8.0-117-generic + apt-mark
-      hold (needs user authorization for node reboot)
+- [x] Roll back k8s-node1 host kernel to 6.8.0-117-generic (done by user;
+      kernel rollback succeeded and NFD now reports
+      `kernel-version.full=6.8.0-117-generic`, `os_release.VERSION_ID=24.04`)
+- [x] Extend driver daemonset startup probe `failureThreshold` from 120 to 300
+      (50 min) in TF `values.yaml` — 2026-05-25. On this hardware the
+      full install sequence (apt headers + gcc compilation + file copy) takes
+      ~21min which exactly exhausted the old 120×10s window.
 - [ ] Add Prometheus alert `GPUNodeNoGPUResource` — fires when a node
      labeled `nvidia.com/gpu.present=true` has `nvidia.com/gpu` capacity
      of 0 for >10m
@ -143,6 +148,47 @@ to-working state pending an upstream fix or kernel rollback.
      `unattended-upgrades` — `do-release-upgrade` is a separate path
      that should be gated too

+## Follow-up Incident: Driver install hang (2026-05-25)
+
+**Date**: 2026-05-25  
+**Status**: Resolved  
+
+After the kernel rollback to 6.8.0-117-generic succeeded, the driver pod
+(`nvidia-driver-daemonset-529vg`) was still reported as "stuck at
+Installing Linux kernel headers..." with no progress for 15–20 min.
+
+**Actual root causes (two compounding issues)**:
+
+1. **Deadlock between k8s-driver-manager and operator-validator**: The
+   `k8s-driver-manager` init container waits for `nvidia-operator-validator`
+   to shut down before it can begin the install sequence. The validator's
+   `driver-validation` init container was in an infinite retry loop polling
+   `/run/nvidia/validations/.driver-ctr-ready` (which the driver creates when
+   ready). Since the driver never finished, the validator never exited. The
+   validator pod had `deletionTimestamp` set but kubelet on node1 couldn't GC
+   it — the container received SIGTERM but remained in "Terminating" state
+   indefinitely, blocking the new driver from starting.
+   **Fix**: Force-deleted the stuck validator pod
+   (`kubectl delete pod -n nvidia nvidia-operator-validator-sff98 --force --grace-period=0`).
+   This broke the deadlock immediately.
+
+2. **Startup probe timeout**: The full driver install sequence on this hardware
+   (6 vCPUs, 16Gi RAM) takes ~21 minutes:
+   - `apt-get install linux-headers-6.8.0-117-generic`: ~2 min
+   - `gcc/make -j16` kernel module build (nvidia, nvidia-uvm, nvidia-modeset,
+     nvidia-peermem): ~12 min  
+   - nvidia-installer file copy + archive integrity check: ~7 min
+   The default startup probe allows exactly `60 + (120 × 10) = 1260s = 21min`.
+   This caused a SIGKILL (exit 137) at 21 minutes even when the install was
+   progressing normally.
+   **Fix**: Patched `driver.startupProbe.failureThreshold` from 120 → 300
+   in `stacks/nvidia/modules/nvidia/values.yaml` (gives 51 min headroom).
+
+**Key observation**: "Installing Linux kernel headers..." is NOT a hang — the
+apt install just takes 2+ min and produces no log output during execution. The
+log line appears before apt runs, so it looks frozen. Check `ps auxf` inside
+the container to confirm apt/dpkg are actively running.
+
 ## Lessons

 - **Operator-style charts that auto-detect host OS can silently break
@ -158,3 +204,9 @@ to-working state pending an upstream fix or kernel rollback.
  24.04 image on a 26.04 host), edit the NFD label — but only as a last
  resort; the chart upgrade made clear the operator will eventually
  reconcile this.
+- **A k8s-driver-manager deadlock on a stuck Terminating validator pod is
+  indistinguishable from an apt hang** — `ps auxf` inside the container is
+  the key diagnostic. Force-deleting a stuck Terminating pod with no
+  finalizers is safe and immediately resolves the deadlock.
+- **Driver startup probe must be sized for the full install wall-clock time**,
+  not just apt or just compilation. On slow hardware, 21 min is tight.
--- a/modules/create-template-vm/cloud_init.yaml
+++ b/modules/create-template-vm/cloud_init.yaml
@ -1,5 +1,8 @@
-#cloud-config 
-hostname: terraform-vm
+#cloud-config
+# Hostname intentionally NOT set here — cloud-init reads it from
+# Proxmox's auto-generated meta-data (which uses `qm set --name <X>`),
+# so a single shared snippet works for every node.
+manage_etc_hosts: true
 users:
  - name: wizard
    sudo: ALL=(ALL) NOPASSWD:ALL
@ -46,7 +49,7 @@ apt:
  sources:
  %{if is_k8s_template}
    kubernetes:
-      source: "deb https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /"
+      source: "deb https://pkgs.k8s.io/core:/stable:/v1.34/deb/ /"
      keyid: "DE15B14486CD377B9E876E1A234654DA9A296436"
      filename: kubernetes.list
  %{endif}
@ -55,6 +58,26 @@ apt:
      keyid: "9DC858229FC7DD38854AE2D88D81803C0EBFCD88"
      filename: docker.list

+%{if is_k8s_template}
+# Setup script is base64-encoded by the module so YAML whitespace
+# handling never touches the heredoc bodies inside it. Replaces an
+# earlier `indent(6, …)` approach that put `[plugins.*]` TOML
+# sections at col 6 inside `cat >> /etc/containerd/config.toml`
+# heredocs — containerd refused to parse the result and the node5 v1
+# boot failed there (2026-05-26). Source: modules/create-template-vm/k8s-node-containerd-setup.sh
+write_files:
+  - path: /usr/local/bin/k8s-node-containerd-setup.sh
+    permissions: '0755'
+    owner: root:root
+    encoding: b64
+    content: ${k8s_node_setup_script_b64}
+  - path: /usr/local/bin/k8s-node-post-join-tune.sh
+    permissions: '0755'
+    owner: root:root
+    encoding: b64
+    content: ${k8s_node_post_join_script_b64}
+%{endif}
+
 runcmd:
  # Enable weekly TRIM/discard to reclaim freed blocks in LVM thin pool
  - systemctl enable --now fstrim.timer
@ -67,6 +90,20 @@ runcmd:
  - sed -i 's/#Compress=yes/Compress=yes/' /etc/systemd/journald.conf
  - systemctl restart systemd-journald
  %{if is_k8s_template}
+  # systemd-resolved global DNS fallback. Without this, only the
+  # link-level DNS from Proxmox's `qm set --nameserver` (Technitium,
+  # 10.0.20.201) is consulted — and Technitium returns NXDOMAIN for
+  # forgejo.viktorbarzin.me, so kubelet image pulls from the Forgejo
+  # registry break. Public DNS upstream + Technitium fallback matches
+  # the pre-existing manual setup on k8s-node1..4.
+  - mkdir -p /etc/systemd/resolved.conf.d
+  - |
+    cat > /etc/systemd/resolved.conf.d/global-dns.conf <<'EOF'
+    [Resolve]
+    DNS=8.8.8.8 1.1.1.1
+    FallbackDNS=10.0.20.201
+    EOF
+  - systemctl restart systemd-resolved
  # Re-enabled 2026-05-10: unattended-upgrades is back on, but with a tight
  # Allowed-Origins list, a Package-Blacklist for k8s/containerd/runc/calico,
  # and Automatic-Reboot disabled (kured + sentinel-gate handles reboots in a
@ -107,7 +144,12 @@ runcmd:
  - apt-mark hold containerd containerd.io runc 2>/dev/null || true
  - systemctl stop kubelet
  - containerd config default | sudo tee /etc/containerd/config.toml
-  - ${containerd_config_update_command}
+  # The containerd/kubelet setup is delivered as /usr/local/bin/k8s-node-containerd-setup.sh
+  # via the write_files: block at the top of this file. We run it as a single
+  # bash invocation here so cloud-init only sees a one-line runcmd item.
+  # (Previous inline `- $${containerd_config_update_command}` broke YAML parsing
+  # because the heredoc contains mixed-indent inner shell heredocs.)
+  - bash /usr/local/bin/k8s-node-containerd-setup.sh
  - systemctl restart containerd
  - systemctl enable --now iscsid
  # Harden iSCSI: increase recovery timeout (300s vs 120s default) and enable
@ -124,17 +166,19 @@ runcmd:
  - systemctl restart iscsid
  # Create /sentinel directory for kured reboot gating (sentinel gate DaemonSet)
  - mkdir -p /sentinel
-  # Create 4Gi swap file for worker node memory pressure relief (NOT for master — etcd is latency-critical)
-  - fallocate -l 4G /swapfile
-  - chmod 600 /swapfile
-  - mkswap /swapfile
-  - swapon /swapfile
-  - echo '/swapfile none swap sw 0 0' >> /etc/fstab
-  - sysctl -w vm.swappiness=10
-  - echo 'vm.swappiness=10' >> /etc/sysctl.d/99-swap.conf
+  # Disable swap — kubelet defaults to failSwapOn=true and won't start otherwise.
+  # (Previously this snippet created a 4G swapfile for "memory pressure relief"
+  # but never set failSwapOn=false / memorySwap.swapBehavior together, so the
+  # join consistently bricked kubelet — observed on node6 boot v3 2026-05-26.)
+  - swapoff -a
+  - sed -i '/ swap / s/^/#/' /etc/fstab
  - ${k8s_join_command}
  - systemctl enable kubelet
  - systemctl start kubelet
+  # Kubelet tuning runs AFTER kubeadm join — that's when
+  # /var/lib/kubelet/config.yaml gets written. Restarts kubelet at the
+  # end to pick up the patched config.
+  - bash /usr/local/bin/k8s-node-post-join-tune.sh
  %{ endif }
  %{ for provision_cmd in provision_cmds ~}
  - ${provision_cmd}
--- a/modules/create-template-vm/k8s-node-containerd-setup.sh
+++ b/modules/create-template-vm/k8s-node-containerd-setup.sh
@ -0,0 +1,146 @@
+#!/usr/bin/env bash
+#
+# K8s node containerd + kubelet bootstrap. Runs once via cloud-init runcmd.
+# Embedded into the cloud-init snippet base64-encoded by main.tf so YAML
+# whitespace handling never touches the heredoc bodies — TOML / Python
+# blocks below land in /etc/containerd/config.toml etc. with their leading
+# whitespace intact.
+#
+# Layout:
+#   1. /etc/containerd/config.toml — config_path + mirror dirs + GC tuning
+#   2. /etc/containerd/certs.d/*/hosts.toml — per-registry mirror configs
+#   3. /var/lib/kubelet/config.yaml — eviction + shutdown grace + log rotation
+#   4. /etc/systemd/logind.conf.d + kubelet.service.d — graceful shutdown
+#   5. (master-only) /etc/kubernetes/manifests — apiserver + controller flags
+set -euo pipefail
+
+# 1. config_path — match BOTH quote styles. containerd v1 writes `""`,
+# containerd v2.x writes `''`. Without the v2 match, hosts.toml mirror
+# config is silently ignored — observed 2026-05-26 on k8s-node4
+# (containerd v2.2.4) and reproduced on k8s-node5 v1 boot.
+sed -i "s|config_path = \"\"|config_path = \"/etc/containerd/certs.d\"|g" /etc/containerd/config.toml
+sed -i "s|config_path = ''|config_path = \"/etc/containerd/certs.d\"|g" /etc/containerd/config.toml
+
+# 2. Per-registry hosts.toml — pull-through caches on docker-registry VM
+# (10.0.20.10) for high-traffic registries, Traefik LB (10.0.20.200) for
+# forgejo. Low-traffic registries (registry.k8s.io, reg.kyverno.io) skip
+# the cache and pull direct because past pull-through cache attempts
+# truncated downloads and broke VPA certgen + Kyverno image pulls.
+
+mkdir -p /etc/containerd/certs.d/docker.io
+cat > /etc/containerd/certs.d/docker.io/hosts.toml <<'DOCKERIO'
+server = "https://registry-1.docker.io"
+
+[host."http://10.0.20.10:5000"]
+  capabilities = ["pull", "resolve"]
+
+[host."https://registry-1.docker.io"]
+  capabilities = ["pull", "resolve"]
+DOCKERIO
+
+mkdir -p /etc/containerd/certs.d/ghcr.io
+cat > /etc/containerd/certs.d/ghcr.io/hosts.toml <<'GHCR'
+server = "https://ghcr.io"
+
+[host."http://10.0.20.10:5010"]
+  capabilities = ["pull", "resolve"]
+
+[host."https://ghcr.io"]
+  capabilities = ["pull", "resolve"]
+GHCR
+
+# Forgejo OCI registry: prefer in-cluster Traefik LB (10.0.20.200) to
+# avoid hairpin NAT. Traefik serves the *.viktorbarzin.me wildcard so
+# SNI verification succeeds. If the mirror is unreachable, fall back to
+# public DNS resolution (needs the global DNS fallback set up below).
+mkdir -p /etc/containerd/certs.d/forgejo.viktorbarzin.me
+cat > /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml <<'FORGEJO'
+server = "https://forgejo.viktorbarzin.me"
+
+[host."https://10.0.20.200"]
+  capabilities = ["pull", "resolve"]
+FORGEJO
+
+# quay.io + registry.k8s.io: include mirror configs that match node4's
+# layout (no real pull-through cache today, server line is the direct
+# upstream). Keeping these present makes the per-node config uniform and
+# lets us flip a cache on later by editing only the [host."..."] block.
+mkdir -p /etc/containerd/certs.d/quay.io
+cat > /etc/containerd/certs.d/quay.io/hosts.toml <<'QUAY'
+server = "https://quay.io"
+
+[host."http://10.0.20.10:5020"]
+  capabilities = ["pull", "resolve"]
+QUAY
+
+mkdir -p /etc/containerd/certs.d/registry.k8s.io
+cat > /etc/containerd/certs.d/registry.k8s.io/hosts.toml <<'K8SREG'
+server = "https://registry.k8s.io"
+
+[host."http://10.0.20.10:5030"]
+  capabilities = ["pull", "resolve"]
+K8SREG
+
+# 3. containerd tuning: parallel pulls + selective GC overrides.
+# containerd v2's `config default` ALREADY emits `[plugins.'io.containerd.gc.v1.scheduler']`,
+# `[plugins.'io.containerd.runtime.v2.task']`, and `[plugins.'io.containerd.metadata.v1.bolt']`
+# sections — declaring them again fails with `toml: table … already exists`
+# (observed on node6 boot 2026-05-26). Patch values in place instead.
+sed -i 's/.*max_concurrent_downloads = 3/max_concurrent_downloads = 20/g' /etc/containerd/config.toml
+# pause_threshold: 0.5 → 0.02 (run GC more aggressively when images dirty %)
+sed -i "s/^[[:space:]]*pause_threshold = .*/  pause_threshold = 0.02/" /etc/containerd/config.toml
+# schedule_delay: 0s/1ms → 30 min (longer cool-down between GC runs)
+sed -i "s/^[[:space:]]*schedule_delay = .*/  schedule_delay = '1800s'/" /etc/containerd/config.toml
+# exit_timeout: 0s → 5m (more aggressive container cleanup)
+sed -i "s/^[[:space:]]*exit_timeout = .*/  exit_timeout = '5m'/" /etc/containerd/config.toml
+
+# 4. (kubelet tuning intentionally NOT here — /var/lib/kubelet/config.yaml
+# only exists AFTER kubeadm join. That work runs in
+# k8s-node-post-join-tune.sh, invoked as a separate cloud-init runcmd
+# step after the join completes.)
+
+# 5. logind + kubelet systemd unit — total kubelet shutdown 310s, so
+# logind InhibitDelay > that and kubelet TimeoutStopSec > that.
+mkdir -p /etc/systemd/logind.conf.d
+cat > /etc/systemd/logind.conf.d/kubelet-shutdown.conf <<'LOGIND_CONF'
+[Login]
+InhibitDelayMaxSec=480
+LOGIND_CONF
+systemctl restart systemd-logind
+
+mkdir -p /etc/systemd/system/kubelet.service.d
+cat > /etc/systemd/system/kubelet.service.d/20-shutdown.conf <<'KUBELET_SHUTDOWN'
+[Service]
+TimeoutStopSec=420s
+KUBELET_SHUTDOWN
+systemctl daemon-reload
+
+# 6. (master-only) faster pod eviction + attach-detach reconcile.
+if [ -f /etc/kubernetes/manifests/kube-controller-manager.yaml ]; then
+    python3 - <<'CM_PATCH'
+import yaml
+with open('/etc/kubernetes/manifests/kube-controller-manager.yaml') as f:
+    m = yaml.safe_load(f)
+args = m['spec']['containers'][0]['command']
+for flag in ['--attach-detach-reconcile-sync-period=15s']:
+    key = flag.split('=')[0]
+    args = [a for a in args if not a.startswith(key)]
+    args.append(flag)
+m['spec']['containers'][0]['command'] = args
+with open('/etc/kubernetes/manifests/kube-controller-manager.yaml', 'w') as f:
+    yaml.dump(m, f, default_flow_style=False)
+CM_PATCH
+    python3 - <<'AS_PATCH'
+import yaml
+with open('/etc/kubernetes/manifests/kube-apiserver.yaml') as f:
+    m = yaml.safe_load(f)
+args = m['spec']['containers'][0]['command']
+for flag in ['--default-unreachable-toleration-seconds=60', '--default-not-ready-toleration-seconds=60']:
+    key = flag.split('=')[0]
+    args = [a for a in args if not a.startswith(key)]
+    args.append(flag)
+m['spec']['containers'][0]['command'] = args
+with open('/etc/kubernetes/manifests/kube-apiserver.yaml', 'w') as f:
+    yaml.dump(m, f, default_flow_style=False)
+AS_PATCH
+fi
--- a/modules/create-template-vm/k8s-node-post-join-tune.sh
+++ b/modules/create-template-vm/k8s-node-post-join-tune.sh
@ -0,0 +1,78 @@
+#!/usr/bin/env bash
+#
+# Runs AFTER `kubeadm join` has written /var/lib/kubelet/config.yaml.
+# Patches kubelet config in place (parallel image pulls, eviction
+# thresholds, priority-based shutdown grace, container log rotation)
+# and (on master) tightens controller-manager / apiserver flags.
+#
+# Embedded into the cloud-init snippet base64-encoded by main.tf so
+# YAML whitespace doesn't touch the heredoc bodies inside.
+set -euo pipefail
+
+if [ ! -f /var/lib/kubelet/config.yaml ]; then
+    echo "post-join-tune: /var/lib/kubelet/config.yaml not found — was kubeadm join run?" >&2
+    exit 1
+fi
+
+# Parallel image pulls.
+sed -i '/serializeImagePulls:/d' /var/lib/kubelet/config.yaml
+sed -i '/maxParallelImagePulls:/d' /var/lib/kubelet/config.yaml
+printf 'serializeImagePulls: false\nmaxParallelImagePulls: 50\n' >> /var/lib/kubelet/config.yaml
+
+# Memory / disk eviction. Aggressive disk thresholds (15%/20%)
+# prevent the 2026-03-13 containerd image-store corruption that took
+# down k8s-node2.
+sed -i '/systemReserved:/d; /kubeReserved:/d; /evictionHard:/,/^[^ ]/{ /evictionHard:/d; /^  /d }; /evictionSoft:/,/^[^ ]/{ /evictionSoft:/d; /^  /d }; /evictionSoftGracePeriod:/,/^[^ ]/{ /evictionSoftGracePeriod:/d; /^  /d }' /var/lib/kubelet/config.yaml
+
+cat >> /var/lib/kubelet/config.yaml <<'KUBELET_PATCH'
+systemReserved:
+  memory: "512Mi"
+  cpu: "200m"
+kubeReserved:
+  memory: "512Mi"
+  cpu: "200m"
+evictionHard:
+  memory.available: "500Mi"
+  nodefs.available: "15%"
+  imagefs.available: "20%"
+evictionSoft:
+  memory.available: "1Gi"
+  nodefs.available: "20%"
+  imagefs.available: "25%"
+evictionSoftGracePeriod:
+  memory.available: "30s"
+  nodefs.available: "60s"
+  imagefs.available: "30s"
+memorySwap:
+  swapBehavior: "LimitedSwap"
+KUBELET_PATCH
+
+# Container log rotation + priority-based shutdown grace.
+sed -i '/^shutdownGracePeriod:/d; /^shutdownGracePeriodCriticalPods:/d' /var/lib/kubelet/config.yaml
+python3 - <<'KUBELET_FINAL'
+import yaml
+with open('/var/lib/kubelet/config.yaml') as f:
+    cfg = yaml.safe_load(f)
+cfg.pop('shutdownGracePeriod', None)
+cfg.pop('shutdownGracePeriodCriticalPods', None)
+cfg.pop('shutdownGracePeriodByPodPriority', None)
+cfg['containerLogMaxSize'] = '10Mi'
+cfg['containerLogMaxFiles'] = 3
+cfg['shutdownGracePeriodByPodPriority'] = [
+    {'priority': 0,          'shutdownGracePeriodSeconds': 20},
+    {'priority': 200000,     'shutdownGracePeriodSeconds': 20},
+    {'priority': 400000,     'shutdownGracePeriodSeconds': 30},
+    {'priority': 600000,     'shutdownGracePeriodSeconds': 30},
+    {'priority': 800000,     'shutdownGracePeriodSeconds': 90},
+    {'priority': 1000000,    'shutdownGracePeriodSeconds': 30},
+    {'priority': 1200000,    'shutdownGracePeriodSeconds': 30},
+    {'priority': 2000000000, 'shutdownGracePeriodSeconds': 30},
+    {'priority': 2000001000, 'shutdownGracePeriodSeconds': 30},
+]
+with open('/var/lib/kubelet/config.yaml', 'w') as f:
+    yaml.dump(cfg, f, default_flow_style=False)
+KUBELET_FINAL
+
+# Reload kubelet to pick up new config (it's already started by the
+# preceding cloud-init runcmd line — restart, not start).
+systemctl restart kubelet
--- a/modules/create-template-vm/main.tf
+++ b/modules/create-template-vm/main.tf
@ -16,7 +16,7 @@ variable "k8s_join_command" {
 variable "containerd_config_update_command" {
  type        = string
  default     = ""
-  description = "Command to execute to update containerd config.toml; e.g add mirror"
+  description = "DEPRECATED: was inlined into write_files via indent(); the heredoc-TOML interaction broke containerd config parsing on node5 v1 boot 2026-05-26. The k8s setup script is now bundled inside the module at k8s-node-containerd-setup.sh — pass nothing here. Kept to avoid breaking stacks that still reference it; ignored when is_k8s_template=true."
 }
 variable "is_k8s_template" { type = bool }
 variable "ssh_private_key" {
@ -79,23 +79,26 @@ resource "null_resource" "upload_cloud_init" {
  provisioner "file" {
    destination = "/var/lib/vz/snippets/${var.snippet_name}"
    content = templatefile("${path.module}/cloud_init.yaml", {
-      is_k8s_template                  = var.is_k8s_template,
-      authorized_ssh_key               = var.ssh_public_key,
-      passwd                           = var.user_passwd,
-      provision_cmds                   = var.provision_cmds,
-      k8s_join_command                 = var.k8s_join_command,
-      containerd_config_update_command = var.containerd_config_update_command
+      is_k8s_template                = var.is_k8s_template,
+      authorized_ssh_key             = var.ssh_public_key,
+      passwd                         = var.user_passwd,
+      provision_cmds                 = var.provision_cmds,
+      k8s_join_command               = var.k8s_join_command,
+      k8s_node_setup_script_b64      = var.is_k8s_template ? base64encode(file("${path.module}/k8s-node-containerd-setup.sh")) : ""
+      k8s_node_post_join_script_b64  = var.is_k8s_template ? base64encode(file("${path.module}/k8s-node-post-join-tune.sh")) : ""
      }
    )
  }

  # Force recreate when the below changes
  triggers = {
-    file_hash                        = filesha256("${path.module}/cloud_init.yaml")
-    provision_cmds                   = join(", ", var.provision_cmds)
-    is_k8s_template                  = var.is_k8s_template,
-    passwd                           = var.user_passwd,
-    k8s_join_command                 = var.k8s_join_command,
-    containerd_config_update_command = var.containerd_config_update_command
+    file_hash                   = filesha256("${path.module}/cloud_init.yaml")
+    setup_script_hash           = var.is_k8s_template ? filesha256("${path.module}/k8s-node-containerd-setup.sh") : ""
+    post_join_script_hash       = var.is_k8s_template ? filesha256("${path.module}/k8s-node-post-join-tune.sh") : ""
+    provision_cmds              = join(", ", var.provision_cmds)
+    is_k8s_template             = var.is_k8s_template,
+    passwd                      = var.user_passwd,
+    k8s_join_command            = var.k8s_join_command,
+    ssh_public_key              = var.ssh_public_key,
  }
 }
--- a/modules/create-vm/main.tf
+++ b/modules/create-vm/main.tf
@ -135,6 +135,22 @@ variable "hostpci0" {
  default = "" # e.g., "0000:06:00.0" for Tesla T4 passthrough
 }

+# ---------------------------------------------------------------------------
+# Variables — Disk I/O throttling (bytes/sec; 0 = uncapped)
+# ---------------------------------------------------------------------------
+# Caps any single VM's share of the underlying disk so a runaway workload
+# (e.g. the 2026-05-23/26 alloy IO storm — memory id=2726) cannot wedge the
+# whole Proxmox host's sdc thin pool. Values inferred from PVE RRD p99/max
+# observed in /nodes/pve/qemu/<vmid>/rrddata.
+variable "mbps_rd" {
+  type    = number
+  default = 0
+}
+variable "mbps_wr" {
+  type    = number
+  default = 0
+}
+
 # ---------------------------------------------------------------------------
 # Resource
 # ---------------------------------------------------------------------------
@ -192,9 +208,11 @@ resource "proxmox_vm_qemu" "cloudinit-vm" {
        for_each = var.disk_slot == "scsi0" ? [1] : []
        content {
          disk {
-            storage = "local-lvm"
-            size    = var.vm_disk_size
-            discard = true # Enable TRIM passthrough to LVM thin pool — reduces CoW overhead
+            storage            = "local-lvm"
+            size               = var.vm_disk_size
+            discard            = true # Enable TRIM passthrough to LVM thin pool — reduces CoW overhead
+            mbps_r_concurrent  = var.mbps_rd
+            mbps_wr_concurrent = var.mbps_wr
          }
        }
      }
@ -202,9 +220,11 @@ resource "proxmox_vm_qemu" "cloudinit-vm" {
        for_each = var.disk_slot == "scsi1" ? [1] : []
        content {
          disk {
-            storage = "local-lvm"
-            size    = var.vm_disk_size
-            discard = true
+            storage            = "local-lvm"
+            size               = var.vm_disk_size
+            discard            = true
+            mbps_r_concurrent  = var.mbps_rd
+            mbps_wr_concurrent = var.mbps_wr
          }
        }
      }
@ -234,12 +254,39 @@ resource "proxmox_vm_qemu" "cloudinit-vm" {
  lifecycle {
    prevent_destroy = true
    ignore_changes = [
-      # democratic-csi dynamically attaches/detaches iSCSI disks
+      # proxmox-csi dynamically attaches/detaches PVC disks. K8s workers
+      # have up to ~30 slots in use simultaneously (k8s-node1: scsi1-29 +
+      # unused0-29). The k8s-master only uses scsi0 (boot) so most of
+      # these are no-ops for that VM but harmless.
      disks[0].scsi[0].scsi1,
      disks[0].scsi[0].scsi2,
      disks[0].scsi[0].scsi3,
      disks[0].scsi[0].scsi4,
      disks[0].scsi[0].scsi5,
+      disks[0].scsi[0].scsi6,
+      disks[0].scsi[0].scsi7,
+      disks[0].scsi[0].scsi8,
+      disks[0].scsi[0].scsi9,
+      disks[0].scsi[0].scsi10,
+      disks[0].scsi[0].scsi11,
+      disks[0].scsi[0].scsi12,
+      disks[0].scsi[0].scsi13,
+      disks[0].scsi[0].scsi14,
+      disks[0].scsi[0].scsi15,
+      disks[0].scsi[0].scsi16,
+      disks[0].scsi[0].scsi17,
+      disks[0].scsi[0].scsi18,
+      disks[0].scsi[0].scsi19,
+      disks[0].scsi[0].scsi20,
+      disks[0].scsi[0].scsi21,
+      disks[0].scsi[0].scsi22,
+      disks[0].scsi[0].scsi23,
+      disks[0].scsi[0].scsi24,
+      disks[0].scsi[0].scsi25,
+      disks[0].scsi[0].scsi26,
+      disks[0].scsi[0].scsi27,
+      disks[0].scsi[0].scsi28,
+      disks[0].scsi[0].scsi29,
      # cloud-init config may drift after first boot
      cicustom,
      ciupgrade,
@ -254,6 +301,13 @@ resource "proxmox_vm_qemu" "cloudinit-vm" {
      # Provider defaults that differ from imported state
      define_connection_info,
      full_clone,
+      # scsihw varies per VM (virtio-scsi-pci / virtio-scsi-single / lsi)
+      # and changing it on a running VM is risky — leave whatever's live.
+      scsihw,
+      # qemu_os is a hint to qemu about the guest OS; some live VMs have
+      # "other" (unset originally) and the module's "l26" default would
+      # otherwise force an unnecessary write on apply.
+      qemu_os,
    ]
  }
 }
--- a/scripts/apply-mbps-caps.service
+++ b/scripts/apply-mbps-caps.service
@ -0,0 +1,12 @@
+[Unit]
+Description=Apply per-VM I/O caps via qm set (idempotent)
+Documentation=https://github.com/ViktorBarzin/infra/blob/master/scripts/apply-mbps-caps.sh
+After=pve-cluster.service
+Wants=pve-cluster.service
+
+[Service]
+Type=oneshot
+ExecStart=/usr/local/bin/apply-mbps-caps.sh
+StandardOutput=journal
+StandardError=journal
+SyslogIdentifier=apply-mbps-caps
--- a/scripts/apply-mbps-caps.sh
+++ b/scripts/apply-mbps-caps.sh
@ -0,0 +1,74 @@
+#!/usr/bin/env bash
+# Apply per-VM I/O caps via `qm set` on the PVE host.
+#
+# - Reads each target VM's current boot-disk options.
+# - Appends/normalises `mbps_rd=<N>,mbps_wr=<N>`.
+# - Re-applies via `qm set` (live, no reboot needed).
+# - Idempotent: re-running with no drift is a no-op at the storage
+#   level (proxmox config rewrite is cheap).
+# - Continues on per-VM failures so one missing/stopped VM doesn't
+#   skip the rest — designed to be safe under the systemd timer.
+#
+# Backed by `apply-mbps-caps.{service,timer}` (hourly + 5min-after-boot).
+# Why these values: see beads code-9v2j + memory id=2726 (alloy IO storm)
+# + memory id=1575 (VMs intentionally out of TF).
+
+set -uo pipefail  # NOT -e — keep going if a single VM step fails.
+
+# vmid:disk_slot:mbps_rd:mbps_wr  (Linux VMs only — skipping 101 pfsense BSD, 300 Windows)
+TARGETS=(
+  "102:scsi0:60:60"      # devvm
+  "103:sata0:40:40"      # home-assistant
+  "200:scsi0:100:60"     # k8s-master (alloy storm origin — firmest clip)
+  "201:scsi1:150:120"    # k8s-node1 (GPU + many CSI disks; boots from scsi1)
+  "202:scsi0:150:120"    # k8s-node2
+  "203:scsi0:150:120"    # k8s-node3
+  "204:scsi0:150:120"    # k8s-node4
+  "220:scsi0:40:40"      # docker-registry
+)
+
+apply_one() {
+  local spec="$1"
+  local vmid slot rd wr
+  IFS=: read -r vmid slot rd wr <<<"$spec"
+
+  # Skip non-existent VMs cleanly (e.g. node decommissioned, never rebuilt).
+  if ! qm status "$vmid" >/dev/null 2>&1; then
+    echo "vmid $vmid: not present on this host — skipping"
+    return 0
+  fi
+
+  local current cleaned newvalue
+  current=$(qm config "$vmid" | awk -v s="$slot:" '$1==s {sub(/^[^ ]+ /, ""); print; exit}')
+  if [[ -z "$current" ]]; then
+    echo "vmid $vmid: no $slot line in config — skipping"
+    return 0
+  fi
+
+  cleaned=$(echo "$current" | sed -E 's/,mbps_rd=[0-9]+//g; s/,mbps_wr=[0-9]+//g')
+  newvalue="${cleaned},mbps_rd=${rd},mbps_wr=${wr}"
+
+  # Skip the qm-set call entirely when state already matches — keeps
+  # journal noise low under the hourly timer.
+  if [[ "$current" == "$newvalue" ]]; then
+    echo "vmid $vmid: $slot already at mbps_rd=${rd},mbps_wr=${wr} — no-op"
+    return 0
+  fi
+
+  echo "vmid $vmid: updating $slot"
+  echo "  before: $current"
+  echo "  after:  $newvalue"
+  if qm set "$vmid" "--$slot" "$newvalue"; then
+    echo "  ok"
+  else
+    echo "  FAILED: qm set returned non-zero"
+    return 1
+  fi
+}
+
+rc=0
+for spec in "${TARGETS[@]}"; do
+  apply_one "$spec" || rc=1
+done
+
+exit "$rc"
--- a/scripts/apply-mbps-caps.timer
+++ b/scripts/apply-mbps-caps.timer
@ -0,0 +1,18 @@
+[Unit]
+Description=Re-apply per-VM I/O caps periodically + after PVE boot
+
+[Timer]
+# After every PVE host reboot — caps survive in /etc/pve/qemu-server/<vmid>.conf
+# normally, but a config restore from backup can drop them (see 2026-05-26
+# incident where we restored 202.conf + 203.conf from /mnt/backup/pve-config/).
+OnBootSec=5min
+
+# Hourly during normal operation — catches manual `qm set` drift or fresh
+# VM clones that haven't had caps applied yet.
+OnCalendar=hourly
+
+Persistent=true
+RandomizedDelaySec=2min
+
+[Install]
+WantedBy=timers.target
--- a/scripts/nfs-mirror.sh
+++ b/scripts/nfs-mirror.sh
@ -13,20 +13,21 @@
 # destination layout (anca-elements lives at /mnt/backup/anca-elements/),
 # but now covers every other critical NFS subtree in one pass.
 #
-# SKIP-LIST rationale (paths NOT mirrored — Synology offsite still covers them):
-#   immich       — 1.2T, doesn't fit on sda; Synology only by design
-#   frigate      — 14d camera ring, auto-rotates
-#   prometheus   — TSDB, rebuildable from cluster state
-#   loki         — log retention is a policy choice, not durable data
-#   temp         — scratch
-#   alertmanager — transient state
-#   ollama       — LLM model weights, re-downloadable
-#   audiblez     — re-fetchable from Audible
-#   ebook2audiobook — regenerable from book sources
-#   *-backup     — CronJob output (these ARE backups; backing them up is meta)
+# SKIP-LIST rationale (2026-05-26 simplification — see commit notes):
+#   immich  — 1.5T, doesn't fit on sda; offsite-sync ships it direct to Synology
+#   frigate — camera ring buffer; intentionally NOT backed up anywhere
+#   temp    — scratch; intentionally NOT backed up
 #
-# Note: /srv/nfs-ssd is intentionally NOT mirrored — after skipping immich
-# (47G), ollama (59G), and llamacpp (26G) there's effectively zero residual.
+# Everything else (ollama, audiblez, ebook2audiobook, *-backup, …) now
+# flows sdc → sda (this script) → Synology pve-backup/ via offsite-sync
+# Step 1. Previously they went sdc → Synology DIRECT via Step 2; the
+# bypass list got pruned to just `immich` so we have a single canonical
+# mirror at sda. Prometheus/loki/alertmanager were live-orphan entries
+# that no longer exist on /srv/nfs (cleaned 2026-05-26) — dropped from
+# the exclude list as a no-op.
+#
+# Note: /srv/nfs-ssd is intentionally NOT mirrored — its three dirs
+# (immich, ollama, llamacpp) all go direct to Synology nfs-ssd/.

 set -euo pipefail

@ -57,27 +58,15 @@ EXCLUDES=(
    --exclude='/.lv-pvc-mapping.json'
    --exclude='/.nfs-changes.log'

-    # ---- anca-elements: photos are being ingested into Immich (2026-05-24),
-    # so /srv/nfs/immich/library/ becomes the canonical copy and the separate
-    # anca-elements tree is redundant. Excluded from nfs-mirror going forward.
-    # The historical 771G at /mnt/backup/anca-elements/ stays put until manual
-    # cleanup once Immich ingest completes; offsite-sync Step 1 also excludes
-    # it from the Synology pve-backup/ upload so we don't ship the redundant copy.
+    # ---- anca-elements: now in Immich (canonical), /mnt/backup copy deleted
+    # 2026-05-26. Kept in excludes so nfs-mirror doesn't re-populate from sdc
+    # if /srv/nfs/anca-elements is ever re-attached.
    --exclude='/anca-elements/'

-    # ---- NFS paths: too big / transient / re-fetchable ----
-    --exclude='/immich/'
-    --exclude='/frigate/'
-    --exclude='/prometheus/'
-    --exclude='/loki/'
-    --exclude='/temp/'
-    --exclude='/alertmanager/'
-    --exclude='/ollama/'
-    --exclude='/audiblez/'
-    --exclude='/ebook2audiobook/'
-
-    # ---- *-backup CronJob outputs (don't back up backups) ----
-    --exclude='/*-backup/'
+    # ---- NFS paths intentionally NOT backed up ----
+    --exclude='/immich/'   # 1.5T — ships sdc → Synology direct (Step 2)
+    --exclude='/frigate/'  # ring buffer — no backup anywhere
+    --exclude='/temp/'     # scratch — no backup anywhere

    # ---- Synology / Windows / macOS cruft ----
    --exclude='/@eaDir/'
@ -130,7 +119,7 @@ mountpoint -q /mnt/backup || { log "FATAL: /mnt/backup not mounted"; push_metric
 [ -d "$SRC" ]              || { log "FATAL: source $SRC missing"; push_metrics 1 0; exit 1; }

 log "=== mirror starting: $SRC → $DST ==="
-log "skip: immich, frigate, prometheus, loki, ollama, audiblez, *-backup, temp"
+log "skip: immich (Synology direct), frigate (no backup), temp (no backup), anca-elements"

 # Marker file used to identify files written by this rsync run, so we can append
 # their paths to the offsite-sync manifest. Touch BEFORE rsync; `find -newer` AFTER.
@ -149,7 +138,13 @@ DST_BYTES=$(df -B1 --output=used /mnt/backup | tail -1)
 if [ "$RSYNC_RC" -eq 0 ]; then
    # Capture files that rsync created/modified and feed them to the offsite-sync
    # manifest so daily Step 1 incremental picks them up tomorrow morning.
-    NEW_COUNT=$(find /mnt/backup -newer "$STAMP" -type f \
+    # Use -cnewer (ctime), not -newer (mtime): rsync -t preserves SOURCE mtime
+    # on the dest, so freshly-written files with old source mtime look "older"
+    # than $STAMP and -newer misses them. ctime is set when the inode is written,
+    # regardless of -t, so it correctly identifies what this run created.
+    # (Bug hit 2026-05-26 full bypass-list mirror: 800k files copied, manifest
+    # captured only 2 entries → forced a .force-full-sync to recover.)
+    NEW_COUNT=$(find /mnt/backup -cnewer "$STAMP" -type f \
        ! -path '/mnt/backup/.changed-files' \
        ! -path '/mnt/backup/.changed-files.lock' \
        ! -path '/mnt/backup/.lv-pvc-mapping.json' \
--- a/scripts/offsite-sync-backup.sh
+++ b/scripts/offsite-sync-backup.sh
@ -76,8 +76,8 @@ if [ "${DAY_OF_MONTH}" -le 7 ] || [ -n "${FORCE_FULL}" ]; then
 elif [ -s "${MANIFEST}" ]; then
    MANIFEST_LINES=$(wc -l < "${MANIFEST}")
    log "Incremental sync (${MANIFEST_LINES} files from manifest)..."
-    # /anca-elements is being ingested into Immich (Immich becomes canonical) —
-    # skip the redundant copy in /mnt/backup/anca-elements/ until manual cleanup.
+    # anca-elements: now in Immich (canonical); /mnt/backup copy deleted
+    # 2026-05-26. Exclude retained as a safety belt in case it re-appears.
    rsync -rlt --chmod=Du=rwx,Dgo=rx,Fu=rw,Fog=r --files-from="${MANIFEST}" \
        --exclude='anca-elements/' \
        "${BACKUP_ROOT}/" "${PVE_BACKUP_DEST}/" 2>&1 || STATUS=1
@ -89,64 +89,60 @@ fi
 # STEP 2: NFS → Synology nfs/ + nfs-ssd/ (inotify change-tracked, FILTERED)
 # ============================================================
 #
-# DESIGN: Step 2 only carries paths that BYPASS the sda mirror. Paths that ARE
-# mirrored to sda by nfs-mirror reach Synology via Step 1 (sda → Synology
-# pve-backup/) and must NOT also flow through Step 2 — that would duplicate
-# every byte and double Synology consumption.
+# DESIGN: Step 2 only carries paths that BYPASS the sda mirror. As of
+# 2026-05-26 that's just /srv/nfs/immich/ (1.5T, doesn't fit on sda).
+# Everything else under /srv/nfs/ now flows through sda via nfs-mirror,
+# reaching Synology via Step 1 (sda → pve-backup/). frigate and temp are
+# excluded from both legs — intentionally NOT backed up.
 #
-# The skip-list below MUST stay in sync with EXCLUDES in
-# /usr/local/bin/nfs-mirror (which defines what nfs-mirror does NOT copy to
-# sda). The two are complementary: nfs-mirror EXCLUDES = offsite-sync Step 2
-# INCLUDES. Failing to keep them aligned creates either gaps (data missing
-# from Synology) or duplication (data on Synology via both paths).
-log "--- Step 2: NFS → Synology (skip-list paths only — sda-bypass leg) ---"
+# nfs-ssd is handled separately below: its three dirs (immich, ollama,
+# llamacpp) all go direct to Synology since /srv/nfs-ssd is not mirrored
+# to sda. ollama+llamacpp are small enough (~85G total) that the direct
+# leg is fine and we don't need to extend nfs-mirror to cover the SSD.
+#
+# Keep this aligned with /usr/local/bin/nfs-mirror's EXCLUDES — the
+# excludes there are { immich (this leg), frigate (no backup), temp
+# (no backup), anca-elements (deleted), pvc-data and friends (owned by
+# daily-backup) }. Only the bypass-leg subset matters here: { immich }.
+log "--- Step 2: NFS → Synology (immich-only direct leg + nfs-ssd) ---"

 # Regex matching paths NOT on sda (must reach Synology directly).
-# Top-level dirs under /srv/nfs/ — anchored, no nesting allowed.
-NFS_SDA_BYPASS_RE='^/srv/nfs/(immich|frigate|prometheus|loki|temp|alertmanager|ollama|audiblez|ebook2audiobook|[^/]+-backup)/'
+NFS_SDA_BYPASS_RE='^/srv/nfs/immich/'

 # rsync include/exclude args for the monthly full sync (HDD).
-# Order matters: --include patterns first, --exclude '*' last.
 NFS_FULL_INCLUDES=(
-    --include='/immich/'        --include='/immich/***'
-    --include='/frigate/'       --include='/frigate/***'
-    --include='/prometheus/'    --include='/prometheus/***'
-    --include='/loki/'          --include='/loki/***'
-    --include='/temp/'          --include='/temp/***'
-    --include='/alertmanager/'  --include='/alertmanager/***'
-    --include='/ollama/'        --include='/ollama/***'
-    --include='/audiblez/'      --include='/audiblez/***'
-    --include='/ebook2audiobook/' --include='/ebook2audiobook/***'
-    --include='/*-backup/'      --include='/*-backup/***'
+    --include='/immich/'  --include='/immich/***'
    --exclude='*'
 )

 if [ "${DAY_OF_MONTH}" -le 7 ]; then
    # Monthly: full sync with --delete for cleanup, restricted to bypass-list.
-    log "Monthly full NFS sync (sda-bypass paths only)..."
+    # --delete here will reap legacy dirs on Synology (frigate, ollama,
+    # audiblez, ebook2audiobook, *-backup, prometheus, loki, temp,
+    # alertmanager) since they're no longer in NFS_FULL_INCLUDES.
+    log "Monthly full NFS sync (immich-only — reaps legacy bypass dirs)..."
    rsync -rlt --delete "${NFS_FULL_INCLUDES[@]}" /srv/nfs/ "${NFS_DEST}/" 2>&1 \
-        && log "  OK: nfs/ full sync (bypass-list)" || { warn "nfs/ full sync failed"; STATUS=1; }
-    # nfs-ssd: every dir under it (immich/ollama/llamacpp) is in the bypass list,
-    # so a plain --delete still applies cleanly.
+        && log "  OK: nfs/ full sync (immich-only)" || { warn "nfs/ full sync failed"; STATUS=1; }
+    # nfs-ssd: full sync of all three dirs (immich, ollama, llamacpp).
    rsync -rlt --delete /srv/nfs-ssd/ "${NFS_SSD_DEST}/" 2>&1 \
        && log "  OK: nfs-ssd/ full sync" || { warn "nfs-ssd/ full sync failed"; STATUS=1; }
    > "${NFS_CHANGE_LOG}"
 elif [ -s "${NFS_CHANGE_LOG}" ]; then
-    # Incremental: only sync changed files in bypass-list paths.
+    # Incremental: only sync changed files matching the bypass leg (immich).
    sort -u "${NFS_CHANGE_LOG}" > /tmp/nfs-changes-deduped

-    # HDD NFS — include only sda-bypass paths.
+    # HDD NFS — include only /srv/nfs/immich/ paths.
    grep -E "${NFS_SDA_BYPASS_RE}" /tmp/nfs-changes-deduped | \
        while IFS= read -r f; do [ -f "$f" ] && echo "${f#/srv/nfs/}"; done \
        > /tmp/sync-nfs.list 2>/dev/null
    NFS_COUNT=$(wc -l < /tmp/sync-nfs.list 2>/dev/null || echo 0)
    if [ "${NFS_COUNT:-0}" -gt 0 ]; then
        rsync -rlt --files-from=/tmp/sync-nfs.list /srv/nfs/ "${NFS_DEST}/" 2>&1 \
-            && log "  OK: nfs/ (${NFS_COUNT} bypass files)" \
+            && log "  OK: nfs/ (${NFS_COUNT} immich files)" \
            || { warn "nfs/ incremental failed"; STATUS=1; }
    fi

-    # SSD NFS — every nfs-ssd path (immich/ollama/llamacpp) is in the bypass list.
+    # SSD NFS — every nfs-ssd path (immich/ollama/llamacpp) ships direct.
    grep '^/srv/nfs-ssd/' /tmp/nfs-changes-deduped | \
        while IFS= read -r f; do [ -f "$f" ] && echo "${f#/srv/nfs-ssd/}"; done \
        > /tmp/sync-nfs-ssd.list 2>/dev/null || true
@ -158,7 +154,7 @@ elif [ -s "${NFS_CHANGE_LOG}" ]; then
    fi

    TOTAL=$(wc -l < /tmp/nfs-changes-deduped)
-    log "  Processed ${TOTAL} change events (${NFS_COUNT} nfs + ${SSD_COUNT} nfs-ssd bypass-list files synced)"
+    log "  Processed ${TOTAL} change events (${NFS_COUNT} nfs/immich + ${SSD_COUNT} nfs-ssd files synced)"
    > "${NFS_CHANGE_LOG}"
    rm -f /tmp/nfs-changes-deduped /tmp/sync-nfs.list /tmp/sync-nfs-ssd.list
 else
--- a/scripts/provision-k8s-worker
+++ b/scripts/provision-k8s-worker
@ -0,0 +1,109 @@
+#!/usr/bin/env bash
+# provision-k8s-worker NAME VMID IP[/CIDR]
+#
+# Clone PVE template 2000 (ubuntu-2404-cloudinit-k8s-template) into a new
+# VM, configure resources to match k8s-node3/4 (32G RAM, 8 vCPU, host CPU,
+# 256G disk, VLAN 20 on vmbr1), attach the shared cicustom snippet
+# (/var/lib/vz/snippets/k8s_cloud_init.yaml), and start it. Cloud-init
+# inside the VM installs containerd + kubelet, applies the bundled
+# setup script, and runs the kubeadm join. No manual steps after this.
+#
+# Hostname is derived from `qm set --name $NAME` and read by cloud-init
+# from Proxmox metadata — DO NOT hard-code in the snippet.
+#
+# Idempotent: aborts if VMID already exists or IP is already in use.
+#
+# Usage:
+#   ssh root@192.168.1.127 bash -s -- k8s-node6 206 10.0.20.106 < provision-k8s-worker
+# or, if the script lives on the PVE host:
+#   provision-k8s-worker k8s-node6 206 10.0.20.106
+#
+# Run on the PVE host (needs qm + /var/lib/vz/snippets access).
+set -euo pipefail
+
+if [ $# -ne 3 ]; then
+    echo "usage: $0 NAME VMID IP" >&2
+    echo "  e.g. $0 k8s-node6 206 10.0.20.106" >&2
+    exit 2
+fi
+
+NAME=$1
+VMID=$2
+IP=$3
+CIDR_IP="${IP}/22"
+GW="10.0.20.1"
+DNS="10.0.20.201"
+SEARCH="viktorbarzin.lan"
+TEMPLATE_ID=2000
+STORAGE="local-lvm"
+USER_SNIPPET="local:snippets/k8s_cloud_init.yaml"
+# Per-node meta-data snippet — written below — supplies local-hostname.
+# Proxmox's auto-generated metadata DOESN'T include hostname when
+# cicustom user=… is set, so the shared user-data snippet alone leaves
+# nodes joining as "ubuntu" (image default). Per-node meta-data is the
+# clean fix.
+META_SNIPPET_FILE="/var/lib/vz/snippets/${NAME}-meta.yaml"
+META_SNIPPET="local:snippets/${NAME}-meta.yaml"
+BRIDGE="vmbr1"
+VLAN=20
+
+# Sanity: VMID must be free
+if qm status "$VMID" >/dev/null 2>&1; then
+    echo "ERROR: VM $VMID already exists. Refusing to clobber." >&2
+    qm status "$VMID" >&2
+    exit 1
+fi
+
+# Sanity: IP must not be pingable
+if ping -c 1 -W 1 "$IP" >/dev/null 2>&1; then
+    echo "ERROR: $IP is already responding to ping. Refusing to assign." >&2
+    exit 1
+fi
+
+# Sanity: snippet must exist
+if [ ! -f "/var/lib/vz/snippets/k8s_cloud_init.yaml" ]; then
+    echo "ERROR: /var/lib/vz/snippets/k8s_cloud_init.yaml missing." >&2
+    echo "  Run `tg apply` in infra/stacks/infra/ to regenerate it." >&2
+    exit 1
+fi
+
+# Sanity: template must be a template
+if ! qm config "$TEMPLATE_ID" | grep -q '^template: 1'; then
+    echo "ERROR: VMID $TEMPLATE_ID is not a template." >&2
+    exit 1
+fi
+
+echo "[1/6] write per-node meta-data snippet ($META_SNIPPET_FILE)"
+cat > "$META_SNIPPET_FILE" <<META
+local-hostname: $NAME
+instance-id: $NAME-$(date +%s)
+META
+
+echo "[2/6] qm clone $TEMPLATE_ID -> $VMID ($NAME)"
+qm clone "$TEMPLATE_ID" "$VMID" --name "$NAME" --full true --storage "$STORAGE"
+
+echo "[3/6] qm set $VMID — VM resources + network + cicustom"
+qm set "$VMID" \
+    --agent 1 \
+    --balloon 32768 \
+    --cores 8 \
+    --cpu host \
+    --memory 32768 \
+    --net0 "virtio,bridge=$BRIDGE,tag=$VLAN" \
+    --ipconfig0 "ip=$CIDR_IP,gw=$GW" \
+    --nameserver "$DNS" \
+    --searchdomain "$SEARCH" \
+    --onboot 1 \
+    --startup 'order=5,up=45,down=420' \
+    --cicustom "user=$USER_SNIPPET,meta=$META_SNIPPET"
+
+echo "[4/6] qm resize $VMID scsi0 256G"
+qm resize "$VMID" scsi0 256G
+
+echo "[5/6] qm start $VMID"
+qm start "$VMID"
+
+echo "[6/6] Done. Cloud-init runs now; node should appear in 'kubectl get nodes' within ~6-10 min."
+echo "  Tail cloud-init: socat -u UNIX-CONNECT:/var/run/qemu-server/$VMID.serial0 STDOUT | strings"
+echo "  Final config:"
+qm config "$VMID" | grep -E '^(name|cores|memory|net0|ipconfig0|cicustom|scsi0|onboot):'
--- a/stacks/beads-server/main.tf
+++ b/stacks/beads-server/main.tf
@ -336,7 +336,11 @@ resource "kubernetes_deployment" "workbench" {
      spec {
        init_container {
          name  = "seed-config"
-          image = "dolthub/dolt-workbench:latest"
+          # Pinned 2026-05-26: Keel rolled :latest → :0.1.0 on 2026-05-17,
+          # which speaks an old GraphQL schema (missing `type` arg on
+          # addDatabaseConnection) → seed-config fails, UI can't add the
+          # connection. :0.3.73 was the last Keel-resolved good tag.
+          image = "dolthub/dolt-workbench:0.3.73"
          command = ["sh", "-c", <<-EOT
            # Seed connection store
            cp /config/store.json /store/store.json
@ -365,7 +369,11 @@ resource "kubernetes_deployment" "workbench" {

        container {
          name  = "workbench"
-          image = "dolthub/dolt-workbench:latest"
+          # Pinned 2026-05-26: Keel rolled :latest → :0.1.0 on 2026-05-17,
+          # which speaks an old GraphQL schema (missing `type` arg on
+          # addDatabaseConnection) → seed-config fails, UI can't add the
+          # connection. :0.3.73 was the last Keel-resolved good tag.
+          image = "dolthub/dolt-workbench:0.3.73"
          command = ["sh", "-c", <<-EOT
            # Patch GraphQL server to listen on 0.0.0.0 (IPv4) — Node 18+ defaults to IPv6
            sed -i 's|app.listen(9002)|app.listen(9002,"0.0.0.0")|g' /app/graphql-server/dist/main.js
--- a/stacks/dbaas/modules/dbaas/main.tf
+++ b/stacks/dbaas/modules/dbaas/main.tf
@ -1088,6 +1088,7 @@ resource "null_resource" "pg_cluster" {
    storage_class = "proxmox-lvm-encrypted"
    memory_limit  = "3Gi"
    pg_params     = "v3-shared1024-walcomp-workmem16-max200"
+    affinity      = "required-hostname-v1"
  }

  provisioner "local-exec" {
@ -1106,6 +1107,15 @@ resource "null_resource" "pg_cluster" {
        # — during a long WAL backlog the failover would stall the drain.
        # Bumped 2026-05-16 ahead of Monday's first post-fix kured cycle.
        instances: 3
+        # Hard anti-affinity: force one PG instance per node. Default is
+        # `preferred` which let all 3 pods collapse onto k8s-node1 during
+        # the 2026-05-26 node4 outage — losing node1 would have killed the
+        # whole cluster (no quorum). With 3 instances + 4 worker nodes,
+        # `required` is safe under 1-node drain.
+        affinity:
+          enablePodAntiAffinity: true
+          podAntiAffinityType: required
+          topologyKey: kubernetes.io/hostname
        imageName: ghcr.io/cloudnative-pg/postgis:16
        postgresql:
          parameters:
--- a/stacks/excalidraw/.terraform.lock.hcl
+++ b/stacks/excalidraw/.terraform.lock.hcl
@ -24,6 +24,22 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
+provider "registry.terraform.io/goauthentik/authentik" {
+  version     = "2024.12.1"
+  constraints = "~> 2024.10"
+  hashes = [
+    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+  ]
+}
+
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
@ -71,3 +87,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/excalidraw/backend.tf
+++ b/stacks/excalidraw/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "excalidraw"
  }
 }
--- a/stacks/excalidraw/main.tf
+++ b/stacks/excalidraw/main.tf
@ -27,33 +27,14 @@ module "tls_secret" {
  tls_secret_name = var.tls_secret_name
 }

-resource "kubernetes_persistent_volume_claim" "data_proxmox" {
-  wait_until_bound = false
-  metadata {
-    name      = "excalidraw-data-proxmox"
-    namespace = kubernetes_namespace.excalidraw.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "5Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm"
-    resources {
-      requests = {
-        storage = "1Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and
-    # PVCs can't shrink. Without this, every TF apply tries to revert
-    # to the spec value, K8s rejects the shrink, and the PVC ends up
-    # in Terminating-but-in-use limbo.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
+module "nfs_data_host" {
+  source       = "../../modules/kubernetes/nfs_volume"
+  name         = "excalidraw-data-host"
+  namespace    = kubernetes_namespace.excalidraw.metadata[0].name
+  nfs_server   = var.nfs_server
+  nfs_path     = "/srv/nfs/excalidraw"
+  storage      = "1Gi"
+  access_modes = ["ReadWriteOnce"]
 }

 resource "kubernetes_deployment" "excalidraw" {
@ -118,7 +99,7 @@ resource "kubernetes_deployment" "excalidraw" {
        volume {
          name = "data"
          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+            claim_name = module.nfs_data_host.claim_name
          }
        }
      }
--- a/stacks/excalidraw/providers.tf
+++ b/stacks/excalidraw/providers.tf
@ -9,6 +9,21 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
+    authentik = {
+      source  = "goauthentik/authentik"
+      version = "~> 2024.10"
+    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

@ -31,3 +46,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/f1-stream/.terraform.lock.hcl
+++ b/stacks/f1-stream/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
--- a/stacks/f1-stream/backend.tf
+++ b/stacks/f1-stream/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "f1-stream"
  }
 }
--- a/stacks/f1-stream/files/backend/playback_verifier.py
+++ b/stacks/f1-stream/files/backend/playback_verifier.py
@ -381,7 +381,15 @@ class PlaybackVerifier:
            return PlaybackVerdict(is_playable=False, error="playwright unavailable")

        is_m3u8 = stream_type == "m3u8"
-        if not is_m3u8:
+        if is_m3u8:
+            # Route m3u8 fetches through our own /proxy so the verifier gets a
+            # same-origin response with ACAO:* — matches what the frontend does
+            # (frontend `getProxyUrl` wraps every m3u8 via /proxy anyway). Without
+            # this, hosts like oe1.ossfeed.store that only return CORS headers
+            # for specific Origins (e.g. pushembdz.store) trigger an immediate
+            # `fatal_network_error` in hls.js and the stream is marked dead.
+            url = f"{PROXY_BASE}/proxy?url={_b64url(url)}"
+        else:
            url = f"{PROXY_BASE}/embed?url={_b64url(url)}"

        async with self._sem:
--- a/stacks/f1-stream/main.tf
+++ b/stacks/f1-stream/main.tf
@ -78,33 +78,14 @@ resource "kubernetes_manifest" "chrome_service_client_secret" {
  depends_on = [kubernetes_namespace.f1-stream]
 }

-resource "kubernetes_persistent_volume_claim" "data_proxmox" {
-  wait_until_bound = false
-  metadata {
-    name      = "f1-stream-data-proxmox"
-    namespace = kubernetes_namespace.f1-stream.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "5Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm"
-    resources {
-      requests = {
-        storage = "1Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and
-    # PVCs can't shrink. Without this, every TF apply tries to revert
-    # to the spec value, K8s rejects the shrink, and the PVC ends up
-    # in Terminating-but-in-use limbo.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
+module "nfs_data_host" {
+  source       = "../../modules/kubernetes/nfs_volume"
+  name         = "f1-stream-data-host"
+  namespace    = kubernetes_namespace.f1-stream.metadata[0].name
+  nfs_server   = var.nfs_server
+  nfs_path     = "/srv/nfs/f1-stream"
+  storage      = "1Gi"
+  access_modes = ["ReadWriteOnce"]
 }

 resource "kubernetes_deployment" "f1-stream" {
@ -196,7 +177,7 @@ resource "kubernetes_deployment" "f1-stream" {
        volume {
          name = "data"
          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+            claim_name = module.nfs_data_host.claim_name
          }
        }
      }
--- a/stacks/f1-stream/providers.tf
+++ b/stacks/f1-stream/providers.tf
@ -13,6 +13,13 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
  }
 }

@ -35,3 +42,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/forgejo/.terraform.lock.hcl
+++ b/stacks/forgejo/.terraform.lock.hcl
@ -52,6 +52,20 @@ provider "registry.terraform.io/goauthentik/authentik" {
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+    "zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
+    "zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
+    "zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
+    "zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
+    "zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
+    "zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
+    "zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
+    "zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
+    "zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
+    "zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
+    "zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
+    "zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
+    "zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
+    "zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
  ]
 }

@ -79,6 +93,18 @@ provider "registry.terraform.io/hashicorp/kubernetes" {
  version = "3.1.0"
  hashes = [
    "h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
+    "zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
+    "zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
+    "zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
+    "zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
+    "zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
+    "zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
+    "zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
+    "zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
+    "zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
+    "zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
+    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+    "zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
  ]
 }

--- a/stacks/forgejo/backend.tf
+++ b/stacks/forgejo/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ZCcWMOLCTqb0aV-XyTAZ@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "forgejo"
  }
 }
--- a/stacks/freedify/.terraform.lock.hcl
+++ b/stacks/freedify/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
@ -79,3 +87,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/freedify/backend.tf
+++ b/stacks/freedify/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "freedify"
  }
 }
--- a/stacks/freedify/providers.tf
+++ b/stacks/freedify/providers.tf
@ -13,6 +13,17 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

@ -35,3 +46,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/immich/.terraform.lock.hcl
+++ b/stacks/immich/.terraform.lock.hcl
@ -29,21 +29,6 @@ provider "registry.terraform.io/gavinbunney/kubectl" {
  constraints = "~> 1.14"
  hashes = [
    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
-    "zh:1dec8766336ac5b00b3d8f62e3fff6390f5f60699c9299920fc9861a76f00c71",
-    "zh:43f101b56b58d7fead6a511728b4e09f7c41dc2e3963f59cf1c146c4767c6cb7",
-    "zh:4c4fbaa44f60e722f25cc05ee11dfaec282893c5c0ffa27bc88c382dbfbaa35c",
-    "zh:51dd23238b7b677b8a1abbfcc7deec53ffa5ec79e58e3b54d6be334d3d01bc0e",
-    "zh:5afc2ebc75b9d708730dbabdc8f94dd559d7f2fc5a31c5101358bd8d016916ba",
-    "zh:6be6e72d4663776390a82a37e34f7359f726d0120df622f4a2b46619338a168e",
-    "zh:72642d5fcf1e3febb6e5d4ae7b592bb9ff3cb220af041dbda893588e4bf30c0c",
-    "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
-    "zh:a1da03e3239867b35812ee031a1060fed6e8d8e458e2eaca48b5dd51b35f56f7",
-    "zh:b98b6a6728fe277fcd133bdfa7237bd733eae233f09653523f14460f608f8ba2",
-    "zh:bb8b071d0437f4767695c6158a3cb70df9f52e377c67019971d888b99147511f",
-    "zh:dc89ce4b63bfef708ec29c17e85ad0232a1794336dc54dd88c3ba0b77e764f71",
-    "zh:dd7dd18f1f8218c6cd19592288fde32dccc743cde05b9feeb2883f37c2ff4b4e",
-    "zh:ec4bd5ab3872dedb39fe528319b4bba609306e12ee90971495f109e142d66310",
-    "zh:f610ead42f724c82f5463e0e71fa735a11ffb6101880665d93f48b4a67b9ad82",
  ]
 }

@ -52,39 +37,13 @@ provider "registry.terraform.io/goauthentik/authentik" {
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
-    "zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
-    "zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
-    "zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
-    "zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
-    "zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
-    "zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
-    "zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
-    "zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
-    "zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
-    "zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
-    "zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
-    "zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
-    "zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
-    "zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
  ]
 }

 provider "registry.terraform.io/hashicorp/helm" {
-  version = "3.1.1"
+  version = "3.1.2"
  hashes = [
-    "h1:5b2ojWKT0noujHiweCds37ZreRFRQLNaErdJLusJN88=",
-    "zh:1a6d5ce931708aec29d1f3d9e360c2a0c35ba5a54d03eeaff0ce3ca597cd0275",
-    "zh:3411919ba2a5941801e677f0fea08bdd0ae22ba3c9ce3309f55554699e06524a",
-    "zh:81b36138b8f2320dc7f877b50f9e38f4bc614affe68de885d322629dd0d16a29",
-    "zh:95a2a0a497a6082ee06f95b38bd0f0d6924a65722892a856cfd914c0d117f104",
-    "zh:9d3e78c2d1bb46508b972210ad706dd8c8b106f8b206ecf096cd211c54f46990",
-    "zh:a79139abf687387a6efdbbb04289a0a8e7eaca2bd91cdc0ce68ea4f3286c2c34",
-    "zh:aaa8784be125fbd50c48d84d6e171d3fb6ef84a221dbc5165c067ce05faab4c8",
-    "zh:afecd301f469975c9d8f350cc482fe656e082b6ab0f677d1a816c3c615837cc1",
-    "zh:c54c22b18d48ff9053d899d178d9ffef7d9d19785d9bf310a07d648b7aac075b",
-    "zh:db2eefd55aea48e73384a555c72bac3f7d428e24147bedb64e1a039398e5b903",
-    "zh:ee61666a233533fd2be971091cecc01650561f1585783c381b6f6e8a390198a4",
-    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+    "h1:lIuknMfM7+QTzPWs8VBocstZF0B3TpEMIj/bw+dLAOs=",
  ]
 }

@ -92,18 +51,6 @@ provider "registry.terraform.io/hashicorp/kubernetes" {
  version = "3.1.0"
  hashes = [
    "h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
-    "zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
-    "zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
-    "zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
-    "zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
-    "zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
-    "zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
-    "zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
-    "zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
-    "zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
-    "zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
-    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
-    "zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
  ]
 }

@ -126,3 +73,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/immich/main.tf
+++ b/stacks/immich/main.tf
@ -157,7 +157,8 @@ resource "kubernetes_namespace" "immich" {
 # Override the kyverno-generated tier-2-gpu quota (12Gi requests.memory).
 # Immich-server needs 8Gi to absorb face-detection burst spikes (OOM 2026-04-26)
 # without OOM. Plus immich-machine-learning (3.5Gi) + immich-postgresql (3Gi) +
-# backup CronJobs ≈ 15.5Gi. 20Gi gives ~4.5Gi headroom.
+# backup CronJobs ≈ 15.5Gi. 24Gi gives ~8Gi headroom (raised 2026-05-26 — was at
+# 88% with VPA bumps creeping up on immich-server burst behaviour).
 resource "kubernetes_resource_quota" "immich" {
  metadata {
    name      = "tier-quota"
@ -166,8 +167,8 @@ resource "kubernetes_resource_quota" "immich" {
  spec {
    hard = {
      "requests.cpu"    = "8"
-      "requests.memory" = "20Gi"
-      "limits.memory"   = "32Gi"
+      "requests.memory" = "24Gi"
+      "limits.memory"   = "40Gi"
      pods              = "40"
    }
  }
@ -321,7 +322,12 @@ resource "kubernetes_deployment" "immich_server" {
            }
            period_seconds    = 10
            timeout_seconds   = 1
-            failure_threshold = 30
+            # Bumped 30 → 360 (5min → 1h): after a PG restart, immich-server
+            # reindexes the clip_index + face_index vector tables before binding
+            # the API port. Hundreds of thousands of rows take longer than 5min
+            # on a cold cache, so the old threshold trapped us in a startup
+            # crashloop after every PG restart (2026-05-24 incident).
+            failure_threshold = 360
            success_threshold = 1
          }

@ -526,10 +532,10 @@ resource "kubernetes_deployment" "immich-postgres" {
          resources {
            requests = {
              cpu    = "100m"
-              memory = "3Gi"
+              memory = "5Gi"
            }
            limits = {
-              memory = "3Gi"
+              memory = "5Gi"
            }
          }
        }
@ -906,7 +912,7 @@ resource "kubernetes_job_v1" "anca_elements_import" {
  wait_for_completion = false

  spec {
-    backoff_limit              = 2
+    backoff_limit              = 20
    ttl_seconds_after_finished = 604800
    template {
      metadata {
@ -948,7 +954,7 @@ resource "kubernetes_job_v1" "anca_elements_import" {
                --ban-file "csp/" --ban-file "KOREAN/" \
                --ban-file "System Volume Information/" \
                --pause-immich-jobs=false \
-                --concurrent-tasks 8 \
+                --concurrent-tasks 20 \
                --client-timeout 1h \
                --no-ui \
                --on-errors continue
--- a/stacks/immich/providers.tf
+++ b/stacks/immich/providers.tf
@ -20,6 +20,10 @@ terraform {
      source  = "gavinbunney/kubectl"
      version = "~> 1.14"
    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

--- a/stacks/infra/.terraform.lock.hcl
+++ b/stacks/infra/.terraform.lock.hcl
@ -1,10 +1,117 @@
 # This file is maintained automatically by "terraform init".
 # Manual edits may be lost in future updates.

+provider "registry.terraform.io/cloudflare/cloudflare" {
+  version     = "4.52.7"
+  constraints = "~> 4.0"
+  hashes = [
+    "h1:pPItIWii5oymR+geZB219ROSPuSODPLTlM4S/u8xLvM=",
+    "zh:0c904ce31a4c6c4a5b3bf7ff1560e77c0cc7e2450c8553ded8e8c90398e1418b",
+    "zh:36183d310c36373fe4cb936b83c595c6fd3b0a94bc7827f28e5789ccbf59752e",
+    "zh:556a568a6f0235e8f41647de9e4d3a1e7b1d6502df8b19b54ec441f1c653ea10",
+    "zh:633ebbd5b0245e75e500ef9be4d9e62288f97e8da3baaa51323892a786d90285",
+    "zh:6acfe60cf52a65ba8f044f748548d2119e7f4fd7f8ebcb14698960d87c68f529",
+    "zh:890df766e9b839623b1f0437355032a3c006226a6c200cd911e15ee1a9014e9f",
+    "zh:904acc31ebb9d6ef68c792074b30532ee61bf515f19e0a3c75b46f126cca1f13",
+    "zh:a1d0a81246afc8750286d3f6fe7a8fbe6460dd2662407b28dbfbabb612e5fa9d",
+    "zh:a41a36fe253fc365fe2b7ffc749624688b2693b4634862fda161179ab100029f",
+    "zh:a7ef269e77ffa8715c8945a2c14322c7ff159ea44c15f62505f3cbb2cae3b32d",
+    "zh:b01aa3bed30610633b762df64332b26f8844a68c3960cebcb30f04918efc67fe",
+    "zh:b069cc2cd18cae10757df3ae030508eac8d55de7e49eda7a5e3e11f2f7fe6455",
+    "zh:b2d2c6313729ebb7465dceece374049e2d08bda34473901be9ff46a8836d42b2",
+    "zh:db0e114edaf4bc2f3d4769958807c83022bfbc619a00bdf4c4bd17faa4ab2d8b",
+    "zh:ecc0aa8b9044f664fd2aaf8fa992d976578f78478980555b4b8f6148e8d1a5fe",
+  ]
+}
+
+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+    "zh:1dec8766336ac5b00b3d8f62e3fff6390f5f60699c9299920fc9861a76f00c71",
+    "zh:43f101b56b58d7fead6a511728b4e09f7c41dc2e3963f59cf1c146c4767c6cb7",
+    "zh:4c4fbaa44f60e722f25cc05ee11dfaec282893c5c0ffa27bc88c382dbfbaa35c",
+    "zh:51dd23238b7b677b8a1abbfcc7deec53ffa5ec79e58e3b54d6be334d3d01bc0e",
+    "zh:5afc2ebc75b9d708730dbabdc8f94dd559d7f2fc5a31c5101358bd8d016916ba",
+    "zh:6be6e72d4663776390a82a37e34f7359f726d0120df622f4a2b46619338a168e",
+    "zh:72642d5fcf1e3febb6e5d4ae7b592bb9ff3cb220af041dbda893588e4bf30c0c",
+    "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
+    "zh:a1da03e3239867b35812ee031a1060fed6e8d8e458e2eaca48b5dd51b35f56f7",
+    "zh:b98b6a6728fe277fcd133bdfa7237bd733eae233f09653523f14460f608f8ba2",
+    "zh:bb8b071d0437f4767695c6158a3cb70df9f52e377c67019971d888b99147511f",
+    "zh:dc89ce4b63bfef708ec29c17e85ad0232a1794336dc54dd88c3ba0b77e764f71",
+    "zh:dd7dd18f1f8218c6cd19592288fde32dccc743cde05b9feeb2883f37c2ff4b4e",
+    "zh:ec4bd5ab3872dedb39fe528319b4bba609306e12ee90971495f109e142d66310",
+    "zh:f610ead42f724c82f5463e0e71fa735a11ffb6101880665d93f48b4a67b9ad82",
+  ]
+}
+
+provider "registry.terraform.io/goauthentik/authentik" {
+  version     = "2024.12.1"
+  constraints = "~> 2024.10"
+  hashes = [
+    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+    "zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
+    "zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
+    "zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
+    "zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
+    "zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
+    "zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
+    "zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
+    "zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
+    "zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
+    "zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
+    "zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
+    "zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
+    "zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
+    "zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/helm" {
+  version = "3.1.2"
+  hashes = [
+    "h1:lIuknMfM7+QTzPWs8VBocstZF0B3TpEMIj/bw+dLAOs=",
+    "zh:1086b24b20d94afc331eb38c52b70848899fd0efaed46d9f4646180b96e9dffd",
+    "zh:28bebd04f8d0c44291dc961597c89de5be1e62153191b8b466dbbfb254c696aa",
+    "zh:49a7dd287c2c80621ba0c25834b1afac88c45d47ad3a24cd0aed634d78b1bbd4",
+    "zh:574e146b128be51cd4d9ee66cb8352eac82c7e3be2dbf53a51516ca701bb8b7c",
+    "zh:68285c8987affaa635c9590a0cefe238ba277e12532b64cb2d7ffec570ade064",
+    "zh:6ce12b5eb8f1d9aa61c4d336905e0186f9ea82c8767169533be5b206e4bd33f4",
+    "zh:83b7743951c989732f191cb429549296bca6faecffed492094bef92bec5c9dcb",
+    "zh:84fe2d11907b4e9d0c536d8b50bb63ad4056f60a73c4b734d5de7435784e53a7",
+    "zh:c8a25498bfbde4916f178d6880d9ee56ed9ceb88bef4842cd47360faadbb3dfb",
+    "zh:dfad553c09b36a7df68c3622c78b835669e69aaf954735802e85375a8df01dff",
+    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+    "zh:fd6f36da732f442e421d2b90ed3925a1c9ad0992c380a61fe7681d90b34aa5f3",
+  ]
+}
+
+provider "registry.terraform.io/hashicorp/kubernetes" {
+  version = "3.1.0"
+  hashes = [
+    "h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
+    "zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
+    "zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
+    "zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
+    "zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
+    "zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
+    "zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
+    "zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
+    "zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
+    "zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
+    "zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
+    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+    "zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
+  ]
+}
+
 provider "registry.terraform.io/hashicorp/null" {
  version = "3.2.4"
  hashes = [
    "h1:L5V05xwp/Gto1leRryuesxjMfgZwjb7oool4WS1UEFQ=",
+    "h1:hkf5w5B6q8e2A42ND2CjAvgvSN3puAosDmOJb3zCVQM=",
    "zh:59f6b52ab4ff35739647f9509ee6d93d7c032985d9f8c6237d1f8a59471bbbe2",
    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
    "zh:795c897119ff082133150121d39ff26cb5f89a730a2c8c26f3a9c1abf81a9c43",
@ -25,6 +132,7 @@ provider "registry.terraform.io/hashicorp/vault" {
  constraints = "~> 4.0"
  hashes = [
    "h1:GPfhH6dr1LY0foPBDYv9bEGifx7eSwYqFcEAOWOUxLk=",
+    "h1:aHqgWQhDBMeZO9iUKwJYMlh4q+xNMUlMIcjRbF4d02Y=",
    "zh:269ab13433f67684012ae7e15876532b0312f5d0d2002a9cf9febb1279ce5ea6",
    "zh:4babc95bf0c40eb85005db1dc2ca403c46be4a71dd3e409db3711a56f7a5ca0e",
    "zh:78d5eefdd9e494defcb3c68d282b8f96630502cac21d1ea161f53cfe9bb483b3",
@ -45,6 +153,7 @@ provider "registry.terraform.io/telmate/proxmox" {
  constraints = "3.0.2-rc07"
  hashes = [
    "h1:0UpRJ8PFsu9lhD3p2KUdUNVsDPbjZLPR46wYRpt1dxc=",
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
    "zh:2ee860cd0a368b3eaa53f4a9ea46f16dab8a97929e813ea6ef55183f8112c2ca",
    "zh:415965fd915bae2040d7f79e45f64d6e3ae61149c10114efeac1b34687d7296c",
    "zh:6584b2055df0e32062561c615e3b6b2c291ca8c959440adda09ef3ec1e1436bd",
--- a/stacks/infra/backend.tf
+++ b/stacks/infra/backend.tf
@ -1,6 +1,6 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "local" {
-    path = "/Users/viktorbarzin/code/infra/state/stacks/infra/terraform.tfstate"
+    path = "/home/wizard/code/infra/state/stacks/infra/terraform.tfstate"
  }
 }
--- a/stacks/infra/main.tf
+++ b/stacks/infra/main.tf
@ -10,8 +10,9 @@
 variable "proxmox_host" { type = string }

 variable "ssh_public_key" {
-  type    = string
-  default = ""
+  type        = string
+  default     = ""
+  description = "DEPRECATED: was a tfvars input. Now read from Vault secret/viktor.ssh_public_key directly (see locals.k8s_ssh_public_key) so no apply-time argument can leave the snippet's authorized_keys empty."
 }

 variable "k8s_join_command" { type = string }
@ -40,6 +41,12 @@ locals {
  non_k8s_cloud_init_image_path   = "/var/lib/vz/template/iso/noble-server-cloudimg-amd64-non-k8s.img"

  cloud_init_image_url = "https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img"
+
+  # Source of truth for the wizard user's SSH key on every cloud-init
+  # generated VM. Lives in Vault so we never apply with an empty value
+  # (which silently locked the wizard account on the node5 v1 boot —
+  # 2026-05-26). Falls back to var.ssh_public_key for backward compat.
+  k8s_ssh_public_key = try(data.vault_kv_secret_v2.viktor.data["ssh_public_key"], var.ssh_public_key)
 }

 # ---------------------------------------------------------------------------
@ -52,7 +59,7 @@ module "k8s-node-template" {
  proxmox_user = "root" # SSH user on Proxmox host

  ssh_private_key = data.vault_kv_secret_v2.secrets.data["ssh_private_key"]
-  ssh_public_key  = var.ssh_public_key
+  ssh_public_key  = local.k8s_ssh_public_key

  cloud_image_url = local.cloud_init_image_url
  image_path      = local.k8s_cloud_init_image_path
@ -62,163 +69,10 @@ module "k8s-node-template" {

  is_k8s_template = true # provision cloud init file with k8s deps
  snippet_name    = local.k8s_cloud_init_snippet_name
-  # Add mirror registry
-  containerd_config_update_command = <<-EOF
-  # Set up config_path for per-registry mirror configuration
-  sed -i 's|config_path = ""|config_path = "/etc/containerd/certs.d"|' /etc/containerd/config.toml
-
-  # Create hosts.toml for docker.io (Docker Hub) — high traffic, rate-limited
-  mkdir -p /etc/containerd/certs.d/docker.io
-  printf 'server = "https://registry-1.docker.io"\n\n[host."http://10.0.20.10:5000"]\n  capabilities = ["pull", "resolve"]\n\n[host."https://registry-1.docker.io"]\n  capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/docker.io/hosts.toml
-
-  # Create hosts.toml for ghcr.io — medium traffic
-  mkdir -p /etc/containerd/certs.d/ghcr.io
-  printf 'server = "https://ghcr.io"\n\n[host."http://10.0.20.10:5010"]\n  capabilities = ["pull", "resolve"]\n\n[host."https://ghcr.io"]\n  capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/ghcr.io/hosts.toml
-
-  # Forgejo OCI registry: redirect to in-cluster Traefik LB (10.0.20.200) so
-  # pulls don't hairpin out through the WAN gateway. Traefik serves the
-  # *.viktorbarzin.me wildcard so SNI verification still passes.
-  # registry.viktorbarzin.me / 10.0.20.10:5050 entries removed in Phase 4 of
-  # the forgejo-registry-consolidation 2026-05-07 — registry-private is gone.
-  mkdir -p /etc/containerd/certs.d/forgejo.viktorbarzin.me
-  printf 'server = "https://forgejo.viktorbarzin.me"\n\n[host."https://10.0.20.200"]\n  capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
-
-  # Low-traffic registries (registry.k8s.io, quay.io, reg.kyverno.io) pull directly.
-  # Pull-through cache removed: caused corrupted images (truncated downloads)
-  # breaking VPA certgen and Kyverno image pulls.
-
-  sed -i 's/.*max_concurrent_downloads = 3/max_concurrent_downloads = 20/g' /etc/containerd/config.toml # Enable multiple concurrent downloads
-  
-  # Configure aggressive garbage collection to prevent disk space exhaustion (node2 incident prevention)
-  # Set up containerd GC for unused images and containers
-  cat >> /etc/containerd/config.toml << 'CONTAINERD_GC'
-
-[plugins."io.containerd.gc.v1.scheduler"]
-  # Run GC every 30 minutes instead of default 1 hour
-  pause_threshold = 0.02
-  deletion_threshold = 0
-  mutation_threshold = 100
-  schedule_delay = "1800s"  # 30 minutes
-
-[plugins."io.containerd.runtime.v2.task"]
-  # More aggressive container cleanup
-  exit_timeout = "5m"
-
-[plugins."io.containerd.metadata.v1.bolt"]
-  # Compact database more frequently 
-  compact_threshold = 5242880  # 5MB instead of default 100MB
-CONTAINERD_GC
-  sudo sed -i '/serializeImagePulls:/d' /var/lib/kubelet/config.yaml && \
-  sudo sed -i '/maxParallelImagePulls:/d' /var/lib/kubelet/config.yaml && \
-  echo -e 'serializeImagePulls: false\nmaxParallelImagePulls: 50' | sudo tee -a /var/lib/kubelet/config.yaml
-
-  # Memory and disk reservation and eviction — prevent node OOM/disk full
-  # Aggressive disk eviction settings added after node2 containerd corruption incident (2026-03-13)
-  # These settings prevent disk space exhaustion that can corrupt containerd image store
-  sudo sed -i '/systemReserved:/d; /kubeReserved:/d; /evictionHard:/,/^[^ ]/{ /evictionHard:/d; /^  /d }; /evictionSoft:/,/^[^ ]/{ /evictionSoft:/d; /^  /d }; /evictionSoftGracePeriod:/,/^[^ ]/{ /evictionSoftGracePeriod:/d; /^  /d }' /var/lib/kubelet/config.yaml
-  cat <<'KUBELET_PATCH' | sudo tee -a /var/lib/kubelet/config.yaml
-systemReserved:
-  memory: "512Mi"
-  cpu: "200m"
-kubeReserved:
-  memory: "512Mi"
-  cpu: "200m"
-evictionHard:
-  memory.available: "500Mi"
-  nodefs.available: "15%"  # More aggressive: evict at 15% free (was 10%) 
-  imagefs.available: "20%"  # Much more aggressive: evict at 20% free to prevent containerd corruption
-evictionSoft:
-  memory.available: "1Gi"
-  nodefs.available: "20%"  # Start warnings at 20% free
-  imagefs.available: "25%"  # Start warnings at 25% free for containerd safety
-evictionSoftGracePeriod:
-  memory.available: "30s"
-  nodefs.available: "60s"  # Grace period for disk space warnings
-  imagefs.available: "30s"  # Shorter grace for critical containerd space
-memorySwap:
-  swapBehavior: "LimitedSwap"
-KUBELET_PATCH
-
-  # Remove old 2-bucket shutdown config if present (replaced by priority-based)
-  sudo sed -i '/^shutdownGracePeriod:/d; /^shutdownGracePeriodCriticalPods:/d' /var/lib/kubelet/config.yaml
-  # Remove old shutdownGracePeriodByPodPriority block if present (idempotent re-apply)
-  sudo python3 -c "
-import yaml, sys
-with open('/var/lib/kubelet/config.yaml') as f:
-    cfg = yaml.safe_load(f)
-cfg.pop('shutdownGracePeriod', None)
-cfg.pop('shutdownGracePeriodCriticalPods', None)
-cfg.pop('shutdownGracePeriodByPodPriority', None)
-# Container log rotation limits — reduces root disk writes (~20-30 GB/day savings)
-cfg['containerLogMaxSize'] = '10Mi'
-cfg['containerLogMaxFiles'] = 3
-cfg['shutdownGracePeriodByPodPriority'] = [
-    {'priority': 0,          'shutdownGracePeriodSeconds': 20},
-    {'priority': 200000,     'shutdownGracePeriodSeconds': 20},
-    {'priority': 400000,     'shutdownGracePeriodSeconds': 30},
-    {'priority': 600000,     'shutdownGracePeriodSeconds': 30},
-    {'priority': 800000,     'shutdownGracePeriodSeconds': 90},
-    {'priority': 1000000,    'shutdownGracePeriodSeconds': 30},
-    {'priority': 1200000,    'shutdownGracePeriodSeconds': 30},
-    {'priority': 2000000000, 'shutdownGracePeriodSeconds': 30},
-    {'priority': 2000001000, 'shutdownGracePeriodSeconds': 30},
-]
-with open('/var/lib/kubelet/config.yaml', 'w') as f:
-    yaml.dump(cfg, f, default_flow_style=False)
-"
-
-  # Systemd: increase InhibitDelayMaxSec so logind doesn't force-kill before kubelet finishes graceful shutdown
-  # Total kubelet shutdown time: 310s. InhibitDelay must exceed this.
-  mkdir -p /etc/systemd/logind.conf.d
-  cat <<'LOGIND_CONF' | sudo tee /etc/systemd/logind.conf.d/kubelet-shutdown.conf
-[Login]
-InhibitDelayMaxSec=480
-LOGIND_CONF
-  sudo systemctl restart systemd-logind
-
-  # Systemd: increase kubelet stop timeout to match total shutdown grace period (310s + buffer)
-  mkdir -p /etc/systemd/system/kubelet.service.d
-  cat <<'KUBELET_SHUTDOWN' | sudo tee /etc/systemd/system/kubelet.service.d/20-shutdown.conf
-[Service]
-TimeoutStopSec=420s
-KUBELET_SHUTDOWN
-  sudo systemctl daemon-reload
-
-  # Tune controller-manager + apiserver for faster volume detach on node failure
-  # Only on master node (has static pod manifests)
-  if [ -f /etc/kubernetes/manifests/kube-controller-manager.yaml ]; then
-    sudo python3 -c "
-import yaml
-# Controller-manager: faster attach-detach reconciliation (15s vs 1m default)
-with open('/etc/kubernetes/manifests/kube-controller-manager.yaml') as f:
-    m = yaml.safe_load(f)
-args = m['spec']['containers'][0]['command']
-for flag in ['--attach-detach-reconcile-sync-period=15s']:
-    key = flag.split('=')[0]
-    args = [a for a in args if not a.startswith(key)]
-    args.append(flag)
-m['spec']['containers'][0]['command'] = args
-with open('/etc/kubernetes/manifests/kube-controller-manager.yaml', 'w') as f:
-    yaml.dump(m, f, default_flow_style=False)
-print('controller-manager: attach-detach-reconcile-sync-period=15s')
-"
-    sudo python3 -c "
-import yaml
-# API server: faster pod eviction from unreachable nodes (60s vs 300s default)
-with open('/etc/kubernetes/manifests/kube-apiserver.yaml') as f:
-    m = yaml.safe_load(f)
-args = m['spec']['containers'][0]['command']
-for flag in ['--default-unreachable-toleration-seconds=60', '--default-not-ready-toleration-seconds=60']:
-    key = flag.split('=')[0]
-    args = [a for a in args if not a.startswith(key)]
-    args.append(flag)
-m['spec']['containers'][0]['command'] = args
-with open('/etc/kubernetes/manifests/kube-apiserver.yaml', 'w') as f:
-    yaml.dump(m, f, default_flow_style=False)
-print('apiserver: unreachable+not-ready toleration=60s')
-"
-  fi
-  EOF
+  # containerd setup script now bundled in the module
+  # (k8s-node-containerd-setup.sh); the deprecated variable is
+  # ignored when is_k8s_template=true.
+  containerd_config_update_command = ""
  k8s_join_command                 = var.k8s_join_command
 }

@ -395,95 +249,53 @@ UNIT
 }

 # ---------------------------------------------------------------------------
-# Docker registry VM
-# ---------------------------------------------------------------------------
-
-module "docker-registry-vm" {
-  source = "../../modules/create-vm"
-  vmid   = 220
-
-  vm_cpus      = 4
-  vm_mem_mb    = 4196
-  vm_disk_size = "64G"
-
-  template_name  = "docker-registry-template"
-  vm_name        = "docker-registry"
-  cisnippet_name = "docker-registry.yaml"
-  agent          = 1
-
-  # Boot order: after TrueNAS (order=2), before k8s nodes (order=4)
-  startup_order    = 3
-  startup_delay    = 60
-  shutdown_timeout = 120
-
-  vm_mac_address = "DE:AD:BE:EF:22:22" # mapped to 10.0.20.10 in dhcp
-  bridge         = "vmbr1"
-  vlan_tag       = "20"
-  ipconfig0      = "ip=10.0.20.10/24,gw=10.0.20.1"
-  # Active pull-through caches (docker.io + ghcr.io only):
-  # 5000 -> nginx -> registry-dockerhub (docker.io proxy)
-  # 5001 -> registry-dockerhub direct (Prometheus metrics)
-  # 5010 -> nginx -> registry-ghcr (ghcr.io proxy)
-  # Disabled caches (low-traffic, caused corrupted images):
-  # 5020 -> registry-quay (quay.io) — DISABLED
-  # 5030 -> registry-k8s (registry.k8s.io) — DISABLED, broke VPA certgen
-  # 5040 -> registry-kyverno (reg.kyverno.io) — DISABLED
-  # 5050 -> nginx -> registry-private (R/W registry for CI build cache)
-  # 8080 -> registry-ui (joxit/docker-registry-ui)
-}
-
-# ---------------------------------------------------------------------------
-# K8s node VMs (imported from existing Proxmox VMs)
-# ---------------------------------------------------------------------------
-
-# ---------------------------------------------------------------------------
-# K8s node VMs — imported from existing Proxmox VMs
+# Docker registry VM (220) — INTENTIONALLY NOT MANAGED BY TERRAFORM.
 #
-# NOTE: Nodes with iSCSI PVC disks (201, 203, 204) cannot be imported yet
-# due to telmate/proxmox provider bug: it constructs wrong volume references
-# for shared iSCSI disks on update, causing API 500 errors. These nodes will
-# be importable after migrating to the bpg/proxmox provider.
+# Same telmate/proxmox provider defect as the K8s VMs below: the
+# provider doesn't refresh `mbps_*_concurrent` fields back from live
+# state, so state perma-shows 0 even when live has 40. Every plan
+# then proposes to "fix" mbps from 0 → 40, and the apply errors with
+# "the QEMU guest needs to be rebooted" — even though the proxmox API
+# call ends up being a no-op (live values already match). Pulling
+# docker-registry out of TF for the same reason as the K8s VMs:
+# bootstrap is reproducible via the docker-registry-template above
+# + the cisnippet; VM lifecycle stays in the Proxmox UI.
+#
+# Pull-through cache port map (for reference; lives on the VM):
+#   5000 -> nginx -> registry-dockerhub (docker.io proxy)
+#   5001 -> registry-dockerhub direct (Prometheus metrics)
+#   5010 -> nginx -> registry-ghcr (ghcr.io proxy)
+#   5020 -> registry-quay (quay.io) — DISABLED (low traffic, corrupt images)
+#   5030 -> registry-k8s (registry.k8s.io) — DISABLED (broke VPA certgen)
+#   5040 -> registry-kyverno (reg.kyverno.io) — DISABLED
+#   5050 -> nginx -> registry-private (R/W cache) — decom 2026-05-07
+#   8080 -> registry-ui (joxit/docker-registry-ui)
 # ---------------------------------------------------------------------------

-module "k8s-master" {
-  source = "../../modules/create-vm"
-  vmid   = 200
-
-  vm_name        = "k8s-master"
-  vm_cpus        = 8
-  vm_mem_mb      = 32768
-  vm_disk_size   = "64G"
-  balloon        = 0
-  qemu_os        = "other"
-  use_cloud_init = false
-  boot           = "order=scsi0"
-  vm_mac_address = "00:50:56:b0:a1:39"
-  bridge         = "vmbr1"
-  vlan_tag       = "20"
-
-  startup_order    = 4
-  startup_delay    = 45
-  shutdown_timeout = 420
-}
-
-module "k8s-node2" {
-  source = "../../modules/create-vm"
-  vmid   = 202
-
-  vm_name        = "k8s-node2"
-  vm_cpus        = 8
-  vm_mem_mb      = 32768
-  vm_disk_size   = "256G"
-  balloon        = 0
-  qemu_os        = "other"
-  use_cloud_init = false
-  boot           = "c"
-  boot_disk      = "scsi0"
-  vm_mac_address = "00:50:56:b0:a1:36"
-  bridge         = "vmbr1"
-  vlan_tag       = "20"
-
-  startup_order    = 5
-  startup_delay    = 45
-  shutdown_timeout = 420
-}
+# ---------------------------------------------------------------------------
+# K8s node VMs — INTENTIONALLY NOT MANAGED BY TERRAFORM.
+#
+# The telmate/proxmox v3.0.2-rc07 provider's `disks{}` block cannot
+# represent dynamically-attached disks: on every update it rewrites
+# the entire disk list, and `lifecycle.ignore_changes` does NOT stop
+# it. We hit this twice: id=539 (iSCSI, 2026-04-02) and the 2026-05-26
+# import attempt where every `vm-9999-pvc-*` slot on k8s-node2 +
+# k8s-node3 got rewritten to point at the boot disk. Recovered via the
+# /mnt/backup/pve-config/etc-pve/nodes/pve/qemu-server/<vmid>.conf
+# nightly backup — no reboots, no data loss, K8s CSI reconciled.
+#
+# Decision (2026-05-26): k8s-master (200) and k8s-node1-4 (201-204)
+# stay out of TF indefinitely. Their cloud-init bootstrap IS in TF
+# (via k8s-node-template + non-k8s-node-template above), so a fresh
+# node still clones the template and runs the same bootstrap. The VM
+# lifecycle itself (create / shutdown / config tweak) stays in the
+# Proxmox UI. devvm (102), home-assistant (103), pfSense (101), and
+# Windows10 (300) are also hand-managed for the same reason / out of
+# scope (BSD, Windows).
+#
+# I/O caps for all 8 Linux VMs live in /tmp/apply-mbps-caps.sh on the
+# PVE host (idempotent qm-set script — beads code-9v2j). The bpg/
+# proxmox provider migration (beads code-75ds) would unblock full TF
+# adoption, but it's a multi-hour project and the cloud-init coverage
+# above already captures the bootstrap-reproducibility goal.
+# ---------------------------------------------------------------------------
--- a/stacks/infra/providers.tf
+++ b/stacks/infra/providers.tf
@ -5,6 +5,21 @@ terraform {
      source  = "hashicorp/vault"
      version = "~> 4.0"
    }
+    cloudflare = {
+      source  = "cloudflare/cloudflare"
+      version = "~> 4"
+    }
+    authentik = {
+      source  = "goauthentik/authentik"
+      version = "~> 2024.10"
+    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
    proxmox = {
      source  = "telmate/proxmox"
      version = "3.0.2-rc07"
@ -17,18 +32,22 @@ variable "kube_config_path" {
  default = "~/.kube/config"
 }

-variable "proxmox_pm_api_url" { type = string }
-variable "proxmox_pm_api_token_id" { type = string }
-variable "proxmox_pm_api_token_secret" { type = string }
+provider "kubernetes" {
+  config_path = var.kube_config_path
+}
+
+provider "helm" {
+  kubernetes = {
+    config_path = var.kube_config_path
+  }
+}

 provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }

-provider "proxmox" {
-  pm_api_url          = var.proxmox_pm_api_url
-  pm_api_token_id     = var.proxmox_pm_api_token_id
-  pm_api_token_secret = var.proxmox_pm_api_token_secret
-  pm_tls_insecure     = true
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
 }
--- a/stacks/infra/terragrunt.hcl
+++ b/stacks/infra/terragrunt.hcl
@ -3,42 +3,30 @@ include "root" {
  path = find_in_parent_folders()
 }

-# Override provider generation to include proxmox + vault (k8s providers not needed)
-generate "providers" {
-  path      = "providers.tf"
-  if_exists = "overwrite"
+# The root's `k8s_providers` generate block now declares `telmate/proxmox`
+# in required_providers for every stack (harmless for non-infra stacks —
+# they just don't instantiate a `provider "proxmox" {}` block).
+#
+# Here we add the per-stack provider config + the tfvar variable for the
+# API URL. Credentials come from Vault `secret/viktor` (same pattern as
+# cloudflare_provider.tf at the root). The output file name is distinct
+# from `providers.tf` to avoid the same-path conflict that the old
+# `generate "providers"` block silently triggered under Terragrunt v0.77.
+generate "proxmox_provider" {
+  path      = "proxmox_provider.tf"
+  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
-terraform {
-  required_providers {
-    vault = {
-      source  = "hashicorp/vault"
-      version = "~> 4.0"
-    }
-    proxmox = {
-      source  = "telmate/proxmox"
-      version = "3.0.2-rc07"
-    }
-  }
-}
-
-variable "kube_config_path" {
-  type    = string
-  default = "~/.kube/config"
-}
-
 variable "proxmox_pm_api_url" { type = string }
-variable "proxmox_pm_api_token_id" { type = string }
-variable "proxmox_pm_api_token_secret" { type = string }

-provider "vault" {
-  address          = "https://vault.viktorbarzin.me"
-  skip_child_token = true
+data "vault_kv_secret_v2" "proxmox_pm" {
+  mount = "secret"
+  name  = "viktor"
 }

 provider "proxmox" {
  pm_api_url          = var.proxmox_pm_api_url
-  pm_api_token_id     = var.proxmox_pm_api_token_id
-  pm_api_token_secret = var.proxmox_pm_api_token_secret
+  pm_api_token_id     = data.vault_kv_secret_v2.proxmox_pm.data["proxmox_pm_api_token_id"]
+  pm_api_token_secret = data.vault_kv_secret_v2.proxmox_pm.data["proxmox_pm_api_token_secret"]
  pm_tls_insecure     = true
 }
 EOF
--- a/stacks/keel/main.tf
+++ b/stacks/keel/main.tf
@ -46,6 +46,16 @@ resource "helm_release" "keel" {
  atomic = true

  values = [yamlencode({
+    # EMERGENCY STOP — scaled to 0 on 2026-05-26 16:42 UTC. Keel was actively
+    # rewriting tag strings (not just digests) despite the
+    # `keel.sh/match-tag=true` annotation injected by Kyverno that's supposed
+    # to constrain it to digest-only watches. Known casualties this round:
+    # uptime-kuma (2 → 1, 4h CrashLoopBackOff), n8n (1.80.5 → 0.1.2, silent
+    # degradation), beads-server/dolt-workbench (0.3.73 → 0.1.0), and ~10
+    # other deployments with downgrade-flavored change-cause annotations.
+    # Re-enable only after root-causing why match-tag isn't being enforced,
+    # OR after migrating each app to a content-addressed (SHA) tag pin.
+    replicaCount = 0
    # Prometheus pod-annotation scrape — picks up Keel-specific metrics
    # (pending_approvals, poll_trigger_tracked_images, registries_scanned_total{image,registry})
    # on container port 9300 /metrics. The cluster's `kubernetes-pods`
--- a/stacks/kyverno/modules/kyverno/resource-governance.tf
+++ b/stacks/kyverno/modules/kyverno/resource-governance.tf
@ -925,19 +925,24 @@ resource "kubectl_manifest" "mutate_gpu_priority" {
            ]
          }
          mutate = {
+            # `op=add` (not replace) — incoming pods often lack the
+            # `/spec/priorityClassName` key entirely; replace fails with
+            # "doc is missing key" and aborts the mutation chain BEFORE
+            # Layer 4 (tier injection) can fall back. add works whether
+            # the path exists or not. Verified 2026-05-26 on frigate.
            patchesJson6902 = yamlencode([
              {
-                op    = "replace"
+                op    = "add"
                path  = "/spec/priorityClassName"
                value = "gpu-workload"
              },
              {
-                op    = "replace"
+                op    = "add"
                path  = "/spec/priority"
                value = 1200000
              },
              {
-                op    = "replace"
+                op    = "add"
                path  = "/spec/preemptionPolicy"
                value = "PreemptLowerPriority"
              }
--- a/stacks/llama-cpp/main.tf
+++ b/stacks/llama-cpp/main.tf
@ -280,7 +280,10 @@ resource "kubernetes_deployment" "llama_swap" {
  # for it to be reachable".
  wait_for_rollout = false
  spec {
-    replicas = 1
+    # TEMP-SCALEDOWN-2026-05-25-IO-STORM: scaled to 0 during cluster recovery.
+    # Restore to 1 when cluster is fully stable. See post-mortem
+    # docs/post-mortems/2026-05-25-immich-anca-elements-io-storm.md.
+    replicas = 0
    strategy { type = "Recreate" }

    selector {
--- a/stacks/monitoring/modules/monitoring/alloy.yaml
+++ b/stacks/monitoring/modules/monitoring/alloy.yaml
@ -1,4 +1,18 @@
 alloy:
+  # Resource limits for the alloy container itself.
+  # Must be under `alloy.resources` (NOT `controller.resources`) — the chart
+  # only maps THIS key onto the alloy container. Without it, the container gets
+  # `resources: {}` and inherits Kyverno LimitRange `tier-defaults` (256Mi),
+  # which is below Alloy's 400-450Mi steady state and caused page-cache
+  # thrashing → 185 MB/s sdc reads → host IO saturation (2026-05-26).
+  # Burstable QoS (request < limit) — workers are at 97-99% memory-request
+  # saturation; a 1Gi request blocks scheduling on node2/node3.
+  resources:
+    requests:
+      cpu: 50m
+      memory: 512Mi
+    limits:
+      memory: 1Gi
  configMap:
    content: |-
      // Write your Alloy config here:
@ -183,6 +197,14 @@ alloy:
        readOnly: true

 controller:
+  # Bump maxUnavailable above the chart default (1) so a 5-node DS finishes its
+  # rolling update inside the helm_release timeout. Log shipper tolerates the
+  # brief gap.
+  updateStrategy:
+    type: RollingUpdate
+    rollingUpdate:
+      maxUnavailable: 50%
+
  volumes:
    extra:
      - name: journal-run
@ -206,13 +228,3 @@ controller:
      operator: "Exists"
      effect: "NoSchedule"

-  # Resource limits for DaemonSet pods
-  # Alloy tails logs from all containers on the node via K8s API and batches
-  # them to Loki. Memory scales with number of active log streams (~30-50 per node).
-  # 128Mi was OOMKilled; steady-state usage is ~400-450Mi per pod.
-  resources:
-    requests:
-      cpu: 50m
-      memory: 512Mi
-    limits:
-      memory: 1Gi
--- a/stacks/monitoring/modules/monitoring/loki.tf
+++ b/stacks/monitoring/modules/monitoring/loki.tf
@ -28,8 +28,9 @@ resource "helm_release" "alloy" {
  repository = "https://grafana.github.io/helm-charts"
  chart      = "alloy"

-  values = [file("${path.module}/alloy.yaml")]
-  atomic = true
+  values  = [file("${path.module}/alloy.yaml")]
+  atomic  = true
+  timeout = 900 # 5-pod DS rolling update + occasional runc-stuck-Terminating on k8s-master needs >300s default

  depends_on = [helm_release.loki]
 }
--- a/stacks/monitoring/modules/monitoring/main.tf
+++ b/stacks/monitoring/modules/monitoring/main.tf
@ -568,6 +568,9 @@ resource "kubernetes_manifest" "yotovski_ingress_route" {

 # Custom ResourceQuota for monitoring — larger than the default 1-cluster tier quota
 # because monitoring runs 29+ pods (Prometheus, Grafana, Loki, Alloy, exporters, etc.)
+# Headroom: cluster grew from 5 → 7 workers (k8s-node5/6 added 2026-05-26); per-pod
+# DaemonSets (alloy 562Mi, node-exporter 100Mi, loki-canary 128Mi, sysctl-inotify 4Mi)
+# now consume ~+2Gi vs. pre-expansion. 20Gi gives ~3-4Gi safe headroom.
 resource "kubernetes_resource_quota" "monitoring" {
  metadata {
    name      = "monitoring-quota"
@ -576,7 +579,7 @@ resource "kubernetes_resource_quota" "monitoring" {
  spec {
    hard = {
      "requests.cpu"    = "16"
-      "requests.memory" = "16Gi"
+      "requests.memory" = "20Gi"
      "limits.memory"   = "64Gi"
      pods              = "100"
    }
--- a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
+++ b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
@ -528,7 +528,7 @@ serverFiles:
            action: drop
          # Whitelist: only keep essential kube-state-metrics, node-exporter, and coredns metrics
          - source_labels: [__name__]
-            regex: 'kube_cronjob_status_last_successful_time|kube_deployment_spec_replicas|kube_deployment_status_replicas_available|kube_deployment_status_replicas_unavailable|kube_job_status_failed|kube_job_status_start_time|kube_node_info|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_persistentvolumeclaim_status_phase|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_restarts_total|kube_pod_container_status_running|kube_pod_container_status_waiting_reason|kube_pod_info|kube_pod_status_phase|kube_pod_status_ready|kube_pod_status_reason|kube_pod_status_conditions|kube_resourcequota|kube_statefulset_replicas|kube_statefulset_status_replicas_ready|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_ready|kube_node_spec_unschedulable|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_filesystem_device_error|node_filesystem_readonly|node_hwmon_chip_names|node_hwmon_temp_celsius|node_load1|node_load15|node_load5|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapTotal_bytes|node_memory_SwapFree_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_nfs_requests_total|node_uname_info|node_vmstat_oom_kill|coredns_cache_entries|coredns_cache_hits_total|coredns_cache_misses_total|coredns_dns_requests_total|coredns_dns_responses_total|coredns_forward_requests_total|coredns_forward_responses_total|coredns_build_info|process_cpu_seconds_total|process_resident_memory_bytes|process_start_time_seconds|up|pve_.*'
+            regex: 'kube_cronjob_status_last_successful_time|kube_deployment_spec_replicas|kube_deployment_status_replicas_available|kube_deployment_status_replicas_unavailable|kube_job_status_failed|kube_job_status_start_time|kube_node_info|kube_node_status_allocatable|kube_node_status_capacity|kube_node_status_condition|kube_persistentvolumeclaim_status_phase|kube_volumeattachment_info|kube_pod_container_resource_limits|kube_pod_container_resource_requests|kube_pod_container_status_restarts_total|kube_pod_container_status_running|kube_pod_container_status_waiting_reason|kube_pod_info|kube_pod_status_phase|kube_pod_status_ready|kube_pod_status_reason|kube_pod_status_conditions|kube_resourcequota|kube_statefulset_replicas|kube_statefulset_status_replicas_ready|kube_daemonset_status_desired_number_scheduled|kube_daemonset_status_number_ready|kube_node_spec_unschedulable|node_cpu_seconds_total|node_disk_io_time_seconds_total|node_disk_read_bytes_total|node_disk_written_bytes_total|node_disk_reads_completed_total|node_disk_writes_completed_total|node_filesystem_avail_bytes|node_filesystem_size_bytes|node_filesystem_device_error|node_filesystem_readonly|node_hwmon_chip_names|node_hwmon_temp_celsius|node_load1|node_load15|node_load5|node_memory_MemAvailable_bytes|node_memory_MemTotal_bytes|node_memory_Buffers_bytes|node_memory_Cached_bytes|node_memory_MemFree_bytes|node_memory_SwapTotal_bytes|node_memory_SwapFree_bytes|node_network_receive_bytes_total|node_network_transmit_bytes_total|node_nfs_requests_total|node_uname_info|node_vmstat_oom_kill|coredns_cache_entries|coredns_cache_hits_total|coredns_cache_misses_total|coredns_dns_requests_total|coredns_dns_responses_total|coredns_forward_requests_total|coredns_forward_responses_total|coredns_build_info|process_cpu_seconds_total|process_resident_memory_bytes|process_start_time_seconds|up|pve_.*'
            action: keep
      - job_name: kubernetes-service-endpoints-slow
        honor_labels: true
@ -1290,6 +1290,42 @@ serverFiles:
            annotations:
              summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container }}) cannot pull image"
              description: "Check the deployment's image reference — often a stale tag, a removed registry, or a credentials mismatch. `kubectl -n {{ $labels.namespace }} describe pod {{ $labels.pod }}` shows the pull error."
+          # N-1 capacity check: if any non-GPU worker (node2/3/4) died, would
+          # its memory requests fit on the remaining Ready workers (incl. node1
+          # GPU node — its taint is PreferNoSchedule, soft)? Fires when the
+          # most-loaded non-GPU worker holds more memory requests than the rest
+          # of the cluster has free.
+          - alert: ClusterCannotTolerateNonGpuNodeLoss
+            expr: |
+              max(
+                sum by (node) (
+                  kube_pod_container_resource_requests{resource="memory",unit="byte",node=~"k8s-node[234]"}
+                )
+              )
+              >
+              sum(
+                clamp_min(
+                  kube_node_status_allocatable{resource="memory",unit="byte",node=~"k8s-node[1234]"}
+                  - on(node) group_left() sum by (node) (
+                      kube_pod_container_resource_requests{resource="memory",unit="byte",node=~"k8s-node[1234]"}
+                    ),
+                  0
+                )
+                and on(node) (kube_node_status_condition{condition="Ready",status="true"} == 1)
+              )
+            for: 15m
+            labels:
+              severity: warning
+            annotations:
+              summary: "Cluster cannot tolerate losing any non-GPU worker — memory requests won't fit on the rest"
+              description: |
+                The most-loaded non-GPU worker (k8s-node2/3/4) has more memory
+                requests pinned to it than the rest of the workers (incl. node1
+                GPU node) currently have free. If that node went down, its
+                pods would not reschedule and stay Pending.
+                Remediation: right-size top reservers via Goldilocks (immich-server,
+                frigate, prometheus, pg-cluster, paperless) or bump VM RAM on
+                k8s-node2/k8s-node3 from 32GB → 48GB to match node1.
      - name: Infrastructure Health
        rules:
          - alert: HomeAssistantDown
@ -2336,6 +2372,35 @@ serverFiles:
              severity: warning
            annotations:
              summary: "Node {{ $labels.instance }}: NFS RPC retransmission rate {{ $value | printf \"%.1f\" }}/s — NFS server (192.168.1.127) may be degraded or unreachable"
+          # Proxmox CSI per-node LUN saturation. The plugin enforces
+          # csi.proxmox.sinextra.dev/max-volume-attachments=28 (set on every k8s-node*
+          # by stacks/proxmox-csi). QEMU's virtio-scsi-pci hard cap is 30 LUNs.
+          # When K8s-side VolumeAttachments approach the cap, new PVCs fail to
+          # attach with "no free lun found" — vaultwarden + 18 pods stuck 2026-05-26.
+          - alert: ProxmoxCSILunUsageHigh
+            expr: count by (node) (kube_volumeattachment_info{node=~"k8s-node.*"}) >= 24
+            for: 10m
+            labels:
+              severity: warning
+            annotations:
+              summary: "{{ $labels.node }}: {{ $value }}/28 CSI volumes attached (>= 85% of cap)"
+              description: "Approaching the proxmox-csi-plugin per-node cap of 28 attachments. Workloads scheduled to this node with new PVCs may fail to attach. Consider rebalancing or migrating PVCs to other nodes."
+          - alert: ProxmoxCSILunUsageCritical
+            expr: count by (node) (kube_volumeattachment_info{node=~"k8s-node.*"}) >= 27
+            for: 3m
+            labels:
+              severity: critical
+            annotations:
+              summary: "{{ $labels.node }}: {{ $value }}/28 CSI volumes attached — 1 slot left"
+              description: "Only 1 LUN slot remains before the proxmox-csi cap. Next PVC attach to this node will fail with 'no free lun found'."
+          - alert: ProxmoxCSILunCapReached
+            expr: count by (node) (kube_volumeattachment_info{node=~"k8s-node.*"}) >= 28
+            for: 1m
+            labels:
+              severity: critical
+            annotations:
+              summary: "{{ $labels.node }}: at proxmox-csi LUN cap (28/28) — attaches WILL fail"
+              description: "Pods needing new PVC attachments on {{ $labels.node }} will fail with 'no free lun found'. Detach unused volumes from this node's Proxmox VM config, or migrate PVCs to a less-loaded node."
      - name: "Application Health"
        rules:
          - alert: MailServerDown
--- a/stacks/n8n/.terraform.lock.hcl
+++ b/stacks/n8n/.terraform.lock.hcl
@ -111,3 +111,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/n8n/providers.tf
+++ b/stacks/n8n/providers.tf
@ -20,6 +20,10 @@ terraform {
      source  = "gavinbunney/kubectl"
      version = "~> 1.14"
    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

--- a/stacks/nvidia/modules/nvidia/values.yaml
+++ b/stacks/nvidia/modules/nvidia/values.yaml
@ -41,6 +41,16 @@ driver:
    limits:
      memory: "2Gi"

+  # 2026-05-25: extended startup probe from 120 to 300 failures.
+  # On k8s-node1 (6 vCPUs, 16Gi RAM, Ubuntu 24.04 + 6.8.0-117-generic),
+  # the full driver install sequence — apt install linux-headers (~2min) +
+  # gcc make -j16 kernel module compilation (~12min) + nvidia-installer
+  # file copy (~7min) = ~21min total, which exactly exhausted the default
+  # 120×10s=20min window (exit 137 = SIGKILL from startup probe).
+  # 300×10s = 50min gives 2.5× headroom on this hardware.
+  startupProbe:
+    failureThreshold: 300
+
  devicePlugin:
    config:
      name: time-slicing-config
--- a/stacks/onlyoffice/.terraform.lock.hcl
+++ b/stacks/onlyoffice/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
@ -33,22 +41,9 @@ provider "registry.terraform.io/goauthentik/authentik" {
 }

 provider "registry.terraform.io/hashicorp/helm" {
-  version = "3.1.1"
+  version = "3.1.2"
  hashes = [
-    "h1:47CqNwkxctJtL/N/JuEj+8QMg8mRNI/NWeKO5/ydfZU=",
-    "h1:5b2ojWKT0noujHiweCds37ZreRFRQLNaErdJLusJN88=",
-    "zh:1a6d5ce931708aec29d1f3d9e360c2a0c35ba5a54d03eeaff0ce3ca597cd0275",
-    "zh:3411919ba2a5941801e677f0fea08bdd0ae22ba3c9ce3309f55554699e06524a",
-    "zh:81b36138b8f2320dc7f877b50f9e38f4bc614affe68de885d322629dd0d16a29",
-    "zh:95a2a0a497a6082ee06f95b38bd0f0d6924a65722892a856cfd914c0d117f104",
-    "zh:9d3e78c2d1bb46508b972210ad706dd8c8b106f8b206ecf096cd211c54f46990",
-    "zh:a79139abf687387a6efdbbb04289a0a8e7eaca2bd91cdc0ce68ea4f3286c2c34",
-    "zh:aaa8784be125fbd50c48d84d6e171d3fb6ef84a221dbc5165c067ce05faab4c8",
-    "zh:afecd301f469975c9d8f350cc482fe656e082b6ab0f677d1a816c3c615837cc1",
-    "zh:c54c22b18d48ff9053d899d178d9ffef7d9d19785d9bf310a07d648b7aac075b",
-    "zh:db2eefd55aea48e73384a555c72bac3f7d428e24147bedb64e1a039398e5b903",
-    "zh:ee61666a233533fd2be971091cecc01650561f1585783c381b6f6e8a390198a4",
-    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+    "h1:lIuknMfM7+QTzPWs8VBocstZF0B3TpEMIj/bw+dLAOs=",
  ]
 }

@ -79,3 +74,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/onlyoffice/backend.tf
+++ b/stacks/onlyoffice/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "onlyoffice"
  }
 }
--- a/stacks/onlyoffice/main.tf
+++ b/stacks/onlyoffice/main.tf
@ -93,33 +93,14 @@ module "tls_secret" {
  tls_secret_name = var.tls_secret_name
 }

-resource "kubernetes_persistent_volume_claim" "data_proxmox" {
-  wait_until_bound = false
-  metadata {
-    name      = "onlyoffice-data-proxmox"
-    namespace = kubernetes_namespace.onlyoffice.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "5Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm"
-    resources {
-      requests = {
-        storage = "1Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and
-    # PVCs can't shrink. Without this, every TF apply tries to revert
-    # to the spec value, K8s rejects the shrink, and the PVC ends up
-    # in Terminating-but-in-use limbo.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
+module "nfs_data_host" {
+  source       = "../../modules/kubernetes/nfs_volume"
+  name         = "onlyoffice-data-host"
+  namespace    = kubernetes_namespace.onlyoffice.metadata[0].name
+  nfs_server   = var.nfs_server
+  nfs_path     = "/srv/nfs/onlyoffice"
+  storage      = "1Gi"
+  access_modes = ["ReadWriteOnce"]
 }

 resource "kubernetes_deployment" "onlyoffice-document-server" {
@ -226,7 +207,7 @@ resource "kubernetes_deployment" "onlyoffice-document-server" {
        volume {
          name = "data"
          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+            claim_name = module.nfs_data_host.claim_name
          }
        }
      }
--- a/stacks/onlyoffice/providers.tf
+++ b/stacks/onlyoffice/providers.tf
@ -13,6 +13,17 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

@ -35,3 +46,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/proxmox-csi/modules/proxmox-csi/main.tf
+++ b/stacks/proxmox-csi/modules/proxmox-csi/main.tf
@ -107,25 +107,35 @@ locals {
    "k8s-node2"  = { vmid = 202, proxmox_node = "pve" }
    "k8s-node3"  = { vmid = 203, proxmox_node = "pve" }
    "k8s-node4"  = { vmid = 204, proxmox_node = "pve" }
+    "k8s-node5"  = { vmid = 205, proxmox_node = "pve" }
+    "k8s-node6"  = { vmid = 206, proxmox_node = "pve" }
  }
 }

 resource "null_resource" "node_labels" {
  for_each = local.k8s_nodes

+  # max-volume-attachments: capped at 28 (4 below plugin's hard ceiling of 30,
+  # see VolumesPerNodeHardLimit in sergelogvinov/proxmox-csi-plugin pkg/csi/node.go).
+  # Default is 24; bumping to 28 gives ~4-PVC headroom per node while keeping
+  # 2 slots for recovery (boot disk + transient attach during reschedule).
+  # Without this label the plugin reports 24 and node1 cascades through that
+  # ceiling during evictions — see post-mortem 2026-05-25.
  provisioner "local-exec" {
    command = <<-EOT
      kubectl --kubeconfig=${var.kube_config_path} label node ${each.key} \
        topology.kubernetes.io/region=${var.proxmox_cluster_name} \
        topology.kubernetes.io/zone=${each.value.proxmox_node} \
        node.csi.proxmox.sinextra.dev/name=${each.key} \
+        csi.proxmox.sinextra.dev/max-volume-attachments=28 \
        --overwrite
    EOT
  }

  triggers = {
-    region = var.proxmox_cluster_name
-    zone   = each.value.proxmox_node
+    region      = var.proxmox_cluster_name
+    zone        = each.value.proxmox_node
+    max_volumes = "28"
  }
 }

--- a/stacks/real-estate-crawler/.terraform.lock.hcl
+++ b/stacks/real-estate-crawler/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
@ -79,3 +87,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/real-estate-crawler/backend.tf
+++ b/stacks/real-estate-crawler/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "real-estate-crawler"
  }
 }
--- a/stacks/real-estate-crawler/providers.tf
+++ b/stacks/real-estate-crawler/providers.tf
@ -13,6 +13,17 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

@ -35,3 +46,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/redis/modules/redis/main.tf
+++ b/stacks/redis/modules/redis/main.tf
@ -268,6 +268,12 @@ resource "kubernetes_config_map" "redis_v2_conf" {
      auto-aof-rewrite-min-size 128mb
      aof-load-truncated yes
      aof-use-rdb-preamble yes
+      # Allow loading an AOF with up to 1KB of garbage at the tail (post-2026-05-26
+      # node2 unclean reboot corrupted redis-v2-2's incremental AOF at offset
+      # 84799139; without this, redis-v2-2 crashlooped). Redis truncates the
+      # corrupted tail and continues. Default is 0 (refuse to load any corruption).
+      aof-load-corrupt-tail-max-size 1024
+

      replica-read-only yes
      replica-serve-stale-data yes
--- a/stacks/resume/.terraform.lock.hcl
+++ b/stacks/resume/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
--- a/stacks/resume/backend.tf
+++ b/stacks/resume/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "resume"
  }
 }
--- a/stacks/resume/main.tf
+++ b/stacks/resume/main.tf
@ -170,33 +170,14 @@ resource "kubernetes_service" "printer" {
  }
 }

-resource "kubernetes_persistent_volume_claim" "data_proxmox" {
-  wait_until_bound = false
-  metadata {
-    name      = "resume-data-proxmox"
-    namespace = kubernetes_namespace.resume.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "5Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm"
-    resources {
-      requests = {
-        storage = "1Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and
-    # PVCs can't shrink. Without this, every TF apply tries to revert
-    # to the spec value, K8s rejects the shrink, and the PVC ends up
-    # in Terminating-but-in-use limbo.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
+module "nfs_data_host" {
+  source       = "../../modules/kubernetes/nfs_volume"
+  name         = "resume-data-host"
+  namespace    = kubernetes_namespace.resume.metadata[0].name
+  nfs_server   = var.nfs_server
+  nfs_path     = "/srv/nfs/resume"
+  storage      = "1Gi"
+  access_modes = ["ReadWriteOnce"]
 }

 # Reactive Resume app
@ -339,7 +320,7 @@ resource "kubernetes_deployment" "resume" {
        volume {
          name = "data"
          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+            claim_name = module.nfs_data_host.claim_name
          }
        }
      }
--- a/stacks/resume/providers.tf
+++ b/stacks/resume/providers.tf
@ -13,6 +13,13 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
  }
 }

@ -35,3 +42,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/servarr/.terraform.lock.hcl
+++ b/stacks/servarr/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
@ -122,3 +130,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/servarr/backend.tf
+++ b/stacks/servarr/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ZCcWMOLCTqb0aV-XyTAZ@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "servarr"
  }
 }
--- a/stacks/servarr/providers.tf
+++ b/stacks/servarr/providers.tf
@ -13,6 +13,17 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

@ -35,3 +46,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/stirling-pdf/.terraform.lock.hcl
+++ b/stacks/stirling-pdf/.terraform.lock.hcl
@ -24,6 +24,22 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
+provider "registry.terraform.io/goauthentik/authentik" {
+  version     = "2024.12.1"
+  constraints = "~> 2024.10"
+  hashes = [
+    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+  ]
+}
+
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/stirling-pdf/backend.tf
+++ b/stacks/stirling-pdf/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "stirling-pdf"
  }
 }
--- a/stacks/stirling-pdf/providers.tf
+++ b/stacks/stirling-pdf/providers.tf
@ -9,6 +9,17 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
+    authentik = {
+      source  = "goauthentik/authentik"
+      version = "~> 2024.10"
+    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
  }
 }

@ -31,3 +42,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/uptime-kuma/modules/uptime-kuma/main.tf
+++ b/stacks/uptime-kuma/modules/uptime-kuma/main.tf
@ -81,9 +81,23 @@ resource "kubernetes_deployment" "uptime-kuma" {
    labels = {
      app  = "uptime-kuma"
      tier = var.tier
+      # Opt out of Kyverno's inject-keel-annotations ClusterPolicy. The Kyverno
+      # rule excludes any workload with this LABEL (see
+      # stacks/kyverno/modules/kyverno/keel-annotations.tf, exclude.any
+      # matchLabels keel.sh/policy=never). Without the label, Kyverno would
+      # silently re-add `keel.sh/policy=force` after every reconcile, undoing
+      # the annotation below.
+      "keel.sh/policy" = "never"
    }
    annotations = {
      "reloader.stakater.com/search" = "true"
+      # Stop Keel polling for this workload. Even with match-tag=true,
+      # Keel auto-downgraded :2 → :1 on 2026-05-26 12:14, which v1 booted
+      # into SQLite mode and couldn't read the existing MariaDB store
+      # (db-config.json) → 4h CrashLoopBackOff. Pinning the image string
+      # alone isn't enough because Keel kept fighting the apply. Combined
+      # with the matching LABEL above, this fully bypasses Keel.
+      "keel.sh/policy" = "never"
    }
  }
  spec {
@ -108,7 +122,14 @@ resource "kubernetes_deployment" "uptime-kuma" {
      }
      spec {
        container {
-          image = "louislam/uptime-kuma:2"
+          # Pinned to 2.3.2 because Keel auto-downgraded :2 → :1 on 2026-05-26
+          # 12:14 UTC despite the Kyverno-injected `keel.sh/match-tag=true` +
+          # `keel.sh/policy=force` annotation pair (which is supposed to gate
+          # digest changes only). The v1 image opens kuma.db (SQLite) at boot
+          # and can't read the v2 db-config.json → 4h CrashLoopBackOff while
+          # the MariaDB store sat intact. Until the keel-match-tag regression
+          # is root-caused, pin minor versions explicitly.
+          image = "louislam/uptime-kuma:2.3.2"
          name  = "uptime-kuma"

          resources {
@ -167,9 +188,12 @@ resource "kubernetes_deployment" "uptime-kuma" {
  lifecycle {
    ignore_changes = [
      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
-      metadata[0].annotations["keel.sh/policy"],
+      # `keel.sh/policy` is intentionally NOT ignored — we want TF to own it
+      # as `never` so a Kyverno reconcile (or manual kubectl) can't flip it
+      # back to `force` and re-enable auto-updates.
      metadata[0].annotations["keel.sh/trigger"],
-      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
+      metadata[0].annotations["keel.sh/pollSchedule"],   # KYVERNO_LIFECYCLE_V2
+      metadata[0].annotations["keel.sh/match-tag"],      # injected by Kyverno
    ]
  }
 }
--- a/stacks/url/.terraform.lock.hcl
+++ b/stacks/url/.terraform.lock.hcl
@ -29,6 +29,21 @@ provider "registry.terraform.io/gavinbunney/kubectl" {
  constraints = "~> 1.14"
  hashes = [
    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+    "zh:1dec8766336ac5b00b3d8f62e3fff6390f5f60699c9299920fc9861a76f00c71",
+    "zh:43f101b56b58d7fead6a511728b4e09f7c41dc2e3963f59cf1c146c4767c6cb7",
+    "zh:4c4fbaa44f60e722f25cc05ee11dfaec282893c5c0ffa27bc88c382dbfbaa35c",
+    "zh:51dd23238b7b677b8a1abbfcc7deec53ffa5ec79e58e3b54d6be334d3d01bc0e",
+    "zh:5afc2ebc75b9d708730dbabdc8f94dd559d7f2fc5a31c5101358bd8d016916ba",
+    "zh:6be6e72d4663776390a82a37e34f7359f726d0120df622f4a2b46619338a168e",
+    "zh:72642d5fcf1e3febb6e5d4ae7b592bb9ff3cb220af041dbda893588e4bf30c0c",
+    "zh:9b12af85486a96aedd8d7984b0ff811a4b42e3d88dad1a3fb4c0b580d04fa425",
+    "zh:a1da03e3239867b35812ee031a1060fed6e8d8e458e2eaca48b5dd51b35f56f7",
+    "zh:b98b6a6728fe277fcd133bdfa7237bd733eae233f09653523f14460f608f8ba2",
+    "zh:bb8b071d0437f4767695c6158a3cb70df9f52e377c67019971d888b99147511f",
+    "zh:dc89ce4b63bfef708ec29c17e85ad0232a1794336dc54dd88c3ba0b77e764f71",
+    "zh:dd7dd18f1f8218c6cd19592288fde32dccc743cde05b9feeb2883f37c2ff4b4e",
+    "zh:ec4bd5ab3872dedb39fe528319b4bba609306e12ee90971495f109e142d66310",
+    "zh:f610ead42f724c82f5463e0e71fa735a11ffb6101880665d93f48b4a67b9ad82",
  ]
 }

@ -37,6 +52,20 @@ provider "registry.terraform.io/goauthentik/authentik" {
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+    "zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
+    "zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
+    "zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
+    "zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
+    "zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
+    "zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
+    "zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
+    "zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
+    "zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
+    "zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
+    "zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
+    "zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
+    "zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
+    "zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
  ]
 }

@ -64,6 +93,18 @@ provider "registry.terraform.io/hashicorp/kubernetes" {
  version = "3.1.0"
  hashes = [
    "h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
+    "zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
+    "zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
+    "zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
+    "zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
+    "zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
+    "zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
+    "zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
+    "zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
+    "zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
+    "zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
+    "zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
+    "zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
  ]
 }

@ -87,3 +128,25 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+    "zh:2ee860cd0a368b3eaa53f4a9ea46f16dab8a97929e813ea6ef55183f8112c2ca",
+    "zh:415965fd915bae2040d7f79e45f64d6e3ae61149c10114efeac1b34687d7296c",
+    "zh:6584b2055df0e32062561c615e3b6b2c291ca8c959440adda09ef3ec1e1436bd",
+    "zh:65dcfad71928e0a8dd9befc22524ed686be5020b0024dc5cca5184c7420eeb6b",
+    "zh:7253dc29bd265d33f2791ac4f779c5413f16720bb717de8e6c5fcb2c858648ea",
+    "zh:7ec8993da10a47606670f9f67cfd10719a7580641d11c7aa761121c4a2bd66fb",
+    "zh:999a3f7a9dcf517967fc537e6ec930a8172203642fb01b8e1f78f908373db210",
+    "zh:a50e6df7280eb6584a5fd2456e3f5b6df13b2ec8a7fa4605511e438e1863be42",
+    "zh:b25b329a1e42681c509d027fee0365414f0cc5062b65690cfc3386aab16132ae",
+    "zh:c028877fdb438ece48f7bc02b65bbae9ca7b7befbd260e519ccab6c0cbb39f26",
+    "zh:cf0eaa3ea9fcc6d62793637947f1b8d7c885b6ad74695ab47e134e4ff132190f",
+    "zh:d5ade3fae031cc629b7c512a7b60e46570f4c41665e88a595d7efd943dde5ab2",
+    "zh:f388c15ad1ecfc09e7361e3b98bae9b627a3a85f7b908c9f40650969c949901c",
+    "zh:f415cc6f735a3971faae6ac24034afdb9ee83373ef8de19a9631c187d5adc7db",
+  ]
+}
--- a/stacks/url/main.tf
+++ b/stacks/url/main.tf
@ -377,13 +377,16 @@ resource "kubernetes_deployment" "shlink-web" {
              memory = "64Mi"
            }
          }
+          # shlinkio/shlink-web-client >=0.1.0 listens on port 80 (nginx default);
+          # prior :latest builds listened on 8080. Keep both probes + service
+          # target_port aligned with the image.
          port {
-            container_port = 8080
+            container_port = 80
          }
          liveness_probe {
            http_get {
              path = "/"
-              port = 8080
+              port = 80
            }
            initial_delay_seconds = 15
            period_seconds        = 30
@ -393,7 +396,7 @@ resource "kubernetes_deployment" "shlink-web" {
          readiness_probe {
            http_get {
              path = "/"
-              port = 8080
+              port = 80
            }
            initial_delay_seconds = 5
            period_seconds        = 30
@ -436,7 +439,7 @@ resource "kubernetes_service" "shlink-web" {
    port {
      name        = "http"
      port        = 80
-      target_port = 8080
+      target_port = 80
    }
  }
 }
--- a/stacks/url/providers.tf
+++ b/stacks/url/providers.tf
@ -20,6 +20,10 @@ terraform {
      source  = "gavinbunney/kubectl"
      version = "~> 1.14"
    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

--- a/stacks/vaultwarden/modules/vaultwarden/main.tf
+++ b/stacks/vaultwarden/modules/vaultwarden/main.tf
@ -87,8 +87,9 @@ resource "kubernetes_deployment" "vaultwarden" {
      }
      spec {
        container {
-          image = "vaultwarden/server:1.35.7"
-          name  = "vaultwarden"
+          image             = "vaultwarden/server:latest"
+          image_pull_policy = "Always"
+          name              = "vaultwarden"

          resources {
            requests = {
@ -181,7 +182,9 @@ resource "kubernetes_deployment" "vaultwarden" {
      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
      metadata[0].annotations["keel.sh/policy"],
      metadata[0].annotations["keel.sh/trigger"],
-      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
+      metadata[0].annotations["keel.sh/pollSchedule"],         # KYVERNO_LIFECYCLE_V2
+      metadata[0].annotations["keel.sh/match-tag"],             # KYVERNO_LIFECYCLE_V2
+      metadata[0].annotations["kubernetes.io/change-cause"],    # Keel rewrites this on every rollout
    ]
  }
 }
--- a/stacks/wealthfolio/.terraform.lock.hcl
+++ b/stacks/wealthfolio/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
@ -99,3 +107,11 @@ provider "registry.terraform.io/hashicorp/vault" {
    "zh:ff35fb1ab6add288f0f368981e56f780b50405accd1937131cba1137999c8d83",
  ]
 }
+
+provider "registry.terraform.io/telmate/proxmox" {
+  version     = "3.0.2-rc07"
+  constraints = "3.0.2-rc07"
+  hashes = [
+    "h1:zp5hpQJQ4t4zROSLqdltVpBO+Riy9VugtfFbpyTw1aM=",
+  ]
+}
--- a/stacks/wealthfolio/backend.tf
+++ b/stacks/wealthfolio/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:ZCcWMOLCTqb0aV-XyTAZ@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "wealthfolio"
  }
 }
--- a/stacks/wealthfolio/main.tf
+++ b/stacks/wealthfolio/main.tf
@ -146,7 +146,10 @@ resource "kubernetes_deployment" "wealthfolio" {
      }
      spec {
        container {
-          image = "afadil/wealthfolio:3.2"
+          # Pinned 2026-05-26: prior live was :3.2.1, Keel rolled it to :2.0
+          # on 2026-05-26 03:13, then truncated to :3.2 at 06:46 (Keel string
+          # match dropped the patch suffix). Restore the patch version.
+          image = "afadil/wealthfolio:3.2.1"
          name  = "wealthfolio"
          port {
            container_port = 8080
--- a/stacks/wealthfolio/providers.tf
+++ b/stacks/wealthfolio/providers.tf
@ -13,6 +13,17 @@ terraform {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }

@ -35,3 +46,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/stacks/whisper/.terraform.lock.hcl
+++ b/stacks/whisper/.terraform.lock.hcl
@ -24,6 +24,22 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/gavinbunney/kubectl" {
+  version     = "1.19.0"
+  constraints = "~> 1.14"
+  hashes = [
+    "h1:9QkxPjp0x5FZFfJbE+B7hBOoads9gmdfj9aYu5N4Sfc=",
+  ]
+}
+
+provider "registry.terraform.io/goauthentik/authentik" {
+  version     = "2024.12.1"
+  constraints = "~> 2024.10"
+  hashes = [
+    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+  ]
+}
+
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/whisper/backend.tf
+++ b/stacks/whisper/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:LicuZK1nVl4ILE5HF-A9@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "whisper"
  }
 }
--- a/stacks/whisper/main.tf
+++ b/stacks/whisper/main.tf
@ -25,33 +25,14 @@ module "tls_secret" {
  tls_secret_name = var.tls_secret_name
 }

-resource "kubernetes_persistent_volume_claim" "data_proxmox" {
-  wait_until_bound = false
-  metadata {
-    name      = "whisper-data-proxmox"
-    namespace = kubernetes_namespace.whisper.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "5Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm"
-    resources {
-      requests = {
-        storage = "1Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and
-    # PVCs can't shrink. Without this, every TF apply tries to revert
-    # to the spec value, K8s rejects the shrink, and the PVC ends up
-    # in Terminating-but-in-use limbo.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
+module "nfs_data_host" {
+  source       = "../../modules/kubernetes/nfs_volume"
+  name         = "whisper-data-host"
+  namespace    = kubernetes_namespace.whisper.metadata[0].name
+  nfs_server   = var.nfs_server
+  nfs_path     = "/srv/nfs/whisper"
+  storage      = "1Gi"
+  access_modes = ["ReadWriteMany"]
 }

 resource "kubernetes_deployment" "whisper" {
@ -118,7 +99,7 @@ resource "kubernetes_deployment" "whisper" {
        volume {
          name = "data"
          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+            claim_name = module.nfs_data_host.claim_name
          }
        }
      }
@ -244,7 +225,7 @@ resource "kubernetes_deployment" "piper" {
        volume {
          name = "data"
          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+            claim_name = module.nfs_data_host.claim_name
          }
        }
      }
--- a/stacks/whisper/providers.tf
+++ b/stacks/whisper/providers.tf
@ -9,6 +9,17 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
+    authentik = {
+      source  = "goauthentik/authentik"
+      version = "~> 2024.10"
+    }
+    # kubectl (gavinbunney) — workaround for hashicorp/kubernetes
+    # `kubernetes_manifest` panics on Kyverno CRDs. See beads code-e2dp.
+    # Declared for all stacks but only used where opted-in.
+    kubectl = {
+      source  = "gavinbunney/kubectl"
+      version = "~> 1.14"
+    }
  }
 }

@ -31,3 +42,8 @@ provider "vault" {
  address          = "https://vault.viktorbarzin.me"
  skip_child_token = true
 }
+
+provider "kubectl" {
+  config_path      = var.kube_config_path
+  load_config_file = true
+}
--- a/state/stacks/dbaas/terraform.tfstate.enc
+++ b/state/stacks/dbaas/terraform.tfstate.enc
--- a/state/stacks/infra/terraform.tfstate.enc
+++ b/state/stacks/infra/terraform.tfstate.enc
--- a/terragrunt.hcl
+++ b/terragrunt.hcl
@ -46,8 +46,14 @@ terraform {
  }
 }

-# Generate kubernetes + helm + cloudflare providers for all stacks.
-# The infra stack overrides this to add the proxmox provider.
+# Generate kubernetes + helm + cloudflare + proxmox providers for all stacks.
+# (Stacks that don't use proxmox simply omit any `provider "proxmox" {}` block;
+# the required_providers entry is harmless. The pre-2026-05-26 trick of the
+# infra stack overriding this block to add proxmox stopped working under
+# Terragrunt v0.77 — same-name generate blocks are now forbidden — so proxmox
+# is declared globally instead. The `provider "proxmox" {}` config lives in
+# stacks/infra/terragrunt.hcl, generated under a different filename so it
+# doesn't collide with this providers.tf.)
 generate "k8s_providers" {
  path      = "providers.tf"
  if_exists = "overwrite_terragrunt"
@ -73,6 +79,10 @@ terraform {
      source  = "gavinbunney/kubectl"
      version = "~> 1.14"
    }
+    proxmox = {
+      source  = "telmate/proxmox"
+      version = "3.0.2-rc07"
+    }
  }
 }