infra/stacks
Viktor Barzin 445feb118f infra: per-VM I/O caps + terragrunt v0.77 plumbing + state recovery
WHAT LANDED:
- terragrunt.hcl (root): added telmate/proxmox to k8s_providers
  required_providers. Other stacks just don't instantiate a provider
  block — harmless. Replaces the same-name override trick the infra
  stack used to do, which stopped working under Terragrunt v0.77
  ("Detected generate blocks with the same name").
- stacks/infra/terragrunt.hcl: new generate "proxmox_provider" block
  writes proxmox_provider.tf with the provider config; credentials
  read from Vault secret/viktor at plan/apply time (no env vars).
- modules/create-vm: new mbps_rd / mbps_wr number variables (default 0
  = uncapped), wired into scsi0/scsi1 disk{} blocks as
  mbps_r_concurrent / mbps_wr_concurrent. lifecycle.ignore_changes
  extended to scsi6..scsi29 (K8s nodes have many CSI-managed slots),
  plus scsihw and qemu_os (vary per-VM; non-trivial live changes).
- stacks/infra/main.tf: docker-registry-vm gains mbps_rd=40,
  mbps_wr=40 in HCL — already applied live via qm set on 2026-05-26.

WHAT FAILED AND WAS ROLLED BACK:
- Attempted import of 7 VMs (102 devvm, 103 home-assistant, 200
  k8s-master, 201 k8s-node1, 202 k8s-node2, 203 k8s-node3, 204
  k8s-node4) via import {} blocks. The telmate/proxmox v3.0.2-rc07
  provider mangled proxmox-csi PVC slots on apply for vmid 202 and
  203: every scsi slot got rewritten from `vm-9999-pvc-<uuid>` to
  the boot disk `vm-<vmid>-disk-0`. Restored both .conf files from
  the 2026-05-24 nightly PVE config backup at /mnt/backup/pve-config/
  etc-pve/nodes/pve/qemu-server/{202,203}.conf — no reboots, no data
  loss, K8s CSI reconciled PVC attachments within minutes. Removed
  the 7 imports from state via `terraform state rm` and re-encrypted.
  Tracked in beads code-xzbl: blocked on bpg/proxmox provider
  migration (telmate has the same dynamic-disk defect that bit us on
  iSCSI back in 2026-04-02; see memory id=539).

LIVE CAPS STILL IN PLACE (qm set, 2026-05-26 ~03:13 UTC):
  102 devvm 60/60   103 home-assistant 40/40   200 k8s-master 100/60
  201 k8s-node1 150/120   202 k8s-node2 150/120   203 k8s-node3 150/120
  204 k8s-node4 150/120   220 docker-registry 40/40
  (pfSense 101 BSD + Windows10 300 intentionally out of scope.)

PRE-EXISTING DRIFT EXPOSED (NOT NEW):
- HCL declares k8s-master (200) and k8s-node2 (202) but neither was
  ever imported into TF state — confirmed against the SOPS-encrypted
  state in git (lineage e1cc5bb5, serial 42, last touched 2026-04-06).
  This commit leaves both declarations in place but does NOT import
  them; that's part of the code-xzbl follow-up.

Closes: code-s9xr
2026-05-26 06:46:47 +00:00
..
_template ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-10 18:53:49 +00:00
actualbudget recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
affine recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
authentik authentik: worker replicas 3 -> 2 2026-05-21 09:14:35 +00:00
beads-server beads-server: codify Keel annotations on Dolt deployment (drift cleanup) 2026-05-17 22:22:40 +00:00
blog nfs-mirror: append transferred files to offsite-sync manifest 2026-05-24 15:32:22 +00:00
broker-sync broker-sync(imap): fix command name + add fsGroup for sync.db writes 2026-05-22 14:41:54 +00:00
calico security(wave1): W1.6 expand observation from recruiter-responder pilot → tier 3+4 (82 namespaces) 2026-05-19 22:14:16 +00:00
changedetection enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
chrome-service recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
city-guesser enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
claude-agent-service claude-agent-service: cut memory request 2Gi → 1Gi (limit 4Gi → 2Gi) 2026-05-23 10:03:42 +00:00
claude-memory recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
cloudflared mailserver: decommission SendGrid 2026-05-22 20:08:38 +00:00
cnpg cnpg: bump webhook-cert renewal threshold 7d -> 30d 2026-05-22 15:00:41 +00:00
coturn enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
crowdsec crowdsec: pin image to v1.7.8 + remove ENROLL_KEY, CAPI restored 2026-05-24 11:11:29 +00:00
cyberchef final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
dashy enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
dawarich enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
dbaas dbaas: opt MySQL out of Keel + add do-not-bump warning 2026-05-19 13:21:03 +00:00
descheduler keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
diun enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
ebook2audiobook enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
ebooks enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
echo enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
excalidraw excalidraw: migrate PVC from proxmox-lvm to NFS 2026-05-26 02:33:41 +00:00
external-secrets recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
f1-stream f1-stream: migrate PVC from proxmox-lvm to NFS 2026-05-26 02:49:43 +00:00
fire-planner fire-planner: COL refresh CronJob + Grafana Cost-of-Living dashboard 2026-05-22 14:15:38 +00:00
foolery recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
forgejo Woodpecker CI deploy [CI SKIP] 2026-05-24 22:07:58 +00:00
freedify recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
freshrss infra: add kubectl + authentik providers across 6 stacks 2026-05-21 08:07:22 +00:00
frigate ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
grampsweb ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
hackmd ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
headscale keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
health ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
hermes-agent ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
homepage final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
immich immich: harden against bulk-import load (memory + probe + Job retries) 2026-05-24 22:14:05 +00:00
infra infra: per-VM I/O caps + terragrunt v0.77 plumbing + state recovery 2026-05-26 06:46:47 +00:00
infra-maintenance [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
insta2spotify ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
instagram-poster Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
isponsorblocktv ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
job-hunter ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
jsoncrack final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
k8s-dashboard final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
k8s-portal Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
k8s-version-upgrade k8s-version-upgrade: ignore IngressTTFBCritical in halt-on-alert check 2026-05-24 01:10:44 +00:00
keel upgrade-state: skill + script + Keel scrape for periodic three-pipeline audit 2026-05-18 10:50:43 +00:00
kms final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
kured ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
kyverno kyverno: allowlist woodpeckerci/* for CI step pods 2026-05-23 08:52:48 +00:00
linkwarden infra: add kubectl + authentik providers across 6 stacks 2026-05-21 08:07:22 +00:00
llama-cpp llama-cpp: ignore_changes for keel/k8s-managed annotations 2026-05-24 09:01:17 +00:00
local-path final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
mailserver mailserver: decommission SendGrid 2026-05-22 20:08:38 +00:00
matrix ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
meshcentral ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
metallb keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
metrics-server keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
monitoring monitoring: alerts for proxmox-csi LUN saturation per node 2026-05-26 02:45:13 +00:00
n8n nfs-mirror: append transferred files to offsite-sync manifest 2026-05-24 15:32:22 +00:00
navidrome infra: add kubectl + authentik providers across 6 stacks 2026-05-21 08:07:22 +00:00
netbox ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
networking-toolbox ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
nextcloud nextcloud(external_storage): add per-mount enableSharing option 2026-05-24 11:39:16 +00:00
nfs-csi keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
nodelocal-dns [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
novelapp Woodpecker CI deploy [CI SKIP] 2026-05-16 23:17:44 +00:00
ntfy ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
nvidia nvidia: fix driver install deadlock + extend startup probe 2026-05-25 11:53:44 +00:00
onlyoffice onlyoffice: restore replicas 0 → 1 post IO-storm recovery 2026-05-26 03:08:17 +00:00
openclaw nfs-mirror: append transferred files to offsite-sync manifest 2026-05-24 15:32:22 +00:00
osm_routing final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
owntracks ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
paperless-mcp paperless-mcp: deploy MCP for AI document search 2026-05-17 11:14:35 +00:00
paperless-ngx ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
payslip-ingest ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
phpipam ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
platform [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
plotting-book Woodpecker CI deploy [CI SKIP] 2026-05-16 23:17:44 +00:00
poison-fountain ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
postiz postiz: bump memory request 512Mi → 2Gi, limit 4Gi → 3Gi (right-size for next deploy) 2026-05-24 01:11:25 +00:00
priority-pass ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
privatebin ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
proxmox-csi proxmox-csi/node: bump memory request 64Mi → 1Gi (LUKS unlock reservation) 2026-05-24 01:10:44 +00:00
pvc-autoresizer [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rbac [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
real-estate-crawler realestate-crawler: dockerhub pull-secret + lift image-pin on ui/api 2026-05-18 19:11:43 +00:00
recruiter-responder nfs-mirror: append transferred files to offsite-sync manifest 2026-05-24 15:32:22 +00:00
redis keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
reloader keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
resume resume: migrate PVC from proxmox-lvm to NFS 2026-05-26 02:36:20 +00:00
reverse-proxy keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
rybbit ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
sealed-secrets keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
send ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
servarr keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
shadowsocks ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
speedtest ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
status-page [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
stirling-pdf ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
tandoor infra: add kubectl + authentik providers across 6 stacks 2026-05-21 08:07:22 +00:00
technitium technitium: cut memory — primary 2Gi → 1Gi, secondary+tertiary 2Gi → 512Mi 2026-05-23 10:03:51 +00:00
terminal terminal: probe + alerts after Traefik replica routing-table skew 2026-05-17 10:04:26 +00:00
tor-proxy ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
trading-bot trading-bot: add kevin_signal_bridge container (kill-switch OFF for Phase 1) 2026-05-24 01:22:53 +00:00
traefik traefik: bump auth-proxy nginx header buffers to handle Authentik cookie pile 2026-05-23 08:34:33 +00:00
travel_blog final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
tuya-bridge ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
uptime-kuma Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
url nfs-mirror: append transferred files to offsite-sync manifest 2026-05-24 15:32:22 +00:00
vault trading-bot: revive K8s stack + add meet-kevin-watcher 2026-05-22 11:23:30 +00:00
vaultwarden Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
vpa keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
wealthfolio Woodpecker CI deploy [CI SKIP] 2026-05-16 13:45:45 +00:00
webhook_handler final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
whisper whisper: migrate PVC from proxmox-lvm to NFS 2026-05-26 02:38:34 +00:00
wireguard keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
woodpecker ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
xray xray: drop dead vless ingress + pin Service target_port 2026-05-24 01:13:54 +00:00
ytdlp ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00