infra/stacks/platform/modules
Viktor Barzin 43b49f7f6c cluster recovery: fix resource limits and node1 memory
- nvidia quota: requests.memory 8Gi → 12Gi (unblock cuda-validator)
- calibre: startup probe initial_delay 60→120s, timeout 1→5s,
  wait_for_rollout=false (DOCKER_MODS install takes 10+ min)
- immich ML: memory 2Gi → 4Gi (OOMKilled loading CLIP models)

Also done outside TF (not in this commit):
- node1 VM: 16 GiB → 24 GiB RAM (Proxmox)
- tigera-operator: kubectl patch 128→256Mi
- nvidia-driver-daemonset: kubectl patch 1→4Gi memory
- kyverno reports-controller: kubectl patch 128→256Mi
- CNPG operator: kubectl rollout restart
2026-03-15 01:44:28 +00:00
..
authentik equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
cloudflared equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
cnpg equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
crowdsec equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
dbaas add vaultwarden daily backup CronJob to NFS 2026-03-15 00:03:59 +00:00
headscale equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
infra-maintenance [ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup 2026-03-06 19:54:21 +00:00
iscsi-csi add vaultwarden daily backup CronJob to NFS 2026-03-15 00:03:59 +00:00
k8s-portal equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
kyverno add vaultwarden daily backup CronJob to NFS 2026-03-15 00:03:59 +00:00
mailserver equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
metallb [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
metrics-server equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
monitoring equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
nfs-csi equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
nvidia cluster recovery: fix resource limits and node1 memory 2026-03-15 01:44:28 +00:00
rbac add vaultwarden daily backup CronJob to NFS 2026-03-15 00:03:59 +00:00
redis equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
reverse_proxy Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-14 08:51:45 +00:00
sealed-secrets equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
technitium equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
traefik equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
uptime-kuma Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-14 08:51:45 +00:00
vaultwarden fix vaultwarden backup image: use docker.io/library/alpine for Kyverno 2026-03-15 00:16:31 +00:00
vpa equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
wireguard equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00
xray equalize memory req=lim across 70+ containers using Prometheus 7d max data 2026-03-14 21:46:49 +00:00