infra/stacks/platform/modules
Viktor Barzin 240feda408 Reduce downtime during platform stack applies
CrowdSec fixes:
- Increase ResourceQuota requests.cpu 1→4 (was at 302%, blocking upgrades)
- Add LAPI startupProbe: 30 attempts × 10s = 5min startup window
  (LAPI pods were failing default startup probe during rolling upgrades)
- Reduce Helm timeout 3600s→900s with wait=true, wait_for_jobs=true

Prometheus startup guard on 8 rate-based alerts:
- PodCrashLooping, ContainerOOMKilled, CoreDNSErrors,
  HighServiceErrorRate, HighService4xxRate, HighServiceLatency,
  SSDHighWriteRate, HDDHighWriteRate
- Suppresses false positives for 15m after Prometheus restart
2026-03-18 08:03:59 +00:00
..
authentik Right-size CPU requests cluster-wide and remove missed CPU limits 2026-03-18 08:03:58 +00:00
cloudflared Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
cnpg Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
crowdsec Reduce downtime during platform stack applies 2026-03-18 08:03:59 +00:00
dbaas Right-size CPU requests cluster-wide and remove missed CPU limits 2026-03-18 08:03:58 +00:00
headscale Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
infra-maintenance [ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup 2026-03-06 19:54:21 +00:00
iscsi-csi Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
k8s-portal Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
kyverno Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
mailserver Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
metallb [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
metrics-server [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
monitoring Reduce downtime during platform stack applies 2026-03-18 08:03:59 +00:00
nfs-csi Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
nvidia Right-size CPU requests cluster-wide and remove missed CPU limits 2026-03-18 08:03:58 +00:00
rbac Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
redis Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
reverse_proxy Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
sealed-secrets Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
technitium Right-size CPU requests cluster-wide and remove missed CPU limits 2026-03-18 08:03:58 +00:00
traefik Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
uptime-kuma Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
vaultwarden Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
vpa Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
wireguard Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
xray Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00