Reduce downtime during platform stack applies
CrowdSec fixes:
- Increase ResourceQuota requests.cpu 1→4 (was at 302%, blocking upgrades)
- Add LAPI startupProbe: 30 attempts × 10s = 5min startup window
(LAPI pods were failing default startup probe during rolling upgrades)
- Reduce Helm timeout 3600s→900s with wait=true, wait_for_jobs=true
Prometheus startup guard on 8 rate-based alerts:
- PodCrashLooping, ContainerOOMKilled, CoreDNSErrors,
HighServiceErrorRate, HighService4xxRate, HighServiceLatency,
SSDHighWriteRate, HDDHighWriteRate
- Suppresses false positives for 15m after Prometheus restart