infra/stacks/platform
Viktor Barzin 240feda408 Reduce downtime during platform stack applies
CrowdSec fixes:
- Increase ResourceQuota requests.cpu 1→4 (was at 302%, blocking upgrades)
- Add LAPI startupProbe: 30 attempts × 10s = 5min startup window
  (LAPI pods were failing default startup probe during rolling upgrades)
- Reduce Helm timeout 3600s→900s with wait=true, wait_for_jobs=true

Prometheus startup guard on 8 rate-based alerts:
- PodCrashLooping, ContainerOOMKilled, CoreDNSErrors,
  HighServiceErrorRate, HighService4xxRate, HighServiceLatency,
  SSDHighWriteRate, HDDHighWriteRate
- Suppresses false positives for 15m after Prometheus restart
2026-03-18 08:03:59 +00:00
..
modules Reduce downtime during platform stack applies 2026-03-18 08:03:59 +00:00
.gitkeep [ci skip] Add Terragrunt directory skeleton and root config 2026-02-22 13:01:37 +00:00
.terraform.lock.hcl Woodpecker CI deploy commit [CI SKIP] 2026-03-07 20:47:22 +00:00
backend.tf Woodpecker CI deploy commit [CI SKIP] 2026-03-07 20:47:22 +00:00
main.tf Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
providers.tf [ci skip] fix false-positive sensitive=true on kube_config_path 2026-03-07 15:48:19 +00:00
redis-25.3.2.tgz [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
secrets [ci skip] Migrate 22 platform service states to stacks/platform 2026-02-22 13:35:10 +00:00
terragrunt.hcl [ci skip] Add platform stack (core services) for Terragrunt migration 2026-02-22 13:21:09 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00