infra

History

Viktor Barzin 240feda408 Reduce downtime during platform stack applies CrowdSec fixes: - Increase ResourceQuota requests.cpu 1→4 (was at 302%, blocking upgrades) - Add LAPI startupProbe: 30 attempts × 10s = 5min startup window (LAPI pods were failing default startup probe during rolling upgrades) - Reduce Helm timeout 3600s→900s with wait=true, wait_for_jobs=true Prometheus startup guard on 8 rate-based alerts: - PodCrashLooping, ContainerOOMKilled, CoreDNSErrors, HighServiceErrorRate, HighService4xxRate, HighServiceLatency, SSDHighWriteRate, HDDHighWriteRate - Suppresses false positives for 15m after Prometheus restart		2026-03-18 08:03:59 +00:00
..
modules	Reduce downtime during platform stack applies	2026-03-18 08:03:59 +00:00
.gitkeep	[ci skip] Add Terragrunt directory skeleton and root config	2026-02-22 13:01:37 +00:00
.terraform.lock.hcl	Woodpecker CI deploy commit [CI SKIP]	2026-03-07 20:47:22 +00:00
backend.tf	Woodpecker CI deploy commit [CI SKIP]	2026-03-07 20:47:22 +00:00
main.tf	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-18 08:03:58 +00:00
providers.tf	[ci skip] fix false-positive sensitive=true on kube_config_path	2026-03-07 15:48:19 +00:00
redis-25.3.2.tgz	[ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache	2026-03-06 23:55:57 +00:00
secrets	[ci skip] Migrate 22 platform service states to stacks/platform	2026-02-22 13:35:10 +00:00
terragrunt.hcl	[ci skip] Add platform stack (core services) for Terragrunt migration	2026-02-22 13:21:09 +00:00
tiers.tf	[ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk	2026-02-28 19:08:06 +00:00