infra

History

Viktor Barzin a66a8d0de2 Reduce downtime during platform stack applies CrowdSec Helm fix: - Increase ResourceQuota requests.cpu from 1 to 4 — pods were at 302% of quota, preventing scheduling during rolling upgrades - Reduce Helm timeout from 3600s to 600s — 1 hour hang is excessive - Add wait=true and wait_for_jobs=true for proper readiness checking Prometheus startup guard: - Add startup guard to 8 rate/increase-based alerts that false-fire after Prometheus restarts (needs 2 scrapes for rate() to work): PodCrashLooping, ContainerOOMKilled, CoreDNSErrors, HighServiceErrorRate, HighService4xxRate, HighServiceLatency, SSDHighWriteRate, HDDHighWriteRate - Guard: and on() (time() - process_start_time_seconds) > 900 suppresses alerts for 15m after Prometheus startup		2026-03-18 08:03:59 +00:00
..
modules	Reduce downtime during platform stack applies	2026-03-18 08:03:59 +00:00
.gitkeep	[ci skip] Add Terragrunt directory skeleton and root config	2026-02-22 13:01:37 +00:00
.terraform.lock.hcl	Woodpecker CI deploy commit [CI SKIP]	2026-03-07 20:47:22 +00:00
backend.tf	Woodpecker CI deploy commit [CI SKIP]	2026-03-07 20:47:22 +00:00
main.tf	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-18 08:03:58 +00:00
providers.tf	[ci skip] fix false-positive sensitive=true on kube_config_path	2026-03-07 15:48:19 +00:00
redis-25.3.2.tgz	[ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache	2026-03-06 23:55:57 +00:00
secrets	[ci skip] Migrate 22 platform service states to stacks/platform	2026-02-22 13:35:10 +00:00
terragrunt.hcl	[ci skip] Add platform stack (core services) for Terragrunt migration	2026-02-22 13:21:09 +00:00
tiers.tf	[ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk	2026-02-28 19:08:06 +00:00