infra

Viktor Barzin 17065304dc Fix NFSServerUnresponsive false positives Root cause: sum(rate(node_nfs_requests_total[5m])) == 0 was too fragile: - rate() returns nothing after Prometheus restarts (needs 2 scrapes) - Individual nodes show zero NFS rate during scrape gaps or low activity - The sum() could hit zero during quiet hours + scrape gaps New expression uses: - changes() instead of rate() — works with a single scrape - Per-instance aggregation: count nodes with any NFS counter change - Threshold < 2 nodes: single-node restarts won't trigger, real NFS outage (all nodes affected) will - Prometheus startup guard: skip first 15m after restart to avoid false positives from empty TSDB - Wider 15m changes() window to smooth out scrape gaps		2026-03-14 11:28:17 +00:00
..
modules	Fix NFSServerUnresponsive false positives	2026-03-14 11:28:17 +00:00
.gitkeep	[ci skip] Add Terragrunt directory skeleton and root config	2026-02-22 13:01:37 +00:00
.terraform.lock.hcl	Woodpecker CI deploy commit [CI SKIP]	2026-03-07 20:47:22 +00:00
backend.tf	Woodpecker CI deploy commit [CI SKIP]	2026-03-07 20:47:22 +00:00
main.tf	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
providers.tf	[ci skip] fix false-positive sensitive=true on kube_config_path	2026-03-07 15:48:19 +00:00
redis-25.3.2.tgz	[ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache	2026-03-06 23:55:57 +00:00
secrets	[ci skip] Migrate 22 platform service states to stacks/platform	2026-02-22 13:35:10 +00:00
terragrunt.hcl	[ci skip] Add platform stack (core services) for Terragrunt migration	2026-02-22 13:21:09 +00:00
tiers.tf	[ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk	2026-02-28 19:08:06 +00:00