infra/stacks/monitoring
Viktor Barzin 5da6d75094 fix(monitoring): PodCrashLooping alert now fires only for active CrashLoopBackOff
Switch from restart-count based detection (increase restarts[1h] > 5) to
waiting-reason based (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}).
Alert auto-resolves when pod recovers, making it clear whether the issue is active.
2026-04-12 12:41:07 +01:00
..
modules/monitoring fix(monitoring): PodCrashLooping alert now fires only for active CrashLoopBackOff 2026-04-12 12:41:07 +01:00
main.tf add TrueNAS Cloud Sync monitor CronJob and bump Prometheus Helm timeout 2026-03-23 02:24:39 +02:00
secrets extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
terragrunt.hcl extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
tiers.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00