Switch from restart-count based detection (increase restarts[1h] > 5) to
waiting-reason based (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}).
Alert auto-resolves when pod recovers, making it clear whether the issue is active.
|
||
|---|---|---|
| .. | ||
| modules/monitoring | ||
| main.tf | ||
| secrets | ||
| terragrunt.hcl | ||
| tiers.tf | ||