After k8s-node1 was silently cordoned and broke Frigate camera streams, existing alerts (NvidiaExporterDown, PodUnschedulable) didn't catch the root cause proactively. This alert fires within 5m of the GPU node being cordoned, before any pod restart attempts to schedule and fails. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| modules/monitoring | ||
| main.tf | ||
| secrets | ||
| terragrunt.hcl | ||