infra/stacks/monitoring/modules
Viktor Barzin 877cd15b45 fix: increase tier-2-gpu quota to 12Gi, add NvidiaExporterDown alert
- Increase tier-2-gpu requests.memory from 8Gi to 12Gi to give immich
  ML pods scheduling headroom (was at 96% utilization)
- Add critical NvidiaExporterDown Prometheus alert that fires when GPU
  metrics are absent for >10 minutes (faster than generic ScrapeTargetDown)
2026-03-23 03:04:33 +02:00
..
monitoring fix: increase tier-2-gpu quota to 12Gi, add NvidiaExporterDown alert 2026-03-23 03:04:33 +02:00