- Increase tier-2-gpu requests.memory from 8Gi to 12Gi to give immich ML pods scheduling headroom (was at 96% utilization) - Add critical NvidiaExporterDown Prometheus alert that fires when GPU metrics are absent for >10 minutes (faster than generic ScrapeTargetDown) |
||
|---|---|---|
| .. | ||
| modules/kyverno | ||
| main.tf | ||
| secrets | ||
| terragrunt.hcl | ||
| tiers.tf | ||