- Increase tier-2-gpu requests.memory from 8Gi to 12Gi to give immich ML pods scheduling headroom (was at 96% utilization) - Add critical NvidiaExporterDown Prometheus alert that fires when GPU metrics are absent for >10 minutes (faster than generic ScrapeTargetDown) |
||
|---|---|---|
| .. | ||
| dependency-init-containers.tf | ||
| main.tf | ||
| registry-credentials.tf | ||
| resource-governance.tf | ||
| security-policies.tf | ||