infra/modules
Viktor Barzin 4a857ebefd Add per-pod GPU memory metrics exporter
- Add DaemonSet that runs on GPU node and exposes Prometheus metrics
- Uses nvidia-smi to collect per-process GPU memory usage
- Maps PIDs to container IDs via /proc/<pid>/cgroup
- Exposes gpu_pod_memory_used_bytes metric at :9401/metrics
- Add Prometheus scrape config for gpu-pod-memory job

[ci skip]
2026-01-31 16:58:14 +00:00
..
create-template-vm add nginx reverse proxy to serialize registyr requests for the same path to avoid race conditions [ci skip] 2025-12-29 20:16:13 +00:00
create-vm add startup_shutdown to qemu vms to avoid metadata reset [ci skip] 2025-12-29 10:19:22 +00:00
docker-registry add nginx reverse proxy to serialize registyr requests for the same path to avoid race conditions [ci skip] 2025-12-29 20:16:13 +00:00
kubernetes Add per-pod GPU memory metrics exporter 2026-01-31 16:58:14 +00:00