infra

History

Viktor Barzin 4a857ebefd Add per-pod GPU memory metrics exporter - Add DaemonSet that runs on GPU node and exposes Prometheus metrics - Uses nvidia-smi to collect per-process GPU memory usage - Maps PIDs to container IDs via /proc/<pid>/cgroup - Exposes gpu_pod_memory_used_bytes metric at :9401/metrics - Add Prometheus scrape config for gpu-pod-memory job [ci skip]		2026-01-31 16:58:14 +00:00
..
create-template-vm	add nginx reverse proxy to serialize registyr requests for the same path to avoid race conditions [ci skip]	2025-12-29 20:16:13 +00:00
create-vm	add startup_shutdown to qemu vms to avoid metadata reset [ci skip]	2025-12-29 10:19:22 +00:00
docker-registry	add nginx reverse proxy to serialize registyr requests for the same path to avoid race conditions [ci skip]	2025-12-29 20:16:13 +00:00
kubernetes	Add per-pod GPU memory metrics exporter	2026-01-31 16:58:14 +00:00