- Add DaemonSet that runs on GPU node and exposes Prometheus metrics - Uses nvidia-smi to collect per-process GPU memory usage - Maps PIDs to container IDs via /proc/<pid>/cgroup - Exposes gpu_pod_memory_used_bytes metric at :9401/metrics - Add Prometheus scrape config for gpu-pod-memory job [ci skip] |
||
|---|---|---|
| .. | ||
| create-template-vm | ||
| create-vm | ||
| docker-registry | ||
| kubernetes | ||