6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
186 lines
6.2 KiB
Markdown
186 lines
6.2 KiB
Markdown
---
|
|
name: k8s-gpu-no-nvidia-devices
|
|
description: |
|
|
Fix for Kubernetes GPU pods showing "CUDA not supported" or no /dev/nvidia* devices
|
|
despite nvidia.com/gpu resource allocation. Use when: (1) container runs but torch.cuda.is_available()
|
|
returns False, (2) ls /dev/nvidia* shows "no matches found", (3) nvidia-smi fails inside pod
|
|
but works on host, (4) PyTorch/TensorFlow falls back to CPU despite GPU allocation.
|
|
Covers NVIDIA device plugin, time-slicing, and container runtime issues.
|
|
author: Claude Code
|
|
version: 1.1.0
|
|
date: 2026-03-01
|
|
---
|
|
|
|
# Kubernetes GPU Pod - No NVIDIA Devices Found
|
|
|
|
## Problem
|
|
|
|
A Kubernetes pod requests GPU resources (`nvidia.com/gpu: 1`) and schedules on a GPU node,
|
|
but inside the container there are no NVIDIA devices visible. The application falls back
|
|
to CPU with messages like "CUDA not supported by the Torch installed!" despite running
|
|
in a CUDA-enabled container image.
|
|
|
|
## Context / Trigger Conditions
|
|
|
|
- Pod shows `Running` status and is on a node with `gpu=true` label
|
|
- `kubectl describe pod` shows GPU limit/request is satisfied
|
|
- Inside container: `ls /dev/nvidia*` returns "no matches found"
|
|
- Inside container: `nvidia-smi` fails or command not found
|
|
- Application logs show: "CUDA not supported", "Switching to CPU", "torch.cuda.is_available() = False"
|
|
- On the host node: `nvidia-smi` works fine
|
|
|
|
## Solution
|
|
|
|
### Step 1: Verify GPU Availability
|
|
|
|
Check if other pods are consuming the GPU:
|
|
|
|
```bash
|
|
# List all pods using GPU resources
|
|
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].resources.limits."nvidia.com/gpu" != null) | "\(.metadata.namespace)/\(.metadata.name)"'
|
|
|
|
# Check NVIDIA device plugin pods
|
|
kubectl get pods -n nvidia -l app=nvidia-device-plugin
|
|
kubectl logs -n nvidia -l app=nvidia-device-plugin --tail=50
|
|
```
|
|
|
|
### Step 2: Free GPU Resources
|
|
|
|
If another workload is using the GPU, unload it:
|
|
|
|
```bash
|
|
# For Ollama specifically
|
|
kubectl exec -n ollama deployment/ollama -- ollama stop <model_name>
|
|
|
|
# Or scale down the conflicting deployment
|
|
kubectl scale deployment/<name> -n <namespace> --replicas=0
|
|
```
|
|
|
|
### Step 3: Restart the Affected Pod
|
|
|
|
After freeing GPU resources, restart the pod to get fresh device allocation:
|
|
|
|
```bash
|
|
kubectl rollout restart deployment/<name> -n <namespace>
|
|
|
|
# Or delete the pod directly
|
|
kubectl delete pod <pod-name> -n <namespace>
|
|
```
|
|
|
|
### Step 4: Verify GPU Access
|
|
|
|
```bash
|
|
# Check devices are now visible
|
|
kubectl exec -n <namespace> deployment/<name> -- ls -la /dev/nvidia*
|
|
|
|
# Test nvidia-smi
|
|
kubectl exec -n <namespace> deployment/<name> -- nvidia-smi
|
|
|
|
# Test PyTorch CUDA
|
|
kubectl exec -n <namespace> deployment/<name> -- python3 -c "import torch; print('CUDA:', torch.cuda.is_available())"
|
|
```
|
|
|
|
## Verification
|
|
|
|
After restart, you should see:
|
|
|
|
```
|
|
/dev/nvidia0
|
|
/dev/nvidiactl
|
|
/dev/nvidia-uvm
|
|
/dev/nvidia-uvm-tools
|
|
```
|
|
|
|
And `nvidia-smi` should show the GPU with your container process.
|
|
|
|
## Example
|
|
|
|
```bash
|
|
# Problem: ebook2audiobook shows "CUDA not supported"
|
|
$ kubectl exec -n ebook2audiobook deployment/ebook2audiobook -- ls /dev/nvidia*
|
|
zsh:1: no matches found: /dev/nvidia*
|
|
|
|
# Solution: Unload Ollama model holding the GPU
|
|
$ kubectl exec -n ollama deployment/ollama -- ollama ps
|
|
NAME SIZE PROCESSOR
|
|
qwen2.5:14b 10 GB 33%/67% CPU/GPU
|
|
|
|
$ kubectl exec -n ollama deployment/ollama -- ollama stop qwen2.5:14b
|
|
|
|
# Restart the affected pod
|
|
$ kubectl rollout restart deployment/ebook2audiobook -n ebook2audiobook
|
|
|
|
# Verify
|
|
$ kubectl exec -n ebook2audiobook deployment/ebook2audiobook -- nvidia-smi
|
|
# Should now show the Tesla T4 GPU
|
|
```
|
|
|
|
## Notes
|
|
|
|
- **GPU Time-Slicing**: If using NVIDIA GPU time-slicing (configured in GPU Operator),
|
|
multiple pods can share a GPU. However, device injection still requires proper timing.
|
|
|
|
- **Pod Scheduling Order**: Pods that start while GPU is fully allocated may not get
|
|
devices injected even after GPU becomes available - a restart is required.
|
|
|
|
- **Container Runtime**: The NVIDIA Container Toolkit must be properly configured.
|
|
Issues can arise from:
|
|
- cgroup driver mismatch (systemd vs cgroupfs)
|
|
- Container updates causing device loss
|
|
- SELinux blocking device access
|
|
|
|
- **Image Compatibility**: The container image must have CUDA libraries matching the
|
|
driver version. Check with `nvidia-smi` on host for driver version.
|
|
|
|
- **This Cluster**: Uses NVIDIA GPU Operator with time-slicing (20 replicas per GPU).
|
|
GPU node is `k8s-node1` with Tesla T4.
|
|
|
|
## See Also
|
|
|
|
- Check GPU Operator status: `kubectl get pods -n nvidia`
|
|
- View time-slicing config: `kubectl get configmap -n nvidia time-slicing-config -o yaml`
|
|
|
|
## Automatic GPU Recovery via Liveness Probe
|
|
|
|
To prevent GPU loss from requiring manual intervention, add a liveness probe that checks
|
|
both GPU availability and application health. Example for Frigate (but applicable to any
|
|
GPU workload):
|
|
|
|
```hcl
|
|
# Restart pod if GPU becomes unavailable or app hangs
|
|
liveness_probe {
|
|
exec {
|
|
command = ["sh", "-c", "nvidia-smi > /dev/null 2>&1 && curl -sf http://localhost:<port>/health > /dev/null"]
|
|
}
|
|
initial_delay_seconds = 120
|
|
period_seconds = 60
|
|
timeout_seconds = 10
|
|
failure_threshold = 3
|
|
}
|
|
# Allow time for GPU model loading at startup
|
|
startup_probe {
|
|
http_get {
|
|
path = "/health"
|
|
port = <port>
|
|
}
|
|
period_seconds = 10
|
|
failure_threshold = 30 # up to 5 minutes
|
|
}
|
|
```
|
|
|
|
The liveness probe checks:
|
|
- `nvidia-smi` — fails if GPU devices are no longer accessible (CUDA context corruption, device plugin issues)
|
|
- `curl` health endpoint — fails if the application process is hung
|
|
|
|
If either fails 3 times in a row (3 minutes), Kubernetes automatically restarts the pod,
|
|
which re-acquires the GPU device through the NVIDIA device plugin.
|
|
|
|
**Important**: Always pair with a `startup_probe` when using GPU workloads — model loading
|
|
(TensorRT, ONNX, PyTorch) can take several minutes and would trip a liveness probe
|
|
configured with a short `initial_delay_seconds`.
|
|
|
|
## References
|
|
|
|
- [NVIDIA Container Toolkit Troubleshooting](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/troubleshooting.html)
|
|
- [Kubernetes GPU Device Plugin](https://github.com/NVIDIA/k8s-device-plugin)
|
|
- [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html)
|