Add GPU node taint tolerations and enhance GPU memory exporter
Add nvidia.com/gpu toleration to all GPU workloads (frigate, ollama) to support NoSchedule taint on GPU nodes. Update nvidia operator helm values with daemonset tolerations. Enhance GPU pod memory exporter with Kubernetes API integration to resolve container IDs to pod names/namespaces, adding RBAC resources for API access.
This commit is contained in:
parent
ffa80f0df6
commit
1275697f2b
5 changed files with 188 additions and 12 deletions
|
|
@ -17,3 +17,11 @@ driver:
|
|||
devicePlugin:
|
||||
config:
|
||||
name: time-slicing-config
|
||||
|
||||
# Tolerate GPU node taint for all GPU operator components
|
||||
daemonsets:
|
||||
tolerations:
|
||||
- key: "nvidia.com/gpu"
|
||||
operator: "Equal"
|
||||
value: "true"
|
||||
effect: "NoSchedule"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue