Add GPU node taint tolerations and enhance GPU memory exporter
Add nvidia.com/gpu toleration to all GPU workloads (frigate, ollama) to support NoSchedule taint on GPU nodes. Update nvidia operator helm values with daemonset tolerations. Enhance GPU pod memory exporter with Kubernetes API integration to resolve container IDs to pod names/namespaces, adding RBAC resources for API access.
This commit is contained in:
parent
d9a4417257
commit
9689b67895
5 changed files with 188 additions and 12 deletions
|
|
@ -17,3 +17,11 @@ driver:
|
|||
devicePlugin:
|
||||
config:
|
||||
name: time-slicing-config
|
||||
|
||||
# Tolerate GPU node taint for all GPU operator components
|
||||
daemonsets:
|
||||
tolerations:
|
||||
- key: "nvidia.com/gpu"
|
||||
operator: "Equal"
|
||||
value: "true"
|
||||
effect: "NoSchedule"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue