nvidia: bump driver container memory limit 128Mi → 2Gi
After rolling back k8s-node1's kernel to 6.8.0-117 + spoofing
/etc/os-release to 24.04 so the operator picked the matching
ubuntu24.04 driver image (everything per the workaround documented in
docs/known-issues.md), the driver container still went into a restart
loop. Container status:
lastState.terminated: { reason: "OOMKilled", exitCode: 137 }
The driver-installer was hitting the namespace LimitRange default of
128Mi during `apt-get install linux-headers-6.8.0-117-generic` — the
last log line on every restart was "Installing Linux kernel
headers..." before SIGKILL. 2Gi gives apt + the DKMS compile step
enough headroom; peak observed during a successful compile in a test
container was ~1.4Gi.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
d06a34ccc7
commit
a2d23c1dfb
1 changed files with 13 additions and 0 deletions
|
|
@ -28,6 +28,19 @@ driver:
|
|||
upgradePolicy:
|
||||
autoUpgrade: false
|
||||
|
||||
# 2026-05-17: bumped from the namespace LimitRange default of 128Mi.
|
||||
# The driver-installer's `apt-get install linux-headers-<kernel>` step
|
||||
# exceeded 128Mi and OOMKilled (exit 137) before producing any visible
|
||||
# output beyond "Installing Linux kernel headers...". 2Gi limit gives
|
||||
# the apt + module-compile phase enough headroom (peak observed ~1.4Gi
|
||||
# while DKMS builds the kernel module).
|
||||
resources:
|
||||
requests:
|
||||
cpu: "50m"
|
||||
memory: "256Mi"
|
||||
limits:
|
||||
memory: "2Gi"
|
||||
|
||||
devicePlugin:
|
||||
config:
|
||||
name: time-slicing-config
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue