nvidia: bump driver container memory limit 128Mi → 2Gi

After rolling back k8s-node1's kernel to 6.8.0-117 + spoofing
/etc/os-release to 24.04 so the operator picked the matching
ubuntu24.04 driver image (everything per the workaround documented in
docs/known-issues.md), the driver container still went into a restart
loop. Container status:

    lastState.terminated: { reason: "OOMKilled", exitCode: 137 }

The driver-installer was hitting the namespace LimitRange default of
128Mi during `apt-get install linux-headers-6.8.0-117-generic` — the
last log line on every restart was "Installing Linux kernel
headers..." before SIGKILL. 2Gi gives apt + the DKMS compile step
enough headroom; peak observed during a successful compile in a test
container was ~1.4Gi.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-17 11:23:52 +00:00
parent d06a34ccc7
commit a2d23c1dfb

View file

@ -28,6 +28,19 @@ driver:
upgradePolicy:
autoUpgrade: false
# 2026-05-17: bumped from the namespace LimitRange default of 128Mi.
# The driver-installer's `apt-get install linux-headers-<kernel>` step
# exceeded 128Mi and OOMKilled (exit 137) before producing any visible
# output beyond "Installing Linux kernel headers...". 2Gi limit gives
# the apt + module-compile phase enough headroom (peak observed ~1.4Gi
# while DKMS builds the kernel module).
resources:
requests:
cpu: "50m"
memory: "256Mi"
limits:
memory: "2Gi"
devicePlugin:
config:
name: time-slicing-config