From a2d23c1dfb04f510fa4c7a4efd1de46b94db491a Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sun, 17 May 2026 11:23:52 +0000 Subject: [PATCH] =?UTF-8?q?nvidia:=20bump=20driver=20container=20memory=20?= =?UTF-8?q?limit=20128Mi=20=E2=86=92=202Gi?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After rolling back k8s-node1's kernel to 6.8.0-117 + spoofing /etc/os-release to 24.04 so the operator picked the matching ubuntu24.04 driver image (everything per the workaround documented in docs/known-issues.md), the driver container still went into a restart loop. Container status: lastState.terminated: { reason: "OOMKilled", exitCode: 137 } The driver-installer was hitting the namespace LimitRange default of 128Mi during `apt-get install linux-headers-6.8.0-117-generic` — the last log line on every restart was "Installing Linux kernel headers..." before SIGKILL. 2Gi gives apt + the DKMS compile step enough headroom; peak observed during a successful compile in a test container was ~1.4Gi. Co-Authored-By: Claude Opus 4.7 --- stacks/nvidia/modules/nvidia/values.yaml | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/stacks/nvidia/modules/nvidia/values.yaml b/stacks/nvidia/modules/nvidia/values.yaml index ac5f130d..5d115695 100644 --- a/stacks/nvidia/modules/nvidia/values.yaml +++ b/stacks/nvidia/modules/nvidia/values.yaml @@ -28,6 +28,19 @@ driver: upgradePolicy: autoUpgrade: false + # 2026-05-17: bumped from the namespace LimitRange default of 128Mi. + # The driver-installer's `apt-get install linux-headers-` step + # exceeded 128Mi and OOMKilled (exit 137) before producing any visible + # output beyond "Installing Linux kernel headers...". 2Gi limit gives + # the apt + module-compile phase enough headroom (peak observed ~1.4Gi + # while DKMS builds the kernel module). + resources: + requests: + cpu: "50m" + memory: "256Mi" + limits: + memory: "2Gi" + devicePlugin: config: name: time-slicing-config