infra/stacks/nvidia
Viktor Barzin f5cf6ec051 nvidia: bump driver container memory limit 128Mi → 2Gi
After rolling back k8s-node1's kernel to 6.8.0-117 + spoofing
/etc/os-release to 24.04 so the operator picked the matching
ubuntu24.04 driver image (everything per the workaround documented in
docs/known-issues.md), the driver container still went into a restart
loop. Container status:

    lastState.terminated: { reason: "OOMKilled", exitCode: 137 }

The driver-installer was hitting the namespace LimitRange default of
128Mi during `apt-get install linux-headers-6.8.0-117-generic` — the
last log line on every restart was "Installing Linux kernel
headers..." before SIGKILL. 2Gi gives apt + the DKMS compile step
enough headroom; peak observed during a successful compile in a test
container was ~1.4Gi.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:56 +00:00
..
modules/nvidia nvidia: bump driver container memory limit 128Mi → 2Gi 2026-05-22 14:16:56 +00:00
main.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
secrets extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
terragrunt.hcl extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00