diff --git a/docs/post-mortems/2026-05-17-gpu-driver-ubuntu2604-mismatch.md b/docs/post-mortems/2026-05-17-gpu-driver-ubuntu2604-mismatch.md
index d2f40394..cbd825b8 100644
--- a/docs/post-mortems/2026-05-17-gpu-driver-ubuntu2604-mismatch.md
+++ b/docs/post-mortems/2026-05-17-gpu-driver-ubuntu2604-mismatch.md
@@ -130,8 +130,13 @@ to-working state pending an upstream fix or kernel rollback.
 
 - [x] Pin gpu-operator chart to v25.10.1 in TF
 - [x] Document situation in this post-mortem
-- [ ] Roll back k8s-node1 host kernel to 6.8.0-117-generic + apt-mark
-      hold (needs user authorization for node reboot)
+- [x] Roll back k8s-node1 host kernel to 6.8.0-117-generic (done by user;
+      kernel rollback succeeded and NFD now reports
+      `kernel-version.full=6.8.0-117-generic`, `os_release.VERSION_ID=24.04`)
+- [x] Extend driver daemonset startup probe `failureThreshold` from 120 to 300
+      (50 min) in TF `values.yaml` — 2026-05-25. On this hardware the
+      full install sequence (apt headers + gcc compilation + file copy) takes
+      ~21min which exactly exhausted the old 120×10s window.
 - [ ] Add Prometheus alert `GPUNodeNoGPUResource` — fires when a node
       labeled `nvidia.com/gpu.present=true` has `nvidia.com/gpu` capacity
       of 0 for >10m
@@ -143,6 +148,47 @@ to-working state pending an upstream fix or kernel rollback.
       `unattended-upgrades` — `do-release-upgrade` is a separate path
       that should be gated too
 
+## Follow-up Incident: Driver install hang (2026-05-25)
+
+**Date**: 2026-05-25  
+**Status**: Resolved  
+
+After the kernel rollback to 6.8.0-117-generic succeeded, the driver pod
+(`nvidia-driver-daemonset-529vg`) was still reported as "stuck at
+Installing Linux kernel headers..." with no progress for 15–20 min.
+
+**Actual root causes (two compounding issues)**:
+
+1. **Deadlock between k8s-driver-manager and operator-validator**: The
+   `k8s-driver-manager` init container waits for `nvidia-operator-validator`
+   to shut down before it can begin the install sequence. The validator's
+   `driver-validation` init container was in an infinite retry loop polling
+   `/run/nvidia/validations/.driver-ctr-ready` (which the driver creates when
+   ready). Since the driver never finished, the validator never exited. The
+   validator pod had `deletionTimestamp` set but kubelet on node1 couldn't GC
+   it — the container received SIGTERM but remained in "Terminating" state
+   indefinitely, blocking the new driver from starting.
+   **Fix**: Force-deleted the stuck validator pod
+   (`kubectl delete pod -n nvidia nvidia-operator-validator-sff98 --force --grace-period=0`).
+   This broke the deadlock immediately.
+
+2. **Startup probe timeout**: The full driver install sequence on this hardware
+   (6 vCPUs, 16Gi RAM) takes ~21 minutes:
+   - `apt-get install linux-headers-6.8.0-117-generic`: ~2 min
+   - `gcc/make -j16` kernel module build (nvidia, nvidia-uvm, nvidia-modeset,
+     nvidia-peermem): ~12 min  
+   - nvidia-installer file copy + archive integrity check: ~7 min
+   The default startup probe allows exactly `60 + (120 × 10) = 1260s = 21min`.
+   This caused a SIGKILL (exit 137) at 21 minutes even when the install was
+   progressing normally.
+   **Fix**: Patched `driver.startupProbe.failureThreshold` from 120 → 300
+   in `stacks/nvidia/modules/nvidia/values.yaml` (gives 51 min headroom).
+
+**Key observation**: "Installing Linux kernel headers..." is NOT a hang — the
+apt install just takes 2+ min and produces no log output during execution. The
+log line appears before apt runs, so it looks frozen. Check `ps auxf` inside
+the container to confirm apt/dpkg are actively running.
+
 ## Lessons
 
 - **Operator-style charts that auto-detect host OS can silently break
@@ -158,3 +204,9 @@ to-working state pending an upstream fix or kernel rollback.
   24.04 image on a 26.04 host), edit the NFD label — but only as a last
   resort; the chart upgrade made clear the operator will eventually
   reconcile this.
+- **A k8s-driver-manager deadlock on a stuck Terminating validator pod is
+  indistinguishable from an apt hang** — `ps auxf` inside the container is
+  the key diagnostic. Force-deleting a stuck Terminating pod with no
+  finalizers is safe and immediately resolves the deadlock.
+- **Driver startup probe must be sized for the full install wall-clock time**,
+  not just apt or just compilation. On slow hardware, 21 min is tight.
diff --git a/stacks/nvidia/modules/nvidia/values.yaml b/stacks/nvidia/modules/nvidia/values.yaml
index 7e345e59..f8cb7677 100644
--- a/stacks/nvidia/modules/nvidia/values.yaml
+++ b/stacks/nvidia/modules/nvidia/values.yaml
@@ -41,6 +41,16 @@ driver:
     limits:
       memory: "2Gi"
 
+  # 2026-05-25: extended startup probe from 120 to 300 failures.
+  # On k8s-node1 (6 vCPUs, 16Gi RAM, Ubuntu 24.04 + 6.8.0-117-generic),
+  # the full driver install sequence — apt install linux-headers (~2min) +
+  # gcc make -j16 kernel module compilation (~12min) + nvidia-installer
+  # file copy (~7min) = ~21min total, which exactly exhausted the default
+  # 120×10s=20min window (exit 137 = SIGKILL from startup probe).
+  # 300×10s = 50min gives 2.5× headroom on this hardware.
+  startupProbe:
+    failureThreshold: 300
+
   devicePlugin:
     config:
       name: time-slicing-config