diff --git a/docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md b/docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md
index 9a42d3ed..e6b11816 100644
--- a/docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md
+++ b/docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md
@@ -1,4 +1,8 @@
-# Post-mortem: kubeadm-config OIDC drift crash-looped the v1.35 apiserver upgrade (2026-06-24)
+# Post-mortem: k8s 1.34→1.35 upgrade stalled — etcd IO starvation (2026-06-24)
+
+> Filename kept for inbound links. The originally-suspected cause (kubeadm-config
+> OIDC drift) turned out **not** to be the crash — see "Correction" below. The OIDC
+> drift was a real *separate* latent bug fixed in the same change.
 
 **Impact:** The autonomous k8s-version-upgrade chain (23:00 UTC nightly) reached
 the master control-plane phase for the first time — preflight passed, etcd
@@ -6,85 +10,88 @@ snapshot taken, master cordoned + drained, etcd upgraded 3.6.5→3.6.6 — then
 kube-apiserver upgrade to v1.35.6 **crash-looped**. kubeadm waited its 5-minute
 static-pod-hash window across all internal retries, then auto-rolled-back to
 v1.34.9. The cluster stayed healthy on 1.34.9 (apiserver, all 7 nodes Ready), but
-the run left **k8s-master cordoned** and the chain **wedged on `in_flight=1`**
-(which correctly blocks subsequent runs). No data loss; no user-facing outage
-(the master carries control-plane taints, so no workloads were displaced).
+the run left **k8s-master cordoned** and the chain **wedged on `in_flight=1`**.
+No data loss; no user-facing outage (the master carries control-plane taints, so
+no workloads were displaced).
 
-**Trigger:** the first *minor* upgrade the chain ever attempted (1.34→1.35).
-Patch upgrades never hit this because the apiserver manifest content is identical
-across patches; a minor upgrade is the first time kubeadm regenerates the
-manifest with a new image.
+**Trigger:** the first *minor* upgrade the chain ever attempted (1.34→1.35) — the
+first time kubeadm upgrades etcd (3.6.5→3.6.6) and regenerates the control-plane
+static pods, i.e. the first time the upgrade pushes real write-IO at etcd.
 
-## Root cause
+## Root cause — etcd IO starvation on the shared HDD
 
-apiserver authentication was configured in **two** places that were allowed to
-drift from a **third**:
+The new kube-apiserver could not establish/keep a working connection to etcd
+during the upgrade because **etcd was IO-starved**. etcd's surviving container log
+from the crash window (`/var/log/pods/.../etcd/0.log`, 23:04–23:20 UTC) shows:
 
-1. `/etc/kubernetes/pki/auth-config.yaml` — a structured `AuthenticationConfiguration`
-   (apiserver.config.k8s.io/v1) carrying **two** JWT issuers (`kubernetes` for
-   kubectl/kubelogin + `k8s-dashboard` for the dashboard's oauth2-proxy), added
-   2026-06-19 (`docs/plans/2026-06-04-k8s-dashboard-sso-design.md`).
-2. the **live** kube-apiserver static-pod manifest — referenced it via
-   `--authentication-config=/etc/kubernetes/pki/auth-config.yaml`.
-3. the **kubeadm-config `ClusterConfiguration` ConfigMap** — still carried the
-   **legacy single-issuer `--oidc-*` extraArgs** (`oidc-issuer-url`,
-   `oidc-client-id`, `oidc-username-claim`, `oidc-groups-claim`). Never updated
-   when (1)+(2) switched to structured auth.
+- **1,180** `apply request took too long` warnings in 16 minutes;
+- individual applies of **4.3s / 2.9s / 2.7s / 1.8s** (healthy is <100ms),
+  clustered at **23:18:51 UTC** — exactly when kubeadm's final attempt was trying
+  to bring the new apiserver up.
 
-`kubeadm upgrade apply` **regenerates the static-pod manifests from
-kubeadm-config**. So it dropped `--authentication-config` and re-added the four
-`--oidc-*` flags. Proven by `kubeadm upgrade diff v1.35.6`:
+A reproduced 1.35.6 apiserver with no etcd dies with
+`F instance.go:233 Error creating leases: error creating storage factory: context
+deadline exceeded` — the same failure mode a multi-second etcd produces. etcd
+lives on the contended `sdc` HDD (**beads code-oflt**: "etcd/critical VM disks on
+shared sdc HDD — recurring IO-storm root cause"). The upgrade itself piled IO onto
+that spindle:
 
-```diff
--    - --authentication-config=/etc/kubernetes/pki/auth-config.yaml
-+    - --oidc-issuer-url=https://authentik.viktorbarzin.me/application/o/kubernetes/
-+    - --oidc-client-id=kubernetes
-+    - --oidc-username-claim=email
-+    - --oidc-groups-claim=groups
-```
+1. etcd's own upgrade-restart + WAL/db re-read (it restarted ~23:04, re-elected);
+2. kubeadm dumping a full **~400MB etcd DB backup** to
+   `/etc/kubernetes/tmp/kubeadm-backup-etcd-<ts>/` (on the same HDD) before the
+   etcd upgrade — and **145 of these had accumulated to 28GB** (kubeadm never
+   cleans them up), pushing master root fs to **73%**, above the 70% kubelet
+   image-GC threshold, so image GC churned during the drain too;
+3. master-drain pod evictions.
 
-The regenerated apiserver crash-looped (`CrashLoopBackOff`, `back-off 10s`, 8
-probe failures in the kubelet journal) — it exited within seconds, repeatedly, so
-kubeadm's hash-watch never saw a stable new pod and timed out → rollback. (The
-`--oidc-*` flags are NOT removed in 1.35; the crash is the auth-config swap in the
-live control-plane environment, the only functional delta in the diff. Image
-pull, etcd, OOM, and disk were all ruled out: all v1.35.6 images were pre-pulled,
-etcd upgraded cleanly, no OOM, master root disk at 73%.)
+### Correction — it was NOT the OIDC flag swap
 
-**Why the existing safety net missed it:** `stacks/rbac/modules/rbac/apiserver-oidc.tf`
-already *knew* kubeadm drops `--authentication-config` and published a
-`apiserver-oidc-restore` ConfigMap for the chain to re-run **after** the upgrade.
-But the apiserver crashes *during* `kubeadm upgrade apply`, which never returns
-success, so the post-upgrade restore step is never reached.
+`kubeadm upgrade diff v1.35.6` showed the regenerated manifest also swaps
+`--authentication-config` (structured multi-issuer OIDC) back to legacy
+single-issuer `--oidc-*` flags (kubeadm-config drift, see secondary finding). That
+was the *first* hypothesis — but an isolated repro of the 1.35.6 apiserver with
+those exact `--oidc-*` flags **and authentik reachable** initialised OIDC cleanly
+(`oidc.go:313`, no error) and ran fine until it hit the (deliberately dead) test
+etcd. So the auth swap does **not** crash the apiserver; it was a red herring for
+the crash. Image pull (all v1.35.6 images pre-pulled), OOM (none), and disk-full
+were also ruled out.
+
+## Secondary finding (real, fixed separately) — kubeadm-config OIDC drift
+
+apiserver auth is configured in three places that must agree:
+(1) `/etc/kubernetes/pki/auth-config.yaml` (structured, two issuers: `kubernetes`
++ `k8s-dashboard`, added 2026-06-19); (2) the live static-pod manifest
+(`--authentication-config`); (3) the kubeadm-config `ClusterConfiguration` CM —
+which still carried the legacy `--oidc-*` extraArgs. `kubeadm upgrade` regenerates
+the manifest from (3), so it would have reverted structured auth → **dashboard +
+kubectl SSO break after a successful upgrade** (recoverable: the chain's
+post-master `restore.sh` re-adds the flag). This is a real bug, just not the crash.
 
 ## Resolution
 
-1. **Reconciled kubeadm-config live** (2026-06-24, zero cluster impact — the CM is
-   only read during an upgrade): rewrote `apiServer.extraArgs` to drop the
-   `--oidc-*` args and add `--authentication-config`, via `kubeadm init phase
-   upload-config kubeadm`. `kubeadm upgrade diff v1.35.6` then showed **only** the
-   control-plane image bumps — no auth-flag changes.
-2. **Recovered:** uncordoned k8s-master, cleared the stuck `in_flight` gauge +
-   namespace annotation.
+1. **Reclaimed the 28GB kubeadm scratch** on master (`/etc/kubernetes/tmp/kubeadm-backup-*`) — root fs 73% → 23%.
+2. **Reconciled kubeadm-config live** (zero cluster impact — CM only read at upgrade time): dropped `--oidc-*`, added `--authentication-config` via `kubeadm init phase upload-config kubeadm`. `kubeadm upgrade diff` then shows only the control-plane image bumps.
+3. **Recovered:** uncordoned k8s-master, cleared the stuck `in_flight` gauge + annotation, deleted last night's Complete/Failed `1-35-6` phase jobs (a Complete preflight would otherwise make the detector idempotent-skip the re-run).
 
-## Prevention (all landed in this change)
+## Prevention (landed in this change)
 
 | Gap | Fix |
 |-----|-----|
-| kubeadm-config not managed alongside the live manifest | `apiserver-oidc.tf`'s remote script now **also** reconciles kubeadm-config (`kubeadm init phase upload-config`). It reaches the cluster two ways: the published `apiserver-oidc-restore` ConfigMap (a plain k8s resource — CI applies it with no ssh) which the chain's `phase_master` re-runs, and a local `-replace` apply with `TF_VAR_ssh_private_key`. (The null_resource trigger deliberately does NOT hash the script: CI has no ssh key, so it must stay a no-op on a plain CI apply.) |
-| The chain drained the master into a crash with no pre-check | new **preflight gate 4b** in `upgrade-step.sh`: runs `kubeadm upgrade diff v$TARGET` and `block`s (k8s_upgrade_blocked=1 → K8sUpgradeBlocked alert) BEFORE snapshot/in-flight/drain if a `-` line would drop `--authentication-config`. Fails safe — blocks only on a positive drift signal. |
-| The live fix had to be applied out-of-band (only `default` Vault policy on the workstation; CI can't ssh) | kubeadm-config reconciled live via `kubeadm init phase upload-config` on the master (2026-06-24); the committed code makes it durable for future upgrades. |
+| kubeadm leaks ~400MB etcd-DB backups into `/etc/kubernetes/tmp` forever (→ disk fills, image-GC churn, write-IO on etcd's spindle) | **`upgrade-step.sh` preflight now prunes** `/etc/kubernetes/tmp/kubeadm-backup-*` + `kubeadm-upgraded-manifests*` older than 3 days on master, every run. Best-effort, never aborts. |
+| kubeadm-config drift would silently break SSO after an upgrade | `apiserver-oidc.tf`'s remote script now **also reconciles kubeadm-config** (`kubeadm init phase upload-config`), delivered via the `apiserver-oidc-restore` ConfigMap the chain re-runs (CI needs no ssh) or a local `-replace` apply. Preflight **alerts** (not blocks — SSO drift is recoverable) if `kubeadm upgrade diff` would still drop `--authentication-config`. |
+| etcd on the contended `sdc` HDD starves under upgrade IO | **Durable fix is beads code-oflt** (move etcd/critical VM disks off `sdc`). Not in this change. Mitigations above reduce the upgrade's own IO; reclaimed disk removes the image-GC variable. |
 
 ## Lessons
 
-- **Out-of-band control-plane edits must be written back to kubeadm-config.**
-  Anything that edits a static-pod manifest directly (auth, admission, audit, API
-  flags) is silently reverted on the next `kubeadm upgrade` unless kubeadm-config
-  itself carries it. `kubeadm upgrade diff <target>` is the authoritative
-  pre-flight check for "what will the upgrade change?" and is non-mutating.
-- **A post-upgrade fixup can't repair something that breaks the upgrade itself.**
-  The restore-after-upgrade design assumed the apiserver would come up (degraded)
-  and be fixed afterward; it actually crash-looped, so the fix has to be in
-  kubeadm-config *before* `apply`, plus a preflight gate.
-- **Minor upgrades exercise manifest regeneration; patch upgrades don't.** First
-  minor bump is where this whole class of drift surfaces.
+- **Capture the failing component's own logs before concluding.** The `kubeadm
+  upgrade diff` made the OIDC swap look like the cause; only etcd's log (multi-second
+  applies) + an isolated apiserver repro showed the truth (etcd IO). A clean diff is
+  "what config changes," not "why it crashed."
+- **etcd on shared HDD is the cluster's recurring fragility** (immich IO storm
+  2026-05-25, this stall). Upgrades concentrate IO (etcd restart + kubeadm's 400MB
+  backup copy + drain) onto that spindle. code-oflt is the real fix.
+- **Tools that leave per-operation scratch must be reaped.** kubeadm's
+  `/etc/kubernetes/tmp` etcd backups are throwaway (real backups → NFS) but never
+  GC'd; 28GB had silently accumulated.
+- **Out-of-band control-plane edits must be written back to kubeadm-config** — else
+  `kubeadm upgrade` silently reverts them (here: SSO; could be admission/audit/API flags).
diff --git a/docs/runbooks/k8s-version-upgrade.md b/docs/runbooks/k8s-version-upgrade.md
index 86d1d31e..021c588f 100644
--- a/docs/runbooks/k8s-version-upgrade.md
+++ b/docs/runbooks/k8s-version-upgrade.md
@@ -41,7 +41,8 @@ Job 0 — preflight       (pinned: k8s-node1)
   ├── halt-on-alert (kured-style ignore-list)
   ├── 24h-quiet baseline (no Ready transitions <24h ago)
   ├── kubeadm upgrade plan matches target (skipped when master already at target — partial-resume)
-  ├── apiserver-OIDC drift gate: kubeadm upgrade diff must NOT drop --authentication-config (else BLOCK+alert)
+  ├── apiserver-OIDC drift check: kubeadm upgrade diff drops --authentication-config? → Slack WARN (recoverable; not a block)
+  ├── reclaim kubeadm scratch: prune /etc/kubernetes/tmp/kubeadm-backup-* >3d on master (kubeadm leaks ~400MB etcd-db backups)
   ├── Push k8s_upgrade_in_flight=1, k8s_upgrade_started_timestamp=$(date +%s)
   ├── Trigger backup-etcd Job, wait, verify snapshot byte count
   ├── SSH master: containerd skew fix (if master < workers)
@@ -229,10 +230,10 @@ Exposed in K8s via ExternalSecret `k8s-upgrade-creds` in the `k8s-upgrade` names
 from kubeadm-config**. apiserver auth uses a structured multi-issuer
 `--authentication-config` (kubectl + dashboard SSO), but kubeadm-config used to
 still carry the legacy single-issuer `--oidc-*` extraArgs — so every upgrade
-reverted the flag. On the **1.34→1.35** bump that regenerated apiserver
-**crash-looped and stalled the whole upgrade mid-flight** (master cordoned, etcd
-already bumped); the post-upgrade restore below never ran because `kubeadm
-upgrade apply` itself never returned success. Post-mortem:
+reverted the flag, **silently breaking SSO after the upgrade** (the apiserver does
+NOT crash on this — verified by isolated repro; it's recoverable via the restore
+script below). NB: the **1.34→1.35 stall on 2026-06-24 was a *separate* issue —
+etcd IO starvation**, not this drift; post-mortem:
 `docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md`.
 
 **Primary fix (2026-06-24):** `stacks/rbac/modules/rbac/apiserver-oidc.tf` now
@@ -243,9 +244,9 @@ upgrades with a pure image bump — `kubeadm upgrade diff <target>` shows only t
 image change. Zero live impact (the CM is read only during an upgrade).
 
 **Backstops:**
-- **Preflight gate 4b** runs `kubeadm upgrade diff` and BLOCKs (k8s_upgrade_blocked=1
-  → alert) BEFORE draining the master if `--authentication-config` would still be
-  dropped — so this can never again drain into a crash.
+- **Preflight check 4b** runs `kubeadm upgrade diff` and **alerts** (Slack WARN, does
+  NOT block — the drift only breaks SSO, which is recoverable) if
+  `--authentication-config` would still be dropped.
 - The `rbac` stack still publishes its restore script to the
   `kube-system/apiserver-oidc-restore` ConfigMap, and `phase_master` re-runs it on
   master right after `kubeadm upgrade apply` (idempotent, `/livez`-gated with
diff --git a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh
index 5bafd241..d393862c 100644
--- a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh
+++ b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh
@@ -416,25 +416,39 @@ phase_preflight() {
     fi
   fi
 
-  # 4b. apiserver-OIDC drift gate (backstop for the rbac stack's kubeadm-config
+  # 4b. apiserver-OIDC drift check (backstop for the rbac stack's kubeadm-config
   # reconciliation). A `kubeadm upgrade` REGENERATES the apiserver manifest from
   # kubeadm-config; if kubeadm-config still carries the legacy single-issuer
   # --oidc-* args instead of --authentication-config, the regenerated apiserver
-  # reverts structured multi-issuer auth and CRASH-LOOPS — stalling the chain
-  # mid-flight with the master cordoned and etcd already bumped (the 2026-06-24
-  # v1.35 stall; docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md).
-  # `kubeadm upgrade diff` shows exactly what the manifest regen will change; a
-  # '-' line dropping --authentication-config means the drift is still present.
-  # Skip on an at-target master (resume — no apiserver regen). Best-effort: blocks
-  # only on a POSITIVE drift signal, never merely because diff is unavailable.
+  # loses structured multi-issuer auth → kubectl + dashboard SSO break AFTER the
+  # upgrade. This is RECOVERABLE (the apiserver does NOT crash — verified by an
+  # isolated repro 2026-06-24; the chain's post-master restore.sh re-adds the flag,
+  # and the rbac stack reconciles kubeadm-config so it won't recur) — so this is an
+  # ALERT, not a block. (NB the 2026-06-24 stall was NOT this — it was etcd IO
+  # starvation; see docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md.)
+  # Skip on an at-target master (resume — no apiserver regen).
   if [ "$master_kubelet_v" != "$TARGET_VERSION" ]; then
     local apiserver_diff
     apiserver_diff=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" "sudo kubeadm upgrade diff v$TARGET_VERSION 2>/dev/null" || true)
     if echo "$apiserver_diff" | grep -qE '^-[[:space:]].*--authentication-config'; then
-      block "kubeadm upgrade would DROP --authentication-config from kube-apiserver (kubeadm-config OIDC drift → apiserver crash-loop). Re-apply the rbac stack (apiserver-oidc.tf reconciles kubeadm-config), then retry. Master NOT drained."
+      slack "WARN preflight — kubeadm upgrade will DROP --authentication-config (kubeadm-config OIDC drift). SSO breaks post-upgrade until restore.sh re-adds it; re-apply the rbac stack to reconcile kubeadm-config. Proceeding (recoverable, not a crash)."
     fi
   fi
 
+  # 4c. Reclaim kubeadm scratch on master. `kubeadm upgrade apply` dumps a full
+  # ~400MB etcd DB backup into /etc/kubernetes/tmp/kubeadm-backup-etcd-<ts>/ before
+  # every etcd upgrade and NEVER cleans it up — 145 dirs / 28GB had accumulated by
+  # 2026-06-24, pushing master root fs to 73% (image-GC churn + extra write IO on
+  # the shared HDD where etcd lives — a contributor to the etcd IO starvation that
+  # stalled that run, see post-mortem). Real etcd backups go to NFS, so these are
+  # throwaway. Prune ones >3 days old (keeps a short rollback window). Best-effort;
+  # never aborts the chain.
+  if [ "$master_kubelet_v" != "$TARGET_VERSION" ]; then
+    ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" \
+      "sudo find /etc/kubernetes/tmp -maxdepth 1 -type d \( -name 'kubeadm-backup-*' -o -name 'kubeadm-upgraded-manifests*' \) -mtime +3 -exec rm -rf {} + 2>/dev/null; echo -n 'master root after prune: '; df -h / | awk 'NR==2{print \$5\" used, \"\$4\" free\"}'" \
+      || echo "kubeadm-scratch prune skipped (ssh/df failed) — non-fatal"
+  fi
+
   # 5. Push in-flight + started_timestamp metrics + ns annotations
   $KUBECTL annotate ns "$NS" \
     "viktorbarzin.me/k8s-upgrade-in-flight=$(date -u +%FT%TZ)" \
diff --git a/stacks/rbac/modules/rbac/apiserver-oidc.tf b/stacks/rbac/modules/rbac/apiserver-oidc.tf
index a5ad08c8..f2d9b4be 100644
--- a/stacks/rbac/modules/rbac/apiserver-oidc.tf
+++ b/stacks/rbac/modules/rbac/apiserver-oidc.tf
@@ -18,13 +18,16 @@
 #   3. the kubeadm-config ClusterConfiguration CM   — what kubeadm regenerates from
 # Originally only (1)+(2) were managed, so every kubeadm upgrade rewrote the
 # manifest from the STALE CM, reverting --authentication-config to single-issuer
-# --oidc-* flags. On k8s 1.35 that regenerated apiserver CRASH-LOOPED and stalled
-# the whole upgrade mid-flight (master cordoned, etcd already bumped) — see
-# docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md. The
+# --oidc-* flags. The consequence is SSO breakage AFTER the upgrade: kubectl +
+# dashboard lose multi-issuer auth (the apiserver does NOT crash on this — verified
+# by an isolated repro 2026-06-24; the 2026-06-24 v1.35 upgrade *stall* was a
+# separate etcd IO-starvation issue, see
+# docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md). The
 # remote script below now ALSO reconciles (3) via `kubeadm init phase
 # upload-config`, so a future kubeadm upgrade regenerates a CORRECT manifest. The
-# k8s-version-upgrade chain additionally GATES on `kubeadm upgrade diff` in
-# preflight and blocks+alerts if --authentication-config would still be dropped.
+# k8s-version-upgrade chain additionally ALERTS (does not block — SSO drift is
+# recoverable) via `kubeadm upgrade diff` in preflight if --authentication-config
+# would still be dropped.
 #
 # SAFETY: the remote script health-gates on /livez and AUTO-ROLLS-BACK the
 # manifest from a timestamped backup if the apiserver does not recover, so a