k8s-upgrade: reclaim+auto-prune kubeadm /etc/kubernetes/tmp leak; correct crash root cause to etcd IO (not OIDC)
Some checks failed
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline failed

Digging into "why did the apiserver crash" disproved the earlier OIDC
explanation. An isolated v1.35.6 apiserver repro with authentik reachable
initialises OIDC cleanly (oidc.go:313, no error) and runs fine — so the
--authentication-config -> --oidc-* revert is NOT what crashed it. etcd's
surviving crash-window log is the real cause: 1180 "apply request took too long"
warnings in 16 min, individual applies up to 4.3s (healthy <100ms) right as
kubeadm tried to bring up the new apiserver. That's etcd IO starvation on the
shared sdc HDD (beads code-oflt).

A big contributor + the reason master root fs sat at 73%: kubeadm dumps a full
~400MB etcd DB backup into /etc/kubernetes/tmp/kubeadm-backup-etcd-<ts>/ before
every etcd upgrade and never cleans it up — 145 dirs / 28GB had accumulated,
driving image-GC churn and extra write-IO onto etcd's spindle. Reclaimed live
(73% -> 23%) and added a preflight prune (>3 days) so it can't re-accumulate.

Also corrected the OIDC handling: the kubeadm-config drift is real but only
breaks dashboard/kubectl SSO AFTER a successful upgrade (recoverable via the
chain's restore.sh + the kubeadm-config reconciliation) — it does not crash the
apiserver. So the preflight check is now an ALERT, not a block (was added on the
wrong hypothesis). Post-mortem, runbook, and apiserver-oidc.tf header corrected.

Per Viktor: reclaim the disk and automate so the manual cleanup never recurs;
the durable IO fix remains code-oflt (etcd off the shared HDD).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-25 15:23:15 +00:00
parent 60a1cb9a25
commit 9c68d147e0
4 changed files with 112 additions and 87 deletions

View file

@ -416,25 +416,39 @@ phase_preflight() {
fi
fi
# 4b. apiserver-OIDC drift gate (backstop for the rbac stack's kubeadm-config
# 4b. apiserver-OIDC drift check (backstop for the rbac stack's kubeadm-config
# reconciliation). A `kubeadm upgrade` REGENERATES the apiserver manifest from
# kubeadm-config; if kubeadm-config still carries the legacy single-issuer
# --oidc-* args instead of --authentication-config, the regenerated apiserver
# reverts structured multi-issuer auth and CRASH-LOOPS — stalling the chain
# mid-flight with the master cordoned and etcd already bumped (the 2026-06-24
# v1.35 stall; docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md).
# `kubeadm upgrade diff` shows exactly what the manifest regen will change; a
# '-' line dropping --authentication-config means the drift is still present.
# Skip on an at-target master (resume — no apiserver regen). Best-effort: blocks
# only on a POSITIVE drift signal, never merely because diff is unavailable.
# loses structured multi-issuer auth → kubectl + dashboard SSO break AFTER the
# upgrade. This is RECOVERABLE (the apiserver does NOT crash — verified by an
# isolated repro 2026-06-24; the chain's post-master restore.sh re-adds the flag,
# and the rbac stack reconciles kubeadm-config so it won't recur) — so this is an
# ALERT, not a block. (NB the 2026-06-24 stall was NOT this — it was etcd IO
# starvation; see docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md.)
# Skip on an at-target master (resume — no apiserver regen).
if [ "$master_kubelet_v" != "$TARGET_VERSION" ]; then
local apiserver_diff
apiserver_diff=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" "sudo kubeadm upgrade diff v$TARGET_VERSION 2>/dev/null" || true)
if echo "$apiserver_diff" | grep -qE '^-[[:space:]].*--authentication-config'; then
block "kubeadm upgrade would DROP --authentication-config from kube-apiserver (kubeadm-config OIDC drift → apiserver crash-loop). Re-apply the rbac stack (apiserver-oidc.tf reconciles kubeadm-config), then retry. Master NOT drained."
slack "WARN preflight — kubeadm upgrade will DROP --authentication-config (kubeadm-config OIDC drift). SSO breaks post-upgrade until restore.sh re-adds it; re-apply the rbac stack to reconcile kubeadm-config. Proceeding (recoverable, not a crash)."
fi
fi
# 4c. Reclaim kubeadm scratch on master. `kubeadm upgrade apply` dumps a full
# ~400MB etcd DB backup into /etc/kubernetes/tmp/kubeadm-backup-etcd-<ts>/ before
# every etcd upgrade and NEVER cleans it up — 145 dirs / 28GB had accumulated by
# 2026-06-24, pushing master root fs to 73% (image-GC churn + extra write IO on
# the shared HDD where etcd lives — a contributor to the etcd IO starvation that
# stalled that run, see post-mortem). Real etcd backups go to NFS, so these are
# throwaway. Prune ones >3 days old (keeps a short rollback window). Best-effort;
# never aborts the chain.
if [ "$master_kubelet_v" != "$TARGET_VERSION" ]; then
ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" \
"sudo find /etc/kubernetes/tmp -maxdepth 1 -type d \( -name 'kubeadm-backup-*' -o -name 'kubeadm-upgraded-manifests*' \) -mtime +3 -exec rm -rf {} + 2>/dev/null; echo -n 'master root after prune: '; df -h / | awk 'NR==2{print \$5\" used, \"\$4\" free\"}'" \
|| echo "kubeadm-scratch prune skipped (ssh/df failed) — non-fatal"
fi
# 5. Push in-flight + started_timestamp metrics + ns annotations
$KUBECTL annotate ns "$NS" \
"viktorbarzin.me/k8s-upgrade-in-flight=$(date -u +%FT%TZ)" \

View file

@ -18,13 +18,16 @@
# 3. the kubeadm-config ClusterConfiguration CM what kubeadm regenerates from
# Originally only (1)+(2) were managed, so every kubeadm upgrade rewrote the
# manifest from the STALE CM, reverting --authentication-config to single-issuer
# --oidc-* flags. On k8s 1.35 that regenerated apiserver CRASH-LOOPED and stalled
# the whole upgrade mid-flight (master cordoned, etcd already bumped) see
# docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md. The
# --oidc-* flags. The consequence is SSO breakage AFTER the upgrade: kubectl +
# dashboard lose multi-issuer auth (the apiserver does NOT crash on this verified
# by an isolated repro 2026-06-24; the 2026-06-24 v1.35 upgrade *stall* was a
# separate etcd IO-starvation issue, see
# docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md). The
# remote script below now ALSO reconciles (3) via `kubeadm init phase
# upload-config`, so a future kubeadm upgrade regenerates a CORRECT manifest. The
# k8s-version-upgrade chain additionally GATES on `kubeadm upgrade diff` in
# preflight and blocks+alerts if --authentication-config would still be dropped.
# k8s-version-upgrade chain additionally ALERTS (does not block SSO drift is
# recoverable) via `kubeadm upgrade diff` in preflight if --authentication-config
# would still be dropped.
#
# SAFETY: the remote script health-gates on /livez and AUTO-ROLLS-BACK the
# manifest from a timestamped backup if the apiserver does not recover, so a