k8s-upgrade: reconcile kubeadm-config OIDC drift that crash-looped the v1.35 apiserver upgrade
Last night's autonomous 1.34->1.35 run reached the master control-plane phase for the first time (preflight passed, etcd snapshot taken, etcd upgraded), then the kube-apiserver upgrade to v1.35.6 crash-looped and kubeadm auto-rolled-back to 1.34.9. The cluster stayed healthy but the master was left cordoned and the chain wedged on in_flight. Root cause: kubeadm upgrade regenerates the apiserver static-pod manifest from the kubeadm-config ConfigMap. apiserver auth was switched on 2026-06-19 to a structured multi-issuer --authentication-config (kubectl + dashboard SSO), but kubeadm-config still carried the legacy single-issuer --oidc-* extraArgs, so the regenerated manifest reverted structured auth and the new apiserver crash-looped. Proven via `kubeadm upgrade diff`. The existing post-upgrade OIDC restore step never ran because the upgrade itself never succeeded. Fix: - rbac/apiserver-oidc.tf: the remote script now also reconciles kubeadm-config (kubeadm init phase upload-config: drop --oidc-*, add --authentication-config) so a future kubeadm upgrade regenerates a correct manifest. Delivered to the cluster via the apiserver-oidc-restore ConfigMap the chain re-runs (CI needs no ssh key); trigger deliberately not script-hashed since CI cannot ssh. - k8s-version-upgrade/upgrade-step.sh: new preflight gate runs `kubeadm upgrade diff` and BLOCKS+alerts (never drains the master) if --authentication-config would still be dropped. - Post-mortem + runbook updated. The live kubeadm-config was reconciled directly on the master and verified (`kubeadm upgrade diff` now shows only the control-plane image bump), so tonight's run can complete the 1.34->1.35 upgrade. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
c6bba1da6e
commit
60a1cb9a25
4 changed files with 218 additions and 20 deletions
|
|
@ -416,6 +416,25 @@ phase_preflight() {
|
|||
fi
|
||||
fi
|
||||
|
||||
# 4b. apiserver-OIDC drift gate (backstop for the rbac stack's kubeadm-config
|
||||
# reconciliation). A `kubeadm upgrade` REGENERATES the apiserver manifest from
|
||||
# kubeadm-config; if kubeadm-config still carries the legacy single-issuer
|
||||
# --oidc-* args instead of --authentication-config, the regenerated apiserver
|
||||
# reverts structured multi-issuer auth and CRASH-LOOPS — stalling the chain
|
||||
# mid-flight with the master cordoned and etcd already bumped (the 2026-06-24
|
||||
# v1.35 stall; docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md).
|
||||
# `kubeadm upgrade diff` shows exactly what the manifest regen will change; a
|
||||
# '-' line dropping --authentication-config means the drift is still present.
|
||||
# Skip on an at-target master (resume — no apiserver regen). Best-effort: blocks
|
||||
# only on a POSITIVE drift signal, never merely because diff is unavailable.
|
||||
if [ "$master_kubelet_v" != "$TARGET_VERSION" ]; then
|
||||
local apiserver_diff
|
||||
apiserver_diff=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" "sudo kubeadm upgrade diff v$TARGET_VERSION 2>/dev/null" || true)
|
||||
if echo "$apiserver_diff" | grep -qE '^-[[:space:]].*--authentication-config'; then
|
||||
block "kubeadm upgrade would DROP --authentication-config from kube-apiserver (kubeadm-config OIDC drift → apiserver crash-loop). Re-apply the rbac stack (apiserver-oidc.tf reconciles kubeadm-config), then retry. Master NOT drained."
|
||||
fi
|
||||
fi
|
||||
|
||||
# 5. Push in-flight + started_timestamp metrics + ns annotations
|
||||
$KUBECTL annotate ns "$NS" \
|
||||
"viktorbarzin.me/k8s-upgrade-in-flight=$(date -u +%FT%TZ)" \
|
||||
|
|
|
|||
|
|
@ -10,16 +10,26 @@
|
|||
# match the existing RBAC subjects (kind: User, name: <raw email>; group names
|
||||
# verbatim). Do NOT add a prefix or existing bindings break.
|
||||
#
|
||||
# DRIFT WARNING: this edits the kube-apiserver static-pod manifest on the single
|
||||
# master. A `kubeadm upgrade` regenerates that manifest and DROPS this flag (this
|
||||
# is exactly how OIDC silently broke before — the flag was wiped and the
|
||||
# content-hash trigger never re-fired). After any k8s control-plane upgrade,
|
||||
# re-apply the rbac stack to restore apiserver OIDC. See
|
||||
# docs/plans/2026-06-04-k8s-dashboard-sso-design.md.
|
||||
# DRIFT WARNING (and how it's now handled): apiserver auth lives in THREE places
|
||||
# that must stay in sync, because a `kubeadm upgrade` REGENERATES the static-pod
|
||||
# manifest from kubeadm-config:
|
||||
# 1. /etc/kubernetes/pki/auth-config.yaml — the structured authn file
|
||||
# 2. the live kube-apiserver static-pod manifest — references it via the flag
|
||||
# 3. the kubeadm-config ClusterConfiguration CM — what kubeadm regenerates from
|
||||
# Originally only (1)+(2) were managed, so every kubeadm upgrade rewrote the
|
||||
# manifest from the STALE CM, reverting --authentication-config to single-issuer
|
||||
# --oidc-* flags. On k8s 1.35 that regenerated apiserver CRASH-LOOPED and stalled
|
||||
# the whole upgrade mid-flight (master cordoned, etcd already bumped) — see
|
||||
# docs/post-mortems/2026-06-24-kubeadm-oidc-drift-apiserver-upgrade-stall.md. The
|
||||
# remote script below now ALSO reconciles (3) via `kubeadm init phase
|
||||
# upload-config`, so a future kubeadm upgrade regenerates a CORRECT manifest. The
|
||||
# k8s-version-upgrade chain additionally GATES on `kubeadm upgrade diff` in
|
||||
# preflight and blocks+alerts if --authentication-config would still be dropped.
|
||||
#
|
||||
# SAFETY: the remote script health-gates on /livez and AUTO-ROLLS-BACK the
|
||||
# manifest from a timestamped backup if the apiserver does not recover, so a
|
||||
# malformed config cannot leave the single master down.
|
||||
# malformed config cannot leave the single master down. Reconciling kubeadm-config
|
||||
# is zero-impact on the running cluster (the CM is only read during an upgrade).
|
||||
|
||||
variable "k8s_master_host" {
|
||||
type = string
|
||||
|
|
@ -97,6 +107,40 @@ locals {
|
|||
print('flag-inserted' if done else 'ANCHOR-NOT-FOUND')
|
||||
PY
|
||||
|
||||
# Reconciles the kubeadm-config ClusterConfiguration's apiServer.extraArgs:
|
||||
# drops the stale single-issuer --oidc-* args and ensures --authentication-config
|
||||
# is present (anchored after --authorization-mode). Stdlib-only (the master is
|
||||
# only guaranteed python3, not pyyaml/yq). Idempotent; preserves all other
|
||||
# fields (etcd args, audit args, extraVolumes) verbatim. Exits 3 if the
|
||||
# authorization-mode anchor is missing (fail loud, leave the CM untouched).
|
||||
kubeadm_oidc_reconcile_py = <<-PY
|
||||
import sys
|
||||
lines = sys.stdin.read().split('\n')
|
||||
out, i, n = [], 0, len(lines)
|
||||
have_authn = any('name: authentication-config' in l for l in lines)
|
||||
inserted = have_authn
|
||||
while i < n:
|
||||
ln = lines[i]; s = ln.strip()
|
||||
if s.startswith('- name: oidc-'):
|
||||
i += 2 if (i + 1 < n and lines[i + 1].strip().startswith('value:')) else 1
|
||||
continue
|
||||
out.append(ln)
|
||||
if (not inserted) and s == '- name: authorization-mode':
|
||||
indent = ln[:len(ln) - len(ln.lstrip())]
|
||||
if i + 1 < n and lines[i + 1].strip().startswith('value:'):
|
||||
out.append(lines[i + 1]); i += 2
|
||||
else:
|
||||
i += 1
|
||||
out.append(indent + '- name: authentication-config')
|
||||
out.append(indent + ' value: /etc/kubernetes/pki/auth-config.yaml')
|
||||
inserted = True
|
||||
continue
|
||||
i += 1
|
||||
if not inserted:
|
||||
sys.stderr.write('ANCHOR-NOT-FOUND: authorization-mode\n'); sys.exit(3)
|
||||
sys.stdout.write('\n'.join(out))
|
||||
PY
|
||||
|
||||
# Whole remote operation, base64-embedded for byte-exact transfer (no
|
||||
# heredoc/escaping hazards across SSH).
|
||||
apiserver_auth_remote_script = <<-SH
|
||||
|
|
@ -137,6 +181,30 @@ locals {
|
|||
echo "rolled back to previous manifest"; exit 1
|
||||
fi
|
||||
echo "kube-apiserver healthy with multi-issuer --authentication-config"
|
||||
|
||||
# 5. Reconcile kubeadm-config so a FUTURE `kubeadm upgrade` regenerates the
|
||||
# apiserver manifest WITH --authentication-config instead of reverting to
|
||||
# the stale single-issuer --oidc-* flags. Without this, kubeadm rewrote the
|
||||
# manifest from kubeadm-config on every control-plane upgrade and the
|
||||
# regenerated apiserver crash-looped (the 2026-06-24 v1.35 upgrade stall).
|
||||
# Zero live impact (the CM is only read at upgrade time); idempotent;
|
||||
# best-effort (the chain's `kubeadm upgrade diff` preflight gate is the
|
||||
# backstop if this cannot run).
|
||||
KC="sudo kubectl --kubeconfig /etc/kubernetes/admin.conf"
|
||||
CC=$($KC -n kube-system get cm kubeadm-config -o jsonpath='{.data.ClusterConfiguration}' 2>/dev/null || true)
|
||||
if [ -n "$CC" ] && { echo "$CC" | grep -q 'oidc-issuer-url' || ! echo "$CC" | grep -q 'authentication-config'; }; then
|
||||
echo "Reconciling kubeadm-config (oidc-* -> authentication-config) so kubeadm upgrade keeps structured auth"
|
||||
echo '${base64encode(local.kubeadm_oidc_reconcile_py)}' | base64 -d > /tmp/reconcile_kubeadm_oidc.py
|
||||
if printf '%s' "$CC" | python3 /tmp/reconcile_kubeadm_oidc.py > /tmp/kubeadm-cc-new.yaml \
|
||||
&& sudo kubeadm init phase upload-config kubeadm --config /tmp/kubeadm-cc-new.yaml; then
|
||||
echo "kubeadm-config reconciled: future control-plane upgrades keep --authentication-config"
|
||||
else
|
||||
echo "WARN: kubeadm-config reconcile failed; the upgrade-chain preflight gate will block the next upgrade"
|
||||
fi
|
||||
rm -f /tmp/reconcile_kubeadm_oidc.py /tmp/kubeadm-cc-new.yaml
|
||||
else
|
||||
echo "kubeadm-config already uses --authentication-config (no oidc drift)"
|
||||
fi
|
||||
SH
|
||||
}
|
||||
|
||||
|
|
@ -155,6 +223,14 @@ resource "null_resource" "apiserver_oidc_config" {
|
|||
}
|
||||
|
||||
triggers = {
|
||||
# Intentionally hash ONLY the issuer config, NOT the remote script. CI applies
|
||||
# the rbac stack with no ssh_private_key (var defaults to ""), so a re-run of
|
||||
# this SSH provisioner in CI would fail — hence the null_resource must stay a
|
||||
# no-op on a plain CI apply. Script changes (e.g. the 2026-06-24 kubeadm-config
|
||||
# reconciliation) reach the cluster via the apiserver-oidc-restore ConfigMap
|
||||
# below (a plain k8s resource, no ssh) which the upgrade chain re-runs. To force
|
||||
# this provisioner to re-run after a script change, apply locally with
|
||||
# `-replace` + TF_VAR_ssh_private_key (see docs/runbooks/k8s-version-upgrade.md).
|
||||
auth_config = sha256(local.apiserver_auth_config_yaml)
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue