k8s-version-upgrade: preflight skips kubeadm-plan gate when master already at target
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The autonomous 1.34.9 version-upgrade chain has been failing its preflight every night. A prior run left k8s-master + k8s-node1 on 1.34.9 while node2-6 stayed on 1.34.8, and preflight's gate-4 runs `kubeadm upgrade plan` on master. On an already-at-target master, kubeadm prints no "kubeadm upgrade apply vX.Y.Z" line, so the parsed target came back empty and the `!= requested` check aborted the whole chain before any worker was touched. Deterministic — it self-cleaned and re-failed identically each night, so it would have failed again tonight, leaving node2-6 stuck on the old patch. Fix: skip the kubeadm-plan-target gate when master is already on TARGET_VERSION — the same at-target self-skip that phase_master and phase_worker already do. The remaining workers are still validated by their own per-node phases, and the detector already confirmed the target is installable via apt-cache. This lets tonight's unattended chain resume and finish node2-6 -> 1.34.9. Runbook updated: node count 5 -> 7, the gate skip note, and a Past Incidents writeup (incl. the collateral apiserver OIDC wipe, restored via the rbac stack). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
8787d361dc
commit
70e217db24
2 changed files with 36 additions and 12 deletions
|
|
@ -356,13 +356,30 @@ phase_preflight() {
|
|||
# on a Keel-drifted CoreDNS (start version unsupported) and, under pipefail,
|
||||
# aborts this whole check. Ignore the two CoreDNS checks here too so plan
|
||||
# still emits its "kubeadm upgrade apply vX.Y.Z" line. (See update_k8s.sh.)
|
||||
local plan_target
|
||||
plan_target=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" 'sudo kubeadm upgrade plan --ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins' \
|
||||
| grep -oE 'kubeadm upgrade apply v[0-9]+\.[0-9]+\.[0-9]+' \
|
||||
| grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' | head -1 | tr -d v)
|
||||
if [ "$plan_target" != "$TARGET_VERSION" ]; then
|
||||
slack "ABORT preflight — kubeadm plan target $plan_target ≠ requested $TARGET_VERSION"
|
||||
exit 1
|
||||
#
|
||||
# SKIP this gate when k8s-master is ALREADY on TARGET_VERSION — a partial-chain
|
||||
# resume (master + earlier workers done, later workers still pending). `kubeadm
|
||||
# upgrade plan` run on an at-target master prints NO "kubeadm upgrade apply
|
||||
# vX.Y.Z" line, so the parse below yields an EMPTY plan_target and the `!=`
|
||||
# check aborts every run — even though the chain just needs to finish the
|
||||
# remaining workers (phase_master self-skips an at-target master the same way,
|
||||
# below). Confirmed root cause of the 1.34.9 preflight aborts (2026-06-18):
|
||||
# master was already on 1.34.9 while node2-6 lagged on 1.34.8, so every nightly
|
||||
# preflight died here with an empty `plan target ≠ requested 1.34.9`.
|
||||
local master_kubelet_v
|
||||
master_kubelet_v=$($KUBECTL get node k8s-master -o jsonpath='{.status.nodeInfo.kubeletVersion}' 2>/dev/null | tr -d v)
|
||||
if [ "$master_kubelet_v" = "$TARGET_VERSION" ]; then
|
||||
slack "preflight — k8s-master already on v$TARGET_VERSION; skipping kubeadm-plan-target gate (workers still pending)"
|
||||
echo "k8s-master already on v$TARGET_VERSION — skipping kubeadm-plan-target gate"
|
||||
else
|
||||
local plan_target
|
||||
plan_target=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" 'sudo kubeadm upgrade plan --ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins' \
|
||||
| grep -oE 'kubeadm upgrade apply v[0-9]+\.[0-9]+\.[0-9]+' \
|
||||
| grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' | head -1 | tr -d v)
|
||||
if [ "$plan_target" != "$TARGET_VERSION" ]; then
|
||||
slack "ABORT preflight — kubeadm plan target $plan_target ≠ requested $TARGET_VERSION"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# 5. Push in-flight + started_timestamp metrics + ns annotations
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue