k8s-version-upgrade: version-check uses oldest kubelet, not master

Previous version-check read RUNNING from .items[0].nodeInfo.kubeletVersion
— which is just k8s-master. If master is upgraded but workers aren't
(e.g. a chain that completed master phase but failed mid-worker), the
version-check sees v1.34.8 and decides "no upgrade needed", never
spawning the resume phase. Workers stay behind forever.

Today's chain hit exactly this: master + node4 upgraded to v1.34.8,
worker-node4 Failed mid-soak (alert sensitivity, since loosened),
chain dead. Re-triggering the version-check looked at master only,
decided cluster was "done", and refused to resume worker chain.

Fix: read all node kubelet versions, sort -V, take head -1 (oldest).
A partial chain now correctly reports the un-upgraded version and the
chain resumes.

Trivial change; tested live — chain now correctly reports v1.34.7
(workers' version) and spawns preflight → master → worker chain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-23 09:48:50 +00:00
parent 68f8514e61
commit a0f3e15562

View file

@ -370,11 +370,17 @@ resource "kubernetes_cron_job_v1" "k8s_version_check" {
exit 0 exit 0
fi fi
# 1. Detect running version # 1. Detect running version use the OLDEST kubelet across
# all nodes so partial chains (e.g. master upgraded but
# workers still pending) don't trick the chain into
# thinking the upgrade is complete. Was `.items[0]` (master
# only) which made the chain skip when workers were behind.
# Fixed 2026-05-23 after node4-only chain failure.
RUNNING=$(/usr/local/bin/kubectl get nodes \ RUNNING=$(/usr/local/bin/kubectl get nodes \
-o jsonpath='{.items[0].status.nodeInfo.kubeletVersion}' | tr -d v) -o jsonpath='{range .items[*]}{.status.nodeInfo.kubeletVersion}{"\n"}{end}' \
| tr -d v | sort -V | head -1)
RUNNING_MINOR=$(echo "$RUNNING" | awk -F. '{print $1"."$2}') RUNNING_MINOR=$(echo "$RUNNING" | awk -F. '{print $1"."$2}')
echo "Running version: v$RUNNING (minor $RUNNING_MINOR)" echo "Running version (oldest kubelet): v$RUNNING (minor $RUNNING_MINOR)"
# 2. Latest patch within current minor (refresh master's apt cache) # 2. Latest patch within current minor (refresh master's apt cache)
LATEST_PATCH=$($SSH wizard@k8s-master.viktorbarzin.lan \ LATEST_PATCH=$($SSH wizard@k8s-master.viktorbarzin.lan \