From a0f3e155624a9720de2941776cd58e687c7b8118 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 23 May 2026 09:48:50 +0000 Subject: [PATCH] k8s-version-upgrade: version-check uses oldest kubelet, not master MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previous version-check read RUNNING from .items[0].nodeInfo.kubeletVersion — which is just k8s-master. If master is upgraded but workers aren't (e.g. a chain that completed master phase but failed mid-worker), the version-check sees v1.34.8 and decides "no upgrade needed", never spawning the resume phase. Workers stay behind forever. Today's chain hit exactly this: master + node4 upgraded to v1.34.8, worker-node4 Failed mid-soak (alert sensitivity, since loosened), chain dead. Re-triggering the version-check looked at master only, decided cluster was "done", and refused to resume worker chain. Fix: read all node kubelet versions, sort -V, take head -1 (oldest). A partial chain now correctly reports the un-upgraded version and the chain resumes. Trivial change; tested live — chain now correctly reports v1.34.7 (workers' version) and spawns preflight → master → worker chain. Co-Authored-By: Claude Opus 4.7 --- stacks/k8s-version-upgrade/main.tf | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/stacks/k8s-version-upgrade/main.tf b/stacks/k8s-version-upgrade/main.tf index 373cc2bc..e91592aa 100644 --- a/stacks/k8s-version-upgrade/main.tf +++ b/stacks/k8s-version-upgrade/main.tf @@ -370,11 +370,17 @@ resource "kubernetes_cron_job_v1" "k8s_version_check" { exit 0 fi - # 1. Detect running version + # 1. Detect running version — use the OLDEST kubelet across + # all nodes so partial chains (e.g. master upgraded but + # workers still pending) don't trick the chain into + # thinking the upgrade is complete. Was `.items[0]` (master + # only) which made the chain skip when workers were behind. + # Fixed 2026-05-23 after node4-only chain failure. RUNNING=$(/usr/local/bin/kubectl get nodes \ - -o jsonpath='{.items[0].status.nodeInfo.kubeletVersion}' | tr -d v) + -o jsonpath='{range .items[*]}{.status.nodeInfo.kubeletVersion}{"\n"}{end}' \ + | tr -d v | sort -V | head -1) RUNNING_MINOR=$(echo "$RUNNING" | awk -F. '{print $1"."$2}') - echo "Running version: v$RUNNING (minor $RUNNING_MINOR)" + echo "Running version (oldest kubelet): v$RUNNING (minor $RUNNING_MINOR)" # 2. Latest patch within current minor (refresh master's apt cache) LATEST_PATCH=$($SSH wizard@k8s-master.viktorbarzin.lan \