infra/stacks/k8s-version-upgrade
Viktor Barzin a0f3e15562 k8s-version-upgrade: version-check uses oldest kubelet, not master
Previous version-check read RUNNING from .items[0].nodeInfo.kubeletVersion
— which is just k8s-master. If master is upgraded but workers aren't
(e.g. a chain that completed master phase but failed mid-worker), the
version-check sees v1.34.8 and decides "no upgrade needed", never
spawning the resume phase. Workers stay behind forever.

Today's chain hit exactly this: master + node4 upgraded to v1.34.8,
worker-node4 Failed mid-soak (alert sensitivity, since loosened),
chain dead. Re-triggering the version-check looked at master only,
decided cluster was "done", and refused to resume worker chain.

Fix: read all node kubelet versions, sort -V, take head -1 (oldest).
A partial chain now correctly reports the un-upgraded version and the
chain resumes.

Trivial change; tested live — chain now correctly reports v1.34.7
(workers' version) and spawns preflight → master → worker chain.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-23 09:48:50 +00:00
..
scripts k8s-version-upgrade: halt_on_alert allowlist (severity=critical only) 2026-05-23 09:14:39 +00:00
job-template.yaml k8s-version-upgrade: decompose into Job chain to fix self-preemption 2026-05-11 23:54:22 +00:00
main.tf k8s-version-upgrade: version-check uses oldest kubelet, not master 2026-05-23 09:48:50 +00:00
terragrunt.hcl k8s-version-upgrade: automated kubeadm/kubelet/kubectl upgrade pipeline 2026-05-10 19:07:42 +00:00