Previous version-check read RUNNING from .items[0].nodeInfo.kubeletVersion
— which is just k8s-master. If master is upgraded but workers aren't
(e.g. a chain that completed master phase but failed mid-worker), the
version-check sees v1.34.8 and decides "no upgrade needed", never
spawning the resume phase. Workers stay behind forever.
Today's chain hit exactly this: master + node4 upgraded to v1.34.8,
worker-node4 Failed mid-soak (alert sensitivity, since loosened),
chain dead. Re-triggering the version-check looked at master only,
decided cluster was "done", and refused to resume worker chain.
Fix: read all node kubelet versions, sort -V, take head -1 (oldest).
A partial chain now correctly reports the un-upgraded version and the
chain resumes.
Trivial change; tested live — chain now correctly reports v1.34.7
(workers' version) and spawns preflight → master → worker chain.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>