k8s-version-upgrade: dynamic worker enumeration + IP-based SSH (auto-cover all/new nodes)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The chain hardcoded master→node4→node3→node2→node1→postflight and SSHed by
FQDN. It silently SKIPPED node5/node6 (added 2026-05-26) — postflight would
have failed even if reachable — and node5/node6 had no .viktorbarzin.lan DNS
records, so the chain couldn't SSH to them at all.
Refactor (upgrade-step.sh):
- Worker set + order derived live from `kubectl get nodes` (worker_nodes /
next_pending_worker), so EVERY worker still off-target is upgraded and a
newly-joined node is covered with zero script change.
- SSH targets are node InternalIPs (ssh_target), removing the dependency on
node DNS records entirely — a new node is reachable the moment it joins.
- The two remaining hardcoded loops (containerd skew, apt-repo rewrite) now
enumerate workers/all-nodes dynamically too.
- Topology preserved: master-drain Job runs on the first worker; every
worker-drain Job runs on the already-upgraded k8s-master (self-preemption
invariant intact).
- next_pending_worker returns 0 explicitly on the no-match path — the
`while read … done < <(…)` loop exits 1 at EOF, which under set -e would
abort the LAST worker's Job before it spawns postflight (cluster upgraded
but no cleanup / in_flight reset). Caught in review.
Docs (runbook + architecture + headers) updated to the dynamic topology.
NOTE: nodes still need the k8s-upgrade SSH public key in authorized_keys; it was
deployed to node4/5/6 by hand this session. Baking it into node provisioning
(so new nodes get it automatically) is the remaining follow-up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
0c5a9b5f44
commit
ed53b34bf4
4 changed files with 112 additions and 79 deletions
|
|
@ -4,11 +4,12 @@
|
|||
# Job's pod runs on a node that is NOT its drain target — eliminates the
|
||||
# self-preemption bug that killed the agent-based v1 (2026-05-11 incident).
|
||||
#
|
||||
# Chain (Job 0 → Job 6):
|
||||
# preflight (pinned: k8s-node1)
|
||||
# master (pinned: k8s-node1; drains k8s-master)
|
||||
# worker (pinned: k8s-node1; drains k8s-node4 → 3 → 2)
|
||||
# worker (pinned: k8s-master + control-plane toleration; drains k8s-node1 last)
|
||||
# Chain (dynamic length — one worker Job per worker node, enumerated live):
|
||||
# preflight (pinned: first worker)
|
||||
# master (pinned: first worker; drains k8s-master)
|
||||
# worker (pinned: k8s-master + control-plane toleration; drains each worker)
|
||||
# ↳ repeats for EVERY worker still off-target (kubectl-derived, so
|
||||
# new nodes like node5/node6 are included automatically)
|
||||
# postflight (no pinning)
|
||||
#
|
||||
# Each phase Job's container runs scripts/upgrade-step.sh which:
|
||||
|
|
|
|||
|
|
@ -1,25 +1,23 @@
|
|||
#!/usr/bin/env bash
|
||||
#
|
||||
# Universal upgrade-step body. Each Job in the k8s-version-upgrade chain runs
|
||||
# this once, dispatching on $PHASE. On success it computes the next phase and
|
||||
# spawns the next Job. The chain is:
|
||||
# this once, dispatching on $PHASE. On success it computes the next phase from
|
||||
# LIVE cluster state and spawns the next Job. The chain is:
|
||||
#
|
||||
# preflight (run on k8s-node1)
|
||||
# preflight (run on the first worker)
|
||||
# ↓
|
||||
# master (drains k8s-master; run on k8s-node1)
|
||||
# ↓
|
||||
# worker k8s-node4 (run on k8s-node1)
|
||||
# ↓
|
||||
# worker k8s-node3 (run on k8s-node1)
|
||||
# ↓
|
||||
# worker k8s-node2 (run on k8s-node1)
|
||||
# ↓
|
||||
# worker k8s-node1 (drains k8s-node1; run on k8s-master with control-plane toleration)
|
||||
# master (drains k8s-master; run on the first worker = the "runner")
|
||||
# ↓
|
||||
# worker <W> (drains <W>; run on k8s-master w/ control-plane toleration) ── repeats
|
||||
# ↓ for EVERY worker still off-target, enumerated from kubectl
|
||||
# postflight (no node pinning)
|
||||
#
|
||||
# k8s-node1 hosts every Job except the one that drains k8s-node1 itself.
|
||||
# k8s-node1 is therefore upgraded LAST.
|
||||
# The worker list is derived dynamically (worker_nodes / next_pending_worker),
|
||||
# so newly-added nodes are upgraded with no script change — the old hardcoded
|
||||
# master→node4→3→2→1 chain silently skipped node5/node6 (added 2026-05-26).
|
||||
# Self-preemption invariant: a Job never runs on the node it drains — the
|
||||
# master-drain Job runs on a worker; each worker-drain Job runs on the
|
||||
# already-upgraded master. SSH targets are node InternalIPs (no DNS dependency).
|
||||
#
|
||||
# Required env vars (set on the Job pod by job-template.yaml):
|
||||
# PHASE preflight | master | worker | postflight
|
||||
|
|
@ -39,13 +37,11 @@ KUBECTL=kubectl
|
|||
JOB_TEMPLATE=/template/job-template.yaml
|
||||
UPDATE_K8S_SH=/scripts/update_k8s.sh
|
||||
|
||||
# Pod-side DNS: the cluster's CoreDNS has search domains
|
||||
# `<ns>.svc.cluster.local svc.cluster.local cluster.local` (plus ndots=2 via
|
||||
# Kyverno mutation). Unqualified `k8s-master` falls through all of these and
|
||||
# then queries the upstream DNS (Technitium) for bare `k8s-master`, which
|
||||
# returns NXDOMAIN. The FQDN `k8s-master.viktorbarzin.lan` is what Technitium
|
||||
# actually serves. Suffix every node SSH target with this domain.
|
||||
NODE_DOMAIN=".viktorbarzin.lan"
|
||||
# SSH targets are node InternalIPs, resolved live from `kubectl get nodes` (see
|
||||
# ssh_target() below) — the pipeline has NO dependency on node DNS records
|
||||
# (`k8s-node<N>.viktorbarzin.lan`). This is what lets a freshly-joined node be
|
||||
# upgraded with zero DNS/Kea provisioning. (Was FQDN-based until 2026-06-17:
|
||||
# node5/node6 had no .viktorbarzin.lan records, which blocked the 1.34.9 chain.)
|
||||
|
||||
# SSH key must be 0400 — refresh from secret mount (defaultMode does this but
|
||||
# bind-mount semantics can preserve loose perms; chmod is idempotent).
|
||||
|
|
@ -183,36 +179,66 @@ drain_node() {
|
|||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Chain definition — what comes after the current phase
|
||||
# Cluster topology — derived live so new nodes are covered automatically
|
||||
# ---------------------------------------------------------------------------
|
||||
# The old chain hardcoded master→node4→3→2→1 and silently skipped node5/node6
|
||||
# (added 2026-05-26). Everything below enumerates nodes from `kubectl get nodes`
|
||||
# instead, and SSHes by InternalIP (no .viktorbarzin.lan DNS record needed), so
|
||||
# a freshly-joined worker is upgraded with ZERO pipeline changes.
|
||||
#
|
||||
# Self-preemption invariant preserved: a phase Job never runs on the node it
|
||||
# drains. The master-drain Job runs on a worker (the "runner"); every
|
||||
# worker-drain Job runs on the (already-upgraded) master.
|
||||
worker_nodes() { $KUBECTL get nodes -l '!node-role.kubernetes.io/control-plane' \
|
||||
-o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}' | sort; }
|
||||
all_nodes() { { echo k8s-master; worker_nodes; } | sed '/^$/d'; }
|
||||
# SSH target = the node's InternalIP (avoids any DNS dependency).
|
||||
ssh_target() { echo "wizard@$($KUBECTL get node "$1" \
|
||||
-o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}')"; }
|
||||
# First worker not yet on $TARGET_VERSION, excluding $1 (the node the current
|
||||
# phase is upgrading — still pending when this runs). Empty → all workers done.
|
||||
next_pending_worker() {
|
||||
local n v exclude="${1:-}"
|
||||
while read -r n; do
|
||||
[ -z "$n" ] && continue
|
||||
[ "$n" = "$exclude" ] && continue
|
||||
v=$($KUBECTL get node "$n" -o jsonpath='{.status.nodeInfo.kubeletVersion}' 2>/dev/null | tr -d v)
|
||||
[ "$v" != "$TARGET_VERSION" ] && { echo "$n"; return 0; }
|
||||
done < <(worker_nodes)
|
||||
# No worker left off-target. Explicit success — the loop's final `read` exits
|
||||
# 1 at EOF, and `next_w="$(next_pending_worker …)"` under `set -e` would abort
|
||||
# the chain BEFORE the postflight branch (cluster upgraded but no cleanup /
|
||||
# in_flight reset / success). Verified blocker — keep this return 0.
|
||||
return 0
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Chain definition — what comes after the current phase (dynamic)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
NEXT_PHASE=""
|
||||
NEXT_TARGET_NODE=""
|
||||
NEXT_RUN_ON=""
|
||||
|
||||
case "${PHASE}:${TARGET_NODE:-}" in
|
||||
preflight:)
|
||||
case "$PHASE" in
|
||||
preflight)
|
||||
# master upgrade drains the control-plane node → its Job runs on a worker.
|
||||
NEXT_PHASE=master
|
||||
NEXT_RUN_ON=k8s-node1 ;;
|
||||
master:)
|
||||
NEXT_PHASE=worker; NEXT_TARGET_NODE=k8s-node4
|
||||
NEXT_RUN_ON=k8s-node1 ;;
|
||||
worker:k8s-node4)
|
||||
NEXT_PHASE=worker; NEXT_TARGET_NODE=k8s-node3
|
||||
NEXT_RUN_ON=k8s-node1 ;;
|
||||
worker:k8s-node3)
|
||||
NEXT_PHASE=worker; NEXT_TARGET_NODE=k8s-node2
|
||||
NEXT_RUN_ON=k8s-node1 ;;
|
||||
worker:k8s-node2)
|
||||
NEXT_PHASE=worker; NEXT_TARGET_NODE=k8s-node1
|
||||
NEXT_RUN_ON=k8s-master ;; # control-plane toleration required
|
||||
worker:k8s-node1)
|
||||
NEXT_PHASE=postflight
|
||||
NEXT_RUN_ON="" ;; # no node pinning for postflight
|
||||
postflight:)
|
||||
NEXT_PHASE="" ;; # end of chain
|
||||
NEXT_RUN_ON="$(worker_nodes | head -1)" ;;
|
||||
master | worker)
|
||||
# Next worker still off-target (excluding the one this phase handled). Its
|
||||
# Job runs on the already-upgraded master (k8s-master, control-plane
|
||||
# toleration). Dynamic → every worker incl. node5/6 + any future node.
|
||||
next_w="$(next_pending_worker "${TARGET_NODE:-}")"
|
||||
if [ -n "$next_w" ]; then
|
||||
NEXT_PHASE=worker; NEXT_TARGET_NODE="$next_w"; NEXT_RUN_ON=k8s-master
|
||||
else
|
||||
NEXT_PHASE=postflight; NEXT_RUN_ON="" # all workers on target
|
||||
fi ;;
|
||||
postflight)
|
||||
NEXT_PHASE="" ;; # end of chain
|
||||
*)
|
||||
echo "ERROR: unknown phase/target combo: ${PHASE}/${TARGET_NODE:-}" >&2
|
||||
echo "ERROR: unknown phase: $PHASE" >&2
|
||||
exit 2 ;;
|
||||
esac
|
||||
|
||||
|
|
@ -331,7 +357,7 @@ phase_preflight() {
|
|||
# aborts this whole check. Ignore the two CoreDNS checks here too so plan
|
||||
# still emits its "kubeadm upgrade apply vX.Y.Z" line. (See update_k8s.sh.)
|
||||
local plan_target
|
||||
plan_target=$(ssh "${SSH_OPTS[@]}" "wizard@k8s-master$NODE_DOMAIN" 'sudo kubeadm upgrade plan --ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins' \
|
||||
plan_target=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" 'sudo kubeadm upgrade plan --ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins' \
|
||||
| grep -oE 'kubeadm upgrade apply v[0-9]+\.[0-9]+\.[0-9]+' \
|
||||
| grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' | head -1 | tr -d v)
|
||||
if [ "$plan_target" != "$TARGET_VERSION" ]; then
|
||||
|
|
@ -372,16 +398,18 @@ phase_preflight() {
|
|||
|
||||
# 7. Containerd skew fix on master (if master < workers)
|
||||
local master_ctr worker_max=0.0.0
|
||||
master_ctr=$(ssh "${SSH_OPTS[@]}" "wizard@k8s-master$NODE_DOMAIN" "containerd --version | awk '{print \$3}' | tr -d v")
|
||||
for n in k8s-node1 k8s-node2 k8s-node3 k8s-node4; do
|
||||
master_ctr=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" "containerd --version | awk '{print \$3}' | tr -d v")
|
||||
local n
|
||||
while read -r n; do
|
||||
[ -z "$n" ] && continue
|
||||
local v
|
||||
v=$(ssh "${SSH_OPTS[@]}" "wizard@$n$NODE_DOMAIN" "containerd --version | awk '{print \$3}' | tr -d v")
|
||||
v=$(ssh "${SSH_OPTS[@]}" "$(ssh_target "$n")" "containerd --version | awk '{print \$3}' | tr -d v")
|
||||
[ "$(printf '%s\n%s' "$v" "$worker_max" | sort -V | tail -1)" = "$v" ] && worker_max="$v"
|
||||
done
|
||||
done < <(worker_nodes)
|
||||
if [ "$(printf '%s\n%s' "$master_ctr" "$worker_max" | sort -V | head -1)" = "$master_ctr" ] \
|
||||
&& [ "$master_ctr" != "$worker_max" ]; then
|
||||
slack "Master containerd $master_ctr < workers $worker_max — bumping"
|
||||
ssh "${SSH_OPTS[@]}" "wizard@k8s-master$NODE_DOMAIN" \
|
||||
ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" \
|
||||
"sudo apt-mark unhold containerd.io && sudo apt-get install -y containerd.io='$worker_max-1' \
|
||||
&& sudo apt-mark hold containerd.io && sudo systemctl restart containerd"
|
||||
wait_for_node_ready k8s-master "$($KUBECTL get node k8s-master -o jsonpath='{.status.nodeInfo.kubeletVersion}' | tr -d v)" \
|
||||
|
|
@ -392,14 +420,15 @@ phase_preflight() {
|
|||
# 8. Apt repo URL rewrite (minor only)
|
||||
if [ "$KIND" = "minor" ]; then
|
||||
local target_minor="${TARGET_VERSION%.*}"
|
||||
for n in k8s-master k8s-node1 k8s-node2 k8s-node3 k8s-node4; do
|
||||
ssh "${SSH_OPTS[@]}" "wizard@$n$NODE_DOMAIN" \
|
||||
while read -r n; do
|
||||
[ -z "$n" ] && continue
|
||||
ssh "${SSH_OPTS[@]}" "$(ssh_target "$n")" \
|
||||
"echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$target_minor/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list \
|
||||
&& curl -fsSL 'https://pkgs.k8s.io/core:/stable:/v$target_minor/deb/Release.key' \
|
||||
| sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg --batch --yes \
|
||||
&& sudo apt-get update"
|
||||
done
|
||||
slack "Apt repo rewritten to v$target_minor/deb on all 5 nodes"
|
||||
done < <(all_nodes)
|
||||
slack "Apt repo rewritten to v$target_minor/deb on all nodes"
|
||||
fi
|
||||
|
||||
slack "Preflight clean. Snapshot at nfs://...$snap_file ($size bytes). Dispatching master Job."
|
||||
|
|
@ -449,7 +478,7 @@ phase_master() {
|
|||
drain_node k8s-master
|
||||
|
||||
slack "Running update_k8s.sh on k8s-master (--role master --release $TARGET_VERSION)"
|
||||
ssh "${SSH_OPTS[@]}" "wizard@k8s-master$NODE_DOMAIN" 'bash -s' \
|
||||
ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" 'bash -s' \
|
||||
< "$UPDATE_K8S_SH" -- --role master --release "$TARGET_VERSION"
|
||||
|
||||
$KUBECTL uncordon k8s-master
|
||||
|
|
@ -507,7 +536,7 @@ phase_worker() {
|
|||
drain_node "$TARGET_NODE"
|
||||
|
||||
slack "Running update_k8s.sh on $TARGET_NODE (--role worker --release $TARGET_VERSION)"
|
||||
ssh "${SSH_OPTS[@]}" "wizard@$TARGET_NODE$NODE_DOMAIN" 'bash -s' \
|
||||
ssh "${SSH_OPTS[@]}" "$(ssh_target "$TARGET_NODE")" 'bash -s' \
|
||||
< "$UPDATE_K8S_SH" -- --role worker --release "$TARGET_VERSION"
|
||||
|
||||
$KUBECTL uncordon "$TARGET_NODE"
|
||||
|
|
@ -539,7 +568,7 @@ phase_worker() {
|
|||
phase_postflight() {
|
||||
slack "Running postflight"
|
||||
|
||||
# All 5 nodes at target
|
||||
# All nodes at target
|
||||
local versions wrong
|
||||
versions=$($KUBECTL get nodes -o jsonpath='{range .items[*]}{.metadata.name}:{.status.nodeInfo.kubeletVersion}{"\n"}{end}')
|
||||
# `grep -v` returns 1 when all nodes are on target (the happy path —
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue