Adds a weekly detection CronJob (Sun 12:00 UTC) that probes apt-cache madison
on master for new patches + HEAD pkgs.k8s.io for next-minor availability,
then POSTs to claude-agent-service to dispatch the k8s-version-upgrade agent.
The agent (.claude/agents/k8s-version-upgrade.md) orchestrates:
pre-flight (5 nodes Ready + halt-on-alert + 24h-quiet + plan target match)
-> etcd snapshot save
-> optional master containerd skew fix
-> apt repo URL rewrite (minor bumps only)
-> drain/upgrade/uncordon master via ssh < update_k8s.sh
-> sequential workers k8s-node4 -> 3 -> 2 -> 1 with 10-min soak each
-> post-flight verification
Two new Upgrade Gates alerts catch failure modes:
- K8sVersionSkew (kubelet/apiserver gitVersion mismatch >30m)
- EtcdPreUpgradeSnapshotMissing (in_flight without snapshot_taken >10m)
update_k8s.sh refactored to take --role / --release args; the agent shells
it into each node via SSH pipe. update_node.sh annotated as OS-major path.
Operator-facing docs: docs/runbooks/k8s-version-upgrade.md and a new section
in docs/architecture/automated-upgrades.md.
Secrets: secret/k8s-upgrade/{ssh_key,ssh_key_pub,slack_webhook} (ed25519
keypair distributed to all 5 nodes via authorized_keys; slack_webhook
reuses kured webhook URL on initial deploy).
98 lines
3.3 KiB
Bash
Executable file
98 lines
3.3 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
#
|
|
# K8s component upgrader. Run on a single node (master OR worker) at a time.
|
|
# The caller is responsible for:
|
|
# - draining + uncordoning the node (this script does not touch kubectl)
|
|
# - sequencing nodes (master first, then workers one at a time)
|
|
# - pre-flight checks (etcd snapshot, halt-on-alert, etc)
|
|
#
|
|
# Used by:
|
|
# - the k8s-version-upgrade agent (infra/.claude/agents/k8s-version-upgrade.md)
|
|
# - manual operators following the runbook (infra/docs/runbooks/k8s-version-upgrade.md)
|
|
#
|
|
# Old manual orchestration loop (kept for reference — the agent does the
|
|
# equivalent now):
|
|
# for n in $(kbn | grep 'k8s-node' | awk '{print $1}'); do
|
|
# kb drain $n --ignore-daemonsets --delete-emptydir-data
|
|
# s wizard@$n 'bash -s' < update_k8s.sh --role worker --release 1.34.5
|
|
# kb uncordon $n
|
|
# done
|
|
|
|
set -euo pipefail
|
|
|
|
ROLE=""
|
|
RELEASE=""
|
|
|
|
usage() {
|
|
cat <<EOF
|
|
Usage: $0 --role <master|worker> --release <X.Y.Z>
|
|
|
|
--role master|worker (required)
|
|
--release kubeadm/kubelet/kubectl target patch version, e.g. 1.34.5
|
|
|
|
Behavior:
|
|
- Rewrites /etc/apt/sources.list.d/kubernetes.list to the v\$MINOR/deb repo
|
|
derived from --release (so a 1.34.x release uses v1.34/deb, 1.35.x uses
|
|
v1.35/deb, etc).
|
|
- apt-get install kubeadm=<release>-* (apt-mark unhold first).
|
|
- master: kubeadm upgrade plan && kubeadm upgrade apply v<release> -y
|
|
- worker: kubeadm upgrade node
|
|
- apt-get install kubelet=<release>-* kubectl=<release>-* then re-hold.
|
|
- systemctl daemon-reload && systemctl restart kubelet
|
|
EOF
|
|
}
|
|
|
|
while [[ $# -gt 0 ]]; do
|
|
case "$1" in
|
|
--role) ROLE="$2"; shift 2;;
|
|
--release) RELEASE="$2"; shift 2;;
|
|
-h|--help) usage; exit 0;;
|
|
*) echo "Unknown arg: $1" >&2; usage; exit 2;;
|
|
esac
|
|
done
|
|
|
|
if [[ -z "$ROLE" || -z "$RELEASE" ]]; then
|
|
echo "ERROR: --role and --release are required" >&2
|
|
usage
|
|
exit 2
|
|
fi
|
|
|
|
if [[ "$ROLE" != "master" && "$ROLE" != "worker" ]]; then
|
|
echo "ERROR: --role must be 'master' or 'worker' (got: $ROLE)" >&2
|
|
exit 2
|
|
fi
|
|
|
|
# Derive minor track (e.g. 1.34.5 → 1.34)
|
|
STABLE_VERSION="$(echo "$RELEASE" | awk -F. '{print $1"."$2}')"
|
|
|
|
echo "==> Upgrading $(hostname) ($ROLE) to v$RELEASE (track v$STABLE_VERSION)"
|
|
|
|
# Apt repo URL is pinned per minor track. Rewrite + re-import the signing key
|
|
# every run — cheap, idempotent, and handles the minor-bump case where the
|
|
# old track's repo no longer carries the target version.
|
|
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v$STABLE_VERSION/deb/ /" \
|
|
| sudo tee /etc/apt/sources.list.d/kubernetes.list
|
|
sudo mkdir -p /etc/apt/keyrings
|
|
curl -fsSL "https://pkgs.k8s.io/core:/stable:/v$STABLE_VERSION/deb/Release.key" \
|
|
| sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg --batch --yes
|
|
|
|
sudo apt-mark unhold kubeadm kubelet kubectl
|
|
sudo apt-get update
|
|
sudo apt-get install -y "kubeadm=$RELEASE-*"
|
|
|
|
if [[ "$ROLE" == "master" ]]; then
|
|
echo "==> Master path: kubeadm upgrade plan + apply"
|
|
sudo kubeadm upgrade plan
|
|
sudo kubeadm upgrade apply "v$RELEASE" -y
|
|
else
|
|
echo "==> Worker path: kubeadm upgrade node"
|
|
sudo kubeadm upgrade node
|
|
fi
|
|
|
|
sudo apt-get install -y "kubelet=$RELEASE-*" "kubectl=$RELEASE-*"
|
|
sudo apt-mark hold kubeadm kubelet kubectl
|
|
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl restart kubelet
|
|
|
|
echo "==> Done: $(hostname) is on v$RELEASE"
|