From 037a609f272c094d2b6b4968bd8c14a0ceb35b0d Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Wed, 17 Jun 2026 13:45:05 +0000 Subject: [PATCH] =?UTF-8?q?k8s-version-upgrade:=20unblock=201.34.9=20?= =?UTF-8?q?=E2=80=94=20skip=20kubeadm=20CoreDNS=20addon=20+=20busybox-date?= =?UTF-8?q?=20fix?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The 1.34.9 master upgrade hard-failed `kubeadm upgrade apply` preflight: CoreDNS is at v1.12.4 (Keel auto-bumped it 1.12.1 -> 1.12.4 on 2026-05-26 via a stale kube-system out-of-band annotation), and 1.12.4 is ahead of kubeadm 1.34.9's bundled corefile-migration table ("start version not supported"). - scripts/update_k8s.sh: master `kubeadm upgrade apply` now runs with `--ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins --skip-phases=addon/coredns`. A dry-run proved --ignore ALONE would overwrite our custom split-horizon Corefile with kubeadm's default AND downgrade the image; --skip-phases leaves CoreDNS 100% untouched while the control plane upgrades. CoreDNS is pinned off Keel (keel.sh/policy=never) to stop the drift. - stacks/k8s-version-upgrade/scripts/upgrade-step.sh: fix the preflight quiet-baseline (settle-window) check, which silently no-op'd on the ghcr claude-agent-service image's busybox `date` (can't parse ISO8601). Now tries GNU then busybox `-D`, and warns+skips on parse failure (no silent fail-open). - docs: runbook + architecture document the CoreDNS handling. Co-Authored-By: Claude Opus 4.8 --- docs/architecture/automated-upgrades.md | 6 +++++- docs/runbooks/k8s-version-upgrade.md | 20 +++++++++++++++++++ scripts/update_k8s.sh | 15 +++++++++++++- .../scripts/upgrade-step.sh | 14 ++++++++++--- 4 files changed, 50 insertions(+), 5 deletions(-) diff --git a/docs/architecture/automated-upgrades.md b/docs/architecture/automated-upgrades.md index 2029c83d..d6a77038 100644 --- a/docs/architecture/automated-upgrades.md +++ b/docs/architecture/automated-upgrades.md @@ -309,7 +309,11 @@ each Job's pod and its drain target are always different nodes. ConfigMap, and a `template` ConfigMap into each Job pod. - **Per-node script**: `infra/scripts/update_k8s.sh`. Caller passes `--role master|worker --release X.Y.Z`. Piped via SSH into each node by - upgrade-step.sh. + upgrade-step.sh. The master path runs `kubeadm upgrade apply` with + `--ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins + --skip-phases=addon/coredns` so kubeadm never touches CoreDNS (custom Corefile + + separately-tracked image; CoreDNS is pinned off Keel via `keel.sh/policy=never`). + See the runbook's "CoreDNS is NOT upgraded by kubeadm here". - **Four Upgrade Gates alerts**: - `K8sVersionSkew` — kubelet/apiserver `gitVersion` count >1 for 30m. Catches a half-done rollout. - `EtcdPreUpgradeSnapshotMissing` — `k8s_upgrade_in_flight==1 && k8s_upgrade_snapshot_taken==0` for 10m. Catches preflight failing silently. diff --git a/docs/runbooks/k8s-version-upgrade.md b/docs/runbooks/k8s-version-upgrade.md index 00ff78f9..176f56e3 100644 --- a/docs/runbooks/k8s-version-upgrade.md +++ b/docs/runbooks/k8s-version-upgrade.md @@ -118,6 +118,26 @@ Pushed by upgrade-step.sh during phase execution; observed by the - **`K8sUpgradeChainJobFailed`** — `kube_job_status_failed{namespace="k8s-upgrade",job_name=~"k8s-upgrade-.*",reason=~"BackoffLimitExceeded|DeadlineExceeded"} > 0` for 15m (warning). Catches a phase Job that **terminally failed before `k8s_upgrade_in_flight` was set** — the preflight gates exit pre-metric, so the two `in_flight`-based alerts above are blind to a failed preflight (this is what hid the 5-day 1.34.9 wedge on 2026-06-12). Reason-scoped to terminal job conditions so a retry-success doesn't false-positive (a bare failed-pod-count would otherwise also block kured for the Job's 7d TTL). - All four alerts ALSO block kured (same `--prometheus-url` halt-on-alert mechanism) so the OS-reboot pipeline can't run on top of a half-done version upgrade. +### CoreDNS is NOT upgraded by kubeadm here + +CoreDNS runs a **custom split-horizon Corefile** (owned by the technitium stack) +and its image is tracked separately — it must NOT be touched by kubeadm. The +master `kubeadm upgrade apply` therefore runs with +`--ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins +--skip-phases=addon/coredns` (in `scripts/update_k8s.sh`), so kubeadm upgrades +the control plane but leaves CoreDNS 100% untouched (image + Corefile). Without +the `--skip-phases`, forcing past the preflight makes kubeadm overwrite the +Corefile with its default and downgrade the image (verified via +`kubeadm upgrade apply --dry-run`). + +**Keep CoreDNS off Keel.** On 2026-06-12 Keel had auto-bumped CoreDNS +v1.12.1 → v1.12.4 (kube-system out-of-band annotation from the 2026-05-26 Keel +cascade), and 1.12.4 is ahead of kubeadm 1.34.9's corefile-migration table — +which is what blocked the 1.34.9 upgrade. CoreDNS is now `keel.sh/policy=never` +(`kubectl -n kube-system annotate deploy/coredns keel.sh/policy=never`). If a +future kubeadm minor ships a CoreDNS that DOES know the running version, drop the +`--skip-phases` for that run to let kubeadm re-take ownership. + ### Vault secrets - `secret/k8s-upgrade/ssh_key` — ed25519 PRIVATE key, used by Jobs to SSH `wizard@` diff --git a/scripts/update_k8s.sh b/scripts/update_k8s.sh index 6e01d654..3684fc1d 100755 --- a/scripts/update_k8s.sh +++ b/scripts/update_k8s.sh @@ -98,7 +98,20 @@ if [[ "$ROLE" == "master" ]]; then # right version (which is the only case where this timeout fires). attempt=1 extra_flags="" - while ! sudo kubeadm upgrade apply "v$RELEASE" -y $extra_flags; do + # CoreDNS is managed OUTSIDE kubeadm on this cluster: the Corefile is a + # custom split-horizon config owned by the technitium stack, and the image + # is intentionally tracked separately. kubeadm's bundled corefile-migration + # library rejects CoreDNS versions it doesn't know (e.g. 1.12.4 -> "start + # version not supported"), which HARD-FAILS `upgrade apply` at preflight. + # Forcing past preflight with --ignore alone is NOT enough — kubeadm would + # then overwrite our custom Corefile with its default AND downgrade the + # image (verified via `kubeadm upgrade apply --dry-run`, 2026-06-17). So we + # also skip the coredns addon phase entirely: kubeadm leaves CoreDNS 100% + # untouched and only upgrades the control-plane components. (Root fix: keep + # CoreDNS off Keel — keel.sh/policy=never — so it stops drifting ahead of + # kubeadm's migration table.) + coredns_flags="--ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins --skip-phases=addon/coredns" + while ! sudo kubeadm upgrade apply "v$RELEASE" -y $coredns_flags $extra_flags; do if (( attempt >= 3 )); then echo "ERROR: kubeadm upgrade apply failed after 3 attempts" >&2 exit 1 diff --git a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh index f46e1af7..57ef87fc 100644 --- a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh +++ b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh @@ -306,11 +306,19 @@ phase_preflight() { # reboot for an hour. 10min is sufficient for kubelet/control-plane to # stabilise; the kured-sentinel-gate DaemonSet enforces the broader # 24h-between-cluster-reboots invariant. - local recent=0 + local recent=0 now_ep ts_ep + now_ep=$(date -u +%s) while IFS= read -r ts; do [ -z "$ts" ] && continue - local diff=$(( $(date +%s) - $(date -d "$ts" +%s) )) - if [ "$diff" -lt 600 ]; then recent=1; break; fi + # Portable ISO8601(UTC) -> epoch. GNU `date -d` parses ISO8601 directly; + # busybox `date` (the ghcr claude-agent-service base) does NOT and needs an + # explicit -D format. Before 2026-06-17 the bare `date -d "$ts"` silently + # failed on busybox, making this whole settle-window check a no-op. On + # parse failure, warn + skip the node (never silently treat it as quiet). + ts_ep=$(date -u -d "$ts" +%s 2>/dev/null || true) + if [ -z "$ts_ep" ]; then ts_ep=$(date -u -D '%Y-%m-%dT%H:%M:%SZ' -d "$ts" +%s 2>/dev/null || true); fi + if [ -z "$ts_ep" ]; then echo "WARN quiet-baseline: cannot parse Ready ts '$ts' (date impl?); skipping"; continue; fi + if [ "$(( now_ep - ts_ep ))" -lt 600 ]; then recent=1; break; fi done < <($KUBECTL get nodes -o jsonpath='{range .items[*]}{range .status.conditions[?(@.type=="Ready")]}{.lastTransitionTime}{"\n"}{end}{end}') if [ "$recent" -eq 1 ]; then slack "ABORT preflight — node transitioned Ready <10min ago (settle window)"