k8s-version-upgrade: unblock 1.34.9 — skip kubeadm CoreDNS addon + busybox-date fix
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The 1.34.9 master upgrade hard-failed `kubeadm upgrade apply` preflight: CoreDNS
is at v1.12.4 (Keel auto-bumped it 1.12.1 -> 1.12.4 on 2026-05-26 via a stale
kube-system out-of-band annotation), and 1.12.4 is ahead of kubeadm 1.34.9's
bundled corefile-migration table ("start version not supported").
- scripts/update_k8s.sh: master `kubeadm upgrade apply` now runs with
`--ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins
--skip-phases=addon/coredns`. A dry-run proved --ignore ALONE would overwrite
our custom split-horizon Corefile with kubeadm's default AND downgrade the
image; --skip-phases leaves CoreDNS 100% untouched while the control plane
upgrades. CoreDNS is pinned off Keel (keel.sh/policy=never) to stop the drift.
- stacks/k8s-version-upgrade/scripts/upgrade-step.sh: fix the preflight
quiet-baseline (settle-window) check, which silently no-op'd on the ghcr
claude-agent-service image's busybox `date` (can't parse ISO8601). Now tries
GNU then busybox `-D`, and warns+skips on parse failure (no silent fail-open).
- docs: runbook + architecture document the CoreDNS handling.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
042d1ce1ac
commit
037a609f27
4 changed files with 50 additions and 5 deletions
|
|
@ -306,11 +306,19 @@ phase_preflight() {
|
|||
# reboot for an hour. 10min is sufficient for kubelet/control-plane to
|
||||
# stabilise; the kured-sentinel-gate DaemonSet enforces the broader
|
||||
# 24h-between-cluster-reboots invariant.
|
||||
local recent=0
|
||||
local recent=0 now_ep ts_ep
|
||||
now_ep=$(date -u +%s)
|
||||
while IFS= read -r ts; do
|
||||
[ -z "$ts" ] && continue
|
||||
local diff=$(( $(date +%s) - $(date -d "$ts" +%s) ))
|
||||
if [ "$diff" -lt 600 ]; then recent=1; break; fi
|
||||
# Portable ISO8601(UTC) -> epoch. GNU `date -d` parses ISO8601 directly;
|
||||
# busybox `date` (the ghcr claude-agent-service base) does NOT and needs an
|
||||
# explicit -D format. Before 2026-06-17 the bare `date -d "$ts"` silently
|
||||
# failed on busybox, making this whole settle-window check a no-op. On
|
||||
# parse failure, warn + skip the node (never silently treat it as quiet).
|
||||
ts_ep=$(date -u -d "$ts" +%s 2>/dev/null || true)
|
||||
if [ -z "$ts_ep" ]; then ts_ep=$(date -u -D '%Y-%m-%dT%H:%M:%SZ' -d "$ts" +%s 2>/dev/null || true); fi
|
||||
if [ -z "$ts_ep" ]; then echo "WARN quiet-baseline: cannot parse Ready ts '$ts' (date impl?); skipping"; continue; fi
|
||||
if [ "$(( now_ep - ts_ep ))" -lt 600 ]; then recent=1; break; fi
|
||||
done < <($KUBECTL get nodes -o jsonpath='{range .items[*]}{range .status.conditions[?(@.type=="Ready")]}{.lastTransitionTime}{"\n"}{end}{end}')
|
||||
if [ "$recent" -eq 1 ]; then
|
||||
slack "ABORT preflight — node transitioned Ready <10min ago (settle window)"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue