From 566447a69840d71844c7db619cb048c1a812250b Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Wed, 24 Jun 2026 06:06:14 +0000 Subject: [PATCH] k8s-upgrade: preflight kubeadm-plan gate must pass explicit target (minor-upgrade fix) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Last night's 1.34.9->1.35.6 run passed the ESO/kyverno compat gate (the migration worked!) but ABORTED at the kubeadm-plan-target gate: it ran `kubeadm upgrade plan` with NO version, so master's old 1.34.9 kubeadm auto-proposed only the current minor (Loki: "falling back to stable-1.34") and plan_target != 1.35.6 -> abort. That gate worked for patch upgrades but never for minors. Fix: pass the explicit `v$TARGET_VERSION` (verified on master: `kubeadm upgrade plan v1.35.6` emits "kubeadm upgrade apply v1.35.6"). Works for patches too. Applied live to the ConfigMap before tonight's run; deleted the failed preflight-1-35-6 job. Also: ESO 2.x took SSA ownership of .spec.refreshInterval, so terraform's apply of the k8s-upgrade-creds ExternalSecret hit a field-manager conflict. Added field_manager.force_conflicts=true (benign — interval is semantically identical). This pattern affects all 104 migrated ESs fleet-wide (follow-up). Co-Authored-By: Claude Opus 4.8 --- stacks/k8s-version-upgrade/main.tf | 9 +++++++++ stacks/k8s-version-upgrade/scripts/upgrade-step.sh | 10 +++++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/stacks/k8s-version-upgrade/main.tf b/stacks/k8s-version-upgrade/main.tf index 077028f1..857d08e2 100644 --- a/stacks/k8s-version-upgrade/main.tf +++ b/stacks/k8s-version-upgrade/main.tf @@ -131,6 +131,15 @@ resource "kubernetes_manifest" "external_secret" { ] } } + + # ESO 2.x took SSA ownership of .spec.refreshInterval (it normalizes the value), + # which conflicts with terraform's apply of this ExternalSecret. force_conflicts + # lets terraform reassert its spec — the interval is semantically identical, so + # this is benign. Surfaced after the ESO 0.12->2.6 migration (2026-06-24); the + # same pattern affects all migrated ExternalSecrets fleet-wide. + field_manager { + force_conflicts = true + } } # --- Unified ServiceAccount + RBAC --- diff --git a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh index 17f2d2d3..76fdf157 100644 --- a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh +++ b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh @@ -398,8 +398,16 @@ phase_preflight() { slack "preflight — k8s-master already on v$TARGET_VERSION; skipping kubeadm-plan-target gate (workers still pending)" echo "k8s-master already on v$TARGET_VERSION — skipping kubeadm-plan-target gate" else + # Pass the EXPLICIT target version. Without it, `kubeadm upgrade plan` (run by + # the OLD kubeadm still on master) auto-proposes only the current minor's + # latest patch — fine for a patch upgrade, but for a MINOR upgrade it never + # proposes the next minor, so plan_target came back as 1.34.x (or empty, + # "falling back to stable-1.34") and this gate ABORTED every 1.35 attempt + # (blocked last night's 1.34.9->1.35.6 run even though the compat gate passed, + # 2026-06-24). `kubeadm upgrade plan v1.35.6` correctly emits the + # "kubeadm upgrade apply v1.35.6" line (verified on master). Works for patches too. local plan_target - plan_target=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" 'sudo kubeadm upgrade plan --ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins' \ + plan_target=$(ssh "${SSH_OPTS[@]}" "$(ssh_target k8s-master)" "sudo kubeadm upgrade plan v$TARGET_VERSION --ignore-preflight-errors=CoreDNSMigration,CoreDNSUnsupportedPlugins" \ | grep -oE 'kubeadm upgrade apply v[0-9]+\.[0-9]+\.[0-9]+' \ | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+' | head -1 | tr -d v) if [ "$plan_target" != "$TARGET_VERSION" ]; then