k8s-version-upgrade: compat gate — auto-upgrade when safe, halt + alert when not
Make k8s upgrades (patch AND minor) autonomous without being reckless: the chain attempts every upgrade but refuses unless it can prove the target is safe. A refusal is a BLOCK (not a crash) — it halts the chain and signals for attention. - compat-gate.py: read-only preflight check. Blocks if (a) a critical addon's running version doesn't support the target k8s minor, (b) an in-use deprecated API (apiserver_requested_deprecated_apis) is removed at/before the target, or (c) a node's containerd is below the target's floor. Validated against the live cluster: correctly blocks 1.35/1.36 today on Calico 3.26 / ESO 0.12 / kyverno 1.16 (all behind), which is exactly the auto-halt we want until they're bumped. - addon-compat.json: curated addon -> max-supported-k8s matrix (Calico, ESO, kyverno, gpu-operator + containerd floor), sourced from each project's compat docs (2026-06-19). The keystone data the gate reads; keep current. - upgrade-step.sh: phase_preflight runs the gate FIRST (before any mutation); block() pushes k8s_upgrade_blocked=1 + Slacks the reasons + halts. - main.tf: detector minor-probe fix (curl -sILo so the 302 from pkgs.k8s.io resolves to 200 — minors were never being detected). Gated behind the compat gate above, so enabling minor detection can't roll an unsafe minor. Not pushed yet: deploys with the K8sUpgradeBlocked alert + deeper postflight + runbook (next commit) so the detector fix only goes live with the full net.
This commit is contained in:
parent
9189560ac3
commit
cecd9fe247
4 changed files with 230 additions and 3 deletions
57
stacks/k8s-version-upgrade/scripts/addon-compat.json
Normal file
57
stacks/k8s-version-upgrade/scripts/addon-compat.json
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
{
|
||||
"_comment": "Addon -> highest k8s minor each addon version supports. The preflight compat-gate (compat-gate.py) reads the RUNNING version of each addon and blocks a k8s upgrade whose target minor exceeds what that running version supports — so the chain auto-halts + alerts instead of breaking on an unsupported addon. Keep current; sources are the addons' own k8s compat matrices (last refreshed 2026-06-19 for the 1.34->1.36 catch-up). max_k8s keys are addon-version floors (major.minor); value is the highest k8s minor that floor supports.",
|
||||
"addons": [
|
||||
{
|
||||
"name": "calico",
|
||||
"namespace": "calico-system",
|
||||
"kind": "daemonset",
|
||||
"resource": "calico-node",
|
||||
"image_re": "node:v?([0-9]+\\.[0-9]+)",
|
||||
"max_k8s": {
|
||||
"3.26": "1.28",
|
||||
"3.27": "1.29",
|
||||
"3.28": "1.30",
|
||||
"3.29": "1.32",
|
||||
"3.30": "1.35",
|
||||
"3.31": "1.35",
|
||||
"3.32": "1.36"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "external-secrets",
|
||||
"namespace": "external-secrets",
|
||||
"kind": "deployment",
|
||||
"resource": "external-secrets",
|
||||
"image_re": "external-secrets:v?([0-9]+\\.[0-9]+)",
|
||||
"max_k8s": {
|
||||
"0.12": "1.31",
|
||||
"2.0": "1.35"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "kyverno",
|
||||
"namespace": "kyverno",
|
||||
"kind": "deployment",
|
||||
"resource": "kyverno-admission-controller",
|
||||
"image_re": "kyverno:v?([0-9]+\\.[0-9]+)",
|
||||
"max_k8s": {
|
||||
"1.16": "1.34",
|
||||
"1.18": "1.35"
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "gpu-operator",
|
||||
"namespace": "nvidia",
|
||||
"kind": "deployment",
|
||||
"resource": "gpu-operator",
|
||||
"image_re": "gpu-operator:v?([0-9]+\\.[0-9]+)",
|
||||
"max_k8s": {
|
||||
"25.10": "1.35",
|
||||
"26.3": "1.36"
|
||||
}
|
||||
}
|
||||
],
|
||||
"containerd_min": {
|
||||
"1.37": "2.0"
|
||||
}
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue