|
Some checks failed
ci/woodpecker/push/default Pipeline failed
Last night (2026-06-20) the detector + compat-gate fixes worked: the chain resolved target 1.35.6 and the gate correctly REFUSED it (ESO 0.12 + kyverno 1.16 don't support 1.35), pushing k8s_upgrade_blocked=1 -> K8sUpgradeBlocked fired as designed. But the refusal also made the preflight Job exit 1 (block() exits 1 on purpose so the Failed Job re-spawns nightly), which tripped K8sUpgradeChainJobFailed too — a duplicate, misleading "pipeline wedged" alarm for what is the intended halt-and-alert outcome. Fix: gate the alert with `unless on() k8s_upgrade_blocked == 1`. A deliberate block sets that gauge (and it stays 1 until the next preflight resets it), so the chain-job-failed alert is suppressed for the blocked period; a genuine wedge / crash / halt-on-alert exits 1 WITHOUT setting it, so it still fires (preserving the alert's original purpose — catching the pre-in_flight preflight failure that hid the 5-day 1.34.9 wedge). Runbook + automated-upgrades docs updated to match. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| agent-task-tracking.md | ||
| authentication.md | ||
| automated-upgrades.md | ||
| backup-dr.md | ||
| chrome-service.md | ||
| ci-cd.md | ||
| compute.md | ||
| databases.md | ||
| dns.md | ||
| homepage.md | ||
| incident-response.md | ||
| llama-cpp.md | ||
| mailserver.md | ||
| monitoring.md | ||
| multi-tenancy.md | ||
| networking.md | ||
| overview.md | ||
| secrets.md | ||
| security.md | ||
| storage.md | ||
| vpn.md | ||
| wave1-egress-observation-2026-05-22.md | ||