kured: drop Mon-Fri restriction, reboot any day
The weekday-only schedule was a 2026-03-16-incident-era guardrail when the rest of the safety net was thin. Today's gates — halt-on-alert, sentinel-gate Check 4 (24h soak via node Ready transitions), the K8sUpgradeStalled alert, drainTimeout=30m, concurrency=1, and the sentinel-path fix from earlier today — make weekend reboots safe and just clear the backlog faster. Effect: 5 pending node reboots clear in 5 calendar days instead of queueing up over weekends. The K8s version-upgrade detection at Sun 12:00 UTC self-defers if a Sunday-morning kured reboot fires (the RecentNodeReboot alert is in the Upgrade Gates ignore-less list for the version-upgrade preflight — same mechanism kured uses). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
8cbfa6856c
commit
48abb7c520
4 changed files with 17 additions and 8 deletions
|
|
@ -14,9 +14,12 @@ detection CronJob → preflight Job → master Job → worker × 4 Jobs → post
|
|||
```
|
||||
|
||||
This is **independent** of the OS-side `unattended-upgrades + kured`
|
||||
pipeline (see `k8s-node-auto-upgrades.md`). They do not share rollouts and
|
||||
their schedules don't overlap (kured runs Mon-Fri 02:00-06:00 London;
|
||||
detection here runs Sun 12:00 UTC).
|
||||
pipeline (see `k8s-node-auto-upgrades.md`). They do not share rollouts.
|
||||
Schedules can overlap (kured runs daily 02:00-06:00 London; detection
|
||||
here runs Sun 12:00 UTC) — when a kured reboot lands within 24h of the
|
||||
Sunday detection, the `RecentNodeReboot` alert in the Upgrade Gates
|
||||
group blocks the version-upgrade preflight, so the chain self-defers
|
||||
to the next Sunday rather than rolling on top of a half-fresh node.
|
||||
|
||||
## Architecture
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue