kured: fix sentinel path mismatch that stalled rolling reboots
The kured Helm chart derives the sentinel hostPath from `dirname(configuration.rebootSentinel)`. Previously rebootSentinel=/sentinel/gated-reboot-required pointed hostPath at `/sentinel/` (an empty auto-created directory on every host) while the kured-sentinel-gate DaemonSet writes to /var/run/gated-reboot-required. Two different host directories → kured never saw the open gate, even though the gate's checks were all green every 5 min on every node. Result: unattended-upgrades has packages waiting on every node since 2026-05-10 (when uu was re-enabled) and kured's hourly log says "Reboot not required" for the entire period. Set rebootSentinel=/var/run/gated-reboot-required so the chart mounts hostPath /var/run — same directory the gate writes to. The in-pod mountPath (/sentinel) is hardcoded by the chart and doesn't matter, the symlink chain works out: /sentinel/<file> inside the pod resolves to /var/run/<file> on the host. Verified: kured pod can now list /sentinel/gated-reboot-required (0 B) AND /sentinel/reboot-required (32 B, set by uu on 2026-05-15). First gated reboot will land Mon 2026-05-18 02:00 London. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
80e6314bf0
commit
065982d978
1 changed files with 11 additions and 1 deletions
|
|
@ -58,7 +58,17 @@ resource "helm_release" "kured" {
|
|||
startTime = "02:00"
|
||||
endTime = "06:00"
|
||||
rebootDays = ["mo", "tu", "we", "th", "fr"]
|
||||
rebootSentinel = "/sentinel/gated-reboot-required"
|
||||
# IMPORTANT: must match where kured-sentinel-gate writes (below):
|
||||
# `touch /host/var-run/gated-reboot-required` → host
|
||||
# `/var/run/gated-reboot-required`. The kured chart derives the host
|
||||
# path from `dirname(rebootSentinel)`, so this single setting controls
|
||||
# BOTH the in-pod mountPath AND the host hostPath. Previously
|
||||
# `/sentinel/gated-reboot-required` — that pointed the chart's hostPath
|
||||
# at `/sentinel/` (empty, auto-created by hostPath:Directory) while the
|
||||
# gate kept writing to `/var/run/`. kured never saw the open gate so
|
||||
# nodes stopped rebooting on 2026-05-10 when unattended-upgrades was
|
||||
# re-enabled. Fixed 2026-05-16.
|
||||
rebootSentinel = "/var/run/gated-reboot-required"
|
||||
notifyUrl = data.vault_kv_secret_v2.secrets.data["slack_kured_webhook"]
|
||||
concurrency = 1
|
||||
rebootDelay = "30s"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue