From 065982d9783b0dbb60742110132c57846dfbc8b9 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 16 May 2026 11:19:13 +0000 Subject: [PATCH] kured: fix sentinel path mismatch that stalled rolling reboots MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The kured Helm chart derives the sentinel hostPath from `dirname(configuration.rebootSentinel)`. Previously rebootSentinel=/sentinel/gated-reboot-required pointed hostPath at `/sentinel/` (an empty auto-created directory on every host) while the kured-sentinel-gate DaemonSet writes to /var/run/gated-reboot-required. Two different host directories → kured never saw the open gate, even though the gate's checks were all green every 5 min on every node. Result: unattended-upgrades has packages waiting on every node since 2026-05-10 (when uu was re-enabled) and kured's hourly log says "Reboot not required" for the entire period. Set rebootSentinel=/var/run/gated-reboot-required so the chart mounts hostPath /var/run — same directory the gate writes to. The in-pod mountPath (/sentinel) is hardcoded by the chart and doesn't matter, the symlink chain works out: /sentinel/ inside the pod resolves to /var/run/ on the host. Verified: kured pod can now list /sentinel/gated-reboot-required (0 B) AND /sentinel/reboot-required (32 B, set by uu on 2026-05-15). First gated reboot will land Mon 2026-05-18 02:00 London. Co-Authored-By: Claude Opus 4.7 --- stacks/kured/main.tf | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/stacks/kured/main.tf b/stacks/kured/main.tf index 7292bb00..514e07da 100644 --- a/stacks/kured/main.tf +++ b/stacks/kured/main.tf @@ -58,7 +58,17 @@ resource "helm_release" "kured" { startTime = "02:00" endTime = "06:00" rebootDays = ["mo", "tu", "we", "th", "fr"] - rebootSentinel = "/sentinel/gated-reboot-required" + # IMPORTANT: must match where kured-sentinel-gate writes (below): + # `touch /host/var-run/gated-reboot-required` → host + # `/var/run/gated-reboot-required`. The kured chart derives the host + # path from `dirname(rebootSentinel)`, so this single setting controls + # BOTH the in-pod mountPath AND the host hostPath. Previously + # `/sentinel/gated-reboot-required` — that pointed the chart's hostPath + # at `/sentinel/` (empty, auto-created by hostPath:Directory) while the + # gate kept writing to `/var/run/`. kured never saw the open gate so + # nodes stopped rebooting on 2026-05-10 when unattended-upgrades was + # re-enabled. Fixed 2026-05-16. + rebootSentinel = "/var/run/gated-reboot-required" notifyUrl = data.vault_kv_secret_v2.secrets.data["slack_kured_webhook"] concurrency = 1 rebootDelay = "30s"