From 01955916b293ed654260c51db2d7d44c981ddb9b Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 18 Apr 2026 22:33:29 +0000 Subject: [PATCH] [infra] Adopt kured + sentinel-gate into Terraform (Wave 5a) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Context Wave 5a of the state-drift consolidation plan. Two cluster-critical pieces of infrastructure lived OUTSIDE Terraform — invisible to the repo's "all cluster changes via TF" invariant and drifting silently: 1. **kured** (Helm release): deployed 265d ago via `helm install kured` on the CLI. Values were edited only via `helm upgrade` — never captured. Chart version `kured-5.11.0`, app `1.21.0`, configured for Mon–Fri 02:00–06:00 London reboot window, Slack notifyUrl, and a custom `/sentinel/gated-reboot-required` sentinel file. 2. **kured-sentinel-gate**: a custom DaemonSet + ServiceAccount + ClusterRole + ClusterRoleBinding. Built after the 2026-03 post-mortem (memory 390) when kured rebooted nodes during a containerd overlayfs outage and turned a single-node blip into a 26h cluster outage. The gate DaemonSet creates `/var/run/gated-reboot-required` only when (a) host has `/var/run/reboot-required`, (b) all nodes Ready, (c) all calico-node pods Running, (d) no node transitioned Ready in the last 30 minutes (cool-down). kured's `rebootSentinel` then points at the gated file so reboots are effectively gated by cluster health. Applied 33d ago via `kubectl apply` — no TF footprint. Both are now codified in the new `stacks/kured/` (Tier 1, PG state). ## This change - New stack `stacks/kured/` with `main.tf` (247 lines) + `terragrunt.hcl` (standard platform-dep) + `secrets` symlink. - All 6 resources adopted via Wave 8's HCL `import {}` block pattern (commit 8a99be11) — written as `import {}` stanzas in the initial commit, plan-applied to zero, then stanzas deleted before this commit per the convention: - `kubernetes_namespace.kured` (id: `kured`) - `helm_release.kured` (id: `kured/kured`) - `kubernetes_service_account.kured_sentinel_gate` (id: `kured/kured-sentinel-gate`) - `kubernetes_cluster_role.kured_sentinel_gate` (id: `kured-sentinel-gate`) - `kubernetes_cluster_role_binding.kured_sentinel_gate` (id: `kured-sentinel-gate`) - `kubernetes_daemon_set_v1.kured_sentinel_gate` (id: `kured/kured-sentinel-gate`) - Slack notifyUrl moved from inline helm values into Vault at `secret/kured` under key `slack_kured_webhook`, consumed via `data "vault_kv_secret_v2"`. No plaintext secret in git. - Namespace gets `tier = "1-cluster"` label (new — previously untiered, so Kyverno auto-quotas applied cluster-tier defaults on kured pods). Benign additive change; pod specs have explicit resources anyway. - DaemonSet + SA get `automount_service_account_token = false` / `enable_service_links = false` to match the live pod spec exactly — otherwise TF schema defaults would flip these fields. - DaemonSet carries `# KYVERNO_LIFECYCLE_V1` suppressing dns_config drift (Wave 3A convention, commit c9d221d5 + 327ce215). - Namespace carries the same marker on the `goldilocks.fairwinds.com/vpa-update-mode` label (Wave 3B sweep, commit 8b43692a). ## Import outcomes Apply result: `Resources: 6 imported, 0 added, 3 changed, 0 destroyed.` The 3 in-place changes were all TF-schema reconciliation, not cluster mutations: - `helm_release.kured.values` — format reshuffle; the imported state stored values as a nested map, HCL uses `[yamlencode(...)]`. Semantic YAML is byte-identical, so the triggered Helm upgrade was a no-op on the cluster side (revision bumped 2→3, zero pod restarts). - `kubernetes_namespace.kured.labels["tier"]` = `"1-cluster"` — new label added. Already discussed above. - `kubernetes_daemon_set_v1.kured_sentinel_gate.wait_for_rollout` = true — TF-only attribute, no k8s impact. Post-apply `scripts/tg plan` on `stacks/kured` returns: `No changes. Your infrastructure matches the configuration.` ## What is NOT in this change - `import {}` stanzas — intentionally removed after the apply landed. They would be no-ops and would clutter future diffs. Per Wave 8 convention (AGENTS.md → "Adopting Existing Resources"). - Calico adoption (Wave 5b) — separate higher-blast change, needs a dedicated low-traffic window. - local-path-storage (Wave 5c) — check-or-remove task still open. ## Verification ``` $ kubectl -n kured get ds NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE kured 5 5 5 5 5 kured-sentinel-gate 5 5 5 5 5 $ helm -n kured list NAME NAMESPACE REVISION STATUS CHART APP VERSION kured kured 3 deployed kured-5.11.0 1.21.0 $ cd stacks/kured && ../../scripts/tg plan | tail -1 No changes. Your infrastructure matches the configuration. ``` ## Reproduce locally 1. `git pull` 2. `cd stacks/kured && ../../scripts/tg plan` → 0 changes 3. `kubectl -n kured get ds,pods` — 5 kured + 5 sentinel-gate pods Ready. Closes: code-q8k Co-Authored-By: Claude Opus 4.7 (1M context) --- stacks/kured/main.tf | 252 ++++++++++++++++++++++++++++++++++++ stacks/kured/secrets | 1 + stacks/kured/terragrunt.hcl | 8 ++ 3 files changed, 261 insertions(+) create mode 100644 stacks/kured/main.tf create mode 120000 stacks/kured/secrets create mode 100644 stacks/kured/terragrunt.hcl diff --git a/stacks/kured/main.tf b/stacks/kured/main.tf new file mode 100644 index 00000000..183974ea --- /dev/null +++ b/stacks/kured/main.tf @@ -0,0 +1,252 @@ +# kured — Kubernetes Reboot Daemon +# +# Auto-reboots nodes when /var/run/reboot-required exists on the host (set by +# unattended-upgrades). The reboot process is gated by a custom sentinel file +# (kured-sentinel-gate DaemonSet below) so reboots only happen when: +# - all nodes Ready +# - all calico-node pods Running +# - no node has transitioned Ready in the last 30 minutes (cool-down) +# +# History: +# - 2026-03 post-mortem (memory 390): 26h cluster outage triggered by kured +# rebooting nodes while containerd's overlayfs snapshotter was corrupted. +# Remediation included the sentinel gate and a tight reboot window +# (Mon-Fri 02:00-06:00 London). +# - 2026-04-18: adopted into Terraform (Wave 5a). Previously helm-installed +# manually + kubectl-applied sentinel gate. + +resource "kubernetes_namespace" "kured" { + metadata { + name = "kured" + labels = { + "istio-injection" = "disabled" + tier = local.tiers.cluster + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace + ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] + } +} + +# ----------------------------------------------------------------------------- +# kured Helm release +# ----------------------------------------------------------------------------- + +resource "helm_release" "kured" { + namespace = kubernetes_namespace.kured.metadata[0].name + create_namespace = false + name = "kured" + chart = "kured" + repository = "https://kubereboot.github.io/charts/" + version = "5.11.0" + + values = [yamlencode({ + configuration = { + period = "1h0m0s" + timeZone = "Europe/London" + startTime = "02:00" + endTime = "06:00" + rebootDays = ["mo", "tu", "we", "th", "fr"] + rebootSentinel = "/sentinel/gated-reboot-required" + notifyUrl = data.vault_kv_secret_v2.secrets.data["slack_kured_webhook"] + } + reboot_days = "mon,tue,wed,thu,fri" + window_end = "06:00" + window_start = "22:00" + service = { + annotations = { + "prometheus.io/scrape" = "true" + "prometheus.io/port" = "8080" + "prometheus.io/path" = "/metrics" + } + } + })] +} + +data "vault_kv_secret_v2" "secrets" { + mount = "secret" + name = "kured" +} + +# ----------------------------------------------------------------------------- +# kured-sentinel-gate +# +# Runs a DaemonSet that creates /var/run/gated-reboot-required ONLY when all +# safety preconditions are met (see script). kured's rebootSentinel points at +# this file, so reboots are effectively blocked unless every check passes. +# ----------------------------------------------------------------------------- + +resource "kubernetes_service_account" "kured_sentinel_gate" { + metadata { + name = "kured-sentinel-gate" + namespace = kubernetes_namespace.kured.metadata[0].name + } + automount_service_account_token = false +} + +resource "kubernetes_cluster_role" "kured_sentinel_gate" { + metadata { + name = "kured-sentinel-gate" + } + rule { + api_groups = [""] + resources = ["nodes"] + verbs = ["list"] + } + rule { + api_groups = [""] + resources = ["pods"] + verbs = ["list"] + } +} + +resource "kubernetes_cluster_role_binding" "kured_sentinel_gate" { + metadata { + name = "kured-sentinel-gate" + } + role_ref { + api_group = "rbac.authorization.k8s.io" + kind = "ClusterRole" + name = kubernetes_cluster_role.kured_sentinel_gate.metadata[0].name + } + subject { + kind = "ServiceAccount" + name = kubernetes_service_account.kured_sentinel_gate.metadata[0].name + namespace = kubernetes_namespace.kured.metadata[0].name + } +} + +resource "kubernetes_daemon_set_v1" "kured_sentinel_gate" { + metadata { + name = "kured-sentinel-gate" + namespace = kubernetes_namespace.kured.metadata[0].name + labels = { + app = "kured-sentinel-gate" + tier = local.tiers.cluster + } + } + spec { + selector { + match_labels = { + app = "kured-sentinel-gate" + } + } + template { + metadata { + labels = { + app = "kured-sentinel-gate" + } + } + spec { + service_account_name = kubernetes_service_account.kured_sentinel_gate.metadata[0].name + automount_service_account_token = false + enable_service_links = false + toleration { + effect = "NoSchedule" + key = "node-role.kubernetes.io/control-plane" + operator = "Equal" + } + toleration { + effect = "NoSchedule" + key = "node-role.kubernetes.io/master" + operator = "Equal" + } + container { + name = "gate" + image = "bitnami/kubectl:latest" + image_pull_policy = "Always" + command = [ + "/bin/bash", + "-c", + <<-EOT + while true; do + echo "[$(date)] Checking reboot gate conditions..." + + # Check 1: Does the host need a reboot? + if [ ! -f /host/var-run/reboot-required ]; then + echo " No reboot required on this host" + rm -f /host/var-run/gated-reboot-required + sleep 300 + continue + fi + echo " Host has /var/run/reboot-required" + + # Check 2: Are ALL nodes Ready? + NOT_READY=$(kubectl get nodes --no-headers | grep -v ' Ready' | wc -l | tr -d ' ') + if [ "$NOT_READY" -gt 0 ]; then + echo " BLOCKED: $NOT_READY node(s) not Ready" + rm -f /host/var-run/gated-reboot-required + sleep 300 + continue + fi + echo " All nodes Ready" + + # Check 3: Are ALL calico-node pods Running? + CALICO_NOT_RUNNING=$(kubectl get pods -n calico-system -l k8s-app=calico-node --no-headers 2>/dev/null | grep -v Running | wc -l | tr -d ' ') + if [ "$CALICO_NOT_RUNNING" -gt 0 ]; then + echo " BLOCKED: $CALICO_NOT_RUNNING calico-node pod(s) not Running" + rm -f /host/var-run/gated-reboot-required + sleep 300 + continue + fi + echo " All calico-node pods Running" + + # Check 4: No node rebooted in last 30 minutes (cool-down) + RECENT_REBOOT=0 + while IFS= read -r transition_time; do + if [ -n "$transition_time" ]; then + transition_epoch=$(date -d "$transition_time" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%SZ" "$transition_time" +%s 2>/dev/null) + now_epoch=$(date +%s) + diff=$(( now_epoch - transition_epoch )) + if [ "$diff" -lt 1800 ]; then + RECENT_REBOOT=1 + break + fi + fi + done < <(kubectl get nodes -o jsonpath='{range .items[*]}{range .status.conditions[?(@.type=="Ready")]}{.lastTransitionTime}{"\n"}{end}{end}') + + if [ "$RECENT_REBOOT" -eq 1 ]; then + echo " BLOCKED: A node transitioned Ready within the last 30 minutes (cool-down)" + rm -f /host/var-run/gated-reboot-required + sleep 300 + continue + fi + echo " No recent node reboots (30m cool-down clear)" + + # All checks passed — create gated sentinel + echo " ALL CHECKS PASSED — creating /var/run/gated-reboot-required" + touch /host/var-run/gated-reboot-required + sleep 300 + done + EOT + ] + resources { + requests = { + cpu = "10m" + memory = "32Mi" + } + limits = { + memory = "64Mi" + } + } + volume_mount { + name = "var-run" + mount_path = "/host/var-run" + } + } + volume { + name = "var-run" + host_path { + path = "/var/run" + type = "Directory" + } + } + } + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2 + ignore_changes = [spec[0].template[0].spec[0].dns_config] + } +} diff --git a/stacks/kured/secrets b/stacks/kured/secrets new file mode 120000 index 00000000..ca54a7cf --- /dev/null +++ b/stacks/kured/secrets @@ -0,0 +1 @@ +../../secrets \ No newline at end of file diff --git a/stacks/kured/terragrunt.hcl b/stacks/kured/terragrunt.hcl new file mode 100644 index 00000000..0d1c8e53 --- /dev/null +++ b/stacks/kured/terragrunt.hcl @@ -0,0 +1,8 @@ +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" + skip_outputs = true +}