infra/stacks/kured/main.tf
Viktor Barzin 01955916b2 [infra] Adopt kured + sentinel-gate into Terraform (Wave 5a)
## Context

Wave 5a of the state-drift consolidation plan. Two cluster-critical pieces
of infrastructure lived OUTSIDE Terraform — invisible to the repo's "all
cluster changes via TF" invariant and drifting silently:

1. **kured** (Helm release): deployed 265d ago via `helm install kured` on
   the CLI. Values were edited only via `helm upgrade` — never captured.
   Chart version `kured-5.11.0`, app `1.21.0`, configured for Mon–Fri
   02:00–06:00 London reboot window, Slack notifyUrl, and a custom
   `/sentinel/gated-reboot-required` sentinel file.

2. **kured-sentinel-gate**: a custom DaemonSet + ServiceAccount +
   ClusterRole + ClusterRoleBinding. Built after the 2026-03 post-mortem
   (memory 390) when kured rebooted nodes during a containerd overlayfs
   outage and turned a single-node blip into a 26h cluster outage.
   The gate DaemonSet creates `/var/run/gated-reboot-required` only when
   (a) host has `/var/run/reboot-required`, (b) all nodes Ready, (c) all
   calico-node pods Running, (d) no node transitioned Ready in the last
   30 minutes (cool-down). kured's `rebootSentinel` then points at the
   gated file so reboots are effectively gated by cluster health.
   Applied 33d ago via `kubectl apply` — no TF footprint.

Both are now codified in the new `stacks/kured/` (Tier 1, PG state).

## This change

- New stack `stacks/kured/` with `main.tf` (247 lines) + `terragrunt.hcl`
  (standard platform-dep) + `secrets` symlink.
- All 6 resources adopted via Wave 8's HCL `import {}` block pattern
  (commit 8a99be11) — written as `import {}` stanzas in the initial
  commit, plan-applied to zero, then stanzas deleted before this commit
  per the convention:
    - `kubernetes_namespace.kured` (id: `kured`)
    - `helm_release.kured` (id: `kured/kured`)
    - `kubernetes_service_account.kured_sentinel_gate` (id: `kured/kured-sentinel-gate`)
    - `kubernetes_cluster_role.kured_sentinel_gate` (id: `kured-sentinel-gate`)
    - `kubernetes_cluster_role_binding.kured_sentinel_gate` (id: `kured-sentinel-gate`)
    - `kubernetes_daemon_set_v1.kured_sentinel_gate` (id: `kured/kured-sentinel-gate`)
- Slack notifyUrl moved from inline helm values into Vault at
  `secret/kured` under key `slack_kured_webhook`, consumed via
  `data "vault_kv_secret_v2"`. No plaintext secret in git.
- Namespace gets `tier = "1-cluster"` label (new — previously untiered,
  so Kyverno auto-quotas applied cluster-tier defaults on kured pods).
  Benign additive change; pod specs have explicit resources anyway.
- DaemonSet + SA get `automount_service_account_token = false` /
  `enable_service_links = false` to match the live pod spec exactly —
  otherwise TF schema defaults would flip these fields.
- DaemonSet carries `# KYVERNO_LIFECYCLE_V1` suppressing dns_config drift
  (Wave 3A convention, commit c9d221d5 + 327ce215).
- Namespace carries the same marker on the
  `goldilocks.fairwinds.com/vpa-update-mode` label (Wave 3B sweep,
  commit 8b43692a).

## Import outcomes

Apply result: `Resources: 6 imported, 0 added, 3 changed, 0 destroyed.`

The 3 in-place changes were all TF-schema reconciliation, not cluster
mutations:

- `helm_release.kured.values` — format reshuffle; the imported state
  stored values as a nested map, HCL uses `[yamlencode(...)]`. Semantic
  YAML is byte-identical, so the triggered Helm upgrade was a no-op on
  the cluster side (revision bumped 2→3, zero pod restarts).
- `kubernetes_namespace.kured.labels["tier"]` = `"1-cluster"` — new
  label added. Already discussed above.
- `kubernetes_daemon_set_v1.kured_sentinel_gate.wait_for_rollout` = true
  — TF-only attribute, no k8s impact.

Post-apply `scripts/tg plan` on `stacks/kured` returns:
`No changes. Your infrastructure matches the configuration.`

## What is NOT in this change

- `import {}` stanzas — intentionally removed after the apply landed.
  They would be no-ops and would clutter future diffs. Per Wave 8
  convention (AGENTS.md → "Adopting Existing Resources").
- Calico adoption (Wave 5b) — separate higher-blast change, needs a
  dedicated low-traffic window.
- local-path-storage (Wave 5c) — check-or-remove task still open.

## Verification

```
$ kubectl -n kured get ds
NAME                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE
kured                 5         5         5       5            5
kured-sentinel-gate   5         5         5       5            5

$ helm -n kured list
NAME     NAMESPACE   REVISION  STATUS    CHART          APP VERSION
kured    kured       3         deployed  kured-5.11.0   1.21.0

$ cd stacks/kured && ../../scripts/tg plan | tail -1
No changes. Your infrastructure matches the configuration.
```

## Reproduce locally
1. `git pull`
2. `cd stacks/kured && ../../scripts/tg plan` → 0 changes
3. `kubectl -n kured get ds,pods` — 5 kured + 5 sentinel-gate pods Ready.

Closes: code-q8k

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 22:33:29 +00:00

252 lines
8.4 KiB
HCL

# kured — Kubernetes Reboot Daemon
#
# Auto-reboots nodes when /var/run/reboot-required exists on the host (set by
# unattended-upgrades). The reboot process is gated by a custom sentinel file
# (kured-sentinel-gate DaemonSet below) so reboots only happen when:
# - all nodes Ready
# - all calico-node pods Running
# - no node has transitioned Ready in the last 30 minutes (cool-down)
#
# History:
# - 2026-03 post-mortem (memory 390): 26h cluster outage triggered by kured
# rebooting nodes while containerd's overlayfs snapshotter was corrupted.
# Remediation included the sentinel gate and a tight reboot window
# (Mon-Fri 02:00-06:00 London).
# - 2026-04-18: adopted into Terraform (Wave 5a). Previously helm-installed
# manually + kubectl-applied sentinel gate.
resource "kubernetes_namespace" "kured" {
metadata {
name = "kured"
labels = {
"istio-injection" = "disabled"
tier = local.tiers.cluster
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
# -----------------------------------------------------------------------------
# kured Helm release
# -----------------------------------------------------------------------------
resource "helm_release" "kured" {
namespace = kubernetes_namespace.kured.metadata[0].name
create_namespace = false
name = "kured"
chart = "kured"
repository = "https://kubereboot.github.io/charts/"
version = "5.11.0"
values = [yamlencode({
configuration = {
period = "1h0m0s"
timeZone = "Europe/London"
startTime = "02:00"
endTime = "06:00"
rebootDays = ["mo", "tu", "we", "th", "fr"]
rebootSentinel = "/sentinel/gated-reboot-required"
notifyUrl = data.vault_kv_secret_v2.secrets.data["slack_kured_webhook"]
}
reboot_days = "mon,tue,wed,thu,fri"
window_end = "06:00"
window_start = "22:00"
service = {
annotations = {
"prometheus.io/scrape" = "true"
"prometheus.io/port" = "8080"
"prometheus.io/path" = "/metrics"
}
}
})]
}
data "vault_kv_secret_v2" "secrets" {
mount = "secret"
name = "kured"
}
# -----------------------------------------------------------------------------
# kured-sentinel-gate
#
# Runs a DaemonSet that creates /var/run/gated-reboot-required ONLY when all
# safety preconditions are met (see script). kured's rebootSentinel points at
# this file, so reboots are effectively blocked unless every check passes.
# -----------------------------------------------------------------------------
resource "kubernetes_service_account" "kured_sentinel_gate" {
metadata {
name = "kured-sentinel-gate"
namespace = kubernetes_namespace.kured.metadata[0].name
}
automount_service_account_token = false
}
resource "kubernetes_cluster_role" "kured_sentinel_gate" {
metadata {
name = "kured-sentinel-gate"
}
rule {
api_groups = [""]
resources = ["nodes"]
verbs = ["list"]
}
rule {
api_groups = [""]
resources = ["pods"]
verbs = ["list"]
}
}
resource "kubernetes_cluster_role_binding" "kured_sentinel_gate" {
metadata {
name = "kured-sentinel-gate"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = kubernetes_cluster_role.kured_sentinel_gate.metadata[0].name
}
subject {
kind = "ServiceAccount"
name = kubernetes_service_account.kured_sentinel_gate.metadata[0].name
namespace = kubernetes_namespace.kured.metadata[0].name
}
}
resource "kubernetes_daemon_set_v1" "kured_sentinel_gate" {
metadata {
name = "kured-sentinel-gate"
namespace = kubernetes_namespace.kured.metadata[0].name
labels = {
app = "kured-sentinel-gate"
tier = local.tiers.cluster
}
}
spec {
selector {
match_labels = {
app = "kured-sentinel-gate"
}
}
template {
metadata {
labels = {
app = "kured-sentinel-gate"
}
}
spec {
service_account_name = kubernetes_service_account.kured_sentinel_gate.metadata[0].name
automount_service_account_token = false
enable_service_links = false
toleration {
effect = "NoSchedule"
key = "node-role.kubernetes.io/control-plane"
operator = "Equal"
}
toleration {
effect = "NoSchedule"
key = "node-role.kubernetes.io/master"
operator = "Equal"
}
container {
name = "gate"
image = "bitnami/kubectl:latest"
image_pull_policy = "Always"
command = [
"/bin/bash",
"-c",
<<-EOT
while true; do
echo "[$(date)] Checking reboot gate conditions..."
# Check 1: Does the host need a reboot?
if [ ! -f /host/var-run/reboot-required ]; then
echo " No reboot required on this host"
rm -f /host/var-run/gated-reboot-required
sleep 300
continue
fi
echo " Host has /var/run/reboot-required"
# Check 2: Are ALL nodes Ready?
NOT_READY=$(kubectl get nodes --no-headers | grep -v ' Ready' | wc -l | tr -d ' ')
if [ "$NOT_READY" -gt 0 ]; then
echo " BLOCKED: $NOT_READY node(s) not Ready"
rm -f /host/var-run/gated-reboot-required
sleep 300
continue
fi
echo " All nodes Ready"
# Check 3: Are ALL calico-node pods Running?
CALICO_NOT_RUNNING=$(kubectl get pods -n calico-system -l k8s-app=calico-node --no-headers 2>/dev/null | grep -v Running | wc -l | tr -d ' ')
if [ "$CALICO_NOT_RUNNING" -gt 0 ]; then
echo " BLOCKED: $CALICO_NOT_RUNNING calico-node pod(s) not Running"
rm -f /host/var-run/gated-reboot-required
sleep 300
continue
fi
echo " All calico-node pods Running"
# Check 4: No node rebooted in last 30 minutes (cool-down)
RECENT_REBOOT=0
while IFS= read -r transition_time; do
if [ -n "$transition_time" ]; then
transition_epoch=$(date -d "$transition_time" +%s 2>/dev/null || date -j -f "%Y-%m-%dT%H:%M:%SZ" "$transition_time" +%s 2>/dev/null)
now_epoch=$(date +%s)
diff=$(( now_epoch - transition_epoch ))
if [ "$diff" -lt 1800 ]; then
RECENT_REBOOT=1
break
fi
fi
done < <(kubectl get nodes -o jsonpath='{range .items[*]}{range .status.conditions[?(@.type=="Ready")]}{.lastTransitionTime}{"\n"}{end}{end}')
if [ "$RECENT_REBOOT" -eq 1 ]; then
echo " BLOCKED: A node transitioned Ready within the last 30 minutes (cool-down)"
rm -f /host/var-run/gated-reboot-required
sleep 300
continue
fi
echo " No recent node reboots (30m cool-down clear)"
# All checks passed — create gated sentinel
echo " ALL CHECKS PASSED — creating /var/run/gated-reboot-required"
touch /host/var-run/gated-reboot-required
sleep 300
done
EOT
]
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "64Mi"
}
}
volume_mount {
name = "var-run"
mount_path = "/host/var-run"
}
}
volume {
name = "var-run"
host_path {
path = "/var/run"
type = "Directory"
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
}