k8s-version-upgrade: switch detection cron from weekly to daily

Was `0 12 * * 0` (Sun 12:00 UTC) — patch releases waited up to 6 days
before the chain picked them up. Now `0 12 * * *` (daily 12:00 UTC,
still outside kured's 02:00-06:00 London window). Concurrency is
bounded by Forbid + deterministic job-name idempotency (the detection
job exits early if a preflight Job for the same target already exists),
so back-to-back days can't pile up parallel runs.

- stacks/k8s-version-upgrade/main.tf: var.schedule default + rationale comment
- scripts/upgrade_state.sh: rename next_sunday_noon_utc -> next_daily_noon_utc
  (now returns "Tue 2026-05-19 12:00 UTC" form); change "(Sun cron)" label
  to "(daily cron)"
- .claude/skills/upgrade-state/SKILL.md: cadence column + frontmatter

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-18 18:29:08 +00:00
parent 018ef3790f
commit 3d43d96a5e
3 changed files with 22 additions and 21 deletions

View file

@ -8,7 +8,7 @@ description: |
(2) User asks "what's pending upgrade" or "what's the upgrade state",
(3) User asks if Keel / kured / k8s-version-check is healthy,
(4) User asks about kept-back / held packages or pending reboots,
(5) Before the Sunday `k8s-version-check` CronJob fires (weekly survey).
(5) Periodic survey before the next `k8s-version-check` daily run.
Read-only — no `--fix`. Exits 0 healthy / 1 attention / 2 stalled.
author: Claude Code
version: 1.0.0
@ -51,7 +51,7 @@ Exit codes: `0` healthy, `1` attention warranted, `2` stalled / broken.
|---|---|---|---|
| **Apps** | Keel polls every watched Deployment's container registry; rolls on new digest | hourly | Prom (`pending_approvals`, `registries_scanned_total`), Keel pod logs |
| **OS** | `unattended-upgrades` in-release patching; `kured` reboots when `/var/run/reboot-required` is set | daily 02:00-06:00 London | SSH fan-out to all 5 nodes |
| **K8s** | `k8s-version-check` CronJob detects new kubeadm patch/minor; spawns the Job-chain that drains+upgrades node-by-node | Sun 12:00 UTC | Pushgateway (`k8s_upgrade_*`), `kubectl get nodes` |
| **K8s** | `k8s-version-check` CronJob detects new kubeadm patch/minor; spawns the Job-chain that drains+upgrades node-by-node | daily 12:00 UTC | Pushgateway (`k8s_upgrade_*`), `kubectl get nodes` |
The K8s pipeline pushes a small set of gauges to the Prometheus
Pushgateway (`prometheus-prometheus-pushgateway.monitoring:9091`):
@ -138,8 +138,9 @@ kubectl -n kured get pods -l name=kured-sentinel-gate
### K8s `→` — patch/minor available
Detection ran, target identified, chain NOT started. This is normal
between Sun 12:00 UTC detection and the next Job chain.
Detection ran, target identified, chain NOT started. The chain spawns
on the same daily detection cycle — typically within ~24h of the
target first being detected.
```bash
# Inspect Pushgateway state

View file

@ -384,7 +384,7 @@ collect_k8s() {
if [[ "$last_run_int" -gt 0 ]]; then
local age=$((NOW_EPOCH - last_run_int))
K8S_LAST_CHECK="$(human_age "$age") (Sun cron)"
K8S_LAST_CHECK="$(human_age "$age") (daily cron)"
if [[ -n "$target_patch" ]]; then
K8S_LAST_DETECT_LINE="last run $(human_age "$age"): available v$target_patch (patch)"
elif [[ -n "$target_minor" ]]; then
@ -415,7 +415,7 @@ collect_k8s() {
fi
fi
K8S_NEXT="$(next_sunday_noon_utc)"
K8S_NEXT="$(next_daily_noon_utc)"
# Status logic.
local stalled=0
@ -453,20 +453,15 @@ collect_k8s() {
fi
}
# Next Sun 12:00 UTC — pure bash date math, no croniter.
next_sunday_noon_utc() {
local now_iso target_iso
now_iso=$(date -u +%FT%TZ)
# date %u: Mon=1..Sun=7. Sun=7.
local dow; dow=$(date -u +%u)
local days_until=$(( (7 - dow) % 7 ))
# If today is Sunday and it's before 12:00 UTC, "next" is today.
if [[ "$dow" == "7" ]]; then
local hr; hr=$(date -u +%H)
[[ "$hr" -lt 12 ]] && days_until=0 || days_until=7
fi
target_iso=$(date -u -d "+$days_until days" +"%Y-%m-%d 12:00 UTC")
echo "Sun $target_iso"
# Next daily 12:00 UTC — pure bash date math, no croniter. Schedule was
# weekly Sunday until 2026-05-18; now `0 12 * * *` in the
# k8s-version-upgrade stack. If we're still before today's 12:00 UTC,
# the next run is today; otherwise it's tomorrow.
next_daily_noon_utc() {
local hr days_ahead
hr=$(date -u +%H)
if [[ "$hr" -lt 12 ]]; then days_ahead=0; else days_ahead=1; fi
date -u -d "+$days_ahead days" +"%a %Y-%m-%d 12:00 UTC"
}
# --- Renderers ---

View file

@ -26,7 +26,12 @@
variable "schedule" {
type = string
default = "0 12 * * 0" # Sunday 12:00 UTC outside kured window
# Daily 12:00 UTC outside kured window (kured runs 02:00-06:00
# London). Was weekly Sunday until 2026-05-18; daily picks up upstream
# patch releases the same day they land. Concurrency is bounded by the
# CronJob's Forbid policy + Job-name idempotency (the detection job
# skips spawning a preflight Job if one already exists).
default = "0 12 * * *"
}
variable "enabled" {