infra/stacks
Viktor Barzin 9e045e2c16 upgrade-state: skill + script + Keel scrape for periodic three-pipeline audit
Three autonomous-upgrade pipelines run independently — Keel for apps
(hourly registry polling), unattended-upgrades+kured for OS, and the
k8s-version-check chain for kubeadm/kubelet/kubectl. Until now there
was no single place to see whether each was healthy, what's pending,
or whether anything's stuck. The /upgrade-state skill collapses the
state of all three into one table you can run before each Sunday's
k8s-version-check fires.

- stacks/keel/main.tf: add Prometheus pod-annotation scrape on
  container port 9300. Surfaces pending_approvals,
  poll_trigger_tracked_images, and registries_scanned_total{image}
  so the skill has a real timeseries (also opens the door to a
  future "pending_approvals > 0 for 24h" alert).
- scripts/upgrade_state.sh: collector + renderer. Three-row table
  (Apps / OS / K8s) + drill-down, --json for piping, exit 0/1/2.
  SSH fan-out (parallel subshells) to all five nodes for apt
  state + reboot-required + uu log; Prometheus query for Keel;
  Pushgateway parse for k8s_upgrade_* gauges. Read-only.
- .claude/skills/upgrade-state/SKILL.md: hardlinked to
  ~/.claude/skills/upgrade-state/SKILL.md so the skill is
  discoverable from both monorepo-rooted and global sessions.

Verification: ran the script, stress-tested the ✗ stalled path by
pushing in_flight=1 + started_timestamp=-100min to Pushgateway and
resetting after — script correctly raised ✗ and exit 2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 10:50:43 +00:00
..
_template ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-10 18:53:49 +00:00
actualbudget recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
affine recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
authentik keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
beads-server beads-server: codify Keel annotations on Dolt deployment (drift cleanup) 2026-05-17 22:22:40 +00:00
blog final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
broker-sync broker-sync(fidelity): un-suspend monthly CronJob 2026-05-17 00:36:23 +00:00
calico final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
changedetection enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
chrome-service recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
city-guesser enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
claude-agent-service recruiter-triage: AI culture & tooling section + warm-engage AI ask 2026-05-16 13:14:27 +00:00
claude-memory recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
cloudflared keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
cnpg [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
coturn enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
crowdsec keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
cyberchef final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
dashy enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
dawarich enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
dbaas kured + cnpg: drain-safe defaults ahead of Monday reboot wave 2026-05-16 12:06:30 +00:00
descheduler keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
diun enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
ebook2audiobook enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
ebooks enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
echo enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
excalidraw enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
external-secrets recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
f1-stream final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
fire-planner ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
foolery recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
forgejo enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-16 13:24:16 +00:00
freedify recruiter-responder: bump image_tag to 189ef901 2026-05-16 12:41:05 +00:00
freshrss ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
frigate ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
grampsweb ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
hackmd ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
headscale keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
health ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
hermes-agent ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
homepage final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
immich final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
infra [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 18:30:02 +00:00
infra-maintenance [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
insta2spotify ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
instagram-poster Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
isponsorblocktv ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
job-hunter ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
jsoncrack final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
k8s-dashboard final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
k8s-portal Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
k8s-version-upgrade RecentNodeReboot: 24h → 1h threshold, matching upgrade-chain preflight 2026-05-17 22:22:01 +00:00
keel upgrade-state: skill + script + Keel scrape for periodic three-pipeline audit 2026-05-18 10:50:43 +00:00
kms final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
kured ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
kyverno keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
linkwarden ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
llama-cpp Bucket C: enroll 5 raw-deploy stacks in Keel auto-update 2026-05-16 23:14:43 +00:00
local-path final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
mailserver keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
matrix ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
meshcentral ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
metallb keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
metrics-server keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
monitoring RecentNodeReboot: 24h → 1h threshold, matching upgrade-chain preflight 2026-05-17 22:22:01 +00:00
n8n ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
navidrome ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
netbox ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
networking-toolbox ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
nextcloud ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
nfs-csi keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
nodelocal-dns [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
novelapp Woodpecker CI deploy [CI SKIP] 2026-05-16 23:17:44 +00:00
ntfy ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
nvidia nvidia: bump driver container memory limit 128Mi → 2Gi 2026-05-17 11:23:52 +00:00
onlyoffice ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
openclaw openclaw: native MCP servers + daily claude-memory sync 2026-05-16 14:01:46 +00:00
osm_routing final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
owntracks ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
paperless-mcp paperless-mcp: deploy MCP for AI document search 2026-05-17 11:14:35 +00:00
paperless-ngx ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
payslip-ingest ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
phpipam ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
platform [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
plotting-book Woodpecker CI deploy [CI SKIP] 2026-05-16 23:17:44 +00:00
poison-fountain ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
postiz Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
priority-pass ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
privatebin ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
proxmox-csi keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
pvc-autoresizer [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rbac [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
real-estate-crawler final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
recruiter-responder recruiter-responder: bump image to 05b95943 (split callback routes) 2026-05-17 11:01:49 +00:00
redis keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
reloader keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
resume ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
reverse-proxy keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
rybbit ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
sealed-secrets keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
send ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
servarr keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
shadowsocks ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
speedtest ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
status-page [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
stirling-pdf ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
tandoor ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-16 13:47:13 +00:00
technitium keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
terminal terminal: probe + alerts after Traefik replica routing-table skew 2026-05-17 10:04:26 +00:00
tor-proxy ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
trading-bot final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
traefik keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
travel_blog final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
tuya-bridge ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
uptime-kuma Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
url ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
vault keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
vaultwarden Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-16 23:10:38 +00:00
vpa keel: enroll 11 more namespaces (operators + critical infra) 2026-05-17 20:59:14 +00:00
wealthfolio Woodpecker CI deploy [CI SKIP] 2026-05-16 13:45:45 +00:00
webhook_handler final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-16 23:19:20 +00:00
whisper ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
wireguard keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
woodpecker ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00
xray keel: enroll 15 critical-path namespaces for digest-only auto-update 2026-05-17 12:13:22 +00:00
ytdlp ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-16 14:06:39 +00:00