infra/.claude/skills
Viktor Barzin b107c0be8c upgrade-state: skill + script + Keel scrape for periodic three-pipeline audit
Three autonomous-upgrade pipelines run independently — Keel for apps
(hourly registry polling), unattended-upgrades+kured for OS, and the
k8s-version-check chain for kubeadm/kubelet/kubectl. Until now there
was no single place to see whether each was healthy, what's pending,
or whether anything's stuck. The /upgrade-state skill collapses the
state of all three into one table you can run before each Sunday's
k8s-version-check fires.

- stacks/keel/main.tf: add Prometheus pod-annotation scrape on
  container port 9300. Surfaces pending_approvals,
  poll_trigger_tracked_images, and registries_scanned_total{image}
  so the skill has a real timeseries (also opens the door to a
  future "pending_approvals > 0 for 24h" alert).
- scripts/upgrade_state.sh: collector + renderer. Three-row table
  (Apps / OS / K8s) + drill-down, --json for piping, exit 0/1/2.
  SSH fan-out (parallel subshells) to all five nodes for apt
  state + reboot-required + uu log; Prometheus query for Keel;
  Pushgateway parse for k8s_upgrade_* gauges. Read-only.
- .claude/skills/upgrade-state/SKILL.md: hardlinked to
  ~/.claude/skills/upgrade-state/SKILL.md so the skill is
  discoverable from both monorepo-rooted and global sessions.

Verification: ran the script, stress-tested the ✗ stalled path by
pushing in_flight=1 + started_timestamp=-100min to Pushgateway and
resetting after — script correctly raised ✗ and exit 2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:57 +00:00
..
add-user docs(add-user): update skill with actual working flow (no auto TF apply) 2026-03-18 00:28:46 +00:00
archived [claude-agent-service] Remove orphaned DevVM SSH key wiring 2026-04-18 13:31:15 +00:00
cluster-health [cluster-health] Expand to 42 checks, remove pod CronJob path 2026-04-19 15:13:03 +00:00
disk-wear [skill] Add /disk-wear skill for periodic disk write analysis 2026-04-17 11:15:26 +00:00
extend-vm-storage [ci skip] Import Claude skills into OpenClaw moltbot 2026-02-17 21:09:12 +00:00
home-assistant docs: add ha-sofia Version Control add-on to HA skill [ci skip] 2026-04-12 11:37:02 +01:00
k8s-ndots-search-domain-nxdomain-flood [ci skip] Add pfsense-dnsmasq-interface-binding skill, update ndots skill to v1.1.0 2026-02-16 22:30:57 +00:00
pfsense [ci skip] Add pfSense firewall management skill 2026-02-14 12:42:10 +00:00
post-mortem feat: add incident management system with user reporting 2026-04-14 20:00:31 +00:00
setup-project feat(setup-project): auto-PR working Dockerfiles back to upstream 2026-04-17 18:12:13 +00:00
upgrade-state upgrade-state: skill + script + Keel scrape for periodic three-pipeline audit 2026-05-22 14:16:57 +00:00
uptime-kuma [uptime-kuma] Codify MySQL monitor (id=663) via idempotent sync CronJob 2026-04-18 12:04:17 +00:00