infra/stacks
Viktor Barzin e75bcaf394 k8s-version-upgrade: automated kubeadm/kubelet/kubectl upgrade pipeline
Adds a weekly detection CronJob (Sun 12:00 UTC) that probes apt-cache madison
on master for new patches + HEAD pkgs.k8s.io for next-minor availability,
then POSTs to claude-agent-service to dispatch the k8s-version-upgrade agent.

The agent (.claude/agents/k8s-version-upgrade.md) orchestrates:
  pre-flight (5 nodes Ready + halt-on-alert + 24h-quiet + plan target match)
    -> etcd snapshot save
    -> optional master containerd skew fix
    -> apt repo URL rewrite (minor bumps only)
    -> drain/upgrade/uncordon master via ssh < update_k8s.sh
    -> sequential workers k8s-node4 -> 3 -> 2 -> 1 with 10-min soak each
    -> post-flight verification

Two new Upgrade Gates alerts catch failure modes:
  - K8sVersionSkew (kubelet/apiserver gitVersion mismatch >30m)
  - EtcdPreUpgradeSnapshotMissing (in_flight without snapshot_taken >10m)

update_k8s.sh refactored to take --role / --release args; the agent shells
it into each node via SSH pipe. update_node.sh annotated as OS-major path.

Operator-facing docs: docs/runbooks/k8s-version-upgrade.md and a new section
in docs/architecture/automated-upgrades.md.

Secrets: secret/k8s-upgrade/{ssh_key,ssh_key_pub,slack_webhook} (ed25519
keypair distributed to all 5 nodes via authorized_keys; slack_webhook
reuses kured webhook URL on initial deploy).
2026-05-22 14:16:42 +00:00
..
_template ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
actualbudget ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
affine ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
authentik ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
beads-server ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
blog ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
broker-sync [broker-sync] unsuspend IMAP + Panel 15 RSU vest reconciliation (Phase D) 2026-04-19 18:29:01 +00:00
calico [infra] Partial Calico adoption: namespaces only (Wave 5b) 2026-04-18 22:52:56 +00:00
changedetection ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
chrome-service ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
city-guesser ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
claude-agent-service [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
claude-memory ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
cloudflared cloudflare: disable AI bot edge-block so x402 can issue payment offers 2026-05-22 14:16:42 +00:00
cnpg [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
coturn [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
crowdsec ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
cyberchef ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
dashy ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
dawarich ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
dbaas ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
descheduler [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
diun [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
ebook2audiobook ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
ebooks ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
echo ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
excalidraw ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
external-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
f1-stream ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
fire-planner fire-planner / k8s-portal / insta2spotify: revert auth=public to auth=none 2026-05-22 14:16:42 +00:00
foolery ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
forgejo ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
freedify ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
freshrss ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
frigate ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
grampsweb ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
hackmd ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
headscale ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
health ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
hermes-agent ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
homepage ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
immich ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
infra [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
infra-maintenance [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
insta2spotify fire-planner / k8s-portal / insta2spotify: revert auth=public to auth=none 2026-05-22 14:16:42 +00:00
instagram-poster instagram-poster: disable ig-ingest-stories CronJob until /ig-ingest ships 2026-05-22 14:16:42 +00:00
isponsorblocktv [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
job-hunter grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
jsoncrack ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
k8s-dashboard ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
k8s-portal fire-planner / k8s-portal / insta2spotify: revert auth=public to auth=none 2026-05-22 14:16:42 +00:00
k8s-version-upgrade k8s-version-upgrade: automated kubeadm/kubelet/kubectl upgrade pipeline 2026-05-22 14:16:42 +00:00
kms ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
kured kured(sentinel-gate): fix auth + write-perm so safety checks actually run 2026-05-22 14:16:41 +00:00
kyverno [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
linkwarden ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
llama-cpp infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
local-path [infra] Adopt local-path-provisioner into Terraform (Wave 5c) 2026-04-18 22:39:55 +00:00
mailserver ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
matrix ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
meshcentral ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
metallb [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
metrics-server [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
monitoring k8s-version-upgrade: automated kubeadm/kubelet/kubectl upgrade pipeline 2026-05-22 14:16:42 +00:00
n8n ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
navidrome ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
netbox ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
networking-toolbox ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
nextcloud ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
nfs-csi [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
nodelocal-dns [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
novelapp ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
ntfy ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
nvidia ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
onlyoffice ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
openclaw ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
osm_routing [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
owntracks Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:42 +00:00
paperless-ngx ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
payslip-ingest grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
phpipam ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
platform [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
plotting-book ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
poison-fountain ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
postiz ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
priority-pass ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
privatebin ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
proxmox-csi proxmox-csi: opt SCs into pvc-autoresizer (resize.topolvm.io/enabled=true) 2026-05-22 14:16:41 +00:00
pvc-autoresizer [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rbac [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
real-estate-crawler ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
redis [redis] stabilise against node-crash flap cascade — RC1-RC5 fixes 2026-04-22 15:59:00 +00:00
reloader [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
resume ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
reverse-proxy chore: remove decommissioned registry.viktorbarzin.me ingress 2026-05-10 11:12:37 +00:00
rybbit ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
sealed-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
send ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
servarr ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
shadowsocks [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
speedtest ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
status-page [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
stirling-pdf ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
tandoor ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
technitium ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
terminal ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
tor-proxy ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
trading-bot ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
traefik ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
travel_blog ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
tuya-bridge ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
uptime-kuma ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
url ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
vault ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
vaultwarden ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
vpa ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
wealthfolio ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
webhook_handler ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
whisper gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
wireguard [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
woodpecker ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
xray ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
ytdlp ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00