infra/stacks
Viktor Barzin 25fcf80651 keel: expand critical-namespace exclude list — protects vault/cnpg/authentik/etc.
2026-05-17 incident: Keel rolled authentik 2026.2.2 → 2026.2.3 around 23:36.
The force+match-tag pairing should have constrained Keel to digest-only on
the current tag (not switch to a new tag), but a race between Kyverno's
mutate (injecting match-tag) and Keel's hourly poll caused the workload to
still have the old `force`-only annotation when Keel acted. Result: tag
rewrite, pods cycled, pgbouncer connection failures, login broken.

Manual rollback: `kubectl rollout undo` on all 5 authentik deployments back
to 2026.2.2. Auth restored within ~5 min.

Going forward, critical-namespace workloads are excluded at the policy level
so this race can't recur. They get upgraded via TF (Helm chart version bumps)
on a deliberate cadence, never by Keel.

Live state: 36 workloads on policy=never (35 critical + chrome-service pin
+ 7 CI-driven self-hosted from earlier), 190 on policy=force+match-tag for
opt-out-pure auto-update on the remaining stateless apps.

This matches user direction (2026-05-17): "upgrading is fine as long as we
upgrade correctly and the latest version is healthy" + "keel responsible
for the latest version, phased rollout, graceful".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:55 +00:00
..
_template ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
actualbudget recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
affine recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
authentik infra: document auth = "app|none" tier on every legacy ingress 2026-05-22 14:16:44 +00:00
beads-server Bucket C: enroll 5 raw-deploy stacks in Keel auto-update 2026-05-22 14:16:55 +00:00
blog final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
broker-sync recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
calico final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
changedetection enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
chrome-service recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
city-guesser enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
claude-agent-service recruiter-triage: AI culture & tooling section + warm-engage AI ask 2026-05-22 14:16:50 +00:00
claude-memory recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
cloudflared cloudflare: disable AI bot edge-block so x402 can issue payment offers 2026-05-22 14:16:42 +00:00
cnpg [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
coturn enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
crowdsec ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
cyberchef final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
dashy enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
dawarich enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
dbaas kured + cnpg: drain-safe defaults ahead of Monday reboot wave 2026-05-22 14:16:48 +00:00
descheduler final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
diun enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
ebook2audiobook enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
ebooks enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
echo enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
excalidraw enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
external-secrets recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
f1-stream final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
fire-planner ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
foolery recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
forgejo enrolled-patch stacks: ignore image drift from Keel auto-update 2026-05-22 14:16:51 +00:00
freedify recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
freshrss ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
frigate ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
grampsweb ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
hackmd ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
headscale infra: document auth = "app|none" tier on every legacy ingress 2026-05-22 14:16:44 +00:00
health ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
hermes-agent ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
homepage final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
immich final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
infra [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
infra-maintenance [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
insta2spotify ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
instagram-poster Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-22 14:16:55 +00:00
isponsorblocktv ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
job-hunter ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
jsoncrack final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
k8s-dashboard final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
k8s-portal Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-22 14:16:55 +00:00
k8s-version-upgrade final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
keel keel: enable Slack notifications on every upgrade 2026-05-22 14:16:50 +00:00
kms final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
kured ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
kyverno keel: expand critical-namespace exclude list — protects vault/cnpg/authentik/etc. 2026-05-22 14:16:55 +00:00
linkwarden ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
llama-cpp Bucket C: enroll 5 raw-deploy stacks in Keel auto-update 2026-05-22 14:16:55 +00:00
local-path final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
mailserver fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
matrix ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
meshcentral ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
metallb [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
metrics-server [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
monitoring wealth: dav_corrected view fixes pension gains-offset miscategorisation 2026-05-22 14:16:52 +00:00
n8n ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
navidrome ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
netbox ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
networking-toolbox ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
nextcloud ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
nfs-csi [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
nodelocal-dns [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
novelapp Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:55 +00:00
ntfy ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
nvidia infra: document auth = "app|none" tier on every legacy ingress 2026-05-22 14:16:44 +00:00
onlyoffice ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
openclaw openclaw: native MCP servers + daily claude-memory sync 2026-05-22 14:16:53 +00:00
osm_routing final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
owntracks ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
paperless-ngx ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
payslip-ingest ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
phpipam ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
platform [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
plotting-book Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:55 +00:00
poison-fountain ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
postiz Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-22 14:16:55 +00:00
priority-pass ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
privatebin ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
proxmox-csi proxmox-csi: opt SCs into pvc-autoresizer (resize.topolvm.io/enabled=true) 2026-05-22 14:16:41 +00:00
pvc-autoresizer [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rbac [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
real-estate-crawler final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
recruiter-responder recruiter-responder: bump image_tag to 94b37a9c (follow-up detection) 2026-05-22 14:16:55 +00:00
redis fix: pvc-autoresizer threshold should be 10%, not 80% 2026-05-22 14:16:43 +00:00
reloader recruiter-responder: bump image_tag to 189ef901 2026-05-22 14:16:49 +00:00
resume ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
reverse-proxy chore: remove decommissioned registry.viktorbarzin.me ingress 2026-05-10 11:12:37 +00:00
rybbit ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
sealed-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
send ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
servarr ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
shadowsocks ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
speedtest ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
status-page [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
stirling-pdf ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
tandoor ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
technitium fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
terminal ci: retrigger v2 — apply pending Keel-enrolled stacks (#697 was cancelled by #698) 2026-05-22 14:16:53 +00:00
tor-proxy ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-22 14:16:54 +00:00
trading-bot final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
traefik ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
travel_blog final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
tuya-bridge ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-22 14:16:54 +00:00
uptime-kuma Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-22 14:16:55 +00:00
url ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-22 14:16:54 +00:00
vault final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
vaultwarden Bucket A retrigger + Bucket D enrollment (5 module-nested stacks) 2026-05-22 14:16:55 +00:00
vpa ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
wealthfolio Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:53 +00:00
webhook_handler final wave: enroll immich + status-page, retrigger 17 pending Bucket A 2026-05-22 14:16:55 +00:00
whisper ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-22 14:16:54 +00:00
wireguard [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
woodpecker ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-22 14:16:54 +00:00
xray infra: document auth = "app|none" tier on every legacy ingress 2026-05-22 14:16:44 +00:00
ytdlp ci: retrigger v3 — apply remaining 22 Keel-enrolled stacks 2026-05-22 14:16:54 +00:00