infra/stacks/kyverno/modules/kyverno
Viktor Barzin 0365ed83ca keel: expand critical-namespace exclude list — protects vault/cnpg/authentik/etc.
2026-05-17 incident: Keel rolled authentik 2026.2.2 → 2026.2.3 around 23:36.
The force+match-tag pairing should have constrained Keel to digest-only on
the current tag (not switch to a new tag), but a race between Kyverno's
mutate (injecting match-tag) and Keel's hourly poll caused the workload to
still have the old `force`-only annotation when Keel acted. Result: tag
rewrite, pods cycled, pgbouncer connection failures, login broken.

Manual rollback: `kubectl rollout undo` on all 5 authentik deployments back
to 2026.2.2. Auth restored within ~5 min.

Going forward, critical-namespace workloads are excluded at the policy level
so this race can't recur. They get upgraded via TF (Helm chart version bumps)
on a deliberate cadence, never by Keel.

Live state: 36 workloads on policy=never (35 critical + chrome-service pin
+ 7 CI-driven self-hosted from earlier), 190 on policy=force+match-tag for
opt-out-pure auto-update on the remaining stateless apps.

This matches user direction (2026-05-17): "upgrading is fine as long as we
upgrade correctly and the latest version is healthy" + "keel responsible
for the latest version, phased rollout, graceful".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 00:07:32 +00:00
..
dependency-init-containers.tf [multi] Sweep Kyverno wait-for redis annotations to redis-master 2026-04-19 12:44:46 +00:00
keel-annotations.tf keel: expand critical-namespace exclude list — protects vault/cnpg/authentik/etc. 2026-05-17 00:07:32 +00:00
main.tf kyverno: bump background-controller memory 384Mi → 2Gi (OOMKilled processing keel URs) 2026-05-16 23:36:16 +00:00
registry-credentials.tf [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 18:30:02 +00:00
resource-governance.tf kyverno: strip resources.limits.cpu cluster-wide via ClusterPolicy 2026-04-18 11:34:39 +00:00
security-policies.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
tls-secret-sync.tf add Kyverno TLS secret sync + enhance renewal pipeline 2026-03-23 22:19:34 +02:00