After re-enabling Keel with `policy: patch` (commit f325b949), 3 of the
60 first-hour bumps broke things and need explicit cluster-wide opt-out
so future Kyverno reconciles can't put them back under auto-update:
- `dbaas/mysql-standalone`: patch-bumped `mysql:8.4.8 → :8.4.9` and the
DD upgrade stalled (we explicitly track that as beads `code-963q` —
the 8.4.9 jump needs a wipe+reinit, not a rolling upgrade). The
StatefulSet already had `annotation=never` from TF but was missing the
LABEL — Kyverno's selector exclude reads the LABEL, so a reconcile
that dropped the annotation could resume auto-update. Added the LABEL.
- `redis/redis-v2`: patch-bumped `redis:8-alpine → :8.0.6-alpine` and
the new image rejected the `aof-load-corrupt-tail-max-size` directive
from commit 1eee56d0 → redis-v2-2 CrashLoopBackOff. Plus :8.0.6 is
semantically older than :8-alpine (which resolves to :8.6.2) — same
Keel tag-picking pathology as the 2026-05-26 morning incident, just
in a different shape. LABEL + ANNOTATION both added.
- `nvidia/nvidia-exporter`: Keel rewrote `:latest → :4.5.2-4.8.1-ubuntu22.04`
and the new dcgm-exporter OOMKilled at the 192Mi memory limit
(4 restarts before I caught it). Added LABEL + ANNOTATION for opt-out,
AND bumped memory request/limit 192Mi → 256Mi/512Mi so the bumped image
doesn't OOM (older versions fit in 192Mi; the bumped one needs ~250Mi
steady-state).
The 56 other Keel bumps in that 10-minute window (coredns 1.12.1→1.12.4,
kyverno 1.16.1→1.16.4, nextcloud 32.0.3→32.0.9, grafana 12.3.1→12.3.6,
cnpg, mailserver, csi-nfs, metrics-server, etc.) landed cleanly — the
`patch` policy is the right default. Per-workload `never` opt-out is
the maintenance cost.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>