keel: default policy → patch (semver-bounded opt-out auto-update)

Move from `never` (no auto-update) to `patch` for the cluster-wide
default. Keel only auto-updates PATCH versions within the current
major.minor: 0.26.6 → 0.26.7 OK; 0.26.6 → :nightly-latest blocked.
Tag-rewrites that broke calico (v3.26.1 → :master) and affine
(0.26.6 → :nightly-latest) on 2026-05-16 cannot recur with patch.

Caveats:
  * Patch causes Terraform image drift for semver-pinned services —
    drift-detection pipeline will surface it; lifecycle ignore_changes
    on container[].image can be added per stack later if drift is
    noisy.
  * Tags that aren't parseable as semver (:latest, :11, :nightly,
    SHA tags) are ignored by patch — those workloads stay on their
    current image until promoted to `force` policy individually.

Self-hosted CI-driven services + chrome-service kept on `never`
(deliberate pins / CI controls the tag):
  recruiter-responder, claude-agent-service, claude-memory,
  chrome-service, fire-planner, job-hunter, payslip-ingest

Live state already updated via kubectl apply + per-workload patches.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-16 13:17:33 +00:00
parent 662695908a
commit 8f18621dd5

View file

@ -69,29 +69,31 @@ resource "kubernetes_manifest" "policy_inject_keel_annotations" {
annotations = {
# `+(...)` only adds if not present; per-workload overrides win.
#
# DEFAULT IS `never` Keel ignores the workload.
# DEFAULT IS `patch` Keel auto-updates only PATCH versions
# within the current major.minor. e.g. 0.26.6 0.26.7 is OK,
# 0.26.6 0.27.0 is NOT, 0.26.6 :nightly-latest is NOT.
#
# Rationale (post 2026-05-16 incident): Keel's `force` policy
# is documented as "always update to the newest tag in the
# registry," not "watch current tag for digest changes." On
# services pinned to semver (e.g. calico/node:v3.26.1,
# affine:0.26.6), force triggers a tag REWRITE Keel switched
# affine :nightly-latest and calico :master. Calico was
# auto-healed by tigera-operator; affine had to be rolled back.
# Why not `force`: the 2026-05-16 incident Keel's `force`
# policy is "always update to the newest tag in the registry,"
# not "watch current tag for digest changes." On semver-pinned
# workloads, force triggered tag-rewrites (affine nightly,
# calico master). `patch` is semver-parser-bounded and safe.
#
# Safe enablement now requires per-WORKLOAD opt-in:
# (a) ensure the Deployment's image is on a MUTABLE tag
# `:latest` (force works), `:<major>` like `:16`/`:7`,
# or a vendor "stable" tag.
# (b) override THIS default by setting the Deployment's
# metadata.annotations["keel.sh/policy"] to `force`
# (digest tracking on the mutable tag) or `patch`/`minor`
# (semver bumps, requires `ignore_changes` on image).
# Caveats of `patch`:
# - Tags that aren't parseable as semver (e.g. `:latest`,
# `:11`, `:nightly`, SHA tags) are ignored by Keel.
# - For services pinned to semver, Keel will REWRITE the
# tag (0.26.6 0.26.7). This causes Terraform drift
# until the stack is updated or its lifecycle adds
# `ignore_changes` on the container[].image field.
# For now, accepting periodic drift (drift_detection.yml
# pipeline will surface it).
#
# The namespace enrollment label + V2 lifecycle remain in
# place so opt-in is a one-line annotation per Deployment,
# without touching the namespace or refactoring lifecycle.
"+(keel.sh/policy)" = "never"
# Per-workload overrides:
# "keel.sh/policy" = "force" for mutable tags (:latest)
# "keel.sh/policy" = "minor" wider semver bumps
# "keel.sh/policy" = "never" opt out (CI-bumped, deliberate pins)
"+(keel.sh/policy)" = "patch"
"+(keel.sh/trigger)" = "poll"
"+(keel.sh/pollSchedule)" = "@every 1h"
}