CI now drives the Deployment rollout (kubectl set image to the build SHA in
.woodpecker.yml), so the stack moves to image_tag = "latest": the Deployment
runs whatever CI last set (image ignore_changes keeps TF from fighting it),
and the CronJob uses :latest + imagePullPolicy=Always (fresh pod each weekly
run). Keel stays enrolled in parallel as a redundant net.
Docs: rewrite the runbook "Deploying" section for build-triggers-deploy;
record the reversal of decision #12 in the auto-upgrade design doc (owned
apps drive their own rollout, Keel parallel — upstream stays Keel-only); add
the owned-app deploy model to infra/.claude/CLAUDE.md CI/CD section.
[ci skip] — applied locally (stack-scoped); avoids a broad CI auto-apply.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-enrolls Nextcloud in Keel (opted out after the 2026-05-26 32.0.3->32.0.9
bump stuck the pod in maintenance mode ~22h). Two safeguards engineer around
both failure modes:
- F1 (interrupted occ upgrade -> 503): nextcloud-watchdog CronJob runs
`occ upgrade` + clears maintenance mode when occ reports needsDbUpgrade=true;
Job deadline bumped 120->600s so it isn't killed mid-migration.
- F2 (helm re-renders a tag below the Keel-bumped live image -> downgrade
CrashLoop): chart_values renders the live tag via a plural
kubernetes_resources data source (empty-list-on-absence -> floor 32.0.9 on
fresh install/DR), so a re-render never downgrades below live.
Scope is patch -- Kyverno's shared inject-keel-annotations policy stamps it and
its background-controller overrides a TF-set value, and patch == minor for
Nextcloud in practice (32.0.x only; major 33 stays manual). Dropped the
per-workload keel.sh/policy override resources to avoid perpetual drift; ns
enrollment + Kyverno now own the keel annotations like other workloads.
Also bumps the external-storage bootstrap Job create timeout 1m->12m to match
its own 10m pod-wait, since Keel bumps now roll the pod mid-apply.
Verified: Keel auto-upgraded 32.0.9->32.0.10 on apply, entrypoint occ upgrade
completed clean (no watchdog needed), pod 2/2, HTTP 200, plan shows no drift.
Foundation for opt-out-pure auto-update model per
docs/plans/2026-05-16-auto-upgrade-apps-{design,plan}.md.
- New stack `stacks/keel/` deploys Keel via Helm (charts.keel.sh, v1.0.6).
Polls registries hourly per design decision #8. Default schedule
overridable per-workload via keel.sh/pollSchedule annotation.
- New Kyverno ClusterPolicy `inject-keel-annotations` mutates Deployments,
StatefulSets, and DaemonSets in namespaces labeled `keel.sh/enrolled=true`
with keel.sh/policy=force + trigger=poll + pollSchedule=@every 1h.
- Phase 0 enrolls no namespaces. Phase 1 (next session) labels the
self-hosted set.
- Per-workload opt-out: label `keel.sh/policy: never` (used by rollback
runbook and chrome-service-style deliberate pins).
- Keel namespace excluded from the mutate — supervisor self-update has
too-bad a failure mode (decision #11).
- AGENTS.md: KYVERNO_LIFECYCLE_V2 marker convention added for the
ignore_changes block enrolled workloads need.
- .claude/CLAUDE.md: docker-images rule flagged as transitional.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>