job-hunter: build-triggers-deploy model; CronJob :latest + docs

CI now drives the Deployment rollout (kubectl set image to the build SHA in
.woodpecker.yml), so the stack moves to image_tag = "latest": the Deployment
runs whatever CI last set (image ignore_changes keeps TF from fighting it),
and the CronJob uses :latest + imagePullPolicy=Always (fresh pod each weekly
run). Keel stays enrolled in parallel as a redundant net.

Docs: rewrite the runbook "Deploying" section for build-triggers-deploy;
record the reversal of decision #12 in the auto-upgrade design doc (owned
apps drive their own rollout, Keel parallel — upstream stays Keel-only); add
the owned-app deploy model to infra/.claude/CLAUDE.md CI/CD section.

[ci skip] — applied locally (stack-scoped); avoids a broad CI auto-apply.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-02 20:24:50 +00:00
parent 052c776eba
commit fe8db19aaf
5 changed files with 82 additions and 13 deletions

View file

@ -3,6 +3,21 @@
**Date**: 2026-05-16
**Status**: Approved (brainstorm + grill complete; implementation pending)
> **UPDATE 2026-06-02 — decision #12 / Q1 reversed for OWNED apps.** The
> original "uniform Keel-only, no per-repo `kubectl set image` step" call held
> only for **upstream** images (which we can't build, so Keel poll-and-bump is
> the only option). For **self-hosted apps we build**, CI now ALSO drives the
> rollout: `build-and-push` tags `latest` + `:<sha>`, then a `deploy` step runs
> `kubectl set image deployment/<app> ...:<sha>` + `rollout status`. Rationale
> (memory id=3183, proven on tuya-bridge 2026-05-29): the pipeline is atomic
> and deterministic — no wait for Keel's hourly poll, no risk of Keel resolving
> `:latest` to a stale concrete tag. **Keel stays enrolled in parallel** as a
> redundant net (it finds the just-deployed SHA already running → no-op), so
> upstream apps and owned apps share one mental model. Enabled cluster-wide by
> the `woodpecker-agent` SA being `cluster-admin` (no per-app RBAC). Owned apps
> being rolled out to this pattern 2026-06-02; CronJobs in owned apps use
> `:latest` + `imagePullPolicy: Always` instead of a deploy step.
## Problem
Three constraints in tension across the cluster's ~70 services: