job-hunter: build-triggers-deploy model; CronJob :latest + docs
CI now drives the Deployment rollout (kubectl set image to the build SHA in .woodpecker.yml), so the stack moves to image_tag = "latest": the Deployment runs whatever CI last set (image ignore_changes keeps TF from fighting it), and the CronJob uses :latest + imagePullPolicy=Always (fresh pod each weekly run). Keel stays enrolled in parallel as a redundant net. Docs: rewrite the runbook "Deploying" section for build-triggers-deploy; record the reversal of decision #12 in the auto-upgrade design doc (owned apps drive their own rollout, Keel parallel — upstream stays Keel-only); add the owned-app deploy model to infra/.claude/CLAUDE.md CI/CD section. [ci skip] — applied locally (stack-scoped); avoids a broad CI auto-apply. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
052c776eba
commit
fe8db19aaf
5 changed files with 82 additions and 13 deletions
|
|
@ -89,7 +89,26 @@ Violations cause state drift, which causes future applies to break or silently r
|
||||||
|
|
||||||
## CI/CD Architecture — GHA Builds + Woodpecker Deploy
|
## CI/CD Architecture — GHA Builds + Woodpecker Deploy
|
||||||
|
|
||||||
**Flow**: `git push → GHA build+push DockerHub (8-char SHA) → POST Woodpecker API → kubectl set image`
|
**Owned-app deploy model (build triggers the rollout — 2026-06-02):** For
|
||||||
|
self-hosted apps **we build** (Forgejo `viktor/<name>` + Dockerfile +
|
||||||
|
`.woodpecker.yml`), the build pipeline ALSO drives the rollout — atomic +
|
||||||
|
deterministic, no wait for Keel's poll. Pattern (`build-and-push` tags `latest`
|
||||||
|
+ `${CI_COMMIT_SHA:0:8}`, then a `deploy` step): `kubectl set image
|
||||||
|
deployment/<app> <container>=<repo>:${CI_COMMIT_SHA:0:8} -n <ns>` +
|
||||||
|
`kubectl rollout status ... --timeout=300s`. The `woodpecker-agent` SA is
|
||||||
|
`cluster-admin`, so the `bitnami/kubectl` step needs no kubeconfig/RBAC (uses
|
||||||
|
its in-cluster SA). **Keel stays enrolled in parallel** as a redundant net
|
||||||
|
(finds the deployed SHA already running → no-op). Requires the Deployment to
|
||||||
|
have `ignore_changes` on `…container[0].image` (KEEL_IGNORE_IMAGE) so CI
|
||||||
|
`set image` doesn't fight `terragrunt apply`. CronJobs in owned apps use
|
||||||
|
`:latest` + `imagePullPolicy: Always` (fresh pod each run) instead of a deploy
|
||||||
|
step. **Never** `set image`/`rollout restart` operator-managed StatefulSets
|
||||||
|
(memory id=740). Reference impls: `tuya_bridge/.woodpecker.yml`,
|
||||||
|
`job-hunter`. This reverses decision #12 of
|
||||||
|
`docs/plans/2026-05-16-auto-upgrade-apps-design.md` for owned (not upstream)
|
||||||
|
images.
|
||||||
|
|
||||||
|
**Flow (GHA-migrated apps)**: `git push → GHA build+push DockerHub (8-char SHA) → POST Woodpecker API → kubectl set image`
|
||||||
|
|
||||||
**Migrated to GHA** (10): Website, k8s-portal, f1-stream, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
|
**Migrated to GHA** (10): Website, k8s-portal, f1-stream, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
|
||||||
**Woodpecker-only**: travel_blog (1.4GB content too large for GHA), infra pipelines (terragrunt apply, certbot, build-cli — need cluster access)
|
**Woodpecker-only**: travel_blog (1.4GB content too large for GHA), infra pipelines (terragrunt apply, certbot, build-cli — need cluster access)
|
||||||
|
|
|
||||||
|
|
@ -3,6 +3,21 @@
|
||||||
**Date**: 2026-05-16
|
**Date**: 2026-05-16
|
||||||
**Status**: Approved (brainstorm + grill complete; implementation pending)
|
**Status**: Approved (brainstorm + grill complete; implementation pending)
|
||||||
|
|
||||||
|
> **UPDATE 2026-06-02 — decision #12 / Q1 reversed for OWNED apps.** The
|
||||||
|
> original "uniform Keel-only, no per-repo `kubectl set image` step" call held
|
||||||
|
> only for **upstream** images (which we can't build, so Keel poll-and-bump is
|
||||||
|
> the only option). For **self-hosted apps we build**, CI now ALSO drives the
|
||||||
|
> rollout: `build-and-push` tags `latest` + `:<sha>`, then a `deploy` step runs
|
||||||
|
> `kubectl set image deployment/<app> ...:<sha>` + `rollout status`. Rationale
|
||||||
|
> (memory id=3183, proven on tuya-bridge 2026-05-29): the pipeline is atomic
|
||||||
|
> and deterministic — no wait for Keel's hourly poll, no risk of Keel resolving
|
||||||
|
> `:latest` to a stale concrete tag. **Keel stays enrolled in parallel** as a
|
||||||
|
> redundant net (it finds the just-deployed SHA already running → no-op), so
|
||||||
|
> upstream apps and owned apps share one mental model. Enabled cluster-wide by
|
||||||
|
> the `woodpecker-agent` SA being `cluster-admin` (no per-app RBAC). Owned apps
|
||||||
|
> being rolled out to this pattern 2026-06-02; CronJobs in owned apps use
|
||||||
|
> `:latest` + `imagePullPolicy: Always` instead of a deploy step.
|
||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
|
|
||||||
Three constraints in tension across the cluster's ~70 services:
|
Three constraints in tension across the cluster's ~70 services:
|
||||||
|
|
|
||||||
|
|
@ -108,6 +108,34 @@ kubectl -n job-hunter exec deploy/job-hunter -- python -m job_hunter cdio-reconc
|
||||||
Changes hit `/webhook/cdio`; comp/role extraction from the diff is manual or
|
Changes hit `/webhook/cdio`; comp/role extraction from the diff is manual or
|
||||||
LLM-side (CDIO only captures the changed text).
|
LLM-side (CDIO only captures the changed text).
|
||||||
|
|
||||||
|
### Deploying (build triggers the rollout)
|
||||||
|
|
||||||
|
Deploys are **automatic on push to master** — we build the image, so CI also
|
||||||
|
drives the rollout (`.woodpecker.yml`: `build-and-push` tags `latest` +
|
||||||
|
`${CI_COMMIT_SHA:0:8}`, then a `deploy` step runs
|
||||||
|
`kubectl set image deployment/job-hunter ...:${SHA}` + `rollout status`). The
|
||||||
|
woodpecker-agent SA is cluster-admin, so no kubeconfig/RBAC is wired into the
|
||||||
|
step. Keel stays enrolled in parallel as a redundant net (finds the SHA already
|
||||||
|
running → no-op). So to ship code:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# in the job-hunter source repo (forgejo viktor/job-hunter)
|
||||||
|
git push origin master # → lint+test → build (latest + :<sha>) → set image → rollout
|
||||||
|
```
|
||||||
|
|
||||||
|
The **Deployment** rolls to the just-built `:<sha>`. The **CronJob** runs
|
||||||
|
`:latest` with `imagePullPolicy: Always`, so its next scheduled pod pulls the
|
||||||
|
newest image (no rollout needed for a CronJob). `image_tag = "latest"` in
|
||||||
|
`terragrunt.hcl` is just the TF baseline; the running Deployment digest is
|
||||||
|
whatever CI last set (`kubectl -n job-hunter get deploy job-hunter -o jsonpath='{..image}'`).
|
||||||
|
|
||||||
|
**Versioning** is still semver — bump `pyproject.toml` and cut a `git tag
|
||||||
|
vX.Y.Z` to mark a release; that's the human version record, independent of the
|
||||||
|
`:<sha>` deploy tag (map a running SHA back to a version with `git describe`).
|
||||||
|
|
||||||
|
**Rollback**: `kubectl -n job-hunter rollout undo deployment/job-hunter` (last
|
||||||
|
ReplicaSet), or push a revert commit (CI redeploys the reverted SHA).
|
||||||
|
|
||||||
### Applying the Terraform stack
|
### Applying the Terraform stack
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|
|
||||||
|
|
@ -5,8 +5,10 @@
|
||||||
#
|
#
|
||||||
# The alembic-migrate init container mirrors the Deployment so the CronJob can
|
# The alembic-migrate init container mirrors the Deployment so the CronJob can
|
||||||
# never run a refresh against an un-migrated DB (snapshot inserts would fail).
|
# never run a refresh against an un-migrated DB (snapshot inserts would fail).
|
||||||
# Image is :latest (Keel-managed for the Deployment); the CronJob pulls the
|
# Image is local.image (:latest via image_tag) with imagePullPolicy=Always: a
|
||||||
# current latest at each run, so it always executes the newest code.
|
# CronJob spawns a fresh pod each run, so Always pull = it always executes the
|
||||||
|
# newest built code. The Deployment is rolled by CI (kubectl set image to the
|
||||||
|
# build SHA); the CronJob needs no rollout — Always pull covers it.
|
||||||
resource "kubernetes_cron_job_v1" "job_hunter_refresh" {
|
resource "kubernetes_cron_job_v1" "job_hunter_refresh" {
|
||||||
metadata {
|
metadata {
|
||||||
name = "job-hunter-refresh"
|
name = "job-hunter-refresh"
|
||||||
|
|
@ -40,9 +42,10 @@ resource "kubernetes_cron_job_v1" "job_hunter_refresh" {
|
||||||
}
|
}
|
||||||
|
|
||||||
init_container {
|
init_container {
|
||||||
name = "alembic-migrate"
|
name = "alembic-migrate"
|
||||||
image = local.image
|
image = local.image
|
||||||
command = ["python", "-m", "job_hunter", "migrate"]
|
image_pull_policy = "Always"
|
||||||
|
command = ["python", "-m", "job_hunter", "migrate"]
|
||||||
env_from {
|
env_from {
|
||||||
secret_ref {
|
secret_ref {
|
||||||
name = "job-hunter-secrets"
|
name = "job-hunter-secrets"
|
||||||
|
|
@ -65,8 +68,9 @@ resource "kubernetes_cron_job_v1" "job_hunter_refresh" {
|
||||||
}
|
}
|
||||||
|
|
||||||
container {
|
container {
|
||||||
name = "refresh"
|
name = "refresh"
|
||||||
image = local.image
|
image = local.image
|
||||||
|
image_pull_policy = "Always"
|
||||||
command = ["python", "-m", "job_hunter", "refresh",
|
command = ["python", "-m", "job_hunter", "refresh",
|
||||||
"--source", "ats", "--source", "hn", "--source", "levels_fyi"]
|
"--source", "ats", "--source", "hn", "--source", "levels_fyi"]
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -18,9 +18,12 @@ dependency "external-secrets" {
|
||||||
}
|
}
|
||||||
|
|
||||||
inputs = {
|
inputs = {
|
||||||
# 92afc38d = master HEAD with levels.fyi scraper + comp_table COALESCE
|
# :latest — CI drives the rollout. On every master push the pipeline builds
|
||||||
# fix + Frankfurter FX backend (exchangerate.host free tier deprecated
|
# latest + :<sha> and runs `kubectl set image deployment/job-hunter ...:<sha>`
|
||||||
# in 2026). Built + pushed locally 2026-04-19 while the Woodpecker
|
# so the Deployment rolls to the just-built code immediately (no wait for
|
||||||
# Forgejo webhook remains broken.
|
# Keel's poll). Keel stays enrolled in parallel as a redundant net. The
|
||||||
image_tag = "92afc38d"
|
# CronJob uses :latest + Always pull (fresh pod each run). Project version
|
||||||
|
# lives in pyproject.toml + git tag vX.Y.Z (semver), independent of the
|
||||||
|
# deploy tag. CI OOM that had blocked all builds since 2026-04 is fixed.
|
||||||
|
image_tag = "latest"
|
||||||
}
|
}
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue