docs: ADR-0002 — all owned image builds move off-infra to GHA + ghcr [ci skip]
Viktor asked to evaluate fully external image builders because in-cluster CI builds keep destabilising the homelab (Forgejo OOM under registry-push load, hairpin push timeouts, build IO on the shared sdc HDD, registry PVC at its 50Gi ceiling). The evaluation was grilled to a decision set: - every owned image builds on GitHub Actions and lives on ghcr.io (extends the 2026-06-09 tripit pilot to the whole fleet) - per-repo visibility: 9 public mirrors + images (gated on a clean gitleaks/PII history scan), the personal/finance/gray ones stay private - clean cut: no in-cluster fallback build pipelines; existing build-fallback.yml files are deleted - Woodpecker becomes deploy-only; Forgejo registry freezes to one last-known-good tag per Service after a manual cleanup pass - dead builders (terminal-lobby, webhook-handler, hmrc-sync, trading-bot, travel-agent, trip-planner) are decommissioned, not migrated; travel_blog is decommissioned outright; manual images (x402-gateway, chrome-service-novnc, chatterbox-tts, android-emulator) get formalized GHA builds; infra-ci + CLI builds move to GHA on the public infra repo CONTEXT.md: updated 'GHA build + Woodpecker deploy', added 'Canonical repo', 'GitHub mirror', 'Forgejo registry' terms, image-path relationship, and a 'registry' ambiguity entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
3978eec53a
commit
623d34628a
2 changed files with 40 additions and 2 deletions
24
docs/adr/0002-all-image-builds-off-infra-gha-ghcr.md
Normal file
24
docs/adr/0002-all-image-builds-off-infra-gha-ghcr.md
Normal file
|
|
@ -0,0 +1,24 @@
|
|||
---
|
||||
status: accepted
|
||||
date: 2026-06-12
|
||||
---
|
||||
|
||||
# All owned images build off-infra on GitHub Actions and live on ghcr.io
|
||||
|
||||
In-cluster Woodpecker buildkit builds repeatedly hurt the homelab: registry-push load OOMKilled Forgejo (2026-06-09), buildkit→Forgejo pushes ride a flaky hairpin, build IO lands on the shared sdc HDD, and the Forgejo registry PVC sat at its 50Gi ceiling with retention stuck in DRY_RUN. We decided every owned image is built by GitHub Actions and hosted on ghcr.io, extending the tripit pilot (2026-06-09) to the whole fleet: Forgejo stays the canonical git host, a one-way push-mirror feeds a GitHub mirror, and the mirror's workflow builds, pushes, then POSTs Woodpecker's API to deploy. The Forgejo container registry is decommissioned as a build target — one manual cleanup pass keeps a last-known-good tag per Service, after which nothing pushes to it.
|
||||
|
||||
## Considered options
|
||||
|
||||
- **GHA builds pushing back into the Forgejo registry** — keeps images home and the pull path unchanged, but keeps the exact failure mode that motivated the move (Forgejo OOM under blob-push load), keeps the PVC growth, and keeps the circular dependency where the images needed to repair the cluster live inside the cluster. Rejected.
|
||||
- **Per-repo in-cluster fallback builds** (the old `build-fallback.yml` pattern) — rejected in favour of a clean cut: a GitHub outage pauses image builds (running workloads are unaffected), and existing fallback files are deleted. The hedge against ghcr's "currently free" private storage ever being enforced is the visibility split (public images are permanently free) plus re-creating fallbacks if that day comes.
|
||||
- **Paid builders (Docker Build Cloud, Depot)** — solve a multi-arch/persistent-cache problem this fleet doesn't have (everything is linux/amd64). Rejected.
|
||||
|
||||
## Consequences
|
||||
|
||||
- DR improves: images survive homelab loss, so a dead cluster can pull everything it needs to come back — the same doctrine that keeps the monorepo on GitHub ("Forgejo dies with the cluster").
|
||||
- Private ghcr pulls bypass the registry VM's pull-through cache (it can't authenticate), so cold-node pulls of private images depend on GitHub availability; public images cache normally.
|
||||
- Visibility is decided per repo: public = generic tooling that passes a gitleaks/PII history scan; private = personal, financial, or legally-gray domains. A failed scan means the repo stays private — canonical history is never rewritten for publication. For interpreted languages repo visibility ≈ image visibility (the image ships the source).
|
||||
- Only private-repo builds consume GitHub free-plan minutes (~12 builders, well under the 2,000/mo free tier; usage is reviewed after rollout wave 2 before considering Pro).
|
||||
- Woodpecker becomes deploy-only; its agents never build. The Kyverno-synced `registry-credentials` stays (Forgejo git + frozen last-known-good images); a cluster-wide Kyverno-synced `ghcr-credentials` joins it.
|
||||
- Builders with no live consumer (terminal-lobby, webhook-handler, hmrc-sync, trading-bot, travel-agent, trip-planner) are decommissioned rather than migrated; travel_blog is decommissioned outright (service + CI). Any revival adopts this ADR's pattern.
|
||||
- Workflows build single-manifest images (`provenance: false`, linux/amd64 only) so registry retention never faces the orphaned-index-children failure class that broke Forgejo's cleanup.
|
||||
Loading…
Add table
Add a link
Reference in a new issue