From 3e82c64a7659e1aa8d2a108bc2edb806f41605c5 Mon Sep 17 00:00:00 2001
From: Viktor Barzin <vbarzin@gmail.com>
Date: Sat, 13 Jun 2026 12:55:49 +0000
Subject: [PATCH] docs: sync CI/CD docs to ADR-0002 final state (ghcr +
 Woodpecker deploy-only) [ci skip]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ADR-0002 is fully landed (issues #11-#32 closed): every owned image now
builds on GitHub Actions and pushes to ghcr.io/viktorbarzin/<name>, with
Woodpecker reduced to deploy-only. The Forgejo container registry is frozen
and emptied; there are no in-cluster image builds or CI test runs anywhere.
The docs still described the old hybrid topology (DockerHub builds,
Woodpecker-native owned-app builds, the per-pattern migration lists, the
tripit-only pilot framing), which would mislead future sessions and
incident response.

This brings the docs to the completed reality (closes #33):

- docs/architecture/ci-cd.md: full rewrite as the canonical CI/CD reference —
  the fleet GHA->ghcr->Woodpecker-deploy pattern, public/private ghcr package
  split, infra-owned image workflows (incl. infra-ci on ghcr), the frozen
  Forgejo registry, what Woodpecker still runs, and the #31 decommissions.
- .claude/CLAUDE.md: rewrite the "CI/CD Architecture" section to the
  fleet-wide final state; FIX the stale claim that claude-memory-mcp builds
  to DockerHub (it is GHA->ghcr); note owned images now live on ghcr and the
  Forgejo registry is frozen/break-glass near the image-registry bullet.
- .claude/reference/service-catalog.md: f1-stream is GHA->ghcr + Woodpecker
  deploy-only (was "Woodpecker-native build->deploy").
- stacks/{tuya-bridge,android-emulator}/variables.tf + stacks/terminal/main.tf:
  cosmetic description/comment updates (forgejo -> ghcr; terminal-lobby has no
  CI pipeline). Description/comment text only — no stack logic changed.

Historical records (docs/post-mortems/*, docs/plans/*) and ADR-0002 itself
are left untouched as point-in-time records.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 .claude/CLAUDE.md                    | 131 ++++---
 .claude/reference/service-catalog.md |   2 +-
 docs/architecture/ci-cd.md           | 530 ++++++++++++++-------------
 stacks/android-emulator/variables.tf |   2 +-
 stacks/terminal/main.tf              |   7 +-
 stacks/tuya-bridge/variables.tf      |   2 +-
 6 files changed, 379 insertions(+), 295 deletions(-)
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index d0bc9444..37ab99f3 100755
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -38,7 +38,7 @@ Violations cause state drift, which causes future applies to break or silently r
   - **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. Smoke-test target: `echo.viktorbarzin.me` (auth=public, header-reflecting backend).
 - **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<backend>.<ns>.svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct, blog has `/net-diag.sh` direct), declare a second `ingress_factory` with `ingress_path = ["/<path>"]` pointing at the bare backend service. Active on: blog (except `/net-diag.sh`), www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering.
 - **Docker images**: Always build for `linux/amd64`. SHA-tag rule is being phased out — see `docs/plans/2026-05-16-auto-upgrade-apps-{design,plan}.md`. New model: CI pushes `:latest` (optionally also `:<8-char-sha>` for traceability), Keel polls and triggers rollouts. Cache-staleness concern from the old rule is resolved at the nginx layer (URL-split — manifests pass through, blobs cached). Until Phase 1 of the migration completes (per the plan), follow the SHA-tag rule for new services to match existing pattern.
-- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. **Kubelet pulls** are kept off the hairpin **at the resolver, with zero node-side DNS config**: pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium (added 2026-06-10, `docs/runbooks/pfsense-unbound.md`), whose split-horizon zone CNAMEs every ingress host (auto-synced hourly by `technitium-ingress-dns-sync`) to the zone apex whose A record tracks the **live** Traefik LB IP (canary: `viktorbarzin-apex-probe`, alerts ViktorBarzinApexDrift). Nodes are stock — link DNS `10.0.20.1 94.140.14.14` via `qm set --nameserver`, no `/etc/hosts` pins, no resolved drop-ins (two same-day interim approaches on 2026-06-10 were removed the same day). The containerd `hosts.toml` mirror (`[host."https://10.0.20.203"]`, `skip_verify = true`) still exists but is **vestigial** — it can NOT keep pulls internal on its own: Traefik routes by Host/SNI and 404s the mirror's bare-IP requests, and the registry's Bearer auth realm is the absolute `https://forgejo.viktorbarzin.me/v2/token` URL fetched outside the mirror — without internal DNS every fresh pull degrades to public DNS → hairpin → intermittent `dial tcp 176.12.22.76:443: i/o timeout` ImagePullBackOff (tuya-bridge 7.5h outage 2026-06-10, tripit 2026-06-09; see `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`). **In-cluster pods are ordinary internal clients too** (since 2026-06-10 evening) — CoreDNS's dedicated `viktorbarzin.me:53` block (Corefile in `stacks/technitium/modules/technitium/main.tf`) forwards to the Technitium ClusterIP `10.96.0.53`, so pods get the same split-horizon answers as everyone else; forgejo stays pinned to Traefik's **ClusterIP** in that block (TF-interpolated from the live Service) so CI pushes survive a Technitium outage. This relies on a k8s-1.34 behavior verified 2026-06-10: **pods CAN reach the ETP=Local Traefik LB IP** (kube-proxy short-circuits in-cluster traffic to LB IPs via the cluster path) — re-verify after major k8s upgrades; canary = the uptime-kuma `[External]` fleet going red. (The block briefly forwarded to `8.8.8.8/1.1.1.1` earlier that day, which kept pods on the WAN IP and the broken TP-Link NAT loopback — 27 non-proxied `[External]` monitors dark; beads code-yh33.) **Was `.200` until 2026-06-01** — Traefik's 2026-05-30 move to its dedicated `.203` left the mirror pointing at the now-dead `.200:443`, silently breaking every *fresh* forgejo pull; a future LB renumber is now handled by DNS (apex record + drift probe) — only the vestigial hosts.toml literal would go stale. Mirror source lives in `modules/create-template-vm/k8s-node-containerd-setup.sh` (new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing nodes; also cleans up the legacy 2026-06-10 node-DNS customization). Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest` + any buildkit `*cache*` tag — **REVERTED to DRY_RUN 2026-06-10 after its first live run orphaned OCI index children** (multi-arch/attestation children are separate *untagged* sha256 versions that sort outside the newest-10 window while their parent index is kept; broke `kms-website:latest`+`:dfc83fb`, caught by the integrity probe, healed by re-tagging latest→a794d1a + deleting the corrupt version; see `docs/post-mortems/2026-06-10-forgejo-retention-orphaned-indexes.md`). Do NOT re-enable deletes until the keep-set resolves kept indexes' child digests (or skips untagged versions, or moves to Forgejo's native container-aware cleanup rules). The registry PVC remains at its 50Gi autoresize ceiling on the HDD (we did NOT move it to SSD, see beads code-oflt), so a container-aware retention is still needed. Integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
+- **Image registry**: **Owned images now live on `ghcr.io/viktorbarzin/<name>`** (ADR-0002, built by GHA — see the CI/CD Architecture section). The **Forgejo container registry is FROZEN + emptied** (break-glass only — `docs/runbooks/forgejo-registry-breakglass.md`); nothing pushes to it. The rest of this bullet documents the **still-live forgejo-pull DNS/mirror machinery** (it remains in place for the break-glass path + because `registry-credentials` is still Kyverno-synced; the hairpin lessons apply to any internal-registry pull). Historical usage was `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. **Kubelet pulls** are kept off the hairpin **at the resolver, with zero node-side DNS config**: pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium (added 2026-06-10, `docs/runbooks/pfsense-unbound.md`), whose split-horizon zone CNAMEs every ingress host (auto-synced hourly by `technitium-ingress-dns-sync`) to the zone apex whose A record tracks the **live** Traefik LB IP (canary: `viktorbarzin-apex-probe`, alerts ViktorBarzinApexDrift). Nodes are stock — link DNS `10.0.20.1 94.140.14.14` via `qm set --nameserver`, no `/etc/hosts` pins, no resolved drop-ins (two same-day interim approaches on 2026-06-10 were removed the same day). The containerd `hosts.toml` mirror (`[host."https://10.0.20.203"]`, `skip_verify = true`) still exists but is **vestigial** — it can NOT keep pulls internal on its own: Traefik routes by Host/SNI and 404s the mirror's bare-IP requests, and the registry's Bearer auth realm is the absolute `https://forgejo.viktorbarzin.me/v2/token` URL fetched outside the mirror — without internal DNS every fresh pull degrades to public DNS → hairpin → intermittent `dial tcp 176.12.22.76:443: i/o timeout` ImagePullBackOff (tuya-bridge 7.5h outage 2026-06-10, tripit 2026-06-09; see `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md`). **In-cluster pods are ordinary internal clients too** (since 2026-06-10 evening) — CoreDNS's dedicated `viktorbarzin.me:53` block (Corefile in `stacks/technitium/modules/technitium/main.tf`) forwards to the Technitium ClusterIP `10.96.0.53`, so pods get the same split-horizon answers as everyone else; forgejo stays pinned to Traefik's **ClusterIP** in that block (TF-interpolated from the live Service) so CI pushes survive a Technitium outage. This relies on a k8s-1.34 behavior verified 2026-06-10: **pods CAN reach the ETP=Local Traefik LB IP** (kube-proxy short-circuits in-cluster traffic to LB IPs via the cluster path) — re-verify after major k8s upgrades; canary = the uptime-kuma `[External]` fleet going red. (The block briefly forwarded to `8.8.8.8/1.1.1.1` earlier that day, which kept pods on the WAN IP and the broken TP-Link NAT loopback — 27 non-proxied `[External]` monitors dark; beads code-yh33.) **Was `.200` until 2026-06-01** — Traefik's 2026-05-30 move to its dedicated `.203` left the mirror pointing at the now-dead `.200:443`, silently breaking every *fresh* forgejo pull; a future LB renumber is now handled by DNS (apex record + drift probe) — only the vestigial hosts.toml literal would go stale. Mirror source lives in `modules/create-template-vm/k8s-node-containerd-setup.sh` (new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing nodes; also cleans up the legacy 2026-06-10 node-DNS customization). Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest` + any buildkit `*cache*` tag — **REVERTED to DRY_RUN 2026-06-10 after its first live run orphaned OCI index children** (multi-arch/attestation children are separate *untagged* sha256 versions that sort outside the newest-10 window while their parent index is kept; broke `kms-website:latest`+`:dfc83fb`, caught by the integrity probe, healed by re-tagging latest→a794d1a + deleting the corrupt version; see `docs/post-mortems/2026-06-10-forgejo-retention-orphaned-indexes.md`). Do NOT re-enable deletes until the keep-set resolves kept indexes' child digests (or skips untagged versions, or moves to Forgejo's native container-aware cleanup rules). The registry PVC remains at its 50Gi autoresize ceiling on the HDD (we did NOT move it to SSD, see beads code-oflt), so a container-aware retention is still needed. Integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
 - **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts.
 - **Node memory changes**: When changing VM memory on any k8s node, update kubelet `systemReserved`, `kubeReserved`, and eviction thresholds accordingly. Config: `/var/lib/kubelet/config.yaml`. Template: `stacks/infra/main.tf`. Current values: systemReserved=512Mi, kubeReserved=512Mi, evictionHard=500Mi, evictionSoft=1Gi.
 - **Node OS disk tuning** (in `stacks/infra/main.tf`): kubelet `imageGCHighThresholdPercent=70` (was 85), `imageGCLowThresholdPercent=60` (was 80), ext4 `commit=60` in fstab (was default 5s), journald `SystemMaxUse=200M` + `MaxRetentionSec=3day`.
@@ -87,62 +87,103 @@ Violations cause state drift, which causes future applies to break or silently r
 - **Pin database versions**: Disable Diun (image update monitoring) for MySQL, PostgreSQL, Redis.
 - **Quarterly right-sizing**: Run `krr` (Dockerized, against Prometheus) for recommendations; compare to current requests and adjust in TF. (Goldilocks dashboard removed 2026-06-12.)
 
-## CI/CD Architecture — GHA Builds + Woodpecker Deploy
+## CI/CD Architecture — GHA Builds → ghcr + Woodpecker Deploy
 
-**Doctrine (ADR-0002): leverage external infra for ALL CI compute.** Builds,
-tests, lint, and release jobs run on GitHub Actions hosted runners (public
-repos: unlimited free; private: 2000 free min/mo) — never on cluster nodes.
-In-cluster pipelines are reserved for cluster-touching steps only: Woodpecker
-deploys (`kubectl set image`), terragrunt applies, certbot. Do not
-(re)introduce in-cluster image builds or CI test runs — the fallback-build
-pattern was deliberately removed (clean cut). **Watch what you trigger**:
-after any push that fires a build chain, monitor it to completion (GHA run →
-Woodpecker deploy → `rollout status`) and fix failures immediately; verify
-via live state, not the checkmark. Fleet migration: PRD infra#10 (ADR-0002).
+**Doctrine (ADR-0002, fleet-wide as of 2026-06-13): ALL image builds + CI
+compute run OFF-infra.** Every owned image is built/linted/tested on GitHub
+Actions (public repos: free; private: 2000 free min/mo) and pushed to
+`ghcr.io/viktorbarzin/<name>`. **No in-cluster image builds or CI test runs
+exist anywhere** — the in-cluster Woodpecker buildkit and the fallback-build
+pattern were removed (clean cut). Woodpecker is **deploy-only** (plus infra
+applies + maintenance crons). Canonical CI/CD reference:
+`docs/architecture/ci-cd.md`; decision: `docs/adr/0002-all-image-builds-off-infra-gha-ghcr.md`.
+**Watch what you trigger**: after a push that fires a build chain, follow it to
+completion (GHA run → Woodpecker deploy → `rollout status`) and fix failures;
+verify via live state, not the checkmark.
 
-**Owned-app deploy model (build triggers the rollout — 2026-06-02):** For
-self-hosted apps **we build** (Forgejo `viktor/<name>` + Dockerfile +
-`.woodpecker.yml`), the build pipeline ALSO drives the rollout — atomic +
-deterministic, no wait for Keel's poll. Pattern (`build-and-push` tags `latest`
-+ `${CI_COMMIT_SHA:0:8}`, then a `deploy` step): `kubectl set image
-deployment/<app> <container>=<repo>:${CI_COMMIT_SHA:0:8} -n <ns>` +
-`kubectl rollout status ... --timeout=300s`. The `woodpecker-agent` SA is
-`cluster-admin`, so the `bitnami/kubectl` step needs no kubeconfig/RBAC (uses
-its in-cluster SA). **Keel stays enrolled in parallel** as a redundant net
-(finds the deployed SHA already running → no-op). Requires the Deployment to
-have `ignore_changes` on `…container[0].image` (KEEL_IGNORE_IMAGE) so CI
-`set image` doesn't fight `terragrunt apply`. CronJobs in owned apps use
-`:latest` + `imagePullPolicy: Always` (fresh pod each run) instead of a deploy
-step. **Never** `set image`/`rollout restart` operator-managed StatefulSets
-(memory id=740). Reference impls: `tuya_bridge/.woodpecker.yml`,
-`job-hunter`, `f1-stream` (viktor/f1-stream, extracted from this monorepo
-2026-06-05). This reverses decision #12 of
-`docs/plans/2026-05-16-auto-upgrade-apps-design.md` for owned (not upstream)
-images.
+**The fleet pattern (every owned app):** Forgejo `viktor/<repo>` (canonical)
+push-mirrors (`sync_on_commit`) → GitHub `ViktorBarzin/<repo>` → GHA
+`.github/workflows/build.yml` (committed on Forgejo, mirrors over): `on: push:
+branches:[master]` ONLY (feature branches mirror but build/deploy nothing — the
+safety valve). The `build` job: lint/test → `svu` cuts the next `vX.Y.Z` tag to
+CANONICAL Forgejo (GHA secret `FORGEJO_GIT_TOKEN` = write:repository PAT) + bakes
+`VERSION` → `buildx` `linux/amd64` `provenance:false` (single-manifest, dodges
+the orphaned-index-children class) → push `ghcr.io/viktorbarzin/<name>:<sha8>` +
+`:latest` → `delete-package-versions` keep-10. The `deploy` job POSTs
+`ci.viktorbarzin.me/api/repos/<id>/pipelines` (the GitHub-mirror's Woodpecker
+registration, github-forge; GHA secret `WOODPECKER_TOKEN`) with `IMAGE_TAG` +
+`IMAGE_NAME` → `.woodpecker/deploy.yml` (event:**manual** ONLY, so the raw
+Forgejo→GitHub mirror pushes don't fire a tag-less deploy) runs `kubectl set
+image deployment/<app> …` in-cluster (woodpecker-agent SA = cluster-admin, no
+kubeconfig). Deployment image is `ignore_changes`/KEEL_IGNORE_IMAGE so the SHA
+sticks vs `terragrunt apply`; CronJobs track `:latest` + `imagePullPolicy:
+Always`. **Keel stays enrolled** as a redundant net (sees the SHA already
+running → no-op). **Never** `set image`/`rollout restart` operator-managed
+StatefulSets (memory id=740). Onboarding tool: `scripts/offinfra-onboard` +
+`scripts/offinfra-templates/`; mirror + workflow commits via the Forgejo API over
+the internal Traefik LB (`curl --resolve forgejo.viktorbarzin.me:443:10.0.20.203`).
+Reference impls: tripit (the original pilot), f1-stream, job-hunter, tuya_bridge.
 
-**Flow (GHA-migrated apps)**: `git push → GHA build+push DockerHub (8-char SHA) → POST Woodpecker API → kubectl set image`
+**Migrated apps (issues #13–#27):** f1-stream, job-hunter, tuya_bridge,
+beadboard, nextcloud-todos, claude-agent-service, **claude-memory-mcp** (GHA →
+ghcr, NOT DockerHub), kms-website, Freedify, instagram-poster, payslip-ingest,
+broker-sync (image `wealthfolio-sync`), fire-planner, recruiter-responder,
+x402-gateway — plus tripit. Earlier public-repo apps already on GHA (Website,
+k8s-portal, apple-health-data, audiblez-web, plotting-book, insta2spotify,
+audiobook-search, council-complaints) now also land on ghcr.
+- **PUBLIC ghcr packages:** beadboard, nextcloud-todos, claude-agent-service,
+  claude-memory-mcp, kms-website, freedify, tuya_bridge, x402-gateway,
+  chrome-service-novnc, android-emulator.
+- **PRIVATE ghcr:** f1-stream, job-hunter, instagram-poster, payslip-ingest,
+  wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli,
+  infra-ci. Pulled via the Kyverno-synced `ghcr-credentials` allowlist
+  (`stacks/kyverno/modules/kyverno/ghcr-credentials.tf`; NOT cluster-wide; cred
+  = Vault `secret/viktor/ghcr_pull_token`, an alias of the admin `github_pat` —
+  GitHub has no token-mint API, swap the alias value if a scoped token is ever
+  UI-minted).
 
-**Migrated to GHA** (9): Website, k8s-portal, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
-**Woodpecker-native owned-app build** (Forgejo registry, build->deploy in one `.woodpecker.yml`): tuya_bridge, job-hunter, f1-stream (extracted to viktor/f1-stream 2026-06-05; Woodpecker repo id 166; the old github source is archived + its GHA repo-id-10 deactivated)
-**Woodpecker-only**: travel_blog (1.4GB content too large for GHA), infra pipelines (terragrunt apply, certbot, build-cli — need cluster access)
-**Private Forgejo repo → off-infra GHA → GHCR** (NEW 2026-06-09 — gentler builds: keeps build IO **and** the registry push OFF the homelab/sdc; replaces in-cluster Woodpecker buildkit for private repos): **tripit** is the pilot. Forgejo `viktor/tripit` (canonical) push-mirrors → PRIVATE `ViktorBarzin/tripit` GitHub repo (`sync_on_commit`); `.github/workflows/build.yml` (committed on Forgejo, mirrors over) builds + pushes `ghcr.io/viktorbarzin/tripit:<sha>+latest` on GHA (free, ~2min, GHA-native cache). Cluster pulls of PRIVATE ghcr images use the `ghcr-credentials` dockerconfigjson, cloned by the kyverno stack's `sync-ghcr-credentials` ClusterPolicy to an explicit ALLOWLIST of private-ghcr namespaces only (ADR-0002; source `stacks/kyverno/modules/kyverno/ghcr-credentials.tf`; cred = Vault `secret/viktor/ghcr_pull_token`, currently an alias of the admin `github_pat` — GitHub has no token-mint API, swap the alias value if a scoped token is ever UI-minted). **Auto-deploy** (verified 2026-06-09): the GHA `deploy` job POSTs `ci.viktorbarzin.me/api/repos/167/pipelines` (Woodpecker repo **167** = the GitHub mirror, registered github-forge; GHA secret `WOODPECKER_TOKEN`) with `IMAGE_TAG`+`IMAGE_NAME` → `.woodpecker/deploy.yml` (event:**manual** ONLY, so the Forgejo→GitHub mirror's raw pushes don't fire a tag-less deploy) runs `kubectl set image deployment/tripit tripit=… alembic-migrate=…` in-cluster (woodpecker-agent SA = cluster-admin, no kubeconfig). Image is KEEL_IGNORE_IMAGE so the SHA tag sticks; worker CronJobs track `:latest`. **Semver** (parallel layer): the GHA `build` job runs `svu` v3.4.1 over conventional commits, auto-cuts the next `vX.Y.Z` git tag pushed to CANONICAL Forgejo (GHA secret `FORGEJO_GIT_TOKEN` = write:repository PAT, NOT the package-scoped push token) and bakes `VERSION` → app reports it at `/api/version` (verified 0.2.1). Deploy tag stays the 8-char SHA. The old in-cluster `.woodpecker/build.yml` was DELETED (only `.woodpecker/deploy.yml` remains). GitHub default branch must be `master`. **Replicate to f1-stream, tuya_bridge, job-hunter** (currently Woodpecker-native in-cluster builds). Mirror + workflow-file commits are done via the Forgejo API over the internal Traefik LB (`curl --resolve forgejo.viktorbarzin.me:443:10.0.20.203`) since the devvm can't reach forgejo's public hairpin.
+**Infra-owned images (issues #29/#30)** build on GHA workflows IN the infra
+repo's own `.github/workflows/` (added to the GitHub lineage via PR; the
+github↔forgejo divergence was deliberately NOT reconciled):
+`build-chrome-service-novnc.yml` + `build-android-emulator.yml` → public ghcr;
+`build-cli.yml` → DockerHub `viktorbarzin/infra` (kept) + `ghcr.io/viktorbarzin/infra-cli`;
+`build-infra-ci.yml` → `ghcr.io/viktorbarzin/infra-ci`. **infra-ci** is the image
+the `.woodpecker/default.yml` apply step + `drift-detection.yml` run in (proven
+by pipelines 165/166). chatterbox-tts is already built by tripit's GHA → ghcr.
+The Woodpecker `build-ci-image.yml` + `build-cli.yml` pipelines were REMOVED;
+infra-ci break-glass is a manual `.woodpecker/breakglass-infra-ci.yml` (ghcr
+pull-and-save to the registry VM).
 
-**Per-project files**:
-- `.github/workflows/build-and-deploy.yml` — GHA: checkout, build, push DockerHub, POST Woodpecker API
-- `.woodpecker/deploy.yml` — Woodpecker: `kubectl set image` + Slack notify (event: `[manual, push]`)
-- `.woodpecker/build-fallback.yml` — Old full build pipeline preserved (event: `deployment` — never auto-fires)
+**Forgejo container registry: FROZEN + emptied** (issue #32 wiped all `viktor/*`
+container packages). Break-glass-only now; nothing pushes. `forgejo-cleanup`
+stays DRY_RUN. Pull-through caches on `10.0.20.10` are unchanged. Runbook:
+`docs/runbooks/forgejo-registry-breakglass.md`.
 
-**Woodpecker API**: Uses **numeric repo IDs** (`/api/repos/2/pipelines`), NOT owner/name paths (those return HTML).
-Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handler=6, audiblez-web=9, plotting-book=43, claude-memory-mcp=78, infra-onboarding=79, council-complaints=TBD (f1-stream's old GHA-era github repo id 10 is deactivated; it's now a Woodpecker-native Forgejo build at repo id 166)
+**Woodpecker now runs only:** per-app `deploy.yml` (manual, `kubectl set
+image`), `default.yml` (terragrunt apply), `renew-tls.yml` (certbot),
+maintenance crons (drift-detection, provision-user, registry-config-sync,
+pve-nfs-exports-sync, issue-automation, postmortem-todos, k8s-portal), and the
+manual `breakglass-infra-ci.yml`. **No build/test pipeline on any repo — do not
+(re)introduce one.**
+
+**Decommissioned (issue #31):** travel_blog (stack destroyed + dir removed), 6
+dead builders' pipelines (terminal-lobby, webhook-handler, hmrc-sync,
+trading-bot, travel-agent, trip-planner), and all `build-fallback.yml` files
+(only Website had one).
+
+**Woodpecker API**: numeric repo IDs (`/api/repos/<id>/pipelines`), NOT
+owner/name (those return HTML). The deploy registration for each app is the
+**GitHub mirror** repo (github-forge). Infra: Forgejo forge = repo 82, legacy
+GitHub forge = repo 1.
 
 **Woodpecker YAML gotchas**:
 - Commands with `${VAR}:${VAR}` must be **quoted** — unquoted `:` triggers YAML map parsing when vars are empty
 - Use `bitnami/kubectl:latest` (not pinned versions — entrypoint compatibility issues)
 - Global secrets must have `manual` in their events list for API-triggered pipelines
 
-**GitHub repo secrets** (set on all repos): `DOCKERHUB_USERNAME`, `DOCKERHUB_TOKEN`, `WOODPECKER_TOKEN`
-
-**Infra pipelines unchanged**: `default.yml` (terragrunt apply), `renew-tls.yml` (certbot cron), `build-cli.yml` (dual registry push), `k8s-portal.yml` (path-filtered build), `provision-user.yml` — all stay on Woodpecker.
+**GitHub repo secrets** (per repo): `WOODPECKER_TOKEN` (POST deploy pipeline),
+`FORGEJO_GIT_TOKEN` (write:repository PAT for the svu tag push). ghcr push uses
+the workflow's built-in `GITHUB_TOKEN` (`packages: write`).
 
 ## Database Host
 
diff --git a/.claude/reference/service-catalog.md b/.claude/reference/service-catalog.md
index 632505c0..ec78beac 100644
--- a/.claude/reference/service-catalog.md
+++ b/.claude/reference/service-catalog.md
@@ -47,7 +47,7 @@
 | nextcloud | File sync/share | nextcloud |
 | calibre | E-book management (may be merged into ebooks stack) | calibre |
 | onlyoffice | Document editing | onlyoffice |
-| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier); source in own repo `viktor/f1-stream` (Forgejo, extracted 2026-06-05), Woodpecker-native build->deploy (repo id 166) | f1-stream |
+| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier); canonical source in own repo `viktor/f1-stream` (Forgejo, extracted 2026-06-05); GHA-built → `ghcr.io/viktorbarzin/f1-stream` (private), Woodpecker deploy-only (ADR-0002) | f1-stream |
 | chrome-service | Headed Chromium over CDP (`http://chrome-service.chrome-service.svc:9222`, `connect_over_cdp`; legacy `:3000/<token>` WS pool removed 2026-06-04) for sibling services driving anti-bot pages — snapshot-harvester CronJob + tripit fare scrape | chrome-service |
 | rybbit | Analytics | rybbit |
 | isponsorblocktv | SponsorBlock for TV | isponsorblocktv |
diff --git a/docs/architecture/ci-cd.md b/docs/architecture/ci-cd.md
index e44df43d..c4493f86 100644
--- a/docs/architecture/ci-cd.md
+++ b/docs/architecture/ci-cd.md
@@ -2,334 +2,374 @@
 
 ## Overview
 
-The CI/CD pipeline uses a hybrid approach: GitHub Actions for building Docker images (providing free compute for public repos) and Woodpecker CI for deployments (leveraging cluster-internal access). Git pushes trigger GHA builds that produce Docker images with 8-character SHA tags, push to DockerHub, then POST to Woodpecker's API to trigger deployments that update Kubernetes workloads via `kubectl set image`.
+**Doctrine (ADR-0002): all image builds and CI compute run OFF-infra.** Every
+owned image is built, tested, and linted on **GitHub Actions** (free on public
+repos; 2000 free min/mo on private) and pushed to **`ghcr.io/viktorbarzin/<name>`**.
+Woodpecker is **deploy-only** — a GHA job POSTs its API with the freshly-built
+image tag and Woodpecker runs `kubectl set image` from inside the cluster.
+There are **no in-cluster image builds or CI test runs anywhere** — the
+in-cluster Woodpecker buildkit and the fallback-build pattern were removed as a
+clean cut (ADR-0002, 2026-06-13). The Forgejo container registry is **frozen
+and emptied** — break-glass only.
+
+This breaks the old circular dependency (images needed to repair the cluster
+used to be built and stored *inside* it) and keeps build IO + registry pushes
+off the homelab spindle.
 
 ## Architecture Diagram
 
 ```mermaid
 graph LR
-    A[Git Push] --> B[GitHub Actions]
-    B --> C[Build Docker Image<br/>linux/amd64, 8-char SHA tag]
-    C --> D[Push to DockerHub]
-    D --> E[POST Woodpecker API]
-    E --> F[Woodpecker Pipeline]
-    F --> G[Vault K8s Auth<br/>SA JWT]
-    G --> H[kubectl set image]
-    H --> I[K8s Deployment]
-    I --> J[Pull from DockerHub<br/>or Pull-Through Cache]
+    A[git push Forgejo<br/>viktor/&lt;repo&gt; canonical] --> B[push-mirror sync_on_commit]
+    B --> C[GitHub mirror<br/>ViktorBarzin/&lt;repo&gt;]
+    C --> D[GitHub Actions<br/>.github/workflows/build.yml]
+    D --> E[lint / test]
+    E --> F[buildx linux/amd64<br/>provenance:false]
+    F --> G[push ghcr.io/viktorbarzin/&lt;name&gt;<br/>:sha8 + :latest]
+    G --> H[svu tag -> Forgejo canonical]
+    G --> I[POST Woodpecker deploy repo]
+    I --> J[.woodpecker/deploy.yml<br/>event: manual]
+    J --> K[kubectl set image<br/>in-cluster SA cluster-admin]
+    K --> L[K8s Deployment<br/>pulls from ghcr]
 
-    K[Pull-Through Cache<br/>10.0.20.10] -.-> J
-    L[forgejo.viktorbarzin.me<br/>Private Registry on Forgejo] -.-> J
-
-    style B fill:#2088ff
-    style F fill:#4c9e47
-    style K fill:#f39c12
+    style D fill:#2088ff
+    style J fill:#4c9e47
+    style G fill:#f39c12
 ```
 
 ## Components
 
-| Component | Version | Location | Purpose |
-|-----------|---------|----------|---------|
-| GitHub Actions | Cloud | `.github/workflows/build-and-deploy.yml` | Build Docker images, push to DockerHub |
-| Woodpecker CI | Self-hosted | `ci.viktorbarzin.me` | Deploy to Kubernetes cluster |
-| DockerHub | Cloud | `viktorbarzin/*` | Public image registry |
-| Private Registry | Forgejo Packages | `forgejo.viktorbarzin.me/viktor` | Private container images (PAT auth, retention CronJob) — migrated from registry.viktorbarzin.me 2026-05-07 |
-| Pull-Through Cache | Custom | `10.0.20.10:5000` (docker.io)<br/>`10.0.20.10:5010` (ghcr.io) | LAN cache for remote registries |
-| Kyverno | Cluster | `kyverno` namespace | Auto-sync registry credentials to all namespaces |
-| Vault | Cluster | `vault.viktorbarzin.me` | K8s auth for Woodpecker pipelines |
+| Component | Location | Purpose |
+|-----------|----------|---------|
+| GitHub Actions | `.github/workflows/build.yml` (per repo) | Build + lint + test + push image; trigger deploy; cut semver tag |
+| ghcr.io | `ghcr.io/viktorbarzin/*` | Container registry for ALL owned images (public + private packages) |
+| Woodpecker CI | `ci.viktorbarzin.me` | **Deploy-only** — `kubectl set image` in-cluster; plus infra applies + maintenance crons |
+| Forgejo | `forgejo.viktorbarzin.me/viktor/<repo>` | **Canonical** git source (push-mirrors to GitHub). Container registry **FROZEN** (break-glass only) |
+| Pull-Through Cache | `10.0.20.10:5000/5010/5020/5030/5040` | LAN cache for upstream registries (DockerHub, ghcr, Quay, k8s.gcr, Kyverno) |
+| Kyverno | `kyverno` namespace | Syncs `ghcr-credentials` (private-ghcr allowlist) + `registry-credentials` to namespaces |
+| Vault | `vault.viktorbarzin.me` | K8s auth for Woodpecker deploy pipelines; CI tokens in `secret/ci/global` + `secret/viktor` |
 
 ## How It Works
 
-### Build Flow (GitHub Actions)
+### The fleet pattern (every owned app)
 
-1. **Trigger**: Git push to main/master branch
-2. **Build**: GHA builds Docker image for `linux/amd64` platform only
-3. **Tag**: Image tagged with 8-character commit SHA (e.g., `viktorbarzin/app:a1b2c3d4`)
-   - `:latest` tags are **never used** to prevent stale pull-through cache issues
-4. **Push**: Image pushed to DockerHub public registry
-5. **Trigger Deploy**: POST request to Woodpecker API with repo ID and commit SHA
+1. **Canonical source = Forgejo** `viktor/<repo>`. A **push-mirror**
+   (`sync_on_commit`) pushes every commit to the GitHub mirror
+   `ViktorBarzin/<repo>`. The `.github/workflows/build.yml` is committed on
+   Forgejo and mirrors over.
+2. **GHA `build` job** (triggers `on: push: branches: [master]` ONLY — feature
+   branches mirror but build/deploy nothing, the safety valve):
+   - lint + test
+   - `svu` computes the next `vX.Y.Z` from conventional commits and pushes the
+     tag back to **canonical Forgejo** (GHA secret `FORGEJO_GIT_TOKEN` =
+     write:repository PAT); `VERSION` is baked into the image
+   - `docker buildx` `linux/amd64`, **`provenance: false`** (single-manifest —
+     avoids the orphaned-index-children failure class), push
+     `ghcr.io/viktorbarzin/<name>:<sha8>` + `:latest`
+   - `delete-package-versions` keeps the newest ~10 ghcr versions
+3. **GHA `deploy` job** POSTs `ci.viktorbarzin.me/api/repos/<id>/pipelines`
+   (the Woodpecker registration for the **GitHub mirror**, github-forge; GHA
+   secret `WOODPECKER_TOKEN`) with `IMAGE_TAG` + `IMAGE_NAME`.
+4. **`.woodpecker/deploy.yml`** (event: **manual** only, so the raw
+   Forgejo→GitHub mirror pushes don't fire a tag-less deploy) runs `kubectl set
+   image deployment/<app> <container>=<image>` in-cluster. The `woodpecker-agent`
+   SA is `cluster-admin`, so the `bitnami/kubectl` step needs no
+   kubeconfig/RBAC. The Deployment image is in `lifecycle.ignore_changes`
+   (`KEEL_IGNORE_IMAGE`) so the SHA tag sticks and `terragrunt apply` doesn't
+   fight it. CronJobs in owned apps track `:latest` + `imagePullPolicy: Always`
+   instead of a deploy step.
 
-### Deploy Flow (Woodpecker CI)
+**Keel stays enrolled** as a redundant net (finds the deployed SHA already
+running → no-op).
 
-1. **Receive Webhook**: Woodpecker API receives deployment trigger from GHA
-2. **Authenticate**: Pipeline uses Kubernetes ServiceAccount JWT to authenticate with Vault via K8s auth
-3. **Deploy**: `kubectl set image deployment/<name> <container>=viktorbarzin/<app>:<sha>`
-4. **Notify**: Slack notification on success/failure
+**Tooling**: `infra/scripts/offinfra-onboard` + `infra/scripts/offinfra-templates/`
+scaffold a repo onto this pattern (mirror, workflow, Woodpecker deploy repo,
+old-pipeline removal, default-branch flip). Mirror + workflow commits go via
+the Forgejo API over the internal Traefik LB
+(`curl --resolve forgejo.viktorbarzin.me:443:10.0.20.203`) since the devvm
+can't reach Forgejo's public hairpin.
 
-### Project Migration Status
+### ghcr package visibility
 
-**Migrated to GHA (8 projects)**:
-- Website
-- k8s-portal
-- claude-memory-mcp
-- apple-health-data
-- audiblez-web
-- plotting-book
-- insta2spotify
-- book-search (audiobook-search)
+| Visibility | Packages | Pull mechanism |
+|------------|----------|----------------|
+| **Public** | beadboard, nextcloud-todos, claude-agent-service, claude-memory-mcp, kms-website, freedify, tuya_bridge, x402-gateway, chrome-service-novnc, android-emulator | Anonymous |
+| **Private** | f1-stream, job-hunter, instagram-poster, payslip-ingest, wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli, infra-ci | `ghcr-credentials` dockerconfigjson |
 
-**Woodpecker-native owned-app builds** (build + push to the Forgejo private
-registry + `kubectl set image` rollout, all in one `.woodpecker.yml`; Keel
-stays enrolled as a redundant net): `tuya_bridge`, `job-hunter`, `f1-stream`.
-`f1-stream` was extracted from this monorepo to `viktor/f1-stream` on
-2026-06-05 (Woodpecker repo id 166); the old github source is archived and its
-GHA-era Woodpecker repo (id 10) is deactivated.
+Private-image pulls use the `ghcr-credentials` dockerconfigjson, cloned by the
+kyverno stack's `sync-ghcr-credentials` ClusterPolicy to an explicit
+**ALLOWLIST** of private-ghcr namespaces only (NOT cluster-wide; source
+`stacks/kyverno/modules/kyverno/ghcr-credentials.tf`). Cred = Vault
+`secret/viktor/ghcr_pull_token` (an alias of the admin `github_pat` — GitHub
+has no token-mint API; swap the alias value if a scoped token is ever
+UI-minted).
 
-**Woodpecker-only (infra + large apps)**:
-- `travel_blog`: 5.7GB content directory exceeds GHA limits
-- Infra pipelines: require cluster access (terragrunt apply, certbot, build-cli)
+### Migrated apps (issues #13–#27)
 
-### Woodpecker Pipeline Files
+f1-stream, job-hunter, tuya_bridge, beadboard, nextcloud-todos,
+claude-agent-service, claude-memory-mcp, kms-website, Freedify,
+instagram-poster, payslip-ingest, broker-sync (image name `wealthfolio-sync`),
+fire-planner, recruiter-responder, x402-gateway — plus **tripit** (the original
+pilot, 2026-06-09). Earlier public-repo apps already on GHA (Website,
+k8s-portal, apple-health-data, audiblez-web, plotting-book, insta2spotify,
+audiobook-search, council-complaints) now also land on ghcr.
 
-Each project contains:
-- `.woodpecker/deploy.yml`: kubectl set image + Slack notification
-- `.woodpecker/build-fallback.yml`: Legacy full build pipeline (event: deployment, never auto-fires)
+### Infra-owned images (issues #29 / #30)
 
-### Woodpecker Repository IDs
+Images owned by the infra repo build on GHA workflows **in the infra repo's own
+`.github/workflows/`** (the github↔forgejo divergence was deliberately NOT
+reconciled — the workflows were added to the GitHub lineage via PR):
 
-Woodpecker API uses numeric IDs (not owner/name):
+| Image | Workflow | Destination |
+|-------|----------|-------------|
+| chrome-service-novnc | `build-chrome-service-novnc.yml` | public `ghcr.io/viktorbarzin/chrome-service-novnc` |
+| android-emulator | `build-android-emulator.yml` | public `ghcr.io/viktorbarzin/android-emulator` |
+| infra CLI | `build-cli.yml` | DockerHub `viktorbarzin/infra` (kept) + `ghcr.io/viktorbarzin/infra-cli` |
+| infra-ci | `build-infra-ci.yml` | private `ghcr.io/viktorbarzin/infra-ci` |
 
-| Repo | ID |
-|------|------|
-| infra | 1 |
-| Website | 2 |
-| finance | 3 |
-| health | 4 |
-| travel_blog | 5 |
-| webhook-handler | 6 |
-| audiblez-web | 9 |
-| plotting-book | 43 |
-| claude-memory-mcp | 78 |
-| infra-onboarding | 79 |
+**`infra-ci`** is the image the `.woodpecker/default.yml` apply step and
+`drift-detection.yml` run in (proven by pipelines 165/166). `chatterbox-tts` is
+already built by tripit's GHA → ghcr.
 
-### Image Registry Flow
+The Woodpecker `build-ci-image.yml` and `build-cli.yml` pipelines were
+**REMOVED**. Break-glass for infra-ci is now a manual
+`.woodpecker/breakglass-infra-ci.yml` (ghcr pull-and-save to the registry VM).
 
-1. **Containerd hosts.toml** redirects pulls from docker.io and ghcr.io to pull-through cache at `10.0.20.10`
-2. **Pull-through cache** serves cached images from LAN, fetches from upstream on cache miss
-3. **Kyverno ClusterPolicy** auto-syncs `registry-credentials` Secret to all namespaces for private registry access
-4. **Private registry** has been Forgejo's built-in OCI registry at `forgejo.viktorbarzin.me/viktor/<image>` since 2026-05-07. Auth via PAT (Vault `secret/ci/global/forgejo_push_token` for push, `secret/viktor/forgejo_pull_token` for pull). The pre-migration `registry:2.8.3`-based private registry on `registry.viktorbarzin.me:5050` was the root cause of three orphan-index incidents in three weeks (2026-04-13, 2026-04-19, 2026-05-04 — see `docs/post-mortems/2026-04-19-registry-orphan-index.md` and the full migration writeup at `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md`). The five pull-through caches on `10.0.20.10` (ports 5000/5010/5020/5030/5040) stay in place for upstream registries.
-5. **Integrity probe** (`registry-integrity-probe` CronJob in `monitoring` ns, every 15m) walks `/v2/_catalog` → tags → indexes → child manifests via HEAD and pushes `registry_manifest_integrity_failures` to Pushgateway; alerts `RegistryManifestIntegrityFailure` / `RegistryIntegrityProbeStale` / `RegistryCatalogInaccessible` page on broken state. Authoritative check (HTTP API, not filesystem).
+### Forgejo container registry — FROZEN
 
-### Infra Pipelines (Woodpecker-only)
+Issue #32 wiped all `viktor/*` container packages (~19G reclaimed, `/data`
+58%→20%). The registry is **break-glass-only** now; nothing pushes to it. The
+`forgejo-cleanup` CronJob stays in `DRY_RUN` (nothing to clean). Pull-through
+caches on the registry VM (`10.0.20.10`) are unchanged. See
+`docs/runbooks/forgejo-registry-breakglass.md`.
+
+### Image registry / pull path
+
+1. **Containerd `hosts.toml`** redirects pulls from docker.io and ghcr.io to the
+   pull-through cache at `10.0.20.10` (5000 = docker.io, 5010 = ghcr.io).
+2. **Pull-through cache** serves cached images from the LAN, fetches upstream on
+   a miss.
+3. **Kyverno ClusterPolicies** sync `ghcr-credentials` (private-ghcr allowlist)
+   and `registry-credentials` to namespaces.
+
+## Woodpecker — what it still runs
+
+Woodpecker is **deploy + cluster-touching steps only**:
 
 | Pipeline | File | Purpose |
 |----------|------|---------|
-| default | `.woodpecker/default.yml` | Terragrunt apply on push |
-| renew-tls | `.woodpecker/renew-tls.yml` | Certbot renewal cron |
-| build-cli | `.woodpecker/build-cli.yml` | Build and push to dual registries |
-| build-ci-image | `.woodpecker/build-ci-image.yml` | Build `infra-ci` tooling image (triggered by `ci/Dockerfile` change or manual); post-push HEADs every blob via `verify-integrity` step to catch orphan-index pushes |
-| k8s-portal | `.woodpecker/k8s-portal.yml` | Path-filtered build for k8s-portal subdirectory |
-| registry-config-sync | `.woodpecker/registry-config-sync.yml` | SCP `modules/docker-registry/*` to `/opt/registry/` on `10.0.20.10` when any managed file changes; bounces containers + nginx per `docs/runbooks/registry-vm.md` |
-| pve-nfs-exports-sync | `.woodpecker/pve-nfs-exports-sync.yml` | Sync `scripts/pve-nfs-exports` → `/etc/exports` on PVE host |
-| postmortem-todos | `.woodpecker/postmortem-todos.yml` | Auto-resolve safe TODOs from new `docs/post-mortems/*.md` via headless Claude agent |
-| drift-detection | `.woodpecker/drift-detection.yml` | Nightly Terraform drift detection |
-| issue-automation | `.woodpecker/issue-automation.yml` | Triage + respond to `ViktorBarzin/infra` GitHub issues |
+| per-app deploy | `.woodpecker/deploy.yml` (each repo) | `kubectl set image` + Slack notify (event: **manual**) |
+| terragrunt apply | `.woodpecker/default.yml` | Changed-stacks apply on push to master (runs in `infra-ci`) |
+| certbot | `.woodpecker/renew-tls.yml` | TLS renewal cron |
+| drift-detection | `.woodpecker/drift-detection.yml` | Nightly Terraform drift (runs in `infra-ci`) |
 | provision-user | `.woodpecker/provision-user.yml` | Add namespace-owner user from Vault spec |
+| registry-config-sync | `.woodpecker/registry-config-sync.yml` | SCP `modules/docker-registry/*` → `10.0.20.10` on change |
+| pve-nfs-exports-sync | `.woodpecker/pve-nfs-exports-sync.yml` | Sync `scripts/pve-nfs-exports` → `/etc/exports` on PVE |
+| issue-automation | `.woodpecker/issue-automation.yml` | Triage + respond to `ViktorBarzin/infra` GitHub issues |
+| postmortem-todos | `.woodpecker/postmortem-todos.yml` | Auto-resolve safe TODOs from new post-mortems |
+| k8s-portal | `.woodpecker/k8s-portal.yml` | Path-filtered deploy for the portal |
+| breakglass-infra-ci | `.woodpecker/breakglass-infra-ci.yml` | **Manual** ghcr pull-and-save of infra-ci to the registry VM |
+
+**No build/test pipeline exists on any repo.** Do not (re)introduce one.
+
+### Woodpecker API
+
+Uses **numeric repo IDs** (`/api/repos/<id>/pipelines`), NOT owner/name paths
+(those return HTML). The deploy registration for each app is the **GitHub
+mirror** repo (registered github-forge). IDs are stable across renames and must
+be looked up from the Woodpecker UI/DB.
+
+### Woodpecker YAML gotchas
+
+- Commands with `${VAR}:${VAR}` must be **quoted** — an unquoted `:` triggers
+  YAML map parsing when the vars are empty.
+- Use `bitnami/kubectl:latest` (not pinned versions — entrypoint compatibility).
+- Global secrets must include `manual` in their events list for API-triggered
+  pipelines.
+
+### GitHub repo secrets
+
+Per repo: `WOODPECKER_TOKEN` (POST the deploy pipeline), `FORGEJO_GIT_TOKEN`
+(write:repository PAT for the `svu` tag push). ghcr push uses the workflow's
+built-in `GITHUB_TOKEN` (`packages: write`).
+
+## Infra repo CI topology
+
+The infra repo runs on Woodpecker via **two** forge registrations: the Forgejo
+forge (repo id 82, registered 2026-06-08) and the legacy GitHub forge (repo id
+1). Pushes to **Forgejo** `master` fire `.woodpecker/default.yml`
+(changed-stacks terragrunt apply, in `infra-ci`) plus the `notify-nonadmin-push`
+Slack audit step. Operational facts (2026-06-10):
+
+- **Webhook URL is the IN-CLUSTER service**:
+  `http://woodpecker-server.woodpecker.svc.cluster.local/api/hook?...` (PATCHed
+  via the Forgejo API). The Woodpecker default (`https://ci.viktorbarzin.me/...`)
+  resolves to the non-proxied public A record from pods → NAT hairpin →
+  intermittent `context deadline exceeded`, silently dropping push events. If
+  Woodpecker "repairs" the repo it rewrites the hook back to `ci.viktorbarzin.me`
+  — re-apply the in-cluster URL.
+- **Repo-scoped secrets must exist on BOTH repos**: pipelines reference
+  repo-level secrets (`registry_ssh_key`, `pve_ssh_key`, `CLOUDFLARE_TOKEN`, …).
+  When registering a new forge repo for infra, clone the secret set too.
+- **Empty commits defeat path filters**: a commit with no changed files makes
+  Woodpecker include ALL workflow files (path conditions can't exclude), so every
+  repo secret must resolve. Normal commits with real files only compile the
+  matching workflows.
+
+The Forgejo trigger is not fully dependable — land infra changes by pushing
+Forgejo master (as viktor), use `[ci skip]` for docs/no-op commits, and verify
+deploys via `scripts/tg` + live cluster state rather than trusting the CI
+checkmark. The two remotes have **diverged** (parallel histories under
+different SHAs); expect github pushes to reject non-fast-forward and leave them
+— never force-push.
 
 ## Configuration
 
-### GitHub Actions
-
-**File**: `.github/workflows/build-and-deploy.yml`
+### GitHub Actions (per-app `.github/workflows/build.yml`)
 
 ```yaml
-name: Build and Deploy
+name: build
 on:
   push:
-    branches: [main, master]
+    branches: [master]
 jobs:
   build:
     runs-on: ubuntu-latest
+    permissions:
+      contents: write   # svu tag push
+      packages: write    # ghcr push
     steps:
-      - name: Build Docker image
-        run: docker build --platform linux/amd64 -t viktorbarzin/app:${SHORT_SHA} .
-      - name: Push to DockerHub
-        run: docker push viktorbarzin/app:${SHORT_SHA}
-      - name: Trigger Woodpecker Deploy
+      - uses: actions/checkout@v4
+      - name: lint + test
+        run: make lint test
+      - name: svu tag -> Forgejo
         run: |
-          curl -X POST https://ci.viktorbarzin.me/api/repos/<REPO_ID>/pipelines \
-            -H "Authorization: Bearer ${{ secrets.WOODPECKER_TOKEN }}"
+          VERSION=$(svu next)
+          # ... push tag to canonical Forgejo with FORGEJO_GIT_TOKEN
+      - uses: docker/setup-buildx-action@v3
+      - uses: docker/build-push-action@v6
+        with:
+          platforms: linux/amd64
+          provenance: false
+          push: true
+          tags: |
+            ghcr.io/viktorbarzin/<name>:${{ github.sha }}
+            ghcr.io/viktorbarzin/<name>:latest
+  deploy:
+    needs: build
+    runs-on: ubuntu-latest
+    steps:
+      - name: Trigger Woodpecker deploy
+        run: |
+          curl -X POST https://ci.viktorbarzin.me/api/repos/<DEPLOY_REPO_ID>/pipelines \
+            -H "Authorization: Bearer ${{ secrets.WOODPECKER_TOKEN }}" \
+            -d '{"branch":"master","variables":{"IMAGE_TAG":"...","IMAGE_NAME":"..."}}'
 ```
 
-**Required GitHub Secrets**:
-- `DOCKERHUB_USERNAME`
-- `DOCKERHUB_TOKEN`
-- `WOODPECKER_TOKEN`
-
-### Woodpecker Deploy Pipeline
-
-**File**: `.woodpecker/deploy.yml`
+### Woodpecker deploy pipeline (per-app `.woodpecker/deploy.yml`)
 
 ```yaml
 when:
-  event: [deployment]
+  event: manual
 
 steps:
   deploy:
-    image: bitnami/kubectl:latest
+    image: bitnami/kubectl:latest   # uses the in-cluster woodpecker-agent SA (cluster-admin)
     commands:
-      - kubectl set image deployment/app app=viktorbarzin/app:${CI_COMMIT_SHA:0:8}
-    secrets: [k8s_token]
-
+      - "kubectl set image deployment/app app=${IMAGE_NAME}:${IMAGE_TAG} -n <ns>"
+      - "kubectl rollout status deployment/app -n <ns> --timeout=300s"
   notify:
     image: plugins/slack
-    settings:
-      webhook: ${SLACK_WEBHOOK}
     when:
       status: [success, failure]
 ```
 
-**YAML Gotchas**:
-- Commands with `${VAR}:${VAR}` syntax must be quoted to prevent YAML map parsing when vars are empty
-- Use `bitnami/kubectl:latest` (not pinned versions)
-- Global secrets must be manually added to `secrets:` list in pipeline
+### CI/CD secrets sync
 
-### Vault Configuration
-
-**K8s Auth for Woodpecker**:
-- Woodpecker pipelines authenticate using ServiceAccount JWT
-- Vault K8s auth mount validates JWT and issues token
-- Policies grant access to secrets and dynamic credentials
-
-### CI/CD Secrets Sync
-
-**CronJob**: Pushes `secret/ci/global` from Vault → Woodpecker API every 6 hours
-- Keeps Woodpecker global secrets in sync with Vault
-- Runs in `woodpecker` namespace
-
-## Infra repo CI (Woodpecker repo 82 — Forgejo forge)
-
-The infra repo itself runs on Woodpecker via the **Forgejo** forge (repo id 82,
-registered 2026-06-08; the GitHub-side repo id 1 also remains registered).
-Pushes to `master` fire `.woodpecker/default.yml` (changed-stacks terragrunt
-apply) plus the `notify-nonadmin-push` Slack audit step (allow-then-audit
-contribution model — see `multi-tenancy.md`). Operational facts (2026-06-10):
-
-- **Webhook URL is the IN-CLUSTER service**: `http://woodpecker-server.woodpecker.svc.cluster.local/api/hook?...`
-  (PATCHed via the Forgejo API). The Woodpecker-generated default
-  (`https://ci.viktorbarzin.me/...`) resolves to the non-proxied public A
-  record from pods → NAT hairpin → intermittent `context deadline exceeded`,
-  silently dropping push events (found when a push produced no pipeline).
-  If Woodpecker ever "repairs" the repo it will rewrite the hook back to
-  `ci.viktorbarzin.me` — re-apply the in-cluster URL (or pin `ci.viktorbarzin.me`
-  in the CoreDNS pod carve-out alongside forgejo).
-- **Repo-scoped secrets must exist on BOTH repos**: pipelines reference
-  repo-level secrets (`registry_ssh_key`, `pve_ssh_key`, `CLOUDFLARE_TOKEN`,
-  …). Repo 82 was registered without them and every all-workflow compile
-  errored with `secret "registry_ssh_key" not found`. Fixed by cloning repo-1
-  rows to repo 82 in the Woodpecker DB (`insert into secrets … select … where
-  repo_id=1`). When registering a new forge repo for infra, clone the secret
-  set too.
-- **Empty commits defeat path filters**: a commit with no changed files makes
-  Woodpecker include ALL workflow files (path conditions can't exclude), so
-  every repo secret must resolve. Normal commits with real files only compile
-  the matching workflows.
+A CronJob in the `woodpecker` namespace pushes `secret/ci/global` from Vault →
+the Woodpecker API every 6h, keeping global secrets in sync. Woodpecker deploy
+pipelines authenticate to the cluster via the in-cluster `woodpecker-agent` SA
+(cluster-admin); Vault K8s auth backs any secret reads.
 
 ## Decisions & Rationale
 
-### Why GitHub Actions + Woodpecker?
+### Why all builds off-infra (ADR-0002)?
 
-**Alternatives considered**:
-1. **Woodpecker-only**: Simple, but wastes cluster resources on builds
-2. **GHA-only**: No cluster access, requires kubectl from outside (security risk)
-3. **Hybrid (chosen)**: GHA for compute-heavy builds (free), Woodpecker for privileged deployments (secure cluster access)
+- **Breaks the circular dependency** — the images needed to repair the cluster
+  no longer live inside it (they're on ghcr, an external registry).
+- **Removes build IO + registry push load** from the contended homelab spindle.
+- GHA is free on public repos and generous on private; buildx provenance:false
+  sidesteps the orphaned-index-children failure class that plagued the
+  in-cluster registry.
+- **Clean cut** — no in-cluster fallback builds anywhere; one pattern,
+  fleet-wide.
 
-**Benefits**:
-- Free compute for builds on public repos
-- Cluster access stays internal (Woodpecker has direct K8s access)
-- Separation of concerns: build vs deploy
+### Why ghcr (not push back to Forgejo)?
 
-### Why 8-Character SHA Tags (Not :latest)?
+Forgejo's container registry repeatedly orphaned OCI index children
+(2026-04-13/19, 2026-05-04, 2026-06-10) and its retention is not container-aware.
+ghcr is external (DR-safe), free for this scale, and has native multi-arch
+handling. The Forgejo registry was frozen + emptied (issue #32).
 
-- Pull-through cache serves stale `:latest` tags indefinitely
-- SHA tags ensure every deployment pulls the correct image
-- 8 characters provide sufficient collision resistance (16^8 = 4.3 billion combinations)
+### Why Woodpecker stays for deploy?
 
-### Why Numeric Repo IDs for Woodpecker API?
+`kubectl set image` needs in-cluster privileged access; doing it from GHA would
+mean exposing kube-apiserver or a long-lived kubeconfig. Woodpecker's
+`woodpecker-agent` SA is already cluster-admin in-cluster — the deploy step
+needs no credentials.
 
-- Woodpecker API requires numeric IDs (not owner/name slugs)
-- IDs are stable across repo renames
-- Must be manually looked up from Woodpecker UI or database
+### Why `event: manual` on deploy.yml?
 
-### Why linux/amd64 Only?
+The Forgejo→GitHub push-mirror sends raw, tag-less pushes to the GitHub mirror.
+If `deploy.yml` fired on `push`, every mirror sync would trigger a deploy with no
+image tag. `manual` means only the GHA `deploy` job's explicit API POST (with
+`IMAGE_TAG`) deploys.
 
-- Cluster runs on x86_64 nodes only
-- ARM builds would waste time and storage
-- Multi-arch images add complexity without benefit
+### Why linux/amd64 only?
+
+The cluster runs on x86_64 nodes only; ARM builds waste time and storage.
 
 ## Troubleshooting
 
-### GHA Build Fails: "denied: requested access to the resource is denied"
+### GHA build fails: ghcr push "denied"
 
-**Cause**: DockerHub credentials expired or incorrect
+The workflow `GITHUB_TOKEN` needs `packages: write` permission and the package
+must allow the repo to push. Check the workflow `permissions:` block and the
+package's "Manage Actions access" settings.
+
+### Image pull fails: "ErrImagePull" / "ImagePullBackOff"
 
-**Fix**:
 ```bash
-# Regenerate DockerHub token
-# Update GitHub repo secrets: DOCKERHUB_USERNAME, DOCKERHUB_TOKEN
+# Public image — check the pull-through cache is up
+curl http://10.0.20.10:5010/v2/_catalog
+
+# Private image — verify the ghcr-credentials Secret exists in the namespace
+kubectl get secret ghcr-credentials -n <namespace>
+# It's Kyverno-synced to an allowlist; if missing, the namespace isn't on the
+# allowlist in stacks/kyverno/modules/kyverno/ghcr-credentials.tf
 ```
 
-### Woodpecker Deploy Fails: "Unauthorized"
+If the cause is the internal-DNS hairpin (fresh pulls timing out on the public
+Forgejo path), see the CoreDNS `viktorbarzin.me` carve-out in
+`docs/architecture/networking.md` and `docs/runbooks/registry-vm.md`.
 
-**Cause**: Vault K8s auth token expired or invalid
+### Deploy didn't happen after a push
 
-**Fix**:
-```bash
-# Restart Woodpecker pipeline (token auto-renewed)
-# Check Vault K8s auth role exists: vault read auth/kubernetes/role/woodpecker-deployer
-```
+Confirm the push was to **master** (feature branches build/deploy nothing).
+Check the GHA run completed the `deploy` job, then check Woodpecker received the
+manual pipeline (`ci.viktorbarzin.me`, the GitHub-mirror deploy repo). Verify
+live with `kubectl rollout status` — not the CI checkmark.
 
-### Image Pull Fails: "ErrImagePull"
+### Woodpecker deploy fails: "YAML: did not find expected key"
 
-**Cause**: Pull-through cache or registry credentials issue
-
-**Fix**:
-```bash
-# Check pull-through cache is running
-curl http://10.0.20.10:5000/v2/_catalog
-
-# Verify registry-credentials Secret exists in namespace
-kubectl get secret registry-credentials -n <namespace>
-
-# Manually sync credentials if missing
-kubectl get secret registry-credentials -n default -o yaml | \
-  sed 's/namespace: default/namespace: <namespace>/' | kubectl apply -f -
-```
-
-### Woodpecker Pipeline: "YAML: did not find expected key"
-
-**Cause**: Unquoted command with `${VAR}:${VAR}` syntax when VAR is empty
-
-**Fix**: Quote the command:
-```yaml
-commands:
-  - "kubectl set image deployment/app app=viktorbarzin/app:${SHORT_SHA}"
-```
-
-### travel_blog Build Times Out on GHA
-
-**Cause**: 5.7GB content directory exceeds GHA disk/time limits
-
-**Fix**: Keep on Woodpecker (no migration). Build uses cluster storage and resources.
-
-### CI/CD Secrets Out of Sync
-
-**Cause**: CronJob failed to sync Vault → Woodpecker
-
-**Fix**:
-```bash
-# Check CronJob status
-kubectl get cronjob -n woodpecker
-
-# Manually trigger sync
-kubectl create job --from=cronjob/sync-secrets manual-sync -n woodpecker
-```
+Unquoted command with `${VAR}:${VAR}` syntax when a VAR is empty. Quote the
+command (see the deploy.yml example above).
 
 ## Related
 
-- [Databases Architecture](./databases.md) — Database credentials via Vault
-- [Multi-Tenancy](./multi-tenancy.md) — Per-user Woodpecker access
-- Runbook: `../runbooks/deploy-new-app.md` — How to set up CI/CD for a new app
-- Runbook: `../runbooks/troubleshoot-image-pull.md` — Debug image pull issues
-- Vault documentation: K8s auth configuration
-- Woodpecker documentation: API reference
+- ADR: `../adr/0002-all-image-builds-off-infra-gha-ghcr.md` — the decision
+- [Databases Architecture](./databases.md) — database credentials via Vault
+- [Multi-Tenancy](./multi-tenancy.md) — per-user Woodpecker access
+- Runbook: `../runbooks/forgejo-registry-breakglass.md` — using the frozen registry
+- Runbook: `../runbooks/registry-vm.md` — pull-through cache VM + image-pull debugging
+- Onboarding tool: `../../scripts/offinfra-onboard` + `../../scripts/offinfra-templates/`
diff --git a/stacks/android-emulator/variables.tf b/stacks/android-emulator/variables.tf
index bcc24a0d..822b7527 100644
--- a/stacks/android-emulator/variables.tf
+++ b/stacks/android-emulator/variables.tf
@@ -6,5 +6,5 @@ variable "tls_secret_name" {
 variable "image_tag" {
   type        = string
   default     = "latest"
-  description = "android-emulator image tag at forgejo.viktorbarzin.me/viktor/android-emulator. Built by GHA (.github/workflows/build-android-emulator.yml) -> ghcr.io/viktorbarzin/android-emulator on changes to stacks/android-emulator/docker/ (ADR-0002). :latest tracks the newest build."
+  description = "android-emulator image tag at ghcr.io/viktorbarzin/android-emulator. Built by GHA (.github/workflows/build-android-emulator.yml) on changes to stacks/android-emulator/docker/ (ADR-0002). :latest tracks the newest build."
 }
diff --git a/stacks/terminal/main.tf b/stacks/terminal/main.tf
index c2f3f50b..3737817d 100644
--- a/stacks/terminal/main.tf
+++ b/stacks/terminal/main.tf
@@ -225,8 +225,11 @@ module "ingress_ro" {
 #   https://forgejo.viktorbarzin.me/viktor/terminal-lobby
 #
 # That repo's ./scripts/deploy.sh ships everything to wizard@10.0.10.10
-# and restarts ttyd / ttyd-ro / tmux-api / clipboard-upload. This stack
-# only owns the Kubernetes side: Services, Endpoints pointing at
+# and restarts ttyd / ttyd-ro / tmux-api / clipboard-upload. Deploy is
+# MANUAL via that script — there is no CI pipeline (the lobby's
+# .woodpecker.yml was removed under ADR-0002, issue #31; it builds no
+# image, so it is not part of the GHA->ghcr fleet). This stack only owns
+# the Kubernetes side: Services, Endpoints pointing at
 # 10.0.10.10:{7681,7682,7683,7684}, the IngressRoutes, and the Traefik
 # middlewares that gate everything behind Authentik forward-auth.
 #
diff --git a/stacks/tuya-bridge/variables.tf b/stacks/tuya-bridge/variables.tf
index 5c2be4d3..58e0a005 100644
--- a/stacks/tuya-bridge/variables.tf
+++ b/stacks/tuya-bridge/variables.tf
@@ -6,5 +6,5 @@ variable "tls_secret_name" {
 variable "image_tag" {
   type        = string
   default     = "latest"
-  description = "tuya_bridge image tag pushed to forgejo.viktorbarzin.me/viktor/tuya_bridge. Each Woodpecker run does `kubectl set image` to the 8-char git SHA; this variable is only used on initial create / TF recreate (image is in lifecycle.ignore_changes)."
+  description = "tuya_bridge image tag at ghcr.io/viktorbarzin/tuya_bridge (built by GHA, ADR-0002). The GHA deploy job drives a Woodpecker `kubectl set image` to the 8-char git SHA; this variable is only used on initial create / TF recreate (image is in lifecycle.ignore_changes)."
 }