docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip]
ADR-0002 is fully landed (issues #11-#32 closed): every owned image now builds on GitHub Actions and pushes to ghcr.io/viktorbarzin/<name>, with Woodpecker reduced to deploy-only. The Forgejo container registry is frozen and emptied; there are no in-cluster image builds or CI test runs anywhere. The docs still described the old hybrid topology (DockerHub builds, Woodpecker-native owned-app builds, the per-pattern migration lists, the tripit-only pilot framing), which would mislead future sessions and incident response. This brings the docs to the completed reality (closes #33): - docs/architecture/ci-cd.md: full rewrite as the canonical CI/CD reference — the fleet GHA->ghcr->Woodpecker-deploy pattern, public/private ghcr package split, infra-owned image workflows (incl. infra-ci on ghcr), the frozen Forgejo registry, what Woodpecker still runs, and the #31 decommissions. - .claude/CLAUDE.md: rewrite the "CI/CD Architecture" section to the fleet-wide final state; FIX the stale claim that claude-memory-mcp builds to DockerHub (it is GHA->ghcr); note owned images now live on ghcr and the Forgejo registry is frozen/break-glass near the image-registry bullet. - .claude/reference/service-catalog.md: f1-stream is GHA->ghcr + Woodpecker deploy-only (was "Woodpecker-native build->deploy"). - stacks/{tuya-bridge,android-emulator}/variables.tf + stacks/terminal/main.tf: cosmetic description/comment updates (forgejo -> ghcr; terminal-lobby has no CI pipeline). Description/comment text only — no stack logic changed. Historical records (docs/post-mortems/*, docs/plans/*) and ADR-0002 itself are left untouched as point-in-time records. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
6e4db0ddc6
commit
3e82c64a76
6 changed files with 379 additions and 295 deletions
File diff suppressed because one or more lines are too long
|
|
@ -47,7 +47,7 @@
|
|||
| nextcloud | File sync/share | nextcloud |
|
||||
| calibre | E-book management (may be merged into ebooks stack) | calibre |
|
||||
| onlyoffice | Document editing | onlyoffice |
|
||||
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier); source in own repo `viktor/f1-stream` (Forgejo, extracted 2026-06-05), Woodpecker-native build->deploy (repo id 166) | f1-stream |
|
||||
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier); canonical source in own repo `viktor/f1-stream` (Forgejo, extracted 2026-06-05); GHA-built → `ghcr.io/viktorbarzin/f1-stream` (private), Woodpecker deploy-only (ADR-0002) | f1-stream |
|
||||
| chrome-service | Headed Chromium over CDP (`http://chrome-service.chrome-service.svc:9222`, `connect_over_cdp`; legacy `:3000/<token>` WS pool removed 2026-06-04) for sibling services driving anti-bot pages — snapshot-harvester CronJob + tripit fare scrape | chrome-service |
|
||||
| rybbit | Analytics | rybbit |
|
||||
| isponsorblocktv | SponsorBlock for TV | isponsorblocktv |
|
||||
|
|
|
|||
|
|
@ -2,334 +2,374 @@
|
|||
|
||||
## Overview
|
||||
|
||||
The CI/CD pipeline uses a hybrid approach: GitHub Actions for building Docker images (providing free compute for public repos) and Woodpecker CI for deployments (leveraging cluster-internal access). Git pushes trigger GHA builds that produce Docker images with 8-character SHA tags, push to DockerHub, then POST to Woodpecker's API to trigger deployments that update Kubernetes workloads via `kubectl set image`.
|
||||
**Doctrine (ADR-0002): all image builds and CI compute run OFF-infra.** Every
|
||||
owned image is built, tested, and linted on **GitHub Actions** (free on public
|
||||
repos; 2000 free min/mo on private) and pushed to **`ghcr.io/viktorbarzin/<name>`**.
|
||||
Woodpecker is **deploy-only** — a GHA job POSTs its API with the freshly-built
|
||||
image tag and Woodpecker runs `kubectl set image` from inside the cluster.
|
||||
There are **no in-cluster image builds or CI test runs anywhere** — the
|
||||
in-cluster Woodpecker buildkit and the fallback-build pattern were removed as a
|
||||
clean cut (ADR-0002, 2026-06-13). The Forgejo container registry is **frozen
|
||||
and emptied** — break-glass only.
|
||||
|
||||
This breaks the old circular dependency (images needed to repair the cluster
|
||||
used to be built and stored *inside* it) and keeps build IO + registry pushes
|
||||
off the homelab spindle.
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
graph LR
|
||||
A[Git Push] --> B[GitHub Actions]
|
||||
B --> C[Build Docker Image<br/>linux/amd64, 8-char SHA tag]
|
||||
C --> D[Push to DockerHub]
|
||||
D --> E[POST Woodpecker API]
|
||||
E --> F[Woodpecker Pipeline]
|
||||
F --> G[Vault K8s Auth<br/>SA JWT]
|
||||
G --> H[kubectl set image]
|
||||
H --> I[K8s Deployment]
|
||||
I --> J[Pull from DockerHub<br/>or Pull-Through Cache]
|
||||
A[git push Forgejo<br/>viktor/<repo> canonical] --> B[push-mirror sync_on_commit]
|
||||
B --> C[GitHub mirror<br/>ViktorBarzin/<repo>]
|
||||
C --> D[GitHub Actions<br/>.github/workflows/build.yml]
|
||||
D --> E[lint / test]
|
||||
E --> F[buildx linux/amd64<br/>provenance:false]
|
||||
F --> G[push ghcr.io/viktorbarzin/<name><br/>:sha8 + :latest]
|
||||
G --> H[svu tag -> Forgejo canonical]
|
||||
G --> I[POST Woodpecker deploy repo]
|
||||
I --> J[.woodpecker/deploy.yml<br/>event: manual]
|
||||
J --> K[kubectl set image<br/>in-cluster SA cluster-admin]
|
||||
K --> L[K8s Deployment<br/>pulls from ghcr]
|
||||
|
||||
K[Pull-Through Cache<br/>10.0.20.10] -.-> J
|
||||
L[forgejo.viktorbarzin.me<br/>Private Registry on Forgejo] -.-> J
|
||||
|
||||
style B fill:#2088ff
|
||||
style F fill:#4c9e47
|
||||
style K fill:#f39c12
|
||||
style D fill:#2088ff
|
||||
style J fill:#4c9e47
|
||||
style G fill:#f39c12
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
| Component | Version | Location | Purpose |
|
||||
|-----------|---------|----------|---------|
|
||||
| GitHub Actions | Cloud | `.github/workflows/build-and-deploy.yml` | Build Docker images, push to DockerHub |
|
||||
| Woodpecker CI | Self-hosted | `ci.viktorbarzin.me` | Deploy to Kubernetes cluster |
|
||||
| DockerHub | Cloud | `viktorbarzin/*` | Public image registry |
|
||||
| Private Registry | Forgejo Packages | `forgejo.viktorbarzin.me/viktor` | Private container images (PAT auth, retention CronJob) — migrated from registry.viktorbarzin.me 2026-05-07 |
|
||||
| Pull-Through Cache | Custom | `10.0.20.10:5000` (docker.io)<br/>`10.0.20.10:5010` (ghcr.io) | LAN cache for remote registries |
|
||||
| Kyverno | Cluster | `kyverno` namespace | Auto-sync registry credentials to all namespaces |
|
||||
| Vault | Cluster | `vault.viktorbarzin.me` | K8s auth for Woodpecker pipelines |
|
||||
| Component | Location | Purpose |
|
||||
|-----------|----------|---------|
|
||||
| GitHub Actions | `.github/workflows/build.yml` (per repo) | Build + lint + test + push image; trigger deploy; cut semver tag |
|
||||
| ghcr.io | `ghcr.io/viktorbarzin/*` | Container registry for ALL owned images (public + private packages) |
|
||||
| Woodpecker CI | `ci.viktorbarzin.me` | **Deploy-only** — `kubectl set image` in-cluster; plus infra applies + maintenance crons |
|
||||
| Forgejo | `forgejo.viktorbarzin.me/viktor/<repo>` | **Canonical** git source (push-mirrors to GitHub). Container registry **FROZEN** (break-glass only) |
|
||||
| Pull-Through Cache | `10.0.20.10:5000/5010/5020/5030/5040` | LAN cache for upstream registries (DockerHub, ghcr, Quay, k8s.gcr, Kyverno) |
|
||||
| Kyverno | `kyverno` namespace | Syncs `ghcr-credentials` (private-ghcr allowlist) + `registry-credentials` to namespaces |
|
||||
| Vault | `vault.viktorbarzin.me` | K8s auth for Woodpecker deploy pipelines; CI tokens in `secret/ci/global` + `secret/viktor` |
|
||||
|
||||
## How It Works
|
||||
|
||||
### Build Flow (GitHub Actions)
|
||||
### The fleet pattern (every owned app)
|
||||
|
||||
1. **Trigger**: Git push to main/master branch
|
||||
2. **Build**: GHA builds Docker image for `linux/amd64` platform only
|
||||
3. **Tag**: Image tagged with 8-character commit SHA (e.g., `viktorbarzin/app:a1b2c3d4`)
|
||||
- `:latest` tags are **never used** to prevent stale pull-through cache issues
|
||||
4. **Push**: Image pushed to DockerHub public registry
|
||||
5. **Trigger Deploy**: POST request to Woodpecker API with repo ID and commit SHA
|
||||
1. **Canonical source = Forgejo** `viktor/<repo>`. A **push-mirror**
|
||||
(`sync_on_commit`) pushes every commit to the GitHub mirror
|
||||
`ViktorBarzin/<repo>`. The `.github/workflows/build.yml` is committed on
|
||||
Forgejo and mirrors over.
|
||||
2. **GHA `build` job** (triggers `on: push: branches: [master]` ONLY — feature
|
||||
branches mirror but build/deploy nothing, the safety valve):
|
||||
- lint + test
|
||||
- `svu` computes the next `vX.Y.Z` from conventional commits and pushes the
|
||||
tag back to **canonical Forgejo** (GHA secret `FORGEJO_GIT_TOKEN` =
|
||||
write:repository PAT); `VERSION` is baked into the image
|
||||
- `docker buildx` `linux/amd64`, **`provenance: false`** (single-manifest —
|
||||
avoids the orphaned-index-children failure class), push
|
||||
`ghcr.io/viktorbarzin/<name>:<sha8>` + `:latest`
|
||||
- `delete-package-versions` keeps the newest ~10 ghcr versions
|
||||
3. **GHA `deploy` job** POSTs `ci.viktorbarzin.me/api/repos/<id>/pipelines`
|
||||
(the Woodpecker registration for the **GitHub mirror**, github-forge; GHA
|
||||
secret `WOODPECKER_TOKEN`) with `IMAGE_TAG` + `IMAGE_NAME`.
|
||||
4. **`.woodpecker/deploy.yml`** (event: **manual** only, so the raw
|
||||
Forgejo→GitHub mirror pushes don't fire a tag-less deploy) runs `kubectl set
|
||||
image deployment/<app> <container>=<image>` in-cluster. The `woodpecker-agent`
|
||||
SA is `cluster-admin`, so the `bitnami/kubectl` step needs no
|
||||
kubeconfig/RBAC. The Deployment image is in `lifecycle.ignore_changes`
|
||||
(`KEEL_IGNORE_IMAGE`) so the SHA tag sticks and `terragrunt apply` doesn't
|
||||
fight it. CronJobs in owned apps track `:latest` + `imagePullPolicy: Always`
|
||||
instead of a deploy step.
|
||||
|
||||
### Deploy Flow (Woodpecker CI)
|
||||
**Keel stays enrolled** as a redundant net (finds the deployed SHA already
|
||||
running → no-op).
|
||||
|
||||
1. **Receive Webhook**: Woodpecker API receives deployment trigger from GHA
|
||||
2. **Authenticate**: Pipeline uses Kubernetes ServiceAccount JWT to authenticate with Vault via K8s auth
|
||||
3. **Deploy**: `kubectl set image deployment/<name> <container>=viktorbarzin/<app>:<sha>`
|
||||
4. **Notify**: Slack notification on success/failure
|
||||
**Tooling**: `infra/scripts/offinfra-onboard` + `infra/scripts/offinfra-templates/`
|
||||
scaffold a repo onto this pattern (mirror, workflow, Woodpecker deploy repo,
|
||||
old-pipeline removal, default-branch flip). Mirror + workflow commits go via
|
||||
the Forgejo API over the internal Traefik LB
|
||||
(`curl --resolve forgejo.viktorbarzin.me:443:10.0.20.203`) since the devvm
|
||||
can't reach Forgejo's public hairpin.
|
||||
|
||||
### Project Migration Status
|
||||
### ghcr package visibility
|
||||
|
||||
**Migrated to GHA (8 projects)**:
|
||||
- Website
|
||||
- k8s-portal
|
||||
- claude-memory-mcp
|
||||
- apple-health-data
|
||||
- audiblez-web
|
||||
- plotting-book
|
||||
- insta2spotify
|
||||
- book-search (audiobook-search)
|
||||
| Visibility | Packages | Pull mechanism |
|
||||
|------------|----------|----------------|
|
||||
| **Public** | beadboard, nextcloud-todos, claude-agent-service, claude-memory-mcp, kms-website, freedify, tuya_bridge, x402-gateway, chrome-service-novnc, android-emulator | Anonymous |
|
||||
| **Private** | f1-stream, job-hunter, instagram-poster, payslip-ingest, wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli, infra-ci | `ghcr-credentials` dockerconfigjson |
|
||||
|
||||
**Woodpecker-native owned-app builds** (build + push to the Forgejo private
|
||||
registry + `kubectl set image` rollout, all in one `.woodpecker.yml`; Keel
|
||||
stays enrolled as a redundant net): `tuya_bridge`, `job-hunter`, `f1-stream`.
|
||||
`f1-stream` was extracted from this monorepo to `viktor/f1-stream` on
|
||||
2026-06-05 (Woodpecker repo id 166); the old github source is archived and its
|
||||
GHA-era Woodpecker repo (id 10) is deactivated.
|
||||
Private-image pulls use the `ghcr-credentials` dockerconfigjson, cloned by the
|
||||
kyverno stack's `sync-ghcr-credentials` ClusterPolicy to an explicit
|
||||
**ALLOWLIST** of private-ghcr namespaces only (NOT cluster-wide; source
|
||||
`stacks/kyverno/modules/kyverno/ghcr-credentials.tf`). Cred = Vault
|
||||
`secret/viktor/ghcr_pull_token` (an alias of the admin `github_pat` — GitHub
|
||||
has no token-mint API; swap the alias value if a scoped token is ever
|
||||
UI-minted).
|
||||
|
||||
**Woodpecker-only (infra + large apps)**:
|
||||
- `travel_blog`: 5.7GB content directory exceeds GHA limits
|
||||
- Infra pipelines: require cluster access (terragrunt apply, certbot, build-cli)
|
||||
### Migrated apps (issues #13–#27)
|
||||
|
||||
### Woodpecker Pipeline Files
|
||||
f1-stream, job-hunter, tuya_bridge, beadboard, nextcloud-todos,
|
||||
claude-agent-service, claude-memory-mcp, kms-website, Freedify,
|
||||
instagram-poster, payslip-ingest, broker-sync (image name `wealthfolio-sync`),
|
||||
fire-planner, recruiter-responder, x402-gateway — plus **tripit** (the original
|
||||
pilot, 2026-06-09). Earlier public-repo apps already on GHA (Website,
|
||||
k8s-portal, apple-health-data, audiblez-web, plotting-book, insta2spotify,
|
||||
audiobook-search, council-complaints) now also land on ghcr.
|
||||
|
||||
Each project contains:
|
||||
- `.woodpecker/deploy.yml`: kubectl set image + Slack notification
|
||||
- `.woodpecker/build-fallback.yml`: Legacy full build pipeline (event: deployment, never auto-fires)
|
||||
### Infra-owned images (issues #29 / #30)
|
||||
|
||||
### Woodpecker Repository IDs
|
||||
Images owned by the infra repo build on GHA workflows **in the infra repo's own
|
||||
`.github/workflows/`** (the github↔forgejo divergence was deliberately NOT
|
||||
reconciled — the workflows were added to the GitHub lineage via PR):
|
||||
|
||||
Woodpecker API uses numeric IDs (not owner/name):
|
||||
| Image | Workflow | Destination |
|
||||
|-------|----------|-------------|
|
||||
| chrome-service-novnc | `build-chrome-service-novnc.yml` | public `ghcr.io/viktorbarzin/chrome-service-novnc` |
|
||||
| android-emulator | `build-android-emulator.yml` | public `ghcr.io/viktorbarzin/android-emulator` |
|
||||
| infra CLI | `build-cli.yml` | DockerHub `viktorbarzin/infra` (kept) + `ghcr.io/viktorbarzin/infra-cli` |
|
||||
| infra-ci | `build-infra-ci.yml` | private `ghcr.io/viktorbarzin/infra-ci` |
|
||||
|
||||
| Repo | ID |
|
||||
|------|------|
|
||||
| infra | 1 |
|
||||
| Website | 2 |
|
||||
| finance | 3 |
|
||||
| health | 4 |
|
||||
| travel_blog | 5 |
|
||||
| webhook-handler | 6 |
|
||||
| audiblez-web | 9 |
|
||||
| plotting-book | 43 |
|
||||
| claude-memory-mcp | 78 |
|
||||
| infra-onboarding | 79 |
|
||||
**`infra-ci`** is the image the `.woodpecker/default.yml` apply step and
|
||||
`drift-detection.yml` run in (proven by pipelines 165/166). `chatterbox-tts` is
|
||||
already built by tripit's GHA → ghcr.
|
||||
|
||||
### Image Registry Flow
|
||||
The Woodpecker `build-ci-image.yml` and `build-cli.yml` pipelines were
|
||||
**REMOVED**. Break-glass for infra-ci is now a manual
|
||||
`.woodpecker/breakglass-infra-ci.yml` (ghcr pull-and-save to the registry VM).
|
||||
|
||||
1. **Containerd hosts.toml** redirects pulls from docker.io and ghcr.io to pull-through cache at `10.0.20.10`
|
||||
2. **Pull-through cache** serves cached images from LAN, fetches from upstream on cache miss
|
||||
3. **Kyverno ClusterPolicy** auto-syncs `registry-credentials` Secret to all namespaces for private registry access
|
||||
4. **Private registry** has been Forgejo's built-in OCI registry at `forgejo.viktorbarzin.me/viktor/<image>` since 2026-05-07. Auth via PAT (Vault `secret/ci/global/forgejo_push_token` for push, `secret/viktor/forgejo_pull_token` for pull). The pre-migration `registry:2.8.3`-based private registry on `registry.viktorbarzin.me:5050` was the root cause of three orphan-index incidents in three weeks (2026-04-13, 2026-04-19, 2026-05-04 — see `docs/post-mortems/2026-04-19-registry-orphan-index.md` and the full migration writeup at `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md`). The five pull-through caches on `10.0.20.10` (ports 5000/5010/5020/5030/5040) stay in place for upstream registries.
|
||||
5. **Integrity probe** (`registry-integrity-probe` CronJob in `monitoring` ns, every 15m) walks `/v2/_catalog` → tags → indexes → child manifests via HEAD and pushes `registry_manifest_integrity_failures` to Pushgateway; alerts `RegistryManifestIntegrityFailure` / `RegistryIntegrityProbeStale` / `RegistryCatalogInaccessible` page on broken state. Authoritative check (HTTP API, not filesystem).
|
||||
### Forgejo container registry — FROZEN
|
||||
|
||||
### Infra Pipelines (Woodpecker-only)
|
||||
Issue #32 wiped all `viktor/*` container packages (~19G reclaimed, `/data`
|
||||
58%→20%). The registry is **break-glass-only** now; nothing pushes to it. The
|
||||
`forgejo-cleanup` CronJob stays in `DRY_RUN` (nothing to clean). Pull-through
|
||||
caches on the registry VM (`10.0.20.10`) are unchanged. See
|
||||
`docs/runbooks/forgejo-registry-breakglass.md`.
|
||||
|
||||
### Image registry / pull path
|
||||
|
||||
1. **Containerd `hosts.toml`** redirects pulls from docker.io and ghcr.io to the
|
||||
pull-through cache at `10.0.20.10` (5000 = docker.io, 5010 = ghcr.io).
|
||||
2. **Pull-through cache** serves cached images from the LAN, fetches upstream on
|
||||
a miss.
|
||||
3. **Kyverno ClusterPolicies** sync `ghcr-credentials` (private-ghcr allowlist)
|
||||
and `registry-credentials` to namespaces.
|
||||
|
||||
## Woodpecker — what it still runs
|
||||
|
||||
Woodpecker is **deploy + cluster-touching steps only**:
|
||||
|
||||
| Pipeline | File | Purpose |
|
||||
|----------|------|---------|
|
||||
| default | `.woodpecker/default.yml` | Terragrunt apply on push |
|
||||
| renew-tls | `.woodpecker/renew-tls.yml` | Certbot renewal cron |
|
||||
| build-cli | `.woodpecker/build-cli.yml` | Build and push to dual registries |
|
||||
| build-ci-image | `.woodpecker/build-ci-image.yml` | Build `infra-ci` tooling image (triggered by `ci/Dockerfile` change or manual); post-push HEADs every blob via `verify-integrity` step to catch orphan-index pushes |
|
||||
| k8s-portal | `.woodpecker/k8s-portal.yml` | Path-filtered build for k8s-portal subdirectory |
|
||||
| registry-config-sync | `.woodpecker/registry-config-sync.yml` | SCP `modules/docker-registry/*` to `/opt/registry/` on `10.0.20.10` when any managed file changes; bounces containers + nginx per `docs/runbooks/registry-vm.md` |
|
||||
| pve-nfs-exports-sync | `.woodpecker/pve-nfs-exports-sync.yml` | Sync `scripts/pve-nfs-exports` → `/etc/exports` on PVE host |
|
||||
| postmortem-todos | `.woodpecker/postmortem-todos.yml` | Auto-resolve safe TODOs from new `docs/post-mortems/*.md` via headless Claude agent |
|
||||
| drift-detection | `.woodpecker/drift-detection.yml` | Nightly Terraform drift detection |
|
||||
| issue-automation | `.woodpecker/issue-automation.yml` | Triage + respond to `ViktorBarzin/infra` GitHub issues |
|
||||
| per-app deploy | `.woodpecker/deploy.yml` (each repo) | `kubectl set image` + Slack notify (event: **manual**) |
|
||||
| terragrunt apply | `.woodpecker/default.yml` | Changed-stacks apply on push to master (runs in `infra-ci`) |
|
||||
| certbot | `.woodpecker/renew-tls.yml` | TLS renewal cron |
|
||||
| drift-detection | `.woodpecker/drift-detection.yml` | Nightly Terraform drift (runs in `infra-ci`) |
|
||||
| provision-user | `.woodpecker/provision-user.yml` | Add namespace-owner user from Vault spec |
|
||||
| registry-config-sync | `.woodpecker/registry-config-sync.yml` | SCP `modules/docker-registry/*` → `10.0.20.10` on change |
|
||||
| pve-nfs-exports-sync | `.woodpecker/pve-nfs-exports-sync.yml` | Sync `scripts/pve-nfs-exports` → `/etc/exports` on PVE |
|
||||
| issue-automation | `.woodpecker/issue-automation.yml` | Triage + respond to `ViktorBarzin/infra` GitHub issues |
|
||||
| postmortem-todos | `.woodpecker/postmortem-todos.yml` | Auto-resolve safe TODOs from new post-mortems |
|
||||
| k8s-portal | `.woodpecker/k8s-portal.yml` | Path-filtered deploy for the portal |
|
||||
| breakglass-infra-ci | `.woodpecker/breakglass-infra-ci.yml` | **Manual** ghcr pull-and-save of infra-ci to the registry VM |
|
||||
|
||||
**No build/test pipeline exists on any repo.** Do not (re)introduce one.
|
||||
|
||||
### Woodpecker API
|
||||
|
||||
Uses **numeric repo IDs** (`/api/repos/<id>/pipelines`), NOT owner/name paths
|
||||
(those return HTML). The deploy registration for each app is the **GitHub
|
||||
mirror** repo (registered github-forge). IDs are stable across renames and must
|
||||
be looked up from the Woodpecker UI/DB.
|
||||
|
||||
### Woodpecker YAML gotchas
|
||||
|
||||
- Commands with `${VAR}:${VAR}` must be **quoted** — an unquoted `:` triggers
|
||||
YAML map parsing when the vars are empty.
|
||||
- Use `bitnami/kubectl:latest` (not pinned versions — entrypoint compatibility).
|
||||
- Global secrets must include `manual` in their events list for API-triggered
|
||||
pipelines.
|
||||
|
||||
### GitHub repo secrets
|
||||
|
||||
Per repo: `WOODPECKER_TOKEN` (POST the deploy pipeline), `FORGEJO_GIT_TOKEN`
|
||||
(write:repository PAT for the `svu` tag push). ghcr push uses the workflow's
|
||||
built-in `GITHUB_TOKEN` (`packages: write`).
|
||||
|
||||
## Infra repo CI topology
|
||||
|
||||
The infra repo runs on Woodpecker via **two** forge registrations: the Forgejo
|
||||
forge (repo id 82, registered 2026-06-08) and the legacy GitHub forge (repo id
|
||||
1). Pushes to **Forgejo** `master` fire `.woodpecker/default.yml`
|
||||
(changed-stacks terragrunt apply, in `infra-ci`) plus the `notify-nonadmin-push`
|
||||
Slack audit step. Operational facts (2026-06-10):
|
||||
|
||||
- **Webhook URL is the IN-CLUSTER service**:
|
||||
`http://woodpecker-server.woodpecker.svc.cluster.local/api/hook?...` (PATCHed
|
||||
via the Forgejo API). The Woodpecker default (`https://ci.viktorbarzin.me/...`)
|
||||
resolves to the non-proxied public A record from pods → NAT hairpin →
|
||||
intermittent `context deadline exceeded`, silently dropping push events. If
|
||||
Woodpecker "repairs" the repo it rewrites the hook back to `ci.viktorbarzin.me`
|
||||
— re-apply the in-cluster URL.
|
||||
- **Repo-scoped secrets must exist on BOTH repos**: pipelines reference
|
||||
repo-level secrets (`registry_ssh_key`, `pve_ssh_key`, `CLOUDFLARE_TOKEN`, …).
|
||||
When registering a new forge repo for infra, clone the secret set too.
|
||||
- **Empty commits defeat path filters**: a commit with no changed files makes
|
||||
Woodpecker include ALL workflow files (path conditions can't exclude), so every
|
||||
repo secret must resolve. Normal commits with real files only compile the
|
||||
matching workflows.
|
||||
|
||||
The Forgejo trigger is not fully dependable — land infra changes by pushing
|
||||
Forgejo master (as viktor), use `[ci skip]` for docs/no-op commits, and verify
|
||||
deploys via `scripts/tg` + live cluster state rather than trusting the CI
|
||||
checkmark. The two remotes have **diverged** (parallel histories under
|
||||
different SHAs); expect github pushes to reject non-fast-forward and leave them
|
||||
— never force-push.
|
||||
|
||||
## Configuration
|
||||
|
||||
### GitHub Actions
|
||||
|
||||
**File**: `.github/workflows/build-and-deploy.yml`
|
||||
### GitHub Actions (per-app `.github/workflows/build.yml`)
|
||||
|
||||
```yaml
|
||||
name: Build and Deploy
|
||||
name: build
|
||||
on:
|
||||
push:
|
||||
branches: [main, master]
|
||||
branches: [master]
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
permissions:
|
||||
contents: write # svu tag push
|
||||
packages: write # ghcr push
|
||||
steps:
|
||||
- name: Build Docker image
|
||||
run: docker build --platform linux/amd64 -t viktorbarzin/app:${SHORT_SHA} .
|
||||
- name: Push to DockerHub
|
||||
run: docker push viktorbarzin/app:${SHORT_SHA}
|
||||
- name: Trigger Woodpecker Deploy
|
||||
- uses: actions/checkout@v4
|
||||
- name: lint + test
|
||||
run: make lint test
|
||||
- name: svu tag -> Forgejo
|
||||
run: |
|
||||
curl -X POST https://ci.viktorbarzin.me/api/repos/<REPO_ID>/pipelines \
|
||||
-H "Authorization: Bearer ${{ secrets.WOODPECKER_TOKEN }}"
|
||||
VERSION=$(svu next)
|
||||
# ... push tag to canonical Forgejo with FORGEJO_GIT_TOKEN
|
||||
- uses: docker/setup-buildx-action@v3
|
||||
- uses: docker/build-push-action@v6
|
||||
with:
|
||||
platforms: linux/amd64
|
||||
provenance: false
|
||||
push: true
|
||||
tags: |
|
||||
ghcr.io/viktorbarzin/<name>:${{ github.sha }}
|
||||
ghcr.io/viktorbarzin/<name>:latest
|
||||
deploy:
|
||||
needs: build
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Trigger Woodpecker deploy
|
||||
run: |
|
||||
curl -X POST https://ci.viktorbarzin.me/api/repos/<DEPLOY_REPO_ID>/pipelines \
|
||||
-H "Authorization: Bearer ${{ secrets.WOODPECKER_TOKEN }}" \
|
||||
-d '{"branch":"master","variables":{"IMAGE_TAG":"...","IMAGE_NAME":"..."}}'
|
||||
```
|
||||
|
||||
**Required GitHub Secrets**:
|
||||
- `DOCKERHUB_USERNAME`
|
||||
- `DOCKERHUB_TOKEN`
|
||||
- `WOODPECKER_TOKEN`
|
||||
|
||||
### Woodpecker Deploy Pipeline
|
||||
|
||||
**File**: `.woodpecker/deploy.yml`
|
||||
### Woodpecker deploy pipeline (per-app `.woodpecker/deploy.yml`)
|
||||
|
||||
```yaml
|
||||
when:
|
||||
event: [deployment]
|
||||
event: manual
|
||||
|
||||
steps:
|
||||
deploy:
|
||||
image: bitnami/kubectl:latest
|
||||
image: bitnami/kubectl:latest # uses the in-cluster woodpecker-agent SA (cluster-admin)
|
||||
commands:
|
||||
- kubectl set image deployment/app app=viktorbarzin/app:${CI_COMMIT_SHA:0:8}
|
||||
secrets: [k8s_token]
|
||||
|
||||
- "kubectl set image deployment/app app=${IMAGE_NAME}:${IMAGE_TAG} -n <ns>"
|
||||
- "kubectl rollout status deployment/app -n <ns> --timeout=300s"
|
||||
notify:
|
||||
image: plugins/slack
|
||||
settings:
|
||||
webhook: ${SLACK_WEBHOOK}
|
||||
when:
|
||||
status: [success, failure]
|
||||
```
|
||||
|
||||
**YAML Gotchas**:
|
||||
- Commands with `${VAR}:${VAR}` syntax must be quoted to prevent YAML map parsing when vars are empty
|
||||
- Use `bitnami/kubectl:latest` (not pinned versions)
|
||||
- Global secrets must be manually added to `secrets:` list in pipeline
|
||||
### CI/CD secrets sync
|
||||
|
||||
### Vault Configuration
|
||||
|
||||
**K8s Auth for Woodpecker**:
|
||||
- Woodpecker pipelines authenticate using ServiceAccount JWT
|
||||
- Vault K8s auth mount validates JWT and issues token
|
||||
- Policies grant access to secrets and dynamic credentials
|
||||
|
||||
### CI/CD Secrets Sync
|
||||
|
||||
**CronJob**: Pushes `secret/ci/global` from Vault → Woodpecker API every 6 hours
|
||||
- Keeps Woodpecker global secrets in sync with Vault
|
||||
- Runs in `woodpecker` namespace
|
||||
|
||||
## Infra repo CI (Woodpecker repo 82 — Forgejo forge)
|
||||
|
||||
The infra repo itself runs on Woodpecker via the **Forgejo** forge (repo id 82,
|
||||
registered 2026-06-08; the GitHub-side repo id 1 also remains registered).
|
||||
Pushes to `master` fire `.woodpecker/default.yml` (changed-stacks terragrunt
|
||||
apply) plus the `notify-nonadmin-push` Slack audit step (allow-then-audit
|
||||
contribution model — see `multi-tenancy.md`). Operational facts (2026-06-10):
|
||||
|
||||
- **Webhook URL is the IN-CLUSTER service**: `http://woodpecker-server.woodpecker.svc.cluster.local/api/hook?...`
|
||||
(PATCHed via the Forgejo API). The Woodpecker-generated default
|
||||
(`https://ci.viktorbarzin.me/...`) resolves to the non-proxied public A
|
||||
record from pods → NAT hairpin → intermittent `context deadline exceeded`,
|
||||
silently dropping push events (found when a push produced no pipeline).
|
||||
If Woodpecker ever "repairs" the repo it will rewrite the hook back to
|
||||
`ci.viktorbarzin.me` — re-apply the in-cluster URL (or pin `ci.viktorbarzin.me`
|
||||
in the CoreDNS pod carve-out alongside forgejo).
|
||||
- **Repo-scoped secrets must exist on BOTH repos**: pipelines reference
|
||||
repo-level secrets (`registry_ssh_key`, `pve_ssh_key`, `CLOUDFLARE_TOKEN`,
|
||||
…). Repo 82 was registered without them and every all-workflow compile
|
||||
errored with `secret "registry_ssh_key" not found`. Fixed by cloning repo-1
|
||||
rows to repo 82 in the Woodpecker DB (`insert into secrets … select … where
|
||||
repo_id=1`). When registering a new forge repo for infra, clone the secret
|
||||
set too.
|
||||
- **Empty commits defeat path filters**: a commit with no changed files makes
|
||||
Woodpecker include ALL workflow files (path conditions can't exclude), so
|
||||
every repo secret must resolve. Normal commits with real files only compile
|
||||
the matching workflows.
|
||||
A CronJob in the `woodpecker` namespace pushes `secret/ci/global` from Vault →
|
||||
the Woodpecker API every 6h, keeping global secrets in sync. Woodpecker deploy
|
||||
pipelines authenticate to the cluster via the in-cluster `woodpecker-agent` SA
|
||||
(cluster-admin); Vault K8s auth backs any secret reads.
|
||||
|
||||
## Decisions & Rationale
|
||||
|
||||
### Why GitHub Actions + Woodpecker?
|
||||
### Why all builds off-infra (ADR-0002)?
|
||||
|
||||
**Alternatives considered**:
|
||||
1. **Woodpecker-only**: Simple, but wastes cluster resources on builds
|
||||
2. **GHA-only**: No cluster access, requires kubectl from outside (security risk)
|
||||
3. **Hybrid (chosen)**: GHA for compute-heavy builds (free), Woodpecker for privileged deployments (secure cluster access)
|
||||
- **Breaks the circular dependency** — the images needed to repair the cluster
|
||||
no longer live inside it (they're on ghcr, an external registry).
|
||||
- **Removes build IO + registry push load** from the contended homelab spindle.
|
||||
- GHA is free on public repos and generous on private; buildx provenance:false
|
||||
sidesteps the orphaned-index-children failure class that plagued the
|
||||
in-cluster registry.
|
||||
- **Clean cut** — no in-cluster fallback builds anywhere; one pattern,
|
||||
fleet-wide.
|
||||
|
||||
**Benefits**:
|
||||
- Free compute for builds on public repos
|
||||
- Cluster access stays internal (Woodpecker has direct K8s access)
|
||||
- Separation of concerns: build vs deploy
|
||||
### Why ghcr (not push back to Forgejo)?
|
||||
|
||||
### Why 8-Character SHA Tags (Not :latest)?
|
||||
Forgejo's container registry repeatedly orphaned OCI index children
|
||||
(2026-04-13/19, 2026-05-04, 2026-06-10) and its retention is not container-aware.
|
||||
ghcr is external (DR-safe), free for this scale, and has native multi-arch
|
||||
handling. The Forgejo registry was frozen + emptied (issue #32).
|
||||
|
||||
- Pull-through cache serves stale `:latest` tags indefinitely
|
||||
- SHA tags ensure every deployment pulls the correct image
|
||||
- 8 characters provide sufficient collision resistance (16^8 = 4.3 billion combinations)
|
||||
### Why Woodpecker stays for deploy?
|
||||
|
||||
### Why Numeric Repo IDs for Woodpecker API?
|
||||
`kubectl set image` needs in-cluster privileged access; doing it from GHA would
|
||||
mean exposing kube-apiserver or a long-lived kubeconfig. Woodpecker's
|
||||
`woodpecker-agent` SA is already cluster-admin in-cluster — the deploy step
|
||||
needs no credentials.
|
||||
|
||||
- Woodpecker API requires numeric IDs (not owner/name slugs)
|
||||
- IDs are stable across repo renames
|
||||
- Must be manually looked up from Woodpecker UI or database
|
||||
### Why `event: manual` on deploy.yml?
|
||||
|
||||
### Why linux/amd64 Only?
|
||||
The Forgejo→GitHub push-mirror sends raw, tag-less pushes to the GitHub mirror.
|
||||
If `deploy.yml` fired on `push`, every mirror sync would trigger a deploy with no
|
||||
image tag. `manual` means only the GHA `deploy` job's explicit API POST (with
|
||||
`IMAGE_TAG`) deploys.
|
||||
|
||||
- Cluster runs on x86_64 nodes only
|
||||
- ARM builds would waste time and storage
|
||||
- Multi-arch images add complexity without benefit
|
||||
### Why linux/amd64 only?
|
||||
|
||||
The cluster runs on x86_64 nodes only; ARM builds waste time and storage.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### GHA Build Fails: "denied: requested access to the resource is denied"
|
||||
### GHA build fails: ghcr push "denied"
|
||||
|
||||
**Cause**: DockerHub credentials expired or incorrect
|
||||
The workflow `GITHUB_TOKEN` needs `packages: write` permission and the package
|
||||
must allow the repo to push. Check the workflow `permissions:` block and the
|
||||
package's "Manage Actions access" settings.
|
||||
|
||||
### Image pull fails: "ErrImagePull" / "ImagePullBackOff"
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Regenerate DockerHub token
|
||||
# Update GitHub repo secrets: DOCKERHUB_USERNAME, DOCKERHUB_TOKEN
|
||||
# Public image — check the pull-through cache is up
|
||||
curl http://10.0.20.10:5010/v2/_catalog
|
||||
|
||||
# Private image — verify the ghcr-credentials Secret exists in the namespace
|
||||
kubectl get secret ghcr-credentials -n <namespace>
|
||||
# It's Kyverno-synced to an allowlist; if missing, the namespace isn't on the
|
||||
# allowlist in stacks/kyverno/modules/kyverno/ghcr-credentials.tf
|
||||
```
|
||||
|
||||
### Woodpecker Deploy Fails: "Unauthorized"
|
||||
If the cause is the internal-DNS hairpin (fresh pulls timing out on the public
|
||||
Forgejo path), see the CoreDNS `viktorbarzin.me` carve-out in
|
||||
`docs/architecture/networking.md` and `docs/runbooks/registry-vm.md`.
|
||||
|
||||
**Cause**: Vault K8s auth token expired or invalid
|
||||
### Deploy didn't happen after a push
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Restart Woodpecker pipeline (token auto-renewed)
|
||||
# Check Vault K8s auth role exists: vault read auth/kubernetes/role/woodpecker-deployer
|
||||
```
|
||||
Confirm the push was to **master** (feature branches build/deploy nothing).
|
||||
Check the GHA run completed the `deploy` job, then check Woodpecker received the
|
||||
manual pipeline (`ci.viktorbarzin.me`, the GitHub-mirror deploy repo). Verify
|
||||
live with `kubectl rollout status` — not the CI checkmark.
|
||||
|
||||
### Image Pull Fails: "ErrImagePull"
|
||||
### Woodpecker deploy fails: "YAML: did not find expected key"
|
||||
|
||||
**Cause**: Pull-through cache or registry credentials issue
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Check pull-through cache is running
|
||||
curl http://10.0.20.10:5000/v2/_catalog
|
||||
|
||||
# Verify registry-credentials Secret exists in namespace
|
||||
kubectl get secret registry-credentials -n <namespace>
|
||||
|
||||
# Manually sync credentials if missing
|
||||
kubectl get secret registry-credentials -n default -o yaml | \
|
||||
sed 's/namespace: default/namespace: <namespace>/' | kubectl apply -f -
|
||||
```
|
||||
|
||||
### Woodpecker Pipeline: "YAML: did not find expected key"
|
||||
|
||||
**Cause**: Unquoted command with `${VAR}:${VAR}` syntax when VAR is empty
|
||||
|
||||
**Fix**: Quote the command:
|
||||
```yaml
|
||||
commands:
|
||||
- "kubectl set image deployment/app app=viktorbarzin/app:${SHORT_SHA}"
|
||||
```
|
||||
|
||||
### travel_blog Build Times Out on GHA
|
||||
|
||||
**Cause**: 5.7GB content directory exceeds GHA disk/time limits
|
||||
|
||||
**Fix**: Keep on Woodpecker (no migration). Build uses cluster storage and resources.
|
||||
|
||||
### CI/CD Secrets Out of Sync
|
||||
|
||||
**Cause**: CronJob failed to sync Vault → Woodpecker
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
# Check CronJob status
|
||||
kubectl get cronjob -n woodpecker
|
||||
|
||||
# Manually trigger sync
|
||||
kubectl create job --from=cronjob/sync-secrets manual-sync -n woodpecker
|
||||
```
|
||||
Unquoted command with `${VAR}:${VAR}` syntax when a VAR is empty. Quote the
|
||||
command (see the deploy.yml example above).
|
||||
|
||||
## Related
|
||||
|
||||
- [Databases Architecture](./databases.md) — Database credentials via Vault
|
||||
- [Multi-Tenancy](./multi-tenancy.md) — Per-user Woodpecker access
|
||||
- Runbook: `../runbooks/deploy-new-app.md` — How to set up CI/CD for a new app
|
||||
- Runbook: `../runbooks/troubleshoot-image-pull.md` — Debug image pull issues
|
||||
- Vault documentation: K8s auth configuration
|
||||
- Woodpecker documentation: API reference
|
||||
- ADR: `../adr/0002-all-image-builds-off-infra-gha-ghcr.md` — the decision
|
||||
- [Databases Architecture](./databases.md) — database credentials via Vault
|
||||
- [Multi-Tenancy](./multi-tenancy.md) — per-user Woodpecker access
|
||||
- Runbook: `../runbooks/forgejo-registry-breakglass.md` — using the frozen registry
|
||||
- Runbook: `../runbooks/registry-vm.md` — pull-through cache VM + image-pull debugging
|
||||
- Onboarding tool: `../../scripts/offinfra-onboard` + `../../scripts/offinfra-templates/`
|
||||
|
|
|
|||
|
|
@ -6,5 +6,5 @@ variable "tls_secret_name" {
|
|||
variable "image_tag" {
|
||||
type = string
|
||||
default = "latest"
|
||||
description = "android-emulator image tag at forgejo.viktorbarzin.me/viktor/android-emulator. Built by GHA (.github/workflows/build-android-emulator.yml) -> ghcr.io/viktorbarzin/android-emulator on changes to stacks/android-emulator/docker/ (ADR-0002). :latest tracks the newest build."
|
||||
description = "android-emulator image tag at ghcr.io/viktorbarzin/android-emulator. Built by GHA (.github/workflows/build-android-emulator.yml) on changes to stacks/android-emulator/docker/ (ADR-0002). :latest tracks the newest build."
|
||||
}
|
||||
|
|
|
|||
|
|
@ -225,8 +225,11 @@ module "ingress_ro" {
|
|||
# https://forgejo.viktorbarzin.me/viktor/terminal-lobby
|
||||
#
|
||||
# That repo's ./scripts/deploy.sh ships everything to wizard@10.0.10.10
|
||||
# and restarts ttyd / ttyd-ro / tmux-api / clipboard-upload. This stack
|
||||
# only owns the Kubernetes side: Services, Endpoints pointing at
|
||||
# and restarts ttyd / ttyd-ro / tmux-api / clipboard-upload. Deploy is
|
||||
# MANUAL via that script — there is no CI pipeline (the lobby's
|
||||
# .woodpecker.yml was removed under ADR-0002, issue #31; it builds no
|
||||
# image, so it is not part of the GHA->ghcr fleet). This stack only owns
|
||||
# the Kubernetes side: Services, Endpoints pointing at
|
||||
# 10.0.10.10:{7681,7682,7683,7684}, the IngressRoutes, and the Traefik
|
||||
# middlewares that gate everything behind Authentik forward-auth.
|
||||
#
|
||||
|
|
|
|||
|
|
@ -6,5 +6,5 @@ variable "tls_secret_name" {
|
|||
variable "image_tag" {
|
||||
type = string
|
||||
default = "latest"
|
||||
description = "tuya_bridge image tag pushed to forgejo.viktorbarzin.me/viktor/tuya_bridge. Each Woodpecker run does `kubectl set image` to the 8-char git SHA; this variable is only used on initial create / TF recreate (image is in lifecycle.ignore_changes)."
|
||||
description = "tuya_bridge image tag at ghcr.io/viktorbarzin/tuya_bridge (built by GHA, ADR-0002). The GHA deploy job drives a Woodpecker `kubectl set image` to the 8-char git SHA; this variable is only used on initial create / TF recreate (image is in lifecycle.ignore_changes)."
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue