[forgejo] Phase 0 of registry consolidation: prepare Forgejo OCI registry

Stage 1 of moving private images off the registry:2 container at
registry.viktorbarzin.me:5050 (which has hit distribution#3324 corruption
3x in 3 weeks) onto Forgejo's built-in OCI registry. No cutover risk —
pods still pull from the existing registry until Phase 3.

What changes:
* Forgejo deployment: memory 384Mi→1Gi, PVC 5Gi→15Gi (cap 50Gi).
  Explicit FORGEJO__packages__ENABLED + CHUNKED_UPLOAD_PATH (defensive,
  v11 default-on).
* ingress_factory: max_body_size variable was declared but never wired
  in after the nginx→Traefik migration. Now creates a per-ingress
  Buffering middleware when set; default null = no limit (preserves
  existing behavior). Forgejo ingress sets max_body_size=5g to allow
  multi-GB layer pushes.
* Cluster-wide registry-credentials Secret: 4th auths entry for
  forgejo.viktorbarzin.me, populated from Vault secret/viktor/
  forgejo_pull_token (cluster-puller PAT, read:package). Existing
  Kyverno ClusterPolicy syncs cluster-wide — no policy edits.
* Containerd hosts.toml redirect: forgejo.viktorbarzin.me → in-cluster
  Traefik LB 10.0.20.200 (avoids hairpin NAT for in-cluster pulls).
  Cloud-init for new VMs + scripts/setup-forgejo-containerd-mirror.sh
  for existing nodes.
* Forgejo retention CronJob (0 4 * * *): keeps newest 10 versions per
  package + always :latest. First 7 days dry-run (DRY_RUN=true);
  flip the local in cleanup.tf after log review.
* Forgejo integrity probe CronJob (*/15): same algorithm as the
  existing registry-integrity-probe. Existing Prometheus alerts
  (RegistryManifestIntegrityFailure et al) made instance-aware so
  they cover both registries during the bake.
* Docs: design+plan in docs/plans/, setup runbook in docs/runbooks/.

Operational note — the apply order is non-trivial because the new
Vault keys (forgejo_pull_token, forgejo_cleanup_token,
secret/ci/global/forgejo_*) must exist BEFORE terragrunt apply in the
kyverno + monitoring + forgejo stacks. The setup runbook documents
the bootstrap sequence.

Phase 1 (per-project dual-push pipelines) follows in subsequent
commits. Bake clock starts when the last project goes dual-push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-07 15:51:34 +00:00
parent b1c21f78b9
commit 5d22b449f9
13 changed files with 1072 additions and 10 deletions

View file

@ -0,0 +1,195 @@
# Forgejo Registry Consolidation — Design
**Date**: 2026-05-07
**Status**: Approved
## Problem
`registry-private` (the `registry:2` container on the docker-registry
VM at `10.0.20.10`) has hit `distribution#3324` corruption three
times in three weeks (2026-04-13, 2026-04-19, 2026-05-04). Each
incident required manual blob recovery and another round of
hardening to `cleanup-tags.sh` and the GC procedure. The integrity
probe catches it within 15 minutes now, but every hit still costs
~1h of cleanup, and we keep tightening the same loose screw.
Root cause is a known race in `distribution`: tag deletes that race
with concurrent garbage collection produce orphan OCI-index children.
Upstream has not patched it; our mitigations (probe, blob
fix-up script, idempotent cleanup) reduce blast radius but don't
remove the failure mode.
Forgejo (deployed for OAuth and personal repos at
`forgejo.viktorbarzin.me`) ships a built-in OCI registry as part of
the Packages feature, default-on in v11. Using it removes
`distribution`-the-engine from the path entirely, replaces it with
Forgejo's own implementation backed by Forgejo's DB+blob store, and
gets us source hosting + image hosting in one resource.
The PVE host RAM upgrade from 142GB to 272GB (memory id=569) means
the cluster can absorb the resource bump Forgejo needs for the
registry workload (1Gi → 1Gi).
## Decision
Move every image currently on `registry.viktorbarzin.me:5050` to
Forgejo's OCI registry at `forgejo.viktorbarzin.me`. Decommission
`registry-private` after a 14-day dual-push bake.
Pull-through caches for upstream registries (DockerHub, GHCR, Quay,
k8s.gcr, Kyverno) stay on the registry VM permanently — Forgejo
won't serve as a pull-through, so the chicken-and-egg of "Forgejo
pulling its own image through itself" never arises.
## Design
### Registry hostname
Image references become `forgejo.viktorbarzin.me/viktor/<image>:<tag>`.
The `viktor/` prefix is the Forgejo owner namespace; all current
private images ship under that single owner.
### Auth
Two service-account users:
| User | Scope | Vault key | Used by |
|---|---|---|---|
| `cluster-puller` | `read:package` | `secret/viktor/forgejo_pull_token` | cluster-wide `registry-credentials` Secret, monitoring probe |
| `ci-pusher` | `write:package` | `secret/ci/global/forgejo_push_token` | Woodpecker pipelines (synced via `vault-woodpecker-sync` CronJob) |
A third PAT (`secret/viktor/forgejo_cleanup_token`, also belongs to
`ci-pusher`) drives the retention CronJob — kept separate from the
push PAT so a leaked CI token doesn't immediately enable mass deletes.
PATs have no expiry. Rotation policy: regenerate via Forgejo Web UI
and `vault kv patch` if a leak is suspected; ESO/sync downstream is
automatic.
### Cluster pull path
`registry-credentials` is a single Secret in `kyverno` ns, cloned
into every namespace by the existing
`sync-registry-credentials` ClusterPolicy. We extend its
`dockerconfigjson` `auths` map with a fourth entry for
`forgejo.viktorbarzin.me`. **No new Secret, no new ClusterPolicy,
no `imagePullSecrets =` line edits across stacks.**
Containerd `hosts.toml` redirects `forgejo.viktorbarzin.me` → in-cluster
Traefik LB at `10.0.20.200`, the same pattern used for
`registry.viktorbarzin.me``10.0.20.10:5050`. Avoids hairpin NAT
through the WAN gateway for in-cluster pulls.
### Push path
Woodpecker pipelines push to BOTH targets during the bake:
```yaml
- name: build-and-push
image: woodpeckerci/plugin-docker-buildx
settings:
repo:
- registry.viktorbarzin.me/<name>
- forgejo.viktorbarzin.me/viktor/<name>
logins:
- registry: registry.viktorbarzin.me
username:
from_secret: registry_user
password:
from_secret: registry_password
- registry: forgejo.viktorbarzin.me
username:
from_secret: forgejo_user
password:
from_secret: forgejo_push_token
```
The `vault-woodpecker-sync` CronJob (every 6h) propagates
`secret/ci/global` keys to every Woodpecker repo as global secrets.
### Retention
Forgejo's per-package "Cleanup Rules" UI is per-user runtime DB
state, not Terraform-driven. Retention runs as a CronJob in the
`forgejo` namespace, schedule `0 4 * * *`, that:
1. Lists all container packages under the `viktor` owner.
2. Groups by package name.
3. Keeps newest 10 versions + always keeps `latest`.
4. DELETEs the rest via `/api/v1/packages/{owner}/{type}/{name}/{version}`.
First 7 days run with `DRY_RUN=true` — script logs what it would
delete but issues no DELETE calls. After log review, flip the
`forgejo_cleanup_dry_run` local in `cleanup.tf` to false.
### Integrity monitoring
Mirror the existing `registry-integrity-probe` CronJob: walk
`/v2/_catalog`, walk every tag, HEAD every manifest + index child,
push `registry_manifest_integrity_*` metrics. Existing
Prometheus alerts fire on the `instance` label, so they cover both
probes automatically once the alert annotations are made
instance-aware (done in this change).
### Source migration
Projects currently living as plain dirs in the local-only monorepo
become standalone Forgejo repos. Two GitHub-hosted private repos
(`beadboard`, `claude-memory-mcp`) move to Forgejo and are archived
on GitHub.
CI standardises on Woodpecker for everything in scope. The two
projects that used GHA (build + Woodpecker-deploy via GHA-hosted
DockerHub push) keep DockerHub for legacy compatibility but their
canonical image source becomes Forgejo.
### Break-glass for infra-ci
`infra-ci` is the Docker image used by all infra Woodpecker
pipelines, including `default.yml` (terragrunt apply). If Forgejo is
unreachable at the moment we need to apply, `infra-ci` is
unreachable, and we can't apply our way out.
Mitigation: dual-push step also `docker save | gzip` the built
infra-ci image to:
- `/opt/registry/data/private/_breakglass/infra-ci-<sha>.tar.gz` on
the registry VM disk (Copy 1)
- `/srv/nfs/forgejo-breakglass/` on the NAS (Copy 2)
A `latest` symlink in each location points at the most recent.
Recovery procedure (`docs/runbooks/forgejo-registry-breakglass.md`):
scp tarball → `docker load``ctr -n k8s.io images import` → fix
Forgejo via that node.
### Cutover style
**Dual-push bake**: pipelines push to both registries for ≥14 days.
Pods continue pulling from `registry.viktorbarzin.me`. After bake:
1. Per-project PR: flip `image=` lines in Terraform stacks. Pod
re-pull naturally on next rollout.
2. Phase 4: stop `registry-private` container, remove its
`auths` entry from the cluster Secret, drop containerd hosts.toml
entry.
## Why not alternatives
| Option | Rejected because |
|---|---|
| Stay on `registry-private` | Three corruption incidents in three weeks; mitigation cost rising |
| Run a fresh registry container alongside (no Forgejo) | Same upstream, same `distribution#3324` failure mode |
| GHCR / DockerHub for all private images | Public-by-default model + push rate limits; loses owner-owned blob storage |
| Harbor | Heavier than Forgejo registry, would need its own DB + ingress, no source-hosting integration |
## Risks
See plan doc § "Risk register" for the full table. Top three:
1. **Forgejo registry hits the same corruption pattern.** Mitigated
by 14-day bake + integrity probe within 15 min.
2. **Forgejo down → infra-ci unreachable → can't apply.** Mitigated
by tarball break-glass on VM + NAS.
3. **Pod re-pulls fail after `image=` flip due to containerd cache
poisoning.** Mitigated by hosts.toml deployment + per-project
`kubectl rollout restart` in Phase 3.

View file

@ -0,0 +1,152 @@
# Forgejo Registry Consolidation — Plan
**Date**: 2026-05-07
**Status**: Approved — execution in progress (Phase 0)
**Design**: `2026-05-07-forgejo-registry-consolidation-design.md`
This is the implementation roadmap for migrating off `registry-private`
onto Forgejo's OCI registry. See the design doc for problem
statement and rationale. Execution spans 5 phases over ≥3 weeks.
## Phase 0 — Prepare Forgejo (1 PR, no cutover risk)
| Task | File / artifact |
|---|---|
| Bump Forgejo memory request+limit 384Mi → 1Gi | `infra/stacks/forgejo/main.tf` |
| Add `FORGEJO__packages__ENABLED=true` and `FORGEJO__packages__CHUNKED_UPLOAD_PATH=/data/tmp/package-upload` env vars (defensive — already default in v11) | `infra/stacks/forgejo/main.tf` |
| Bump Forgejo PVC 5Gi → 15Gi, auto-resize cap 20Gi → 50Gi | `infra/stacks/forgejo/main.tf` |
| Bump ingress `max_body_size = "5g"` (wired into ingress_factory as a Buffering middleware) | `infra/stacks/forgejo/main.tf`, `infra/modules/kubernetes/ingress_factory/main.tf` |
| Create `cluster-puller` (read:package), `ci-pusher` (write:package), and a third `cleanup` PAT on `ci-pusher`; store PATs in Vault | runbook: `docs/runbooks/forgejo-registry-setup.md` |
| Extend `registry-credentials` Secret with 4th `auths` entry for `forgejo.viktorbarzin.me` | `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` |
| Add containerd `hosts.toml` entry redirecting `forgejo.viktorbarzin.me` → in-cluster Traefik LB `10.0.20.200` | `infra/stacks/infra/main.tf` cloud-init + new `infra/scripts/setup-forgejo-containerd-mirror.sh` for existing nodes |
| Forgejo retention CronJob (`0 4 * * *`, dry-run for first 7 days) | new `infra/stacks/forgejo/cleanup.tf` + `infra/stacks/forgejo/files/cleanup.sh` |
| Forgejo integrity probe CronJob (`*/15 * * * *`) | `infra/stacks/monitoring/modules/monitoring/main.tf` |
| Make existing alerts instance-aware so they cover both registries | `infra/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl` |
**Smoke test (must pass before declaring Phase 0 done):**
- `docker login forgejo.viktorbarzin.me` succeeds.
- Push a hello-world image to `forgejo.viktorbarzin.me/viktor/smoketest:1` succeeds.
- `crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1` from a k8s
node succeeds, using the auto-synced `registry-credentials` Secret.
- A fresh namespace gets the cloned Secret with 4 `auths` entries.
- Delete the smoketest package via API.
- Forgejo integrity probe completes once and pushes metrics.
## Phase 1 — Source migration (parallel-safe, no production impact)
For each project the recipe is identical:
1. `git init` + push to `forgejo.viktorbarzin.me/viktor/<name>`
register in Woodpecker via OAuth.
2. Add `.woodpecker.yml` based on `payslip-ingest/.woodpecker.yml`.
Push step uses `woodpeckerci/plugin-docker-buildx` with TWO
`repo:` entries (dual-push).
3. Confirm first build pushes to BOTH registries.
Projects (bake clock starts at "all dual-push"):
| Project | Action |
|---|---|
| `claude-agent-service` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
| `fire-planner` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
| `wealthfolio-sync` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
| `hmrc-sync` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
| `freedify` | Push from monorepo to Forgejo. New `.woodpecker.yml`. (Upstream is gone.) |
| `payslip-ingest` | Already on Forgejo. Add second `repo:` entry to `.woodpecker.yml`. |
| `job-hunter` | Already on Forgejo. Add second `repo:` entry. |
| `beadboard` | Push to Forgejo. New `.woodpecker.yml`. Disable GHA workflow. **Don't archive GitHub yet** (deferred to Phase 3). |
| `claude-memory-mcp` | Push to Forgejo. New `.woodpecker.yml`. |
| `infra-ci` | Edit `.woodpecker/build-ci-image.yml` to dual-push. ALSO `docker save | gzip` to `/opt/registry/data/private/_breakglass/` on VM AND `/srv/nfs/forgejo-breakglass/` on NAS. Pin a `latest` symlink. |
Break-glass runbook (`docs/runbooks/forgejo-registry-breakglass.md`)
documents the recovery path.
## Phase 2 — Bake (≥14 days)
- No `image=` lines change. Pods still pull from
`registry.viktorbarzin.me`.
- **Daily smoke check**: pull a recent image from Forgejo as
`cluster-puller`, verify integrity (HEAD on manifest + each blob).
- **Bake exit criteria**:
- Zero `RegistryManifestIntegrityFailure` alerts on Forgejo.
- Zero `ContainerNearOOM` for the forgejo pod.
- Retention CronJob has run ≥14 times successfully.
- At least one full Sunday GC cycle has elapsed.
- Switch retention CronJob to `DRY_RUN=false` on day 7, observe
until day 14.
## Phase 3 — Cutover (one PR per project, single session)
Order = lowest blast radius first. Each step:
`image=` flip → `kubectl rollout restart` → verify pull from Forgejo.
1. `payslip-ingest` (`infra/stacks/payslip-ingest/main.tf`)
2. `job-hunter` (`infra/stacks/job-hunter/main.tf`)
3. `claude-agent-service` (`infra/stacks/claude-agent-service/main.tf`)
4. `fire-planner` (`infra/stacks/fire-planner/main.tf`)
5. `wealthfolio-sync` (`infra/stacks/wealthfolio/main.tf`)
6. `freedify` (`infra/stacks/freedify/factory/main.tf`)
7. `chrome-service` (`infra/stacks/chrome-service/main.tf`)
8. `beads-server` / `beadboard` (`infra/stacks/beads-server/main.tf`).
Then `gh repo archive ViktorBarzin/beadboard`.
9. `infra-ci` — flip `image:` references in 4 `.woodpecker/*.yml`
files in the infra repo. Verify next push to master applies cleanly.
10. `claude-memory-mcp` — update `CLAUDE.md` install instruction from
`claude plugins install github:ViktorBarzin/claude-memory-mcp` to
`claude plugins install https://forgejo.viktorbarzin.me/viktor/claude-memory-mcp.git`.
`gh repo archive ViktorBarzin/claude-memory-mcp`.
## Phase 4 — Decommission
| Step | File / location |
|---|---|
| Stop `registry-private` container on VM (10.0.20.10): edit `/opt/registry/docker-compose.yml`, comment out service, `docker compose up -d --remove-orphans`. (Manual SSH — cloud-init won't redeploy on TF apply per memory id=1078.) | live VM |
| Update cloud-init template to match the new compose file | `infra/stacks/infra/main.tf:288` |
| Delete `auths` entries for `registry.viktorbarzin.me` / `:5050` / `10.0.20.10:5050` from the dockerconfigjson | `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` |
| Drop `registry.viktorbarzin.me` and `10.0.20.10:5050` `hosts.toml` entries on each node + cloud-init template | `infra/stacks/infra/main.tf` cloud-init + ad-hoc script |
| After 1 week of no incidents, delete `/opt/registry/data/private/` blob storage on the VM (~2.6GB freed) | manual SSH |
## Phase 5 — Docs
In the same commit as the Phase 4 closing:
| Doc | Update |
|---|---|
| `docs/runbooks/registry-vm.md` | Note `registry-private` is gone; pull-through caches and break-glass tarballs only |
| `docs/runbooks/registry-rebuild-image.md` | Replaced by NEW `forgejo-registry-rebuild-image.md` |
| `docs/runbooks/forgejo-registry-rebuild-image.md` (NEW) | Forgejo PVC restore procedure |
| `docs/runbooks/forgejo-registry-breakglass.md` (NEW) | infra-ci tarball recovery |
| `docs/architecture/ci-cd.md` | Image registry section flips to Forgejo |
| `docs/architecture/monitoring.md` | Integrity probe target updated |
| `infra/.claude/CLAUDE.md` | Registry references updated |
| `CLAUDE.md` (monorepo root) | claude-memory-mcp install URL updated |
| `infra/.claude/reference/service-catalog.md` | Cross-reference checked |
## Critical files modified
| File | Phase | What |
|---|---|---|
| `infra/stacks/forgejo/main.tf` | 0 | Memory bump, packages env vars, PVC bump, ingress max_body_size |
| `infra/stacks/forgejo/cleanup.tf` (NEW) | 0 | Retention CronJob |
| `infra/stacks/forgejo/files/cleanup.sh` (NEW) | 0 | Retention script (mounted via ConfigMap) |
| `infra/modules/kubernetes/ingress_factory/main.tf` | 0 | Wire `max_body_size` into a Traefik Buffering middleware |
| `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` | 0 | Add 4th `auths` entry |
| `infra/stacks/infra/main.tf` | 0 + 4 | Containerd hosts.toml block (add Forgejo, later remove registry-private); compose template update |
| `infra/scripts/setup-forgejo-containerd-mirror.sh` (NEW) | 0 | One-shot rollout for existing nodes |
| `infra/stacks/monitoring/modules/monitoring/main.tf` | 0 | Forgejo integrity probe CronJob |
| `infra/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl` | 0 | Make alerts instance-aware |
| `infra/stacks/monitoring/main.tf` | 0 | Plumb `forgejo_pull_token` into module |
| `infra/.woodpecker/build-ci-image.yml` | 1 | Dual-push to add Forgejo target + tarball break-glass |
| `<each-project>/.woodpecker.yml` | 1 | Dual-push (NEW for fire-planner, wealthfolio-sync, hmrc-sync, freedify, beadboard, claude-memory-mcp; EDIT for payslip-ingest, job-hunter, claude-agent-service) |
| `infra/.woodpecker/{default,drift-detection,build-cli}.yml` | 3 | Flip `image:` to Forgejo for infra-ci |
| `infra/stacks/{beads-server,chrome-service,claude-agent-service,fire-planner,freedify/factory,job-hunter,payslip-ingest,wealthfolio}/main.tf` | 3 | Flip `image =` to Forgejo |
## Verification
- **Push** (Phase 0/1): `docker push forgejo.viktorbarzin.me/viktor/<name>` visible in Forgejo Web UI under viktor/.
- **Pull** (Phase 0): `crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1` succeeds with auto-synced Secret.
- **Dual-push** (Phase 1): every Woodpecker pipeline run pushes to BOTH endpoints — confirmed via HEAD checks on `<reg>:<sha>` for both.
- **Bake** (Phase 2): existing daily Forgejo `/api/healthz` external monitor stays green; integrity probe stays green; no `ContainerNearOOM` for forgejo pod.
- **Cutover** (Phase 3): `kubectl rollout status deploy/<svc> -n <ns>` succeeds. `kubectl describe pod` shows the image was pulled from `forgejo.viktorbarzin.me`.
- **Decommission** (Phase 4): `docker ps` on registry VM no longer shows `registry-private`. Brand-new namespace gets the Secret with only the Forgejo `auths` entry. Pull still works.

View file

@ -0,0 +1,163 @@
# Runbook: Forgejo OCI registry — initial setup
Last updated: 2026-05-07
This runbook covers the **one-time** bootstrap of Forgejo's container
registry, executed during Phase 0 of the registry consolidation plan
(`docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md`).
After this runbook is complete, the Forgejo OCI registry at
`forgejo.viktorbarzin.me` accepts pushes from CI and pulls from the
cluster, with retention and integrity monitoring in place.
## Order of operations
The Terraform stacks reference Vault keys that don't exist on a fresh
cluster. Create the keys **before** running `scripts/tg apply`.
1. Apply the resource bumps (memory, PVC, ingress body size,
packages env vars) — these don't depend on the new Vault keys.
2. Create the service-account users + PATs in Forgejo.
3. Push the PATs to Vault.
4. Apply the rest of Phase 0 (registry-credentials extension,
monitoring probe, retention CronJob).
### Step 1 — apply Forgejo deployment bumps
```bash
cd infra/stacks/forgejo
scripts/tg apply
```
Wait for the new pod to come up at the bumped 1Gi memory request and
the resized 15Gi PVC. Verify packages are enabled:
```bash
kubectl exec -n forgejo deploy/forgejo -- forgejo manager flush-queues
kubectl exec -n forgejo deploy/forgejo -- env | grep PACKAGES
```
### Step 2 — create service-account users
`forgejo admin user create` is idempotent only with
`--must-change-password=false`. Re-running it on an existing user
errors out — that's fine; skip on rerun.
```bash
# cluster-puller — read:package PAT for in-cluster pulls.
kubectl exec -n forgejo deploy/forgejo -- \
forgejo admin user create \
--username cluster-puller \
--email cluster-puller@viktorbarzin.me \
--password "$(openssl rand -base64 24)" \
--must-change-password=false
# ci-pusher — write:package PAT for CI dual-push, also reused as the
# cleanup CronJob credential (write:package includes delete).
kubectl exec -n forgejo deploy/forgejo -- \
forgejo admin user create \
--username ci-pusher \
--email ci-pusher@viktorbarzin.me \
--password "$(openssl rand -base64 24)" \
--must-change-password=false
```
The user passwords are throwaway — we only ever auth via PAT. Forgejo
admin can reset them at any time from the Web UI.
### Step 3 — generate the PATs
PATs **must** be generated through the Web UI logged in as the
respective user (the CLI doesn't expose token creation). To log in
without OAuth (registration is disabled for everyone except `viktor`,
the admin), use the per-user temporary password from step 2.
For each of `cluster-puller` and `ci-pusher`:
1. Sign out of `viktor`.
2. Go to `https://forgejo.viktorbarzin.me/user/login` and sign in
with the throwaway password.
3. Settings → Applications → Generate new token.
4. Name: `cluster-pull` / `ci-push`. **Expiration: never.**
5. Scopes:
- `cluster-puller`: `read:package`
- `ci-pusher`: `write:package` (covers read+write+delete)
6. Save the token shown on the next page — it is **not** displayed again.
For the cleanup CronJob, generate a third PAT on `ci-pusher`:
7. Repeat steps 4-6 with name `cleanup`, scope `write:package`.
### Step 4 — push PATs to Vault
```bash
vault login -method=oidc
# Read-only, used by the cluster-wide registry-credentials Secret and
# by the Forgejo integrity probe.
vault kv patch secret/viktor \
forgejo_pull_token=<paste cluster-puller PAT>
# Write+delete, used by the retention CronJob inside Forgejo's
# namespace.
vault kv patch secret/viktor \
forgejo_cleanup_token=<paste ci-pusher cleanup PAT>
# Write, propagated by vault-woodpecker-sync to all Woodpecker repos.
vault kv patch secret/ci/global \
forgejo_user=ci-pusher \
forgejo_push_token=<paste ci-pusher push PAT>
```
### Step 5 — apply the rest of Phase 0
```bash
# Registry credential Secret (now reads forgejo_pull_token).
cd infra/stacks/kyverno && scripts/tg apply
# Monitoring probe + retention CronJob.
cd infra/stacks/monitoring && scripts/tg apply
cd infra/stacks/forgejo && scripts/tg apply
# Containerd hosts.toml on each existing k8s node — VM cloud-init
# only fires on first boot.
infra/scripts/setup-forgejo-containerd-mirror.sh
```
## Verification
```bash
# Login from a workstation with docker.
echo "<ci-pusher PAT>" | docker login forgejo.viktorbarzin.me -u ci-pusher --password-stdin
# Push a smoketest image.
docker pull alpine:3.20
docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
docker push forgejo.viktorbarzin.me/viktor/smoketest:1
# Pull from a k8s node.
ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1
# Confirm the cluster-wide Secret was synced into a fresh namespace.
kubectl create namespace forgejo-smoketest
kubectl get secret -n forgejo-smoketest registry-credentials \
-o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys'
# Expect: ["10.0.20.10:5050", "forgejo.viktorbarzin.me",
# "registry.viktorbarzin.me", "registry.viktorbarzin.me:5050"]
kubectl delete namespace forgejo-smoketest
# Delete the smoketest package via API.
curl -X DELETE -H "Authorization: token <ci-pusher cleanup PAT>" \
https://forgejo.viktorbarzin.me/api/v1/packages/viktor/container/smoketest/1
```
## When to revisit
- **PAT rotation**: PATs created here have no expiry by design. If a
PAT leaks, regenerate via the Web UI and `vault kv patch` the new
value into the same key — the next `terragrunt apply` will sync it
to all consumers within minutes (Kyverno ClusterPolicy clones the
Secret, vault-woodpecker-sync runs every 6h).
- **New service account**: if a future workload needs different
scopes, add a parallel user/PAT here rather than expanding existing
PAT scope. Principle of least privilege.