infra/docs/runbooks/forgejo-registry-setup.md
Viktor Barzin f793a5f50b [forgejo] Phase 0 of registry consolidation: prepare Forgejo OCI registry
Stage 1 of moving private images off the registry:2 container at
registry.viktorbarzin.me:5050 (which has hit distribution#3324 corruption
3x in 3 weeks) onto Forgejo's built-in OCI registry. No cutover risk —
pods still pull from the existing registry until Phase 3.

What changes:
* Forgejo deployment: memory 384Mi→1Gi, PVC 5Gi→15Gi (cap 50Gi).
  Explicit FORGEJO__packages__ENABLED + CHUNKED_UPLOAD_PATH (defensive,
  v11 default-on).
* ingress_factory: max_body_size variable was declared but never wired
  in after the nginx→Traefik migration. Now creates a per-ingress
  Buffering middleware when set; default null = no limit (preserves
  existing behavior). Forgejo ingress sets max_body_size=5g to allow
  multi-GB layer pushes.
* Cluster-wide registry-credentials Secret: 4th auths entry for
  forgejo.viktorbarzin.me, populated from Vault secret/viktor/
  forgejo_pull_token (cluster-puller PAT, read:package). Existing
  Kyverno ClusterPolicy syncs cluster-wide — no policy edits.
* Containerd hosts.toml redirect: forgejo.viktorbarzin.me → in-cluster
  Traefik LB 10.0.20.200 (avoids hairpin NAT for in-cluster pulls).
  Cloud-init for new VMs + scripts/setup-forgejo-containerd-mirror.sh
  for existing nodes.
* Forgejo retention CronJob (0 4 * * *): keeps newest 10 versions per
  package + always :latest. First 7 days dry-run (DRY_RUN=true);
  flip the local in cleanup.tf after log review.
* Forgejo integrity probe CronJob (*/15): same algorithm as the
  existing registry-integrity-probe. Existing Prometheus alerts
  (RegistryManifestIntegrityFailure et al) made instance-aware so
  they cover both registries during the bake.
* Docs: design+plan in docs/plans/, setup runbook in docs/runbooks/.

Operational note — the apply order is non-trivial because the new
Vault keys (forgejo_pull_token, forgejo_cleanup_token,
secret/ci/global/forgejo_*) must exist BEFORE terragrunt apply in the
kyverno + monitoring + forgejo stacks. The setup runbook documents
the bootstrap sequence.

Phase 1 (per-project dual-push pipelines) follows in subsequent
commits. Bake clock starts when the last project goes dual-push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00

5.6 KiB

Runbook: Forgejo OCI registry — initial setup

Last updated: 2026-05-07

This runbook covers the one-time bootstrap of Forgejo's container registry, executed during Phase 0 of the registry consolidation plan (docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md).

After this runbook is complete, the Forgejo OCI registry at forgejo.viktorbarzin.me accepts pushes from CI and pulls from the cluster, with retention and integrity monitoring in place.

Order of operations

The Terraform stacks reference Vault keys that don't exist on a fresh cluster. Create the keys before running scripts/tg apply.

  1. Apply the resource bumps (memory, PVC, ingress body size, packages env vars) — these don't depend on the new Vault keys.
  2. Create the service-account users + PATs in Forgejo.
  3. Push the PATs to Vault.
  4. Apply the rest of Phase 0 (registry-credentials extension, monitoring probe, retention CronJob).

Step 1 — apply Forgejo deployment bumps

cd infra/stacks/forgejo
scripts/tg apply

Wait for the new pod to come up at the bumped 1Gi memory request and the resized 15Gi PVC. Verify packages are enabled:

kubectl exec -n forgejo deploy/forgejo -- forgejo manager flush-queues
kubectl exec -n forgejo deploy/forgejo -- env | grep PACKAGES

Step 2 — create service-account users

forgejo admin user create is idempotent only with --must-change-password=false. Re-running it on an existing user errors out — that's fine; skip on rerun.

# cluster-puller — read:package PAT for in-cluster pulls.
kubectl exec -n forgejo deploy/forgejo -- \
  forgejo admin user create \
  --username cluster-puller \
  --email cluster-puller@viktorbarzin.me \
  --password "$(openssl rand -base64 24)" \
  --must-change-password=false

# ci-pusher — write:package PAT for CI dual-push, also reused as the
# cleanup CronJob credential (write:package includes delete).
kubectl exec -n forgejo deploy/forgejo -- \
  forgejo admin user create \
  --username ci-pusher \
  --email ci-pusher@viktorbarzin.me \
  --password "$(openssl rand -base64 24)" \
  --must-change-password=false

The user passwords are throwaway — we only ever auth via PAT. Forgejo admin can reset them at any time from the Web UI.

Step 3 — generate the PATs

PATs must be generated through the Web UI logged in as the respective user (the CLI doesn't expose token creation). To log in without OAuth (registration is disabled for everyone except viktor, the admin), use the per-user temporary password from step 2.

For each of cluster-puller and ci-pusher:

  1. Sign out of viktor.
  2. Go to https://forgejo.viktorbarzin.me/user/login and sign in with the throwaway password.
  3. Settings → Applications → Generate new token.
  4. Name: cluster-pull / ci-push. Expiration: never.
  5. Scopes:
    • cluster-puller: read:package
    • ci-pusher: write:package (covers read+write+delete)
  6. Save the token shown on the next page — it is not displayed again.

For the cleanup CronJob, generate a third PAT on ci-pusher:

  1. Repeat steps 4-6 with name cleanup, scope write:package.

Step 4 — push PATs to Vault

vault login -method=oidc

# Read-only, used by the cluster-wide registry-credentials Secret and
# by the Forgejo integrity probe.
vault kv patch secret/viktor \
  forgejo_pull_token=<paste cluster-puller PAT>

# Write+delete, used by the retention CronJob inside Forgejo's
# namespace.
vault kv patch secret/viktor \
  forgejo_cleanup_token=<paste ci-pusher cleanup PAT>

# Write, propagated by vault-woodpecker-sync to all Woodpecker repos.
vault kv patch secret/ci/global \
  forgejo_user=ci-pusher \
  forgejo_push_token=<paste ci-pusher push PAT>

Step 5 — apply the rest of Phase 0

# Registry credential Secret (now reads forgejo_pull_token).
cd infra/stacks/kyverno && scripts/tg apply

# Monitoring probe + retention CronJob.
cd infra/stacks/monitoring && scripts/tg apply
cd infra/stacks/forgejo && scripts/tg apply

# Containerd hosts.toml on each existing k8s node — VM cloud-init
# only fires on first boot.
infra/scripts/setup-forgejo-containerd-mirror.sh

Verification

# Login from a workstation with docker.
echo "<ci-pusher PAT>" | docker login forgejo.viktorbarzin.me -u ci-pusher --password-stdin

# Push a smoketest image.
docker pull alpine:3.20
docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
docker push forgejo.viktorbarzin.me/viktor/smoketest:1

# Pull from a k8s node.
ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1

# Confirm the cluster-wide Secret was synced into a fresh namespace.
kubectl create namespace forgejo-smoketest
kubectl get secret -n forgejo-smoketest registry-credentials \
  -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys'
# Expect: ["10.0.20.10:5050", "forgejo.viktorbarzin.me",
#         "registry.viktorbarzin.me", "registry.viktorbarzin.me:5050"]
kubectl delete namespace forgejo-smoketest

# Delete the smoketest package via API.
curl -X DELETE -H "Authorization: token <ci-pusher cleanup PAT>" \
  https://forgejo.viktorbarzin.me/api/v1/packages/viktor/container/smoketest/1

When to revisit

  • PAT rotation: PATs created here have no expiry by design. If a PAT leaks, regenerate via the Web UI and vault kv patch the new value into the same key — the next terragrunt apply will sync it to all consumers within minutes (Kyverno ClusterPolicy clones the Secret, vault-woodpecker-sync runs every 6h).
  • New service account: if a future workload needs different scopes, add a parallel user/PAT here rather than expanding existing PAT scope. Principle of least privilege.