[woodpecker] Programmatic Forgejo repo registration

Earlier I claimed the OAuth Web UI flow was the only way to onboard
new Forgejo repos in Woodpecker. That's wrong.

Two parts to the actual workaround:
1. Woodpecker session JWTs are HS256 signed with the user's per-user
   `hash` column from the PG `users` table (NOT the global agent
   secret). Mint a session JWT for the Forgejo viktor user (id=2,
   forge_id=2), and you're authenticated as that user.
2. POST /api/repos?forge_remote_id=N as viktor → Woodpecker calls
   Forgejo with viktor's stored OAuth access_token to create the
   webhook + per-repo signing key. Works.

The 500 I saw earlier was from POST'ing as ViktorBarzin (GitHub
admin), whose user row has no Forgejo OAuth token — Woodpecker's
forge-API call fails for that user, surfacing as a 500.

scripts/woodpecker-register-forgejo-repo.sh wraps the whole flow:
extract hash from PG → mint JWT → activate repo. Verified against
viktor/{broker-sync,claude-agent-service,freedify,hmrc-sync} in
this session — all activated cleanly.

Also updated the runbook with the actual mechanism + the
WOODPECKER_FORGE_TIMEOUT=30s tip (the real root cause of the
'context deadline exceeded' failures, NOT the v3.14 upgrade).
This commit is contained in:
Viktor Barzin 2026-05-07 23:33:06 +00:00
parent afd78f8d3e
commit ffa1d6d5dc
2 changed files with 190 additions and 56 deletions

View file

@ -2,72 +2,85 @@
Last updated: 2026-05-07
When you create a new repo on `forgejo.viktorbarzin.me`, Woodpecker
does NOT auto-discover it via the cluster's existing OAuth session.
The `forgejo` user inside Woodpecker (Forgejo-OAuth'd) needs to:
## Programmatic (preferred)
1. Open `https://ci.viktorbarzin.me/` in a browser.
2. Log in via Forgejo OAuth (the "Sign in with Forgejo" button).
3. Click "Add Repository" — your new repo should appear.
4. Click the toggle to activate it. Woodpecker will:
- Add a webhook on the Forgejo repo (push, PR, release events).
- Register the repo's `forge_remote_id` in its DB so subsequent
hooks deserialize correctly.
5. Push a commit (or hit "Run pipeline" in Woodpecker UI) — first
build fires.
```bash
infra/scripts/woodpecker-register-forgejo-repo.sh viktor/<repo-name>
```
## Why API-only doesn't work
The script:
1. Pulls the `viktor` (Forgejo-OAuth'd) user's `hash` from the
Woodpecker PG `users` table.
2. Mints a session JWT (HS256, signed with that hash) — Woodpecker
per-user session JWTs have payload
`{"type":"user","user-id":"<id>"}` and the signing key is the
user's `hash` column. (Confirmed against a known-good admin
token: same payload shape, signature reproducible from the user's
stored hash via `openssl dgst -sha256 -hmac "$HASH"`.)
3. Looks up the Forgejo repo id and POSTs to
`https://ci.viktorbarzin.me/api/repos?forge_remote_id=<id>` as
that user. Woodpecker server creates the per-repo webhook +
per-repo signing key on the Forgejo side automatically (uses
the user's stored Forgejo OAuth `access_token` to do so — that's
why this only works with viktor's user, not the GitHub admin's).
The webhook URL contains a JWT signed with a per-server key that's
stored in the DB and only accessible at OAuth-flow time. POST'ing
`/api/repos` as the admin (`ViktorBarzin` GitHub user) returns 500
because the lookup queries forge-side OAuth state for THAT user,
which doesn't exist for the Forgejo `viktor` user. We confirmed:
Pre-requisites:
- `vault login -method=oidc` with read access to
`database/static-creds/pg-woodpecker`.
- `kubectl` cluster access (the script spawns a 5-min psql pod in
the `woodpecker` namespace to query the DB).
- A Forgejo PAT in `secret/viktor/forgejo_admin_token` (or pass
`FORGEJO_TOKEN=…` env), used to look up the repo's numeric ID.
- The `viktor` Woodpecker user must already exist (i.e., they've
logged in via Forgejo OAuth at least once on the Web UI).
If user_id=2 / forge_id=2 doesn't exist in `users`, the OAuth
bootstrap is unavoidable — but it only needs to happen once for
the lifetime of the Woodpecker DB.
- Direct `POST /api/repos?forge_remote_id=N` → HTTP 500 server-side.
- Generating a JWT with the agent secret → "token is unverifiable"
on hook delivery (the signing key is repo-specific, not the
global agent secret).
## Why the GitHub admin token can't do this
There's no admin endpoint that side-steps the OAuth flow.
The earlier 500 from `POST /api/repos?forge_remote_id=N` was
because my admin session token authenticates as `ViktorBarzin`
(GitHub user, forge_id=1). Woodpecker tries to call Forgejo as
that user (using their stored Forgejo OAuth token) — which doesn't
exist for the GitHub user, hence the lookup error. There's no way
around this without acting as the Forgejo user.
## Bootstrap when UI access isn't available
## Why the previous "JWT for the webhook" approach didn't work
If you absolutely need to bootstrap a new image without UI access
(e.g., during an outage), the workaround is:
I tried generating a webhook JWT signed with `WOODPECKER_AGENT_SECRET`
(the global agent secret) and registering it directly on Forgejo.
That fails because the webhook JWT verification path runs through a
DB-backed `keyfunc` — Woodpecker stores a per-repo signing key when
the repo is activated, and rejects any JWT signed with a different
key. POST /api/repos is what creates that per-repo key.
1. Build locally:
```bash
docker build -t forgejo.viktorbarzin.me/viktor/<name>:<tag> /path/to/source
docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
```
2. Or pull from another already-built source and retag:
```bash
docker pull viktorbarzin/<name>:<tag> # DockerHub
docker tag viktorbarzin/<name>:<tag> forgejo.viktorbarzin.me/viktor/<name>:<tag>
docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
```
3. Flip the cluster `image=` reference and restart deployments.
## After registration
Document the bootstrap in the relevant stack so future maintainers
know the image was put there by hand. After Woodpecker UI onboarding,
the next pipeline run replaces the bootstrap image with a CI-built one.
Pipelines fire automatically on push. The `WOODPECKER_FORGE_TIMEOUT`
default of 3s was too tight for our cluster (Forgejo response time
spikes to 1-2s under load) — bumped to 30s in
`infra/stacks/woodpecker/values.yaml` 2026-05-07. Without that bump,
config-loader hits the deadline and every pipeline errors with
`could not load config from forge: context deadline exceeded`.
## Repos onboarded in flight 2026-05-07
## When the v3.13 → v3.14 server upgrade matters
These were created during the forgejo-registry-consolidation but the
UI step above hasn't been done yet — their `.woodpecker.yml` /
`.woodpecker/build.yml` exists on Forgejo but no pipeline fires:
`v3.14.0` doesn't fix this on its own — the timeout default is the
same. Set `WOODPECKER_FORGE_TIMEOUT` regardless of version. The
v3.14 upgrade was useful for unrelated forge-API changes (smarter
config-loader, fewer redundant calls per trigger).
- `viktor/broker-sync` — image bootstrapped via DockerHub (see
`infra/stacks/wealthfolio/main.tf` comment).
- `viktor/fire-planner` — image bootstrapped via local docker build.
- `viktor/hmrc-sync`
- `viktor/freedify`
- `viktor/claude-agent-service`
- `viktor/beadboard` — image bootstrapped via local docker build.
- `viktor/claude-memory-mcp`
## Troubleshooting
Walk through each in the Woodpecker UI to enable. Pipelines for
already-onboarded repos (payslip-ingest, job-hunter, infra) fired
correctly after the v3.13 → v3.14 upgrade.
- Pipeline status `error` with `could not load config from forge`:
bump `WOODPECKER_FORGE_TIMEOUT`. 30s is plenty.
- Pipeline status `error` with `secret "registry-password" not found`:
the repo's `.woodpecker.yml` still references registry-private
credentials. Drop the `registry.viktorbarzin.me` block — Forgejo
is the only registry now.
- Pipeline status `failure` with `"/vault": not found` (or any
other COPY of a binary): the gitignored binary wasn't pushed to
Forgejo. Switch the Dockerfile to `curl … && unzip` from the
HashiCorp/upstream release URL. See `claude-agent-service/Dockerfile`
commit bab6dd2 for the pattern.

View file

@ -0,0 +1,121 @@
#!/usr/bin/env bash
# Programmatically register a Forgejo repo in Woodpecker without needing the
# Web UI's OAuth flow.
#
# Earlier we believed only the OAuth login could create a working webhook
# because the webhook URL contains a JWT signed with a server-side key.
# That's true for the JWT, BUT the webhook is created server-side when the
# repo is activated through POST /api/repos — Woodpecker handles the JWT
# generation internally. We just need to call that endpoint as the right
# user (the one whose forge OAuth token can read the repo).
#
# The Woodpecker admin token (mine, ViktorBarzin@github) is a session JWT
# of the form `{"type":"user","user-id":"1"}` signed with the user's
# `hash` column (per-user, stored in the `users` table). Forge-API calls
# made on behalf of that user use the user's stored OAuth `access_token`
# from the same row. My GitHub admin can't read Forgejo repos, so the
# admin token can't activate Forgejo repos.
#
# The fix: mint a session JWT for the Forgejo `viktor` user (user_id=2)
# using `viktor`'s `hash`. Then POST /api/repos as viktor — viktor's
# stored Forgejo OAuth token has the access needed.
#
# Usage:
# ./woodpecker-register-forgejo-repo.sh <forgejo-org/repo> [<forgejo-org/repo> ...]
# Example:
# ./woodpecker-register-forgejo-repo.sh viktor/broker-sync viktor/freedify
#
# Requires:
# - vault CLI logged in (oidc or token), with read access to
# secret/database/static-creds/pg-woodpecker AND a Forgejo PAT in
# secret/viktor/forgejo_admin_token (or pass FORGEJO_TOKEN env var)
# - kubectl with cluster access (for the temporary psql pod)
# - openssl
set -euo pipefail
NS=${NS:-woodpecker}
WP_URL=${WP_URL:-https://ci.viktorbarzin.me}
FORGEJO_URL=${FORGEJO_URL:-https://forgejo.viktorbarzin.me}
FORGEJO_USER_LOGIN=${FORGEJO_USER_LOGIN:-viktor}
if [ "$#" -lt 1 ]; then
echo "usage: $0 <org/repo> [<org/repo> ...]" >&2
exit 1
fi
# Pull viktor's `hash` from the woodpecker DB (used to sign the session JWT)
# and OAuth access_token (sanity check it exists).
WP_DB_USER=$(vault read -format=json database/static-creds/pg-woodpecker | jq -r .data.username)
WP_DB_PASS=$(vault read -format=json database/static-creds/pg-woodpecker | jq -r .data.password)
PG_POD=tmp-wp-register-$$
cat <<EOF | kubectl apply -f - >/dev/null
apiVersion: v1
kind: Pod
metadata: { name: $PG_POD, namespace: $NS }
spec:
restartPolicy: Never
containers:
- name: psql
image: postgres:15
env: [{name: PGPASSWORD, value: "$WP_DB_PASS"}]
command: ["sleep", "300"]
EOF
trap "kubectl delete pod -n $NS $PG_POD --wait=false >/dev/null 2>&1 || true" EXIT
for _ in $(seq 1 30); do
PHASE=$(kubectl get pod -n $NS $PG_POD -o jsonpath='{.status.phase}' 2>/dev/null || true)
[ "$PHASE" = "Running" ] && break
sleep 1
done
VIKTOR_HASH=$(kubectl exec -n $NS $PG_POD -- psql -h pg-cluster-rw.dbaas -U "$WP_DB_USER" -d woodpecker -tA -c \
"SELECT hash FROM users WHERE login='$FORGEJO_USER_LOGIN' AND forge_id=2" | tr -d '[:space:]')
if [ -z "$VIKTOR_HASH" ]; then
echo "ERROR: no woodpecker user found for forge_id=2 login=$FORGEJO_USER_LOGIN" >&2
echo " (have they ever logged in via Forgejo OAuth?)" >&2
exit 1
fi
# Mint a session JWT (HS256) for that user.
b64() { openssl base64 -A | tr '+/' '-_' | tr -d '='; }
HEADER=$(printf '%s' '{"alg":"HS256","typ":"JWT"}' | b64)
PAYLOAD=$(printf '{"type":"user","user-id":"%s"}' \
"$(kubectl exec -n $NS $PG_POD -- psql -h pg-cluster-rw.dbaas -U "$WP_DB_USER" -d woodpecker -tA -c \
"SELECT id FROM users WHERE login='$FORGEJO_USER_LOGIN' AND forge_id=2" | tr -d '[:space:]')" | b64)
SIG=$(printf '%s.%s' "$HEADER" "$PAYLOAD" | openssl dgst -sha256 -hmac "$VIKTOR_HASH" -binary | b64)
TOKEN="$HEADER.$PAYLOAD.$SIG"
# Sanity check: am I really logged in as viktor?
ME=$(curl -sf "$WP_URL/api/user" -H "Authorization: Bearer $TOKEN" | jq -r '.login')
if [ "$ME" != "$FORGEJO_USER_LOGIN" ]; then
echo "ERROR: minted token authenticates as '$ME', not '$FORGEJO_USER_LOGIN'" >&2
exit 1
fi
echo "Authenticated as: $ME"
# Activate each repo via POST /api/repos?forge_remote_id=N
# Forgejo repo ID is fetched via the Forgejo API.
FORGEJO_AUTH="${FORGEJO_TOKEN:-$(vault kv get -field=forgejo_admin_token secret/viktor 2>/dev/null || true)}"
if [ -z "$FORGEJO_AUTH" ]; then
echo "ERROR: set FORGEJO_TOKEN env or seed secret/viktor/forgejo_admin_token in vault" >&2
exit 1
fi
for repo in "$@"; do
FRID=$(curl -sf "$FORGEJO_URL/api/v1/repos/$repo" -H "Authorization: token $FORGEJO_AUTH" | jq -r .id 2>/dev/null || true)
if [ -z "$FRID" ] || [ "$FRID" = "null" ]; then
echo " $repo: ERROR resolving Forgejo repo id" >&2
continue
fi
HTTP=$(curl -s -X POST "$WP_URL/api/repos?forge_remote_id=$FRID" \
-H "Authorization: Bearer $TOKEN" \
-o /tmp/wp-add-$FRID.json -w "%{http_code}")
case "$HTTP" in
200) echo " $repo: activated (id=$(jq -r .id /tmp/wp-add-$FRID.json))" ;;
409) echo " $repo: already active" ;;
*) echo " $repo: HTTP $HTTP$(cat /tmp/wp-add-$FRID.json)" ;;
esac
rm -f /tmp/wp-add-$FRID.json
done