forgejo: survive CI-build registry-push storms (mem 3Gi + working retention)

Heavy in-cluster builds (e.g. tripit buildkit) were taking Forgejo down via
two vectors. Fixes both, without moving Forgejo off the sdc HDD (code-oflt
deferred):

- Memory 1Gi -> 3Gi (requests=limits). Forgejo was OOMKilled (exit 137) under
  registry-push load; VPA upperBound ~1.5Gi was suppressed by the 1Gi cap it
  kept OOMing against. Size for the push spike.

- Activate registry retention (DRY_RUN false). Verified the delete list
  against all running viktor/* images first: 0 running images affected.
  Pruned 478 -> 161 package versions; PVC was at its 50Gi autoresize ceiling.

- FIX broken retention auth: the cleanup PAT was ci-pusher's, but Forgejo
  scopes container packages per-user, so DELETE on viktor/* returned 403 (the
  dry-run only did GETs, hiding it). Repointed forgejo_cleanup_token to
  viktor's write:package PAT. Retention had never actually worked.

- Protect buildkit *cache* tags from retention (cleanup.sh keep-set) so the
  gentler-builds layer cache survives daily pruning.

[ci skip] — already applied via scripts/tg.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-09 14:23:33 +00:00
parent 1e6e5c4ee9
commit e0452611b5
4 changed files with 39 additions and 14 deletions

View file

@ -2,8 +2,13 @@
# Forgejo container-package retention.
#
# For each container package owned by ${FORGEJO_OWNER}, keep newest
# ${KEEP_LAST_N} versions + always keep tag "latest". Deletes the rest via
# ${KEEP_LAST_N} versions + always keep tag "latest" + always keep any
# buildkit cache tag (matches "cache", e.g. tripit:cache — these back
# --cache-from/--cache-to and must survive retention or every build is a
# cold rebuild). Deletes the rest via
# DELETE /api/v1/packages/{owner}/container/{name}/{version}.
# (Note: an 8-char SHA tag is pure hex and cannot contain "cache" — 'h' is
# not a hex digit — so the cache match never catches a real image tag.)
#
# DRY_RUN=true logs what would be deleted but issues no DELETE calls.
#
@ -72,9 +77,11 @@ for NAME in $NAMES; do
N_VERSIONS=$(jq 'length' "$TMPDIR/$NAME.json")
echo "[$NAME] $N_VERSIONS version(s)"
# Build the keep set: top $KEEP + anything tagged 'latest'.
# Build the keep set: top $KEEP + always 'latest' + any buildkit cache tag.
jq -r --argjson keep "$KEEP" '
[.[0:$keep][].version] + [.[] | select(.version == "latest") | .version]
[.[0:$keep][].version]
+ [.[] | select(.version == "latest") | .version]
+ [.[] | select(.version | test("cache"; "i")) | .version]
| unique
| .[]
' "$TMPDIR/$NAME.json" > "$TMPDIR/$NAME.keep"