forgejo: survive CI-build registry-push storms (mem 3Gi + working retention)
Heavy in-cluster builds (e.g. tripit buildkit) were taking Forgejo down via two vectors. Fixes both, without moving Forgejo off the sdc HDD (code-oflt deferred): - Memory 1Gi -> 3Gi (requests=limits). Forgejo was OOMKilled (exit 137) under registry-push load; VPA upperBound ~1.5Gi was suppressed by the 1Gi cap it kept OOMing against. Size for the push spike. - Activate registry retention (DRY_RUN false). Verified the delete list against all running viktor/* images first: 0 running images affected. Pruned 478 -> 161 package versions; PVC was at its 50Gi autoresize ceiling. - FIX broken retention auth: the cleanup PAT was ci-pusher's, but Forgejo scopes container packages per-user, so DELETE on viktor/* returned 403 (the dry-run only did GETs, hiding it). Repointed forgejo_cleanup_token to viktor's write:package PAT. Retention had never actually worked. - Protect buildkit *cache* tags from retention (cleanup.sh keep-set) so the gentler-builds layer cache survives daily pruning. [ci skip] — already applied via scripts/tg. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
1e6e5c4ee9
commit
e0452611b5
4 changed files with 39 additions and 14 deletions
|
|
@ -4,9 +4,17 @@
|
|||
# it's per-user runtime state inside the Forgejo DB. Driving retention from
|
||||
# a CronJob hitting the public API keeps the policy versioned in this repo.
|
||||
#
|
||||
# Auth: a write:package PAT belonging to ci-pusher (same user that pushes
|
||||
# from CI). DELETE on packages requires write:package scope. PAT lives in
|
||||
# Vault at secret/viktor/forgejo_cleanup_token.
|
||||
# Auth: a write:package PAT belonging to VIKTOR (the package OWNER). PAT
|
||||
# lives in Vault at secret/viktor/forgejo_cleanup_token.
|
||||
#
|
||||
# CORRECTION 2026-06-09: this previously said the PAT belonged to ci-pusher.
|
||||
# That was wrong and silently broke retention — Forgejo container packages
|
||||
# are scoped per-user, so ci-pusher gets HTTP 403 on DELETE of viktor/*
|
||||
# (the dry-run only does GETs, which DO work, so the 403 stayed hidden until
|
||||
# the first live run). DELETE requires a write:package PAT owned by viktor.
|
||||
# forgejo_cleanup_token is therefore set to viktor's write:package PAT (today
|
||||
# the same value as secret/ci/global/forgejo_push_token). IF that push token
|
||||
# is ever regenerated, re-mirror it here or retention silently 403s again.
|
||||
|
||||
data "vault_kv_secret_v2" "forgejo_viktor" {
|
||||
mount = "secret"
|
||||
|
|
@ -14,8 +22,12 @@ data "vault_kv_secret_v2" "forgejo_viktor" {
|
|||
}
|
||||
|
||||
locals {
|
||||
# Flip to false after first 7 days of dry-run logs look correct.
|
||||
forgejo_cleanup_dry_run = true
|
||||
# Activated 2026-06-09 after verifying a dry-run delete list against all
|
||||
# running viktor/* images cluster-wide: 0 running images on the delete set
|
||||
# (would prune 317 stale versions, keeping newest 10 + latest + cache tags).
|
||||
# Live retention is what keeps the registry PVC from filling on the HDD
|
||||
# (we deliberately did NOT move Forgejo to SSD — see beads code-oflt).
|
||||
forgejo_cleanup_dry_run = false
|
||||
}
|
||||
|
||||
resource "kubernetes_config_map" "forgejo_cleanup_script" {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue