forgejo retention: revert to DRY_RUN — first live run orphaned OCI indexes [ci skip]

The keep-set (newest 10 versions + latest + *cache* tags) treats
multi-arch/attestation index CHILDREN — separate untagged sha256
versions — as deletable: for images not rebuilt recently they sort
outside the newest-10 window and were pruned while their kept parent
index survived. kms-website :latest and :dfc83fb children 404'd
(RegistryManifestIntegrityFailure, caught by forgejo-integrity-probe
within hours; deployed tag a794d1a unaffected).

Healed: :latest re-pointed at the intact a794d1a index (also the
newest commit), corrupt :dfc83fb version deleted, probe re-run clean
(0 failures / 22 repos / 63 tags / 59 indexes). DRY_RUN=true applied
live. Re-enable only with a container-aware keep-set — options in the
post-mortem.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-10 09:22:47 +00:00
parent e49c91e60c
commit a1b7b0ca53
3 changed files with 84 additions and 7 deletions

View file

@ -22,12 +22,22 @@ data "vault_kv_secret_v2" "forgejo_viktor" {
}
locals {
# Activated 2026-06-09 after verifying a dry-run delete list against all
# running viktor/* images cluster-wide: 0 running images on the delete set
# (would prune 317 stale versions, keeping newest 10 + latest + cache tags).
# Live retention is what keeps the registry PVC from filling on the HDD
# (we deliberately did NOT move Forgejo to SSD see beads code-oflt).
forgejo_cleanup_dry_run = false
# REVERTED TO DRY-RUN 2026-06-10: the first live runs ORPHANED OCI indexes.
# The keep-set is computed over package VERSIONS (newest 10 + tag "latest"
# + *cache* tags), but multi-arch/attestation index CHILDREN are separate
# UNTAGGED sha256 versions for images not rebuilt recently they fall
# outside the newest-10 window and get deleted while their parent index is
# kept. Result: index children 404 (viktor/kms-website :latest + :dfc83fb,
# caught by forgejo-integrity-probe / RegistryManifestIntegrityFailure,
# 2026-06-10). Do NOT re-enable until the script either (a) resolves each
# kept index's child digests via the registry API and adds them to the
# keep set, or (b) skips untagged sha256 versions entirely, or (c) is
# replaced by Forgejo's native per-owner package cleanup rules (container-
# aware). The 2026-06-09 "0 running images on the delete set" verification
# checked running PODS, not index child references insufficient.
# History: activated 2026-06-09 (would prune 317 stale versions); registry
# PVC pressure concern remains (HDD, no SSD move see beads code-oflt).
forgejo_cleanup_dry_run = true
}
resource "kubernetes_config_map" "forgejo_cleanup_script" {