forgejo: survive CI-build registry-push storms (mem 3Gi + working retention)

Heavy in-cluster builds (e.g. tripit buildkit) were taking Forgejo down via two vectors. Fixes both, without moving Forgejo off the sdc HDD (code-oflt deferred): - Memory 1Gi -> 3Gi (requests=limits). Forgejo was OOMKilled (exit 137) under registry-push load; VPA upperBound ~1.5Gi was suppressed by the 1Gi cap it kept OOMing against. Size for the push spike. - Activate registry retention (DRY_RUN false). Verified the delete list against all running viktor/* images first: 0 running images affected. Pruned 478 -> 161 package versions; PVC was at its 50Gi autoresize ceiling. - FIX broken retention auth: the cleanup PAT was ci-pusher's, but Forgejo scopes container packages per-user, so DELETE on viktor/* returned 403 (the dry-run only did GETs, hiding it). Repointed forgejo_cleanup_token to viktor's write:package PAT. Retention had never actually worked. - Protect buildkit *cache* tags from retention (cleanup.sh keep-set) so the gentler-builds layer cache survives daily pruning. [ci skip] — already applied via scripts/tg. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 14:23:33 +00:00 · 2026-06-09 14:23:33 +00:00 · c5bda77731
commit c5bda77731
parent fd0f4a0365
4 changed files with 39 additions and 14 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -38,7 +38,7 @@ Violations cause state drift, which causes future applies to break or silently r
  - **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. Smoke-test target: `echo.viktorbarzin.me` (auth=public, header-reflecting backend).
 - **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<backend>.<ns>.svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct, blog has `/net-diag.sh` direct), declare a second `ingress_factory` with `ingress_path = ["/<path>"]` pointing at the bare backend service. Active on: blog (except `/net-diag.sh`), www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering.
 - **Docker images**: Always build for `linux/amd64`. SHA-tag rule is being phased out — see `docs/plans/2026-05-16-auto-upgrade-apps-{design,plan}.md`. New model: CI pushes `:latest` (optionally also `:<8-char-sha>` for traceability), Keel polls and triggers rollouts. Cache-staleness concern from the old rule is resolved at the nginx layer (URL-split — manifests pass through, blobs cached). Until Phase 1 of the migration completes (per the plan), follow the SHA-tag rule for new services to match existing pattern.
- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.203` (with `skip_verify = true`, since the node dials Traefik by IP but the cert is for `forgejo.viktorbarzin.me`) to avoid hairpin NAT. That redirect covers **kubelet pulls** only — in-cluster pods (notably Woodpecker buildkit build pods pushing images) resolve `forgejo.viktorbarzin.me` via a CoreDNS `rewrite name exact ... traefik.traefik.svc.cluster.local` (Corefile in `stacks/technitium/modules/technitium/main.tf`), since they do NOT use the node containerd mirror; without it, buildkit pushes intermittently timed out on the public-IP hairpin (added 2026-06-04, beads code-yh33). **Was `.200` until 2026-06-01** — Traefik's 2026-05-30 move to its dedicated `.203` left this redirect pointing at the now-dead `.200:443`, silently breaking every *fresh* forgejo pull (cached images kept running, so it stayed hidden until a new image tag was pulled). Redirect source lives in `modules/create-template-vm/k8s-node-containerd-setup.sh` (new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing nodes). Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest`; integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
+- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.203` (with `skip_verify = true`, since the node dials Traefik by IP but the cert is for `forgejo.viktorbarzin.me`) to avoid hairpin NAT. That redirect covers **kubelet pulls** only — in-cluster pods (notably Woodpecker buildkit build pods pushing images) resolve `forgejo.viktorbarzin.me` via a CoreDNS `rewrite name exact ... traefik.traefik.svc.cluster.local` (Corefile in `stacks/technitium/modules/technitium/main.tf`), since they do NOT use the node containerd mirror; without it, buildkit pushes intermittently timed out on the public-IP hairpin (added 2026-06-04, beads code-yh33). **Was `.200` until 2026-06-01** — Traefik's 2026-05-30 move to its dedicated `.203` left this redirect pointing at the now-dead `.200:443`, silently breaking every *fresh* forgejo pull (cached images kept running, so it stayed hidden until a new image tag was pulled). Redirect source lives in `modules/create-template-vm/k8s-node-containerd-setup.sh` (new nodes) and `scripts/setup-forgejo-containerd-mirror.sh` (existing nodes). Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest` + any buildkit `*cache*` tag (so `--cache-from`/`--cache-to` refs survive retention — added 2026-06-09); **went live (DRY_RUN=false) 2026-06-09** after verifying 0 running images on the delete set — the registry PVC is at its 50Gi autoresize ceiling on the HDD (we did NOT move it to SSD, see beads code-oflt), so live retention is what keeps it from filling. Integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
 - **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts.
 - **Node memory changes**: When changing VM memory on any k8s node, update kubelet `systemReserved`, `kubeReserved`, and eviction thresholds accordingly. Config: `/var/lib/kubelet/config.yaml`. Template: `stacks/infra/main.tf`. Current values: systemReserved=512Mi, kubeReserved=512Mi, evictionHard=500Mi, evictionSoft=1Gi.
 - **Node OS disk tuning** (in `stacks/infra/main.tf`): kubelet `imageGCHighThresholdPercent=70` (was 85), `imageGCLowThresholdPercent=60` (was 80), ext4 `commit=60` in fstab (was default 5s), journald `SystemMaxUse=200M` + `MaxRetentionSec=3day`.
--- a/stacks/forgejo/cleanup.tf
+++ b/stacks/forgejo/cleanup.tf
@ -4,9 +4,17 @@
 # it's per-user runtime state inside the Forgejo DB. Driving retention from
 # a CronJob hitting the public API keeps the policy versioned in this repo.
 #
-# Auth: a write:package PAT belonging to ci-pusher (same user that pushes
-# from CI). DELETE on packages requires write:package scope. PAT lives in
-# Vault at secret/viktor/forgejo_cleanup_token.
+# Auth: a write:package PAT belonging to VIKTOR (the package OWNER). PAT
+# lives in Vault at secret/viktor/forgejo_cleanup_token.
+#
+# CORRECTION 2026-06-09: this previously said the PAT belonged to ci-pusher.
+# That was wrong and silently broke retention — Forgejo container packages
+# are scoped per-user, so ci-pusher gets HTTP 403 on DELETE of viktor/*
+# (the dry-run only does GETs, which DO work, so the 403 stayed hidden until
+# the first live run). DELETE requires a write:package PAT owned by viktor.
+# forgejo_cleanup_token is therefore set to viktor's write:package PAT (today
+# the same value as secret/ci/global/forgejo_push_token). IF that push token
+# is ever regenerated, re-mirror it here or retention silently 403s again.

 data "vault_kv_secret_v2" "forgejo_viktor" {
  mount = "secret"
@ -14,8 +22,12 @@ data "vault_kv_secret_v2" "forgejo_viktor" {
 }

 locals {
-  # Flip to false after first 7 days of dry-run logs look correct.
-  forgejo_cleanup_dry_run = true
+  # Activated 2026-06-09 after verifying a dry-run delete list against all
+  # running viktor/* images cluster-wide: 0 running images on the delete set
+  # (would prune 317 stale versions, keeping newest 10 + latest + cache tags).
+  # Live retention is what keeps the registry PVC from filling on the HDD
+  # (we deliberately did NOT move Forgejo to SSD — see beads code-oflt).
+  forgejo_cleanup_dry_run = false
 }

 resource "kubernetes_config_map" "forgejo_cleanup_script" {
--- a/stacks/forgejo/files/cleanup.sh
+++ b/stacks/forgejo/files/cleanup.sh
@ -2,8 +2,13 @@
 # Forgejo container-package retention.
 #
 # For each container package owned by ${FORGEJO_OWNER}, keep newest
-# ${KEEP_LAST_N} versions + always keep tag "latest". Deletes the rest via
+# ${KEEP_LAST_N} versions + always keep tag "latest" + always keep any
+# buildkit cache tag (matches "cache", e.g. tripit:cache — these back
+# --cache-from/--cache-to and must survive retention or every build is a
+# cold rebuild). Deletes the rest via
 # DELETE /api/v1/packages/{owner}/container/{name}/{version}.
+# (Note: an 8-char SHA tag is pure hex and cannot contain "cache" — 'h' is
+#  not a hex digit — so the cache match never catches a real image tag.)
 #
 # DRY_RUN=true logs what would be deleted but issues no DELETE calls.
 #
@ -72,9 +77,11 @@ for NAME in $NAMES; do
  N_VERSIONS=$(jq 'length' "$TMPDIR/$NAME.json")
  echo "[$NAME] $N_VERSIONS version(s)"

-  # Build the keep set: top $KEEP + anything tagged 'latest'.
+  # Build the keep set: top $KEEP + always 'latest' + any buildkit cache tag.
  jq -r --argjson keep "$KEEP" '
-    [.[0:$keep][].version] + [.[] | select(.version == "latest") | .version]
+    [.[0:$keep][].version]
+    + [.[] | select(.version == "latest") | .version]
+    + [.[] | select(.version | test("cache"; "i")) | .version]
    | unique
    | .[]
  ' "$TMPDIR/$NAME.json" > "$TMPDIR/$NAME.keep"
--- a/stacks/forgejo/main.tf
+++ b/stacks/forgejo/main.tf
@ -9,7 +9,7 @@ resource "kubernetes_namespace" "forgejo" {
    name = "forgejo"
    labels = {
      "istio-injection" : "disabled"
-      tier = local.tiers.edge
+      tier               = local.tiers.edge
      "keel.sh/enrolled" = "true"
    }
  }
@ -94,7 +94,7 @@ resource "kubernetes_deployment" "forgejo" {
          fs_group = 1000
        }
        container {
-          name  = "forgejo"
+          name = "forgejo"
          # Pinned to 11.0.14 (latest 11.x as of 2026-05-12) — was on
          # floating `:11`. On 2026-05-24T15:35:37Z Keel force-policy
          # rewrote the tag from `11.0.14 → 1.18` (Gitea-era Forgejo
@ -168,13 +168,19 @@ resource "kubernetes_deployment" "forgejo" {
            name       = "data"
            mount_path = "/data"
          }
+          # Bumped 1Gi -> 3Gi 2026-06-09: Forgejo was OOMKilled (exit 137)
+          # under registry-push load from in-cluster CI builds (tripit
+          # buildkit pushes large layers into the OCI registry). VPA
+          # upperBound reads ~1.5Gi, but that's suppressed by the 1Gi cap it
+          # kept OOMing against — size for the push spike, not steady-state.
+          # requests=limits (Guaranteed QoS) per the repo memory convention.
          resources {
            requests = {
              cpu    = "15m"
-              memory = "1Gi"
+              memory = "3Gi"
            }
            limits = {
-              memory = "1Gi"
+              memory = "3Gi"
            }
          }
          port {
@ -202,7 +208,7 @@ resource "kubernetes_deployment" "forgejo" {
      metadata[0].annotations["keel.sh/match-tag"],
      metadata[0].annotations["keel.sh/trigger"],
      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
-      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
+      spec[0].template[0].spec[0].container[0].image,  # KEEL_IGNORE_IMAGE — Keel manages tag updates
      metadata[0].annotations["kubernetes.io/change-cause"],
      metadata[0].annotations["deployment.kubernetes.io/revision"],
      spec[0].template[0].metadata[0].annotations["keel.sh/update-time"],