kms: replace inline ConfigMap nginx with custom Hugo image

The kms-web-page deployment now pulls forgejo.viktorbarzin.me/viktor/kms-website:${var.image_tag} (source in the new Forgejo repo viktor/kms-website). The ConfigMap-mounted index.html is gone — the new site is a Hugo build with full GVLK catalog for every Microsoft KMS-eligible Windows + Office edition, copy-to-clipboard, dark/light themes. The container image tag is managed by CI (kubectl set image), so add lifecycle ignore_changes on container[0].image alongside the existing dns_config (Kyverno) ignore. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
f1-stream: Stremio addon extractor — TvVoo + StremVerse Sky F1 / DAZN F1
2026-05-07 23:29:35 +00:00 · 2026-05-07 23:29:35 +00:00 · 2026-05-07 23:29:35 +00:00 · 2026-05-07 23:29:35 +00:00 · 2026-05-07 23:29:35 +00:00 · 2026-05-07 23:29:35 +00:00
75 changed files with 9635 additions and 2415 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -30,7 +30,7 @@ Violations cause state drift, which causes future applies to break or silently r
 - **New service**: Use `setup-project` skill for full workflow
 - **Ingress**: `ingress_factory` module. Auth: `protected = true`. Anti-AI: on by default. **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`.
 - **Docker images**: Always build for `linux/amd64`. Use 8-char git SHA tags — `:latest` causes stale pull-through cache.
- **Private registry**: `registry.viktorbarzin.me` (htpasswd auth, credentials in Vault `secret/viktor`). Use `image: registry.viktorbarzin.me/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the secret to all namespaces. Build & push from registry VM (`10.0.20.10`). Containerd `hosts.toml` redirects pulls to LAN IP directly. Web UI at `docker.viktorbarzin.me` (Authentik-protected). Engine pinned to `registry:2.8.3` (see post-mortem 2026-04-19); on-VM configs deploy via `.woodpecker/registry-config-sync.yml`; integrity probed every 15m by `registry-integrity-probe` CronJob in `monitoring` ns — the HTTP API is the authoritative integrity check, NOT `/blobs/*/data` presence (revision-link absence is the real failure mode).
+- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.200` to avoid hairpin NAT. Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest`; integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
 - **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts.
 - **Node memory changes**: When changing VM memory on any k8s node, update kubelet `systemReserved`, `kubeReserved`, and eviction thresholds accordingly. Config: `/var/lib/kubelet/config.yaml`. Template: `stacks/infra/main.tf`. Current values: systemReserved=512Mi, kubeReserved=512Mi, evictionHard=500Mi, evictionSoft=1Gi.
 - **Node OS disk tuning** (in `stacks/infra/main.tf`): kubelet `imageGCHighThresholdPercent=70` (was 85), `imageGCLowThresholdPercent=60` (was 80), ext4 `commit=60` in fstab (was default 5s), journald `SystemMaxUse=200M` + `MaxRetentionSec=3day`.
--- a/.claude/reference/service-catalog.md
+++ b/.claude/reference/service-catalog.md
@ -45,7 +45,8 @@
 | nextcloud | File sync/share | nextcloud |
 | calibre | E-book management (may be merged into ebooks stack) | calibre |
 | onlyoffice | Document editing | onlyoffice |
-| f1-stream | F1 streaming | f1-stream |
+| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier) | f1-stream |
+| chrome-service | Headed Chromium WebSocket pool (`ws://chrome-service.chrome-service.svc:3000/<token>`) for sibling services driving anti-bot embeds | chrome-service |
 | rybbit | Analytics | rybbit |
 | isponsorblocktv | SponsorBlock for TV | isponsorblocktv |
 | actualbudget | Budgeting (factory pattern) | actualbudget |
--- a/.woodpecker/build-ci-image.yml
+++ b/.woodpecker/build-ci-image.yml
@ -14,104 +14,72 @@ steps:
  - name: build-and-push
    image: woodpeckerci/plugin-docker-buildx
    settings:
-      repo: registry.viktorbarzin.me:5050/infra-ci
+      # Phase 4 of forgejo-registry-consolidation 2026-05-07 —
+      # registry.viktorbarzin.me dropped, Forgejo is the only target.
+      repo:
+        - forgejo.viktorbarzin.me/viktor/infra-ci
      dockerfile: ci/Dockerfile
      context: ci/
      tags:
        - latest
        - "${CI_COMMIT_SHA:0:8}"
      platforms: linux/amd64
-      registry: registry.viktorbarzin.me:5050
      logins:
-        - registry: registry.viktorbarzin.me:5050
+        - registry: forgejo.viktorbarzin.me
          username:
-            from_secret: registry_user
+            from_secret: forgejo_user
          password:
-            from_secret: registry_password
+            from_secret: forgejo_push_token

-  # Post-push integrity check. Re-resolves the image we just pushed and HEADs
-  # every blob it references — top-level manifest (index or single), each child
-  # platform manifest, each config blob, each layer blob. If any returns !=200
-  # the pipeline fails loudly here so we never ship a broken index downstream.
-  # Historical context: 2026-04-13 and 2026-04-19 incidents both shipped indexes
-  # whose platform/attestation children had been GC-orphaned on the registry VM.
-  - name: verify-integrity
+  # Post-push integrity check is now redundant with the every-15min
+  # forgejo-integrity-probe in stacks/monitoring/, which walks
+  # /v2/_catalog + HEADs every blob across the entire Forgejo registry.
+  # If a corruption pattern emerges that the periodic probe misses,
+  # restore a verify step similar to the pre-Phase-4 version (see
+  # commit 49f4956f) but pointed at forgejo.viktorbarzin.me.
+
+  # Break-glass tarball: save the just-pushed infra-ci image to disk on the
+  # registry VM (10.0.20.10) so we can `docker load` it back into a node
+  # when Forgejo is unreachable. Pulls from Forgejo (the only registry now).
+  # Best-effort — failure here doesn't fail the pipeline.
+  # Recovery procedure: docs/runbooks/forgejo-registry-breakglass.md.
+  - name: breakglass-tarball
    image: alpine:3.20
+    failure: ignore
    environment:
-      REG_USER:
-        from_secret: registry_user
-      REG_PASS:
-        from_secret: registry_password
+      REGISTRY_SSH_KEY:
+        from_secret: registry_ssh_key
+      FORGEJO_USER:
+        from_secret: forgejo_user
+      FORGEJO_PASS:
+        from_secret: forgejo_push_token
    commands:
-      - apk add --no-cache curl jq
-      - REG=registry.viktorbarzin.me:5050
-      - REPO=infra-ci
+      - apk add --no-cache openssh-client
+      - mkdir -p ~/.ssh && chmod 700 ~/.ssh
+      - printf '%s\n' "$REGISTRY_SSH_KEY" > ~/.ssh/id_ed25519
+      - chmod 600 ~/.ssh/id_ed25519
+      - ssh-keyscan -t ed25519 10.0.20.10 >> ~/.ssh/known_hosts 2>/dev/null
      - SHA=${CI_COMMIT_SHA:0:8}
-      - AUTH="$REG_USER:$REG_PASS"
      - |
-        set -euo pipefail
-        ACCEPT='Accept: application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json,application/vnd.docker.distribution.manifest.list.v2+json,application/vnd.docker.distribution.manifest.v2+json'
-
-        fetch_manifest() {
-          # Prints the body to $2, returns the HTTP code as stdout.
-          curl -sk -u "$AUTH" -H "$ACCEPT" \
-            -o "$2" -w '%{http_code}' \
-            "https://$REG/v2/$REPO/manifests/$1"
-        }
-        head_blob() {
-          curl -sk -u "$AUTH" -o /dev/null -w '%{http_code}' \
-            -I "https://$REG/v2/$REPO/blobs/$1"
-        }
-
-        verify_single_manifest() {
-          local ref="$1" tmp=/tmp/m-$$.json
-          local rc cfg
-          rc=$(fetch_manifest "$ref" "$tmp")
-          if [ "$rc" != "200" ]; then
-            echo "FAIL: manifest $ref returned HTTP $rc"; return 1
-          fi
-          cfg=$(jq -r '.config.digest // empty' "$tmp")
-          if [ -n "$cfg" ]; then
-            rc=$(head_blob "$cfg")
-            [ "$rc" = "200" ] || { echo "FAIL: config blob $cfg returned HTTP $rc"; return 1; }
-          fi
-          jq -r '.layers[]?.digest' "$tmp" > /tmp/layers-$$.txt
-          while IFS= read -r layer; do
-            [ -z "$layer" ] && continue
-            rc=$(head_blob "$layer")
-            [ "$rc" = "200" ] || { echo "FAIL: layer blob $layer returned HTTP $rc"; return 1; }
-          done < /tmp/layers-$$.txt
-          return 0
-        }
-
-        echo "=== Verifying push integrity for $REPO:$SHA ==="
-        TOP=/tmp/top-$$.json
-        rc=$(fetch_manifest "$SHA" "$TOP")
-        [ "$rc" = "200" ] || { echo "FAIL: top manifest :$SHA returned HTTP $rc"; exit 1; }
-
-        MT=$(jq -r '.mediaType // empty' "$TOP")
-        echo "Top-level media type: ${MT:-<unset>}"
-
-        if echo "$MT" | grep -Eq 'manifest\.list|image\.index'; then
-          jq -r '.manifests[].digest' "$TOP" > /tmp/children-$$.txt
-          echo "Multi-platform index: $(wc -l </tmp/children-$$.txt) child manifest(s)"
-          while IFS= read -r d; do
-            echo "--- child $d ---"
-            verify_single_manifest "$d" || exit 1
-          done < /tmp/children-$$.txt
-        else
-          echo "Single-platform manifest — verifying directly"
-          verify_single_manifest "$SHA" || exit 1
-        fi
-
-        echo "=== All manifests + blobs verified. Push integrity intact. ==="
+        ssh -n -o BatchMode=yes root@10.0.20.10 "
+          set -e
+          mkdir -p /opt/registry/data/private/_breakglass
+          IMAGE=forgejo.viktorbarzin.me/viktor/infra-ci:$SHA
+          echo \$FORGEJO_PASS | docker login forgejo.viktorbarzin.me -u \$FORGEJO_USER --password-stdin
+          docker pull \$IMAGE
+          docker save \$IMAGE | gzip > /opt/registry/data/private/_breakglass/infra-ci-$SHA.tar.gz
+          ln -sfn infra-ci-$SHA.tar.gz /opt/registry/data/private/_breakglass/infra-ci-latest.tar.gz
+          ls -t /opt/registry/data/private/_breakglass/infra-ci-*.tar.gz \
+            | grep -v 'latest' | tail -n +6 | xargs -r rm -v
+          ls -lh /opt/registry/data/private/_breakglass/
+        "

  - name: slack
    image: curlimages/curl
    commands:
      - |
        curl -s -X POST -H 'Content-type: application/json' \
-          --data "{\"text\":\"CI image built: registry.viktorbarzin.me:5050/infra-ci:${CI_COMMIT_SHA:0:8}\"}" \
+          --data "{\"text\":\"CI image built: forgejo.viktorbarzin.me/viktor/infra-ci:${CI_COMMIT_SHA:0:8} (and registry-private mirror)\"}" \
          "$SLACK_WEBHOOK" || true
    environment:
      SLACK_WEBHOOK:
--- a/.woodpecker/default.yml
+++ b/.woodpecker/default.yml
@ -25,7 +25,7 @@ clone:

 steps:
  - name: apply
-    image: registry.viktorbarzin.me/infra-ci:latest
+    image: forgejo.viktorbarzin.me/viktor/infra-ci:latest
    pull: true
    backend_options:
      kubernetes:
--- a/.woodpecker/drift-detection.yml
+++ b/.woodpecker/drift-detection.yml
@ -14,7 +14,7 @@ clone:

 steps:
  - name: detect-drift
-    image: registry.viktorbarzin.me/infra-ci:latest
+    image: forgejo.viktorbarzin.me/viktor/infra-ci:latest
    pull: true
    backend_options:
      kubernetes:
--- a/docs/architecture/chrome-service.md
+++ b/docs/architecture/chrome-service.md
@ -0,0 +1,136 @@
+# chrome-service — In-cluster headed Chromium pool
+
+## Overview
+
+`chrome-service` is a single-replica, persistent-profile, bearer-token-gated
+Playwright **launch-server** that exposes a headed Chromium browser over a
+WebSocket. Sibling services connect to it instead of running their own
+in-process Chromium when the upstream's anti-bot tooling
+(`disable-devtool.js` redirect-to-google trap, console-clear timing tricks,
+`navigator.webdriver` checks) defeats a headless browser.
+
+Initial caller: `f1-stream`'s `playback_verifier`. Future callers attach
+via the WS+token contract documented in `stacks/chrome-service/README.md`.
+
+## Why a separate stack
+
+In-process Chromium inside `f1-stream`:
+
+- Runs **headless** by default (no `Xvfb`/`DISPLAY`).
+- Has the `HeadlessChromium/...` UA suffix and `navigator.webdriver === true`.
+- Trips `disable-devtool.js`'s **Performance** detector — Playwright's CDP
+  adds latency to `console.log(largeArray)` vs `console.table(largeArray)`,
+  which the lib reads as "DevTools is open" and redirects to
+  `https://www.google.com/`.
+
+`chrome-service` solves this by:
+
+1. Running **headed** under `Xvfb :99` (via `playwright launch-server` with
+   a JSON config that pins `headless: false`).
+2. Living in a long-lived pod so JIT browser launch latency disappears.
+3. Allowing a per-context init script
+   (`stacks/chrome-service/files/stealth.js` ~ 40 lines, vendored from
+   `puppeteer-extra-plugin-stealth`) to spoof `webdriver`, `chrome.runtime`,
+   `plugins`, `languages`, `Permissions.query`, WebGL renderer strings, and
+   to hide the `disable-devtool-auto` script-tag attribute so the lib's
+   IIFE exits early.
+
+## Wire protocol
+
+```text
+                  ws://chrome-service.chrome-service.svc.cluster.local:3000/<TOKEN>
+                                            │
+            ┌───────────────────────────────┼───────────────────────────────┐
+            │ caller pod                    │                  chrome-service pod
+            │  (e.g. f1-stream)             │                  (single replica)
+            │                               │
+            │  CHROME_WS_URL  ──────────────┘
+            │  CHROME_WS_TOKEN ─── from `secret/chrome-service.api_bearer_token` (ESO)
+            │
+            │  await chromium.connect(f"{ws}/{token}")
+            │  await ctx.add_init_script(STEALTH_JS)
+            │  page.goto("https://upstream.com/embed/...")
+            │
+            └─── ←── pages render under Xvfb, headed Chromium ──── ─────────┘
+```
+
+## Image pin
+
+Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
+`stacks/chrome-service/main.tf`) and the Python client
+(`playwright==1.48.0` in callers' `requirements.txt`) **must match
+minor-versions**. Bump in lockstep — Playwright protocol changes between
+minors and the client cannot connect to a mismatched server.
+
+The Microsoft image ships only the browser binaries, not the `playwright`
+npm SDK; the start command runs `npx -y playwright@1.48.0 launch-server`
+which downloads the SDK on first start (cached under `$HOME/.npm` via the
+PVC) and reuses it on subsequent restarts.
+
+## Storage
+
+- **`chrome-service-profile-encrypted`** (PVC, 2Gi → 10Gi autoresize,
+  `proxmox-lvm-encrypted`) — Chromium user-data dir + npm cache.
+  Encrypted because cookies/localStorage may include third-party auth tokens
+  for sites callers drive. `HOME=/profile` so npx caches there.
+- **`chrome-service-backup-host`** (NFS, RWX) — destination for a 6-hourly
+  CronJob that `tar -czf /backup/<YYYY_MM_DD_HH>.tar.gz -C /profile .`,
+  retention 30 days.
+
+## Auth + secrets
+
+- Vault KV `secret/chrome-service.api_bearer_token` — 32-byte URL-safe
+  random, rotated by hand:
+  `vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
+- ESO syncs into namespace-local Secret `chrome-service-secrets`
+  (server pod) and `chrome-service-client-secrets` (each caller pod).
+- Reloader (`reloader.stakater.com/auto = "true"`) cascades token rotation
+  to both server and any annotated caller — no manual rollout.
+
+## Network controls
+
+- **`kubernetes_network_policy_v1.ws_ingress`** — two separate ingress
+  rules on the same policy:
+  - **TCP/3000** (Playwright WS): only namespaces labelled
+    `chrome-service.viktorbarzin.me/client = "true"` (plus an explicit
+    fallback for `f1-stream` by `kubernetes.io/metadata.name`).
+  - **TCP/6080** (noVNC HTTP+WS): only the `traefik` namespace, since
+    the public-facing path is `chrome.viktorbarzin.me` ingress →
+    Traefik → sidecar. Authentik forward-auth still gates external
+    access at the Traefik layer.
+- **WS port 3000** is internal-only (no ingress, no Cloudflare DNS).
+- **noVNC sidecar** (`forgejo.viktorbarzin.me/viktor/chrome-service-novnc`)
+  exposes a live HTML5 view of the headed Chromium session via
+  `x11vnc` (connected to Xvfb on `localhost:6099`) bridged to
+  `websockify` on port 6080. Service `chrome` maps :80 → :6080 and is
+  exposed via `ingress_factory` at `chrome.viktorbarzin.me`,
+  Authentik-gated. Both static page and WebSocket upgrade share the
+  same path — Cloudflare proxy, Cloudflared tunnel, Traefik, and
+  Authentik forward-auth all preserve `Upgrade: websocket`.
+
+## Adding a new caller
+
+See `stacks/chrome-service/README.md` for the four-step recipe:
+
+1. Label the caller's namespace.
+2. Add an `ExternalSecret` pulling `secret/chrome-service`.
+3. Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN` env vars.
+4. Vendor `stealth.js` and apply via `await context.add_init_script(...)`
+   after every `new_context()`.
+
+## Limits + risks
+
+- **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
+  license check, device-fingerprint mismatch, hotlink protection that
+  whitelists specific parent domains), the verifier returns
+  `is_playable=False` and the extractor moves on. No user-visible
+  breakage, just empty stream lists for that source.
+- **JWPlayer DRM error 102630** — observed with several hmembeds embeds
+  even from the headed chrome-service. The license check bails because
+  the request origin isn't on the embed's allowlist; this is upstream
+  policy, not an infra defect.
+- **Single replica + RWO PVC** — the deployment uses `Recreate` strategy.
+  Brief outage on rollout, ~30s for browser warmup.
+- **No `/metrics` endpoint** — the cluster's generic
+  `KubePodCrashLooping` rule covers basic alerting. A Prometheus scrape
+  exporter is day-2 work.
--- a/docs/architecture/ci-cd.md
+++ b/docs/architecture/ci-cd.md
@ -19,7 +19,7 @@ graph LR
    I --> J[Pull from DockerHub<br/>or Pull-Through Cache]

    K[Pull-Through Cache<br/>10.0.20.10] -.-> J
-    L[registry.viktorbarzin.me<br/>Private Registry] -.-> J
+    L[forgejo.viktorbarzin.me<br/>Private Registry on Forgejo] -.-> J

    style B fill:#2088ff
    style F fill:#4c9e47
@ -33,7 +33,7 @@ graph LR
 | GitHub Actions | Cloud | `.github/workflows/build-and-deploy.yml` | Build Docker images, push to DockerHub |
 | Woodpecker CI | Self-hosted | `ci.viktorbarzin.me` | Deploy to Kubernetes cluster |
 | DockerHub | Cloud | `viktorbarzin/*` | Public image registry |
-| Private Registry | Custom | `registry.viktorbarzin.me` | Private images, htpasswd auth |
+| Private Registry | Forgejo Packages | `forgejo.viktorbarzin.me/viktor` | Private container images (PAT auth, retention CronJob) — migrated from registry.viktorbarzin.me 2026-05-07 |
 | Pull-Through Cache | Custom | `10.0.20.10:5000` (docker.io)<br/>`10.0.20.10:5010` (ghcr.io) | LAN cache for remote registries |
 | Kyverno | Cluster | `kyverno` namespace | Auto-sync registry credentials to all namespaces |
 | Vault | Cluster | `vault.viktorbarzin.me` | K8s auth for Woodpecker pipelines |
@ -102,7 +102,7 @@ Woodpecker API uses numeric IDs (not owner/name):
 1. **Containerd hosts.toml** redirects pulls from docker.io and ghcr.io to pull-through cache at `10.0.20.10`
 2. **Pull-through cache** serves cached images from LAN, fetches from upstream on cache miss
 3. **Kyverno ClusterPolicy** auto-syncs `registry-credentials` Secret to all namespaces for private registry access
-4. **Private registry** (`registry.viktorbarzin.me`) uses htpasswd auth, credentials stored in Vault. Runs `registry:2.8.3` (pinned — floating `registry:2` was the root cause of the 2026-04-13 + 2026-04-19 orphan-index incidents; see `docs/post-mortems/2026-04-19-registry-orphan-index.md`).
+4. **Private registry** has been Forgejo's built-in OCI registry at `forgejo.viktorbarzin.me/viktor/<image>` since 2026-05-07. Auth via PAT (Vault `secret/ci/global/forgejo_push_token` for push, `secret/viktor/forgejo_pull_token` for pull). The pre-migration `registry:2.8.3`-based private registry on `registry.viktorbarzin.me:5050` was the root cause of three orphan-index incidents in three weeks (2026-04-13, 2026-04-19, 2026-05-04 — see `docs/post-mortems/2026-04-19-registry-orphan-index.md` and the full migration writeup at `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md`). The five pull-through caches on `10.0.20.10` (ports 5000/5010/5020/5030/5040) stay in place for upstream registries.
 5. **Integrity probe** (`registry-integrity-probe` CronJob in `monitoring` ns, every 15m) walks `/v2/_catalog` → tags → indexes → child manifests via HEAD and pushes `registry_manifest_integrity_failures` to Pushgateway; alerts `RegistryManifestIntegrityFailure` / `RegistryIntegrityProbeStale` / `RegistryCatalogInaccessible` page on broken state. Authoritative check (HTTP API, not filesystem).

 ### Infra Pipelines (Woodpecker-only)
--- a/docs/architecture/monitoring.md
+++ b/docs/architecture/monitoring.md
@ -63,7 +63,7 @@ graph TB
 | External Monitor Sync | Python 3.12 | `stacks/uptime-kuma/` | CronJob (10min) syncs `[External]` monitors from `cloudflare_proxied_names` |
 | dcgm-exporter | Configurable resources | `stacks/monitoring/modules/monitoring/` | NVIDIA GPU metrics collection |
 | Email Roundtrip Probe | Python 3.12 | `stacks/mailserver/modules/mailserver/` | E2E email delivery verification via Mailgun API + IMAP |
-| Registry Integrity Probe | Alpine 3.20 + curl/jq | `stacks/monitoring/modules/monitoring/main.tf` | CronJob every 15m: walks `/v2/_catalog` on `registry.viktorbarzin.me:5050`, HEADs every tagged manifest + index child; emits `registry_manifest_integrity_*` metrics to Pushgateway. Catches orphan OCI-index state that filesystem scans miss. |
+| Forgejo Registry Integrity Probe | Alpine 3.20 + curl/jq | `stacks/monitoring/modules/monitoring/main.tf` | CronJob every 15m: walks `/v2/_catalog` on `forgejo.viktorbarzin.me` (HTTP via in-cluster service), HEADs every tagged manifest + index child; emits `registry_manifest_integrity_*` metrics to Pushgateway. Replaces the legacy `registry-integrity-probe` against `registry.viktorbarzin.me:5050` decommissioned in Phase 4 of forgejo-registry-consolidation 2026-05-07. |

 ## How It Works

--- a/docs/plans/2026-05-07-forgejo-registry-consolidation-design.md
+++ b/docs/plans/2026-05-07-forgejo-registry-consolidation-design.md
@ -0,0 +1,195 @@
+# Forgejo Registry Consolidation — Design
+
+**Date**: 2026-05-07
+**Status**: Approved
+
+## Problem
+
+`registry-private` (the `registry:2` container on the docker-registry
+VM at `10.0.20.10`) has hit `distribution#3324` corruption three
+times in three weeks (2026-04-13, 2026-04-19, 2026-05-04). Each
+incident required manual blob recovery and another round of
+hardening to `cleanup-tags.sh` and the GC procedure. The integrity
+probe catches it within 15 minutes now, but every hit still costs
+~1h of cleanup, and we keep tightening the same loose screw.
+
+Root cause is a known race in `distribution`: tag deletes that race
+with concurrent garbage collection produce orphan OCI-index children.
+Upstream has not patched it; our mitigations (probe, blob
+fix-up script, idempotent cleanup) reduce blast radius but don't
+remove the failure mode.
+
+Forgejo (deployed for OAuth and personal repos at
+`forgejo.viktorbarzin.me`) ships a built-in OCI registry as part of
+the Packages feature, default-on in v11. Using it removes
+`distribution`-the-engine from the path entirely, replaces it with
+Forgejo's own implementation backed by Forgejo's DB+blob store, and
+gets us source hosting + image hosting in one resource.
+
+The PVE host RAM upgrade from 142GB to 272GB (memory id=569) means
+the cluster can absorb the resource bump Forgejo needs for the
+registry workload (1Gi → 1Gi).
+
+## Decision
+
+Move every image currently on `registry.viktorbarzin.me:5050` to
+Forgejo's OCI registry at `forgejo.viktorbarzin.me`. Decommission
+`registry-private` after a 14-day dual-push bake.
+
+Pull-through caches for upstream registries (DockerHub, GHCR, Quay,
+k8s.gcr, Kyverno) stay on the registry VM permanently — Forgejo
+won't serve as a pull-through, so the chicken-and-egg of "Forgejo
+pulling its own image through itself" never arises.
+
+## Design
+
+### Registry hostname
+
+Image references become `forgejo.viktorbarzin.me/viktor/<image>:<tag>`.
+The `viktor/` prefix is the Forgejo owner namespace; all current
+private images ship under that single owner.
+
+### Auth
+
+Two service-account users:
+
+| User | Scope | Vault key | Used by |
+|---|---|---|---|
+| `cluster-puller` | `read:package` | `secret/viktor/forgejo_pull_token` | cluster-wide `registry-credentials` Secret, monitoring probe |
+| `ci-pusher` | `write:package` | `secret/ci/global/forgejo_push_token` | Woodpecker pipelines (synced via `vault-woodpecker-sync` CronJob) |
+
+A third PAT (`secret/viktor/forgejo_cleanup_token`, also belongs to
+`ci-pusher`) drives the retention CronJob — kept separate from the
+push PAT so a leaked CI token doesn't immediately enable mass deletes.
+
+PATs have no expiry. Rotation policy: regenerate via Forgejo Web UI
+and `vault kv patch` if a leak is suspected; ESO/sync downstream is
+automatic.
+
+### Cluster pull path
+
+`registry-credentials` is a single Secret in `kyverno` ns, cloned
+into every namespace by the existing
+`sync-registry-credentials` ClusterPolicy. We extend its
+`dockerconfigjson` `auths` map with a fourth entry for
+`forgejo.viktorbarzin.me`. **No new Secret, no new ClusterPolicy,
+no `imagePullSecrets =` line edits across stacks.**
+
+Containerd `hosts.toml` redirects `forgejo.viktorbarzin.me` → in-cluster
+Traefik LB at `10.0.20.200`, the same pattern used for
+`registry.viktorbarzin.me` → `10.0.20.10:5050`. Avoids hairpin NAT
+through the WAN gateway for in-cluster pulls.
+
+### Push path
+
+Woodpecker pipelines push to BOTH targets during the bake:
+
+```yaml
+- name: build-and-push
+  image: woodpeckerci/plugin-docker-buildx
+  settings:
+    repo:
+      - registry.viktorbarzin.me/<name>
+      - forgejo.viktorbarzin.me/viktor/<name>
+    logins:
+      - registry: registry.viktorbarzin.me
+        username:
+          from_secret: registry_user
+        password:
+          from_secret: registry_password
+      - registry: forgejo.viktorbarzin.me
+        username:
+          from_secret: forgejo_user
+        password:
+          from_secret: forgejo_push_token
+```
+
+The `vault-woodpecker-sync` CronJob (every 6h) propagates
+`secret/ci/global` keys to every Woodpecker repo as global secrets.
+
+### Retention
+
+Forgejo's per-package "Cleanup Rules" UI is per-user runtime DB
+state, not Terraform-driven. Retention runs as a CronJob in the
+`forgejo` namespace, schedule `0 4 * * *`, that:
+
+1. Lists all container packages under the `viktor` owner.
+2. Groups by package name.
+3. Keeps newest 10 versions + always keeps `latest`.
+4. DELETEs the rest via `/api/v1/packages/{owner}/{type}/{name}/{version}`.
+
+First 7 days run with `DRY_RUN=true` — script logs what it would
+delete but issues no DELETE calls. After log review, flip the
+`forgejo_cleanup_dry_run` local in `cleanup.tf` to false.
+
+### Integrity monitoring
+
+Mirror the existing `registry-integrity-probe` CronJob: walk
+`/v2/_catalog`, walk every tag, HEAD every manifest + index child,
+push `registry_manifest_integrity_*` metrics. Existing
+Prometheus alerts fire on the `instance` label, so they cover both
+probes automatically once the alert annotations are made
+instance-aware (done in this change).
+
+### Source migration
+
+Projects currently living as plain dirs in the local-only monorepo
+become standalone Forgejo repos. Two GitHub-hosted private repos
+(`beadboard`, `claude-memory-mcp`) move to Forgejo and are archived
+on GitHub.
+
+CI standardises on Woodpecker for everything in scope. The two
+projects that used GHA (build + Woodpecker-deploy via GHA-hosted
+DockerHub push) keep DockerHub for legacy compatibility but their
+canonical image source becomes Forgejo.
+
+### Break-glass for infra-ci
+
+`infra-ci` is the Docker image used by all infra Woodpecker
+pipelines, including `default.yml` (terragrunt apply). If Forgejo is
+unreachable at the moment we need to apply, `infra-ci` is
+unreachable, and we can't apply our way out.
+
+Mitigation: dual-push step also `docker save | gzip` the built
+infra-ci image to:
+
+- `/opt/registry/data/private/_breakglass/infra-ci-<sha>.tar.gz` on
+  the registry VM disk (Copy 1)
+- `/srv/nfs/forgejo-breakglass/` on the NAS (Copy 2)
+
+A `latest` symlink in each location points at the most recent.
+Recovery procedure (`docs/runbooks/forgejo-registry-breakglass.md`):
+scp tarball → `docker load` → `ctr -n k8s.io images import` → fix
+Forgejo via that node.
+
+### Cutover style
+
+**Dual-push bake**: pipelines push to both registries for ≥14 days.
+Pods continue pulling from `registry.viktorbarzin.me`. After bake:
+
+1. Per-project PR: flip `image=` lines in Terraform stacks. Pod
+   re-pull naturally on next rollout.
+2. Phase 4: stop `registry-private` container, remove its
+   `auths` entry from the cluster Secret, drop containerd hosts.toml
+   entry.
+
+## Why not alternatives
+
+| Option | Rejected because |
+|---|---|
+| Stay on `registry-private` | Three corruption incidents in three weeks; mitigation cost rising |
+| Run a fresh registry container alongside (no Forgejo) | Same upstream, same `distribution#3324` failure mode |
+| GHCR / DockerHub for all private images | Public-by-default model + push rate limits; loses owner-owned blob storage |
+| Harbor | Heavier than Forgejo registry, would need its own DB + ingress, no source-hosting integration |
+
+## Risks
+
+See plan doc § "Risk register" for the full table. Top three:
+
+1. **Forgejo registry hits the same corruption pattern.** Mitigated
+   by 14-day bake + integrity probe within 15 min.
+2. **Forgejo down → infra-ci unreachable → can't apply.** Mitigated
+   by tarball break-glass on VM + NAS.
+3. **Pod re-pulls fail after `image=` flip due to containerd cache
+   poisoning.** Mitigated by hosts.toml deployment + per-project
+   `kubectl rollout restart` in Phase 3.
--- a/docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md
+++ b/docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md
@ -0,0 +1,152 @@
+# Forgejo Registry Consolidation — Plan
+
+**Date**: 2026-05-07
+**Status**: Approved — execution in progress (Phase 0)
+**Design**: `2026-05-07-forgejo-registry-consolidation-design.md`
+
+This is the implementation roadmap for migrating off `registry-private`
+onto Forgejo's OCI registry. See the design doc for problem
+statement and rationale. Execution spans 5 phases over ≥3 weeks.
+
+## Phase 0 — Prepare Forgejo (1 PR, no cutover risk)
+
+| Task | File / artifact |
+|---|---|
+| Bump Forgejo memory request+limit 384Mi → 1Gi | `infra/stacks/forgejo/main.tf` |
+| Add `FORGEJO__packages__ENABLED=true` and `FORGEJO__packages__CHUNKED_UPLOAD_PATH=/data/tmp/package-upload` env vars (defensive — already default in v11) | `infra/stacks/forgejo/main.tf` |
+| Bump Forgejo PVC 5Gi → 15Gi, auto-resize cap 20Gi → 50Gi | `infra/stacks/forgejo/main.tf` |
+| Bump ingress `max_body_size = "5g"` (wired into ingress_factory as a Buffering middleware) | `infra/stacks/forgejo/main.tf`, `infra/modules/kubernetes/ingress_factory/main.tf` |
+| Create `cluster-puller` (read:package), `ci-pusher` (write:package), and a third `cleanup` PAT on `ci-pusher`; store PATs in Vault | runbook: `docs/runbooks/forgejo-registry-setup.md` |
+| Extend `registry-credentials` Secret with 4th `auths` entry for `forgejo.viktorbarzin.me` | `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` |
+| Add containerd `hosts.toml` entry redirecting `forgejo.viktorbarzin.me` → in-cluster Traefik LB `10.0.20.200` | `infra/stacks/infra/main.tf` cloud-init + new `infra/scripts/setup-forgejo-containerd-mirror.sh` for existing nodes |
+| Forgejo retention CronJob (`0 4 * * *`, dry-run for first 7 days) | new `infra/stacks/forgejo/cleanup.tf` + `infra/stacks/forgejo/files/cleanup.sh` |
+| Forgejo integrity probe CronJob (`*/15 * * * *`) | `infra/stacks/monitoring/modules/monitoring/main.tf` |
+| Make existing alerts instance-aware so they cover both registries | `infra/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl` |
+
+**Smoke test (must pass before declaring Phase 0 done):**
+
+- `docker login forgejo.viktorbarzin.me` succeeds.
+- Push a hello-world image to `forgejo.viktorbarzin.me/viktor/smoketest:1` succeeds.
+- `crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1` from a k8s
+  node succeeds, using the auto-synced `registry-credentials` Secret.
+- A fresh namespace gets the cloned Secret with 4 `auths` entries.
+- Delete the smoketest package via API.
+- Forgejo integrity probe completes once and pushes metrics.
+
+## Phase 1 — Source migration (parallel-safe, no production impact)
+
+For each project the recipe is identical:
+
+1. `git init` + push to `forgejo.viktorbarzin.me/viktor/<name>` —
+   register in Woodpecker via OAuth.
+2. Add `.woodpecker.yml` based on `payslip-ingest/.woodpecker.yml`.
+   Push step uses `woodpeckerci/plugin-docker-buildx` with TWO
+   `repo:` entries (dual-push).
+3. Confirm first build pushes to BOTH registries.
+
+Projects (bake clock starts at "all dual-push"):
+
+| Project | Action |
+|---|---|
+| `claude-agent-service` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
+| `fire-planner` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
+| `wealthfolio-sync` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
+| `hmrc-sync` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
+| `freedify` | Push from monorepo to Forgejo. New `.woodpecker.yml`. (Upstream is gone.) |
+| `payslip-ingest` | Already on Forgejo. Add second `repo:` entry to `.woodpecker.yml`. |
+| `job-hunter` | Already on Forgejo. Add second `repo:` entry. |
+| `beadboard` | Push to Forgejo. New `.woodpecker.yml`. Disable GHA workflow. **Don't archive GitHub yet** (deferred to Phase 3). |
+| `claude-memory-mcp` | Push to Forgejo. New `.woodpecker.yml`. |
+| `infra-ci` | Edit `.woodpecker/build-ci-image.yml` to dual-push. ALSO `docker save | gzip` to `/opt/registry/data/private/_breakglass/` on VM AND `/srv/nfs/forgejo-breakglass/` on NAS. Pin a `latest` symlink. |
+
+Break-glass runbook (`docs/runbooks/forgejo-registry-breakglass.md`)
+documents the recovery path.
+
+## Phase 2 — Bake (≥14 days)
+
+- No `image=` lines change. Pods still pull from
+  `registry.viktorbarzin.me`.
+- **Daily smoke check**: pull a recent image from Forgejo as
+  `cluster-puller`, verify integrity (HEAD on manifest + each blob).
+- **Bake exit criteria**:
+  - Zero `RegistryManifestIntegrityFailure` alerts on Forgejo.
+  - Zero `ContainerNearOOM` for the forgejo pod.
+  - Retention CronJob has run ≥14 times successfully.
+  - At least one full Sunday GC cycle has elapsed.
+  - Switch retention CronJob to `DRY_RUN=false` on day 7, observe
+    until day 14.
+
+## Phase 3 — Cutover (one PR per project, single session)
+
+Order = lowest blast radius first. Each step:
+`image=` flip → `kubectl rollout restart` → verify pull from Forgejo.
+
+1. `payslip-ingest` (`infra/stacks/payslip-ingest/main.tf`)
+2. `job-hunter` (`infra/stacks/job-hunter/main.tf`)
+3. `claude-agent-service` (`infra/stacks/claude-agent-service/main.tf`)
+4. `fire-planner` (`infra/stacks/fire-planner/main.tf`)
+5. `wealthfolio-sync` (`infra/stacks/wealthfolio/main.tf`)
+6. `freedify` (`infra/stacks/freedify/factory/main.tf`)
+7. `chrome-service` (`infra/stacks/chrome-service/main.tf`)
+8. `beads-server` / `beadboard` (`infra/stacks/beads-server/main.tf`).
+   Then `gh repo archive ViktorBarzin/beadboard`.
+9. `infra-ci` — flip `image:` references in 4 `.woodpecker/*.yml`
+   files in the infra repo. Verify next push to master applies cleanly.
+10. `claude-memory-mcp` — update `CLAUDE.md` install instruction from
+    `claude plugins install github:ViktorBarzin/claude-memory-mcp` to
+    `claude plugins install https://forgejo.viktorbarzin.me/viktor/claude-memory-mcp.git`.
+    `gh repo archive ViktorBarzin/claude-memory-mcp`.
+
+## Phase 4 — Decommission
+
+| Step | File / location |
+|---|---|
+| Stop `registry-private` container on VM (10.0.20.10): edit `/opt/registry/docker-compose.yml`, comment out service, `docker compose up -d --remove-orphans`. (Manual SSH — cloud-init won't redeploy on TF apply per memory id=1078.) | live VM |
+| Update cloud-init template to match the new compose file | `infra/stacks/infra/main.tf:288` |
+| Delete `auths` entries for `registry.viktorbarzin.me` / `:5050` / `10.0.20.10:5050` from the dockerconfigjson | `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` |
+| Drop `registry.viktorbarzin.me` and `10.0.20.10:5050` `hosts.toml` entries on each node + cloud-init template | `infra/stacks/infra/main.tf` cloud-init + ad-hoc script |
+| After 1 week of no incidents, delete `/opt/registry/data/private/` blob storage on the VM (~2.6GB freed) | manual SSH |
+
+## Phase 5 — Docs
+
+In the same commit as the Phase 4 closing:
+
+| Doc | Update |
+|---|---|
+| `docs/runbooks/registry-vm.md` | Note `registry-private` is gone; pull-through caches and break-glass tarballs only |
+| `docs/runbooks/registry-rebuild-image.md` | Replaced by NEW `forgejo-registry-rebuild-image.md` |
+| `docs/runbooks/forgejo-registry-rebuild-image.md` (NEW) | Forgejo PVC restore procedure |
+| `docs/runbooks/forgejo-registry-breakglass.md` (NEW) | infra-ci tarball recovery |
+| `docs/architecture/ci-cd.md` | Image registry section flips to Forgejo |
+| `docs/architecture/monitoring.md` | Integrity probe target updated |
+| `infra/.claude/CLAUDE.md` | Registry references updated |
+| `CLAUDE.md` (monorepo root) | claude-memory-mcp install URL updated |
+| `infra/.claude/reference/service-catalog.md` | Cross-reference checked |
+
+## Critical files modified
+
+| File | Phase | What |
+|---|---|---|
+| `infra/stacks/forgejo/main.tf` | 0 | Memory bump, packages env vars, PVC bump, ingress max_body_size |
+| `infra/stacks/forgejo/cleanup.tf` (NEW) | 0 | Retention CronJob |
+| `infra/stacks/forgejo/files/cleanup.sh` (NEW) | 0 | Retention script (mounted via ConfigMap) |
+| `infra/modules/kubernetes/ingress_factory/main.tf` | 0 | Wire `max_body_size` into a Traefik Buffering middleware |
+| `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` | 0 | Add 4th `auths` entry |
+| `infra/stacks/infra/main.tf` | 0 + 4 | Containerd hosts.toml block (add Forgejo, later remove registry-private); compose template update |
+| `infra/scripts/setup-forgejo-containerd-mirror.sh` (NEW) | 0 | One-shot rollout for existing nodes |
+| `infra/stacks/monitoring/modules/monitoring/main.tf` | 0 | Forgejo integrity probe CronJob |
+| `infra/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl` | 0 | Make alerts instance-aware |
+| `infra/stacks/monitoring/main.tf` | 0 | Plumb `forgejo_pull_token` into module |
+| `infra/.woodpecker/build-ci-image.yml` | 1 | Dual-push to add Forgejo target + tarball break-glass |
+| `<each-project>/.woodpecker.yml` | 1 | Dual-push (NEW for fire-planner, wealthfolio-sync, hmrc-sync, freedify, beadboard, claude-memory-mcp; EDIT for payslip-ingest, job-hunter, claude-agent-service) |
+| `infra/.woodpecker/{default,drift-detection,build-cli}.yml` | 3 | Flip `image:` to Forgejo for infra-ci |
+| `infra/stacks/{beads-server,chrome-service,claude-agent-service,fire-planner,freedify/factory,job-hunter,payslip-ingest,wealthfolio}/main.tf` | 3 | Flip `image =` to Forgejo |
+
+## Verification
+
+- **Push** (Phase 0/1): `docker push forgejo.viktorbarzin.me/viktor/<name>` visible in Forgejo Web UI under viktor/.
+- **Pull** (Phase 0): `crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1` succeeds with auto-synced Secret.
+- **Dual-push** (Phase 1): every Woodpecker pipeline run pushes to BOTH endpoints — confirmed via HEAD checks on `<reg>:<sha>` for both.
+- **Bake** (Phase 2): existing daily Forgejo `/api/healthz` external monitor stays green; integrity probe stays green; no `ContainerNearOOM` for forgejo pod.
+- **Cutover** (Phase 3): `kubectl rollout status deploy/<svc> -n <ns>` succeeds. `kubectl describe pod` shows the image was pulled from `forgejo.viktorbarzin.me`.
+- **Decommission** (Phase 4): `docker ps` on registry VM no longer shows `registry-private`. Brand-new namespace gets the Secret with only the Forgejo `auths` entry. Pull still works.
--- a/docs/runbooks/forgejo-registry-breakglass.md
+++ b/docs/runbooks/forgejo-registry-breakglass.md
@ -0,0 +1,126 @@
+# Runbook: Forgejo registry break-glass — recovering infra-ci
+
+Last updated: 2026-05-07
+
+## When to use this runbook
+
+When **all** of the following are true:
+
+1. Forgejo (`forgejo.viktorbarzin.me`) is unreachable.
+2. `registry-private` is also gone (post-Phase 4 of the consolidation),
+   so you can't fall back to `registry.viktorbarzin.me:5050/infra-ci`.
+3. You need to run an infra Woodpecker pipeline (apply, build-cli,
+   drift-detection, etc.) — but those pipelines pull `infra-ci` and
+   crash because the registry is down.
+
+If only Forgejo is down but `registry-private` is still alive, the
+pipelines work — `image:` references in `infra/.woodpecker/*.yml`
+still hit `registry.viktorbarzin.me:5050/infra-ci` until Phase 3
+flips them. Skip this runbook entirely.
+
+## What's available
+
+The `build-ci-image.yml` Woodpecker pipeline saves a tarball after
+each successful push:
+
+| Location | Path |
+|---|---|
+| Registry VM disk (10.0.20.10) | `/opt/registry/data/private/_breakglass/infra-ci-<sha>.tar.gz` |
+| Registry VM disk (latest symlink) | `/opt/registry/data/private/_breakglass/infra-ci-latest.tar.gz` |
+| Synology NAS (offsite copy via daily-backup sync) | `/volume1/Backup/Viki/pve-backup/_forgejo-breakglass/` |
+
+The registry VM keeps the last 5 tarballs. Synology mirrors them
+through the existing offsite-sync-backup job (`/usr/local/bin/
+offsite-sync-backup`).
+
+## Recovery procedure
+
+The goal is to get a working `infra-ci` image onto a k8s node so
+Woodpecker pods can run it. Then run a Woodpecker pipeline that
+restores Forgejo from PVC backup or rebuilds it.
+
+### Step 1 — copy the tarball to a node
+
+From your workstation (the registry VM is reachable but Forgejo is
+not — the rest of the cluster might be in a similar partial state):
+
+```bash
+ssh wizard@10.0.20.103  # any responsive k8s node
+sudo mkdir -p /var/breakglass
+sudo scp root@10.0.20.10:/opt/registry/data/private/_breakglass/infra-ci-latest.tar.gz \
+  /var/breakglass/
+```
+
+If the registry VM is also down, fall back to Synology:
+
+```bash
+sudo scp 192.168.1.13:/volume1/Backup/Viki/pve-backup/_forgejo-breakglass/infra-ci-latest.tar.gz \
+  /var/breakglass/
+```
+
+### Step 2 — load into containerd
+
+`docker load` won't help on a k8s node — it loads into the docker
+daemon, which kubelet/containerd doesn't see. Use `ctr`:
+
+```bash
+sudo ctr -n k8s.io images import /var/breakglass/infra-ci-latest.tar.gz
+sudo ctr -n k8s.io images list | grep infra-ci
+```
+
+Confirm the image is tagged with the original repository name
+(`registry.viktorbarzin.me:5050/infra-ci:<sha>` — the tarball was
+saved with that tag, NOT the Forgejo name).
+
+### Step 3 — pin pods to this node
+
+Add a node selector or taint-toleration to whatever pipeline you
+need to run. Simplest: cordon the other nodes briefly so Woodpecker
+schedules onto this one.
+
+```bash
+for n in $(kubectl get nodes -o name | grep -v $(hostname)); do
+  kubectl cordon ${n#node/}
+done
+```
+
+Run the pipeline. After it completes:
+
+```bash
+for n in $(kubectl get nodes -o name); do
+  kubectl uncordon ${n#node/}
+done
+```
+
+### Step 4 — fix the underlying problem
+
+The pipeline you just ran was meant to restore Forgejo. Common
+options:
+
+- **Forgejo PVC corrupt** — `docs/runbooks/forgejo-registry-rebuild-image.md`
+  walks through PVC restore from LVM snapshot or PVE backup.
+- **Forgejo OOM-loop** — bump memory request+limit in
+  `infra/stacks/forgejo/main.tf` and apply.
+- **Forgejo unreachable due to network** — check Traefik, MetalLB,
+  pfSense.
+
+Once Forgejo is back, run `build-ci-image.yml` manually so the
+tarball regenerates with the latest commit.
+
+## Why this exists
+
+The 2026-04-19 post-mortem on the registry-orphan-index incident
+showed that a single registry going corrupt could block ALL infra
+pipelines (because every pipeline pulls `infra-ci` from that
+registry). The dual-push to Forgejo + registry-private removes that
+single-point-of-failure during the bake. After Phase 4
+decommissions registry-private, the tarball is the last line of
+defense.
+
+## Why on the registry VM and not in-cluster
+
+The Forgejo pod and registry-private pod both depend on cluster
+networking + storage. The registry VM is an independent
+non-clustered VM with local storage. If the cluster is in a bad
+state, the VM's disk is still readable from any other host on the
+LAN.
--- a/docs/runbooks/forgejo-registry-rebuild-image.md
+++ b/docs/runbooks/forgejo-registry-rebuild-image.md
@ -0,0 +1,128 @@
+# Runbook: Rebuild an Image on the Forgejo OCI Registry
+
+Last updated: 2026-05-07
+
+## When to use this
+
+Pipelines pulling from `forgejo.viktorbarzin.me/viktor/<image>` fail with:
+
+- `failed to resolve reference … : not found`
+- `manifest unknown`
+- HEAD on a manifest/blob digest returns 404
+- `forgejo-integrity-probe` CronJob in `monitoring` reports
+  `registry_manifest_integrity_failures > 0` for
+  `instance="forgejo.viktorbarzin.me"`
+
+This is the Forgejo equivalent of the registry-private orphan-index
+failure mode (`docs/post-mortems/2026-04-19-registry-orphan-index.md`).
+Cause is usually package-version delete races with an in-flight pull,
+or PVC corruption. Fix is to rebuild the image from source and
+re-push, so Forgejo receives a complete, fresh upload.
+
+If the symptom is different (Forgejo unreachable, PVC OOM,
+authentication failure), use:
+- `docs/runbooks/forgejo-registry-setup.md` for auth + token issues
+- `docs/runbooks/forgejo-registry-breakglass.md` if Forgejo + the
+  cluster are both unreachable
+- `docs/runbooks/restore-pvc-from-backup.md` for PVC corruption
+
+## Phase 1 — Confirm the diagnosis
+
+From any host:
+
+```sh
+REG=forgejo.viktorbarzin.me
+USER=cluster-puller
+PASS="$(vault kv get -field=forgejo_pull_token secret/viktor)"
+IMAGE=viktor/payslip-ingest
+TAG=latest
+
+# 1. Confirm the manifest exists at all.
+curl -sk -u "$USER:$PASS" \
+  -H 'Accept: application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json' \
+  "https://$REG/v2/$IMAGE/manifests/$TAG" | jq '.mediaType, .manifests[].digest // .config.digest'
+
+# 2. HEAD each child / config / layer digest. Any non-200 = confirmed.
+for d in $(curl -sk -u "$USER:$PASS" -H 'Accept: application/vnd.oci.image.index.v1+json' \
+             "https://$REG/v2/$IMAGE/manifests/$TAG" | jq -r '.manifests[].digest // empty'); do
+  code=$(curl -sk -u "$USER:$PASS" -o /dev/null -w '%{http_code}' \
+         -I "https://$REG/v2/$IMAGE/manifests/$d")
+  echo "$d → $code"
+done
+```
+
+The probe's last log run is also a fast way to see what's affected:
+
+```sh
+kubectl -n monitoring logs \
+  $(kubectl -n monitoring get pods -l job-name -o name \
+     | grep forgejo-integrity-probe | head -1)
+```
+
+## Phase 2 — Rebuild and re-push
+
+Forgejo lets you delete a specific package version through the API.
+Doing this **before** the rebuild ensures the new push doesn't
+collide with the half-broken existing entry.
+
+```sh
+# Delete the broken version (replace TAG with the actual tag).
+curl -X DELETE -H "Authorization: token $(vault kv get -field=forgejo_cleanup_token secret/viktor)" \
+  "https://$REG/api/v1/packages/viktor/container/$(basename $IMAGE)/$TAG"
+```
+
+Rebuild via Woodpecker (manual run if the pipeline isn't triggered
+by a code change):
+
+1. Open `https://ci.viktorbarzin.me/repos/<repo>/manual` for the
+   project.
+2. Click **Run pipeline** with `branch=master`.
+3. Wait for the build-and-push step to complete.
+4. Confirm the new version is visible in Forgejo Web UI under
+   `viktor/<image>` → Packages → Container.
+
+## Phase 3 — Restart consumers
+
+Pods that already cached the broken digest may continue using it.
+Force a fresh pull:
+
+```sh
+kubectl rollout restart deploy/<service> -n <ns>
+```
+
+If the pod still fails, the new manifest digest may not have
+propagated through containerd's cache. Drain + restart containerd on
+the affected node:
+
+```sh
+kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
+ssh wizard@<node> sudo systemctl restart containerd
+kubectl uncordon <node>
+```
+
+## Phase 4 — Verify integrity recovery
+
+The next probe run (every 15 min) will report:
+
+```
+registry_manifest_integrity_failures{instance="forgejo.viktorbarzin.me"} 0
+```
+
+The `RegistryManifestIntegrityFailure` alert resolves automatically
+30 minutes after the metric goes back to 0.
+
+## Why this happens
+
+Forgejo's OCI registry stores blobs in its own DB+filesystem. Unlike
+`registry:2` + `distribution`, it doesn't have the
+[`distribution#3324`](https://github.com/distribution/distribution/issues/3324)
+GC-vs-tag-delete race. But it can still reach a broken state if:
+
+- The retention CronJob deletes a version while a pull is in flight
+  on the same digest.
+- The PVC fills up mid-push (`docs/runbooks/restore-pvc-from-backup.md`).
+- A Forgejo upgrade migrates the package schema and a row is dropped.
+
+In all cases the recovery procedure is identical: delete the broken
+version through the API, rebuild from source, force consumers to
+re-pull.
--- a/docs/runbooks/forgejo-registry-setup.md
+++ b/docs/runbooks/forgejo-registry-setup.md
@ -0,0 +1,163 @@
+# Runbook: Forgejo OCI registry — initial setup
+
+Last updated: 2026-05-07
+
+This runbook covers the **one-time** bootstrap of Forgejo's container
+registry, executed during Phase 0 of the registry consolidation plan
+(`docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md`).
+
+After this runbook is complete, the Forgejo OCI registry at
+`forgejo.viktorbarzin.me` accepts pushes from CI and pulls from the
+cluster, with retention and integrity monitoring in place.
+
+## Order of operations
+
+The Terraform stacks reference Vault keys that don't exist on a fresh
+cluster. Create the keys **before** running `scripts/tg apply`.
+
+1. Apply the resource bumps (memory, PVC, ingress body size,
+   packages env vars) — these don't depend on the new Vault keys.
+2. Create the service-account users + PATs in Forgejo.
+3. Push the PATs to Vault.
+4. Apply the rest of Phase 0 (registry-credentials extension,
+   monitoring probe, retention CronJob).
+
+### Step 1 — apply Forgejo deployment bumps
+
+```bash
+cd infra/stacks/forgejo
+scripts/tg apply
+```
+
+Wait for the new pod to come up at the bumped 1Gi memory request and
+the resized 15Gi PVC. Verify packages are enabled:
+
+```bash
+kubectl exec -n forgejo deploy/forgejo -- forgejo manager flush-queues
+kubectl exec -n forgejo deploy/forgejo -- env | grep PACKAGES
+```
+
+### Step 2 — create service-account users
+
+`forgejo admin user create` is idempotent only with
+`--must-change-password=false`. Re-running it on an existing user
+errors out — that's fine; skip on rerun.
+
+```bash
+# cluster-puller — read:package PAT for in-cluster pulls.
+kubectl exec -n forgejo deploy/forgejo -- \
+  forgejo admin user create \
+  --username cluster-puller \
+  --email cluster-puller@viktorbarzin.me \
+  --password "$(openssl rand -base64 24)" \
+  --must-change-password=false
+
+# ci-pusher — write:package PAT for CI dual-push, also reused as the
+# cleanup CronJob credential (write:package includes delete).
+kubectl exec -n forgejo deploy/forgejo -- \
+  forgejo admin user create \
+  --username ci-pusher \
+  --email ci-pusher@viktorbarzin.me \
+  --password "$(openssl rand -base64 24)" \
+  --must-change-password=false
+```
+
+The user passwords are throwaway — we only ever auth via PAT. Forgejo
+admin can reset them at any time from the Web UI.
+
+### Step 3 — generate the PATs
+
+PATs **must** be generated through the Web UI logged in as the
+respective user (the CLI doesn't expose token creation). To log in
+without OAuth (registration is disabled for everyone except `viktor`,
+the admin), use the per-user temporary password from step 2.
+
+For each of `cluster-puller` and `ci-pusher`:
+
+1. Sign out of `viktor`.
+2. Go to `https://forgejo.viktorbarzin.me/user/login` and sign in
+   with the throwaway password.
+3. Settings → Applications → Generate new token.
+4. Name: `cluster-pull` / `ci-push`. **Expiration: never.**
+5. Scopes:
+   - `cluster-puller`: `read:package`
+   - `ci-pusher`: `write:package` (covers read+write+delete)
+6. Save the token shown on the next page — it is **not** displayed again.
+
+For the cleanup CronJob, generate a third PAT on `ci-pusher`:
+
+7. Repeat steps 4-6 with name `cleanup`, scope `write:package`.
+
+### Step 4 — push PATs to Vault
+
+```bash
+vault login -method=oidc
+
+# Read-only, used by the cluster-wide registry-credentials Secret and
+# by the Forgejo integrity probe.
+vault kv patch secret/viktor \
+  forgejo_pull_token=<paste cluster-puller PAT>
+
+# Write+delete, used by the retention CronJob inside Forgejo's
+# namespace.
+vault kv patch secret/viktor \
+  forgejo_cleanup_token=<paste ci-pusher cleanup PAT>
+
+# Write, propagated by vault-woodpecker-sync to all Woodpecker repos.
+vault kv patch secret/ci/global \
+  forgejo_user=ci-pusher \
+  forgejo_push_token=<paste ci-pusher push PAT>
+```
+
+### Step 5 — apply the rest of Phase 0
+
+```bash
+# Registry credential Secret (now reads forgejo_pull_token).
+cd infra/stacks/kyverno && scripts/tg apply
+
+# Monitoring probe + retention CronJob.
+cd infra/stacks/monitoring && scripts/tg apply
+cd infra/stacks/forgejo && scripts/tg apply
+
+# Containerd hosts.toml on each existing k8s node — VM cloud-init
+# only fires on first boot.
+infra/scripts/setup-forgejo-containerd-mirror.sh
+```
+
+## Verification
+
+```bash
+# Login from a workstation with docker.
+echo "<ci-pusher PAT>" | docker login forgejo.viktorbarzin.me -u ci-pusher --password-stdin
+
+# Push a smoketest image.
+docker pull alpine:3.20
+docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
+docker push forgejo.viktorbarzin.me/viktor/smoketest:1
+
+# Pull from a k8s node.
+ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1
+
+# Confirm the cluster-wide Secret was synced into a fresh namespace.
+kubectl create namespace forgejo-smoketest
+kubectl get secret -n forgejo-smoketest registry-credentials \
+  -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys'
+# Expect: ["10.0.20.10:5050", "forgejo.viktorbarzin.me",
+#         "registry.viktorbarzin.me", "registry.viktorbarzin.me:5050"]
+kubectl delete namespace forgejo-smoketest
+
+# Delete the smoketest package via API.
+curl -X DELETE -H "Authorization: token <ci-pusher cleanup PAT>" \
+  https://forgejo.viktorbarzin.me/api/v1/packages/viktor/container/smoketest/1
+```
+
+## When to revisit
+
+- **PAT rotation**: PATs created here have no expiry by design. If a
+  PAT leaks, regenerate via the Web UI and `vault kv patch` the new
+  value into the same key — the next `terragrunt apply` will sync it
+  to all consumers within minutes (Kyverno ClusterPolicy clones the
+  Secret, vault-woodpecker-sync runs every 6h).
+- **New service account**: if a future workload needs different
+  scopes, add a parallel user/PAT here rather than expanding existing
+  PAT scope. Principle of least privilege.
--- a/docs/runbooks/registry-vm.md
+++ b/docs/runbooks/registry-vm.md
@ -1,12 +1,30 @@
 # Runbook: Registry VM (docker-registry, 10.0.20.10)

-Last updated: 2026-04-19
+Last updated: 2026-05-07

-The registry VM hosts `registry.viktorbarzin.me` (private Docker
-registry, htpasswd-auth, NGINX → registry:2). It is an Ubuntu 24.04
-VM on the cluster LAN subnet `10.0.20.0/24`, with a static netplan
-config (no DHCP). Because it sits on a subnet that only has pfSense
-as its gateway, its DNS must be statically configured.
+The registry VM is an Ubuntu 24.04 VM on the cluster LAN subnet
+`10.0.20.0/24`, with a static netplan config (no DHCP). Because it
+sits on a subnet that only has pfSense as its gateway, its DNS must
+be statically configured.
+
+**As of Phase 4 of forgejo-registry-consolidation 2026-05-07** the VM
+no longer hosts the private R/W registry. It hosts pull-through
+caches only:
+
+| Port | Upstream |
+|---|---|
+| 5000 | docker.io (Docker Hub) — auth via dockerhub_registry_password |
+| 5010 | ghcr.io |
+| 5020 | quay.io |
+| 5030 | registry.k8s.io |
+| 5040 | reg.kyverno.io |
+
+The decommissioned private registry (port 5050) is now hosted on
+Forgejo at `forgejo.viktorbarzin.me/viktor/<image>`. See
+`docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md` for the
+migration. Break-glass tarballs of `infra-ci` are still produced on
+each build to `/opt/registry/data/private/_breakglass/` — see
+`docs/runbooks/forgejo-registry-breakglass.md`.

 ## DNS configuration

--- a/docs/runbooks/woodpecker-onboard-forgejo-repo.md
+++ b/docs/runbooks/woodpecker-onboard-forgejo-repo.md
@ -0,0 +1,73 @@
+# Runbook: Onboarding a new Forgejo repo to Woodpecker
+
+Last updated: 2026-05-07
+
+When you create a new repo on `forgejo.viktorbarzin.me`, Woodpecker
+does NOT auto-discover it via the cluster's existing OAuth session.
+The `forgejo` user inside Woodpecker (Forgejo-OAuth'd) needs to:
+
+1. Open `https://ci.viktorbarzin.me/` in a browser.
+2. Log in via Forgejo OAuth (the "Sign in with Forgejo" button).
+3. Click "Add Repository" — your new repo should appear.
+4. Click the toggle to activate it. Woodpecker will:
+   - Add a webhook on the Forgejo repo (push, PR, release events).
+   - Register the repo's `forge_remote_id` in its DB so subsequent
+     hooks deserialize correctly.
+5. Push a commit (or hit "Run pipeline" in Woodpecker UI) — first
+   build fires.
+
+## Why API-only doesn't work
+
+The webhook URL contains a JWT signed with a per-server key that's
+stored in the DB and only accessible at OAuth-flow time. POST'ing
+`/api/repos` as the admin (`ViktorBarzin` GitHub user) returns 500
+because the lookup queries forge-side OAuth state for THAT user,
+which doesn't exist for the Forgejo `viktor` user. We confirmed:
+
+- Direct `POST /api/repos?forge_remote_id=N` → HTTP 500 server-side.
+- Generating a JWT with the agent secret → "token is unverifiable"
+  on hook delivery (the signing key is repo-specific, not the
+  global agent secret).
+
+There's no admin endpoint that side-steps the OAuth flow.
+
+## Bootstrap when UI access isn't available
+
+If you absolutely need to bootstrap a new image without UI access
+(e.g., during an outage), the workaround is:
+
+1. Build locally:
+   ```bash
+   docker build -t forgejo.viktorbarzin.me/viktor/<name>:<tag> /path/to/source
+   docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
+   ```
+2. Or pull from another already-built source and retag:
+   ```bash
+   docker pull viktorbarzin/<name>:<tag>          # DockerHub
+   docker tag  viktorbarzin/<name>:<tag>     forgejo.viktorbarzin.me/viktor/<name>:<tag>
+   docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
+   ```
+3. Flip the cluster `image=` reference and restart deployments.
+
+Document the bootstrap in the relevant stack so future maintainers
+know the image was put there by hand. After Woodpecker UI onboarding,
+the next pipeline run replaces the bootstrap image with a CI-built one.
+
+## Repos onboarded in flight 2026-05-07
+
+These were created during the forgejo-registry-consolidation but the
+UI step above hasn't been done yet — their `.woodpecker.yml` /
+`.woodpecker/build.yml` exists on Forgejo but no pipeline fires:
+
+- `viktor/broker-sync` — image bootstrapped via DockerHub (see
+  `infra/stacks/wealthfolio/main.tf` comment).
+- `viktor/fire-planner` — image bootstrapped via local docker build.
+- `viktor/hmrc-sync`
+- `viktor/freedify`
+- `viktor/claude-agent-service`
+- `viktor/beadboard` — image bootstrapped via local docker build.
+- `viktor/claude-memory-mcp`
+
+Walk through each in the Woodpecker UI to enable. Pipelines for
+already-onboarded repos (payslip-ingest, job-hunter, infra) fired
+correctly after the v3.13 → v3.14 upgrade.
--- a/modules/docker-registry/docker-compose.yml
+++ b/modules/docker-registry/docker-compose.yml
@ -89,35 +89,26 @@ services:
      retries: 3
      start_period: 10s

-  registry-private:
-    image: registry:2.8.3
-    container_name: registry-private
-    restart: always
-    volumes:
-      - /opt/registry/data/private:/var/lib/registry
-      - /opt/registry/config-private.yml:/etc/docker/registry/config.yml:ro
-      - /opt/registry/htpasswd:/auth/htpasswd:ro
-    networks:
-      - registry
-    healthcheck:
-      # 401 is expected (auth required) — any HTTP response means the registry is healthy
-      test: ["CMD", "sh", "-c", "wget -qS -O /dev/null http://127.0.0.1:5000/v2/ 2>&1 | grep -q 'HTTP/'"]
-      interval: 30s
-      timeout: 10s
-      retries: 3
-      start_period: 10s
+  # registry-private decommissioned in Phase 4 of
+  # forgejo-registry-consolidation 2026-05-07 — image migration completed,
+  # cluster flipped to forgejo.viktorbarzin.me/viktor/<image>. The remaining
+  # five services on this VM are pull-through caches for upstream registries.
+  # After 1 week of no incidents, `rm -rf /opt/registry/data/private/` on the
+  # VM frees ~2.6 GB. The tarball break-glass under
+  # /opt/registry/data/private/_breakglass/ stays — it's how we recover
+  # infra-ci if Forgejo ever goes fully down.

  nginx:
    image: nginx:alpine
    container_name: registry-nginx
    restart: always
+    # 5050 dropped Phase 4 of forgejo-registry-consolidation 2026-05-07.
    ports:
      - "5000:5000"
      - "5010:5010"
      - "5020:5020"
      - "5030:5030"
      - "5040:5040"
-      - "5050:5050"
    volumes:
      - /opt/registry/nginx.conf:/etc/nginx/nginx.conf:ro
      - /opt/registry/tls:/etc/nginx/tls:ro
@ -135,8 +126,6 @@ services:
        condition: service_healthy
      registry-kyverno:
        condition: service_healthy
-      registry-private:
-        condition: service_healthy
    healthcheck:
      test: ["CMD", "sh", "-c", "wget -qO- http://127.0.0.1:5000/v2/ >/dev/null 2>&1"]
      interval: 30s
--- a/modules/docker-registry/nginx_registry.conf
+++ b/modules/docker-registry/nginx_registry.conf
@ -33,10 +33,9 @@ http {
        keepalive 32;
    }

-    upstream private {
-        server registry-private:5000;
-        keepalive 32;
-    }
+    # `upstream private` removed in Phase 4 of forgejo-registry-consolidation
+    # 2026-05-07. The /v2/ private registry is now Forgejo at
+    # forgejo.viktorbarzin.me/viktor/.

    # --- Docker Hub (port 5000) ---

@ -168,37 +167,8 @@ http {
        }
    }

-    # --- Private R/W Registry (port 5050, TLS) ---
-
-    server {
-        listen 5050 ssl;
-        server_name registry.viktorbarzin.me;
-
-        ssl_certificate /etc/nginx/tls/fullchain.pem;
-        ssl_certificate_key /etc/nginx/tls/privkey.pem;
-        ssl_protocols TLSv1.2 TLSv1.3;
-
-        client_max_body_size 0;
-        proxy_request_buffering off;
-        proxy_buffering off;
-        chunked_transfer_encoding on;
-
-        location /v2/ {
-            proxy_pass http://private;
-            proxy_http_version 1.1;
-            proxy_set_header Host $http_host;
-            proxy_set_header Connection "";
-            proxy_set_header X-Real-IP $remote_addr;
-            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
-            proxy_set_header X-Forwarded-Proto $scheme;
-
-            proxy_read_timeout 900;
-            proxy_send_timeout 900;
-        }
-
-        location / {
-            return 200 'ok';
-            add_header Content-Type text/plain;
-        }
-    }
+    # --- Private R/W Registry (port 5050) decommissioned Phase 4 2026-05-07 ---
+    # The TLS port 5050 server block previously fronted `registry-private`.
+    # Migrated to Forgejo at forgejo.viktorbarzin.me/viktor/. Both
+    # docker-compose.yml and this nginx config no longer reference port 5050.
 }
--- a/modules/kubernetes/ingress_factory/main.tf
+++ b/modules/kubernetes/ingress_factory/main.tf
@ -40,8 +40,9 @@ variable "ingress_path" {
  default = ["/"]
 }
 variable "max_body_size" {
-  type    = string
-  default = "50m"
+  type        = string
+  default     = null
+  description = "Maximum request body size, e.g. '5g'. null = no limit (Traefik default). When set, a per-ingress Buffering middleware is created and attached."
 }
 variable "extra_annotations" {
  default = {}
@ -203,6 +204,17 @@ locals {
    "gethomepage.dev/href"    = "https://${local.effective_host}"
    "gethomepage.dev/icon"    = "${replace(var.name, "-", "")}.png"
  } : {}
+
+  # Parse "5g"/"50m"/"1024k"/"42" into bytes. Traefik's Buffering middleware
+  # takes maxRequestBodyBytes as an integer. Empty unit = bytes.
+  body_size_match = var.max_body_size == null ? null : regex("^([0-9]+)([kmgKMG]?)$", var.max_body_size)
+  body_size_unit_multiplier = var.max_body_size == null ? 0 : (
+    lower(local.body_size_match[1]) == "g" ? 1073741824 :
+    lower(local.body_size_match[1]) == "m" ? 1048576 :
+    lower(local.body_size_match[1]) == "k" ? 1024 :
+    1
+  )
+  max_body_size_bytes = var.max_body_size == null ? 0 : tonumber(local.body_size_match[0]) * local.body_size_unit_multiplier
 }


@ -245,6 +257,7 @@ resource "kubernetes_ingress_v1" "proxied-ingress" {
        var.protected ? "traefik-authentik-forward-auth@kubernetescrd" : null,
        var.allow_local_access_only ? "traefik-local-only@kubernetescrd" : null,
        var.custom_content_security_policy != null ? "${var.namespace}-custom-csp-${var.name}@kubernetescrd" : null,
+        var.max_body_size != null ? "${var.namespace}-buffering-${var.name}@kubernetescrd" : null,
      ], var.extra_middlewares)))
      "traefik.ingress.kubernetes.io/router.entrypoints" = "websecure"
      }, local.homepage_defaults, var.extra_annotations,
@ -302,6 +315,27 @@ resource "kubernetes_manifest" "custom_csp" {
  }
 }

+# Buffering middleware - created per service when max_body_size is set.
+# Traefik default is unlimited; setting maxRequestBodyBytes enforces a limit
+# (e.g. Forgejo container pushes can ship multi-GB layer blobs).
+resource "kubernetes_manifest" "buffering" {
+  count = var.max_body_size != null ? 1 : 0
+
+  manifest = {
+    apiVersion = "traefik.io/v1alpha1"
+    kind       = "Middleware"
+    metadata = {
+      name      = "buffering-${var.name}"
+      namespace = var.namespace
+    }
+    spec = {
+      buffering = {
+        maxRequestBodyBytes = local.max_body_size_bytes
+      }
+    }
+  }
+}
+
 # Cloudflare DNS records — created automatically when dns_type is set.
 # Proxied: CNAME to Cloudflare tunnel. Non-proxied: A + AAAA to public IP.
 resource "cloudflare_record" "proxied" {
--- a/scripts/forgejo-migrate-orphan-images.sh
+++ b/scripts/forgejo-migrate-orphan-images.sh
@ -0,0 +1,76 @@
+#!/usr/bin/env bash
+# One-shot migration of every private image on registry.viktorbarzin.me to
+# Forgejo. Used as a stop-gap when the dual-push CI pipelines aren't
+# producing Forgejo images on their own (Forgejo-Woodpecker forge driver
+# context-deadline-exceeded issue, see bd code-d3y / 2026-05-07).
+#
+# Pulls each image from registry.viktorbarzin.me, retags, pushes to
+# forgejo.viktorbarzin.me/viktor/<name>:<tag> — preserving the blob bytes
+# verbatim so the cluster can flip image= without a rebuild.
+#
+# Run from any host with docker + network reach to BOTH registries. Auth
+# from `docker login` (~/.docker/config.json) — make sure both registries
+# are logged in:
+#   docker login registry.viktorbarzin.me -u viktorbarzin
+#   docker login forgejo.viktorbarzin.me -u viktor   # use viktor PAT, not ci-pusher
+#
+# (ci-pusher CANNOT push to viktor/<image> — Forgejo container packages
+# are scoped to the pushing user. Only viktor's PAT can write to viktor/*.)
+#
+# After the script, the new image lives at
+#   forgejo.viktorbarzin.me/viktor/<name>:<tag>
+# Phase 3 of the consolidation flips infra/stacks/<svc>/main.tf image=
+# to that path.
+
+set -euo pipefail
+
+OLD_REG=registry.viktorbarzin.me
+NEW_REG=forgejo.viktorbarzin.me/viktor
+
+# Image list: <name>:<tag>. Generated 2026-05-07 from `grep -rEn 'image\s*=\s*
+# "registry\.viktorbarzin\.me'` across infra/stacks/.
+#
+# Excluded:
+# - wealthfolio-sync: registry repo exists but has 0 tags (CronJob has been
+#   broken for 36+ days, separate decision needed). User to triage before
+#   migration.
+# - fire-planner: registry repo exists but has 0 tags. Dockerfile + CI added
+#   in this session (commit 8b53d99e); rebuild via Woodpecker before flipping.
+IMAGES=(
+  "chrome-service-novnc:v4"
+  "chrome-service-novnc:latest"
+  "payslip-ingest:latest"
+  "job-hunter:latest"
+  "claude-agent-service:latest"
+  "freedify:latest"
+  "beadboard:latest"
+  "infra-ci:latest"
+)
+
+for img in "${IMAGES[@]}"; do
+  echo "=== $img ==="
+  src="$OLD_REG/$img"
+  dst="$NEW_REG/$img"
+
+  if ! docker pull "$src" 2>&1 | tee /tmp/pull-$$ | grep -q 'Status: '; then
+    if grep -q 'not found' /tmp/pull-$$; then
+      echo "  SKIP — image not present in source registry"
+      rm -f /tmp/pull-$$
+      continue
+    fi
+  fi
+  rm -f /tmp/pull-$$
+
+  echo "  tag → $dst"
+  docker tag "$src" "$dst"
+
+  echo "  push $dst"
+  docker push "$dst" 2>&1 | tail -2
+
+  echo "  cleanup local copy"
+  docker rmi "$src" "$dst" 2>&1 | tail -1 || true
+done
+
+echo ""
+echo "Done. Verify in Forgejo Web UI: https://forgejo.viktorbarzin.me/viktor/-/packages?type=container"
+echo "Phase 3 of the plan flips infra/stacks/{wealthfolio,fire-planner}/main.tf image= references."
--- a/scripts/setup-forgejo-containerd-mirror.sh
+++ b/scripts/setup-forgejo-containerd-mirror.sh
@ -0,0 +1,59 @@
+#!/usr/bin/env bash
+# One-shot deployment of the forgejo.viktorbarzin.me containerd hosts.toml
+# entry across every k8s node. Cloud-init only fires on VM provision, so
+# existing nodes need this manual rollout.
+#
+# What it does, per node:
+#   1. drain (ignore-daemonsets, delete-emptydir-data)
+#   2. ssh in: mkdir + write /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
+#   3. systemctl restart containerd
+#   4. uncordon
+#
+# hosts.toml is documented as hot-reloaded but the post-2026-04-19
+# containerd corruption playbook calls for an explicit restart so the
+# config is unambiguously in effect. Running drain/uncordon around it
+# avoids pulling against an in-flight containerd restart.
+#
+# Re-run is safe: writes are idempotent.
+
+set -euo pipefail
+
+CERTS_DIR=/etc/containerd/certs.d/forgejo.viktorbarzin.me
+HOSTS_TOML='server = "https://forgejo.viktorbarzin.me"
+
+[host."https://10.0.20.200"]
+  capabilities = ["pull", "resolve"]
+'
+
+NODES=$(kubectl get nodes -o name | sed 's|^node/||')
+if [[ -z "$NODES" ]]; then
+  echo "ERROR: no nodes returned from kubectl get nodes" >&2
+  exit 1
+fi
+
+for n in $NODES; do
+  echo "=== $n ==="
+  kubectl drain "$n" --ignore-daemonsets --delete-emptydir-data --force --grace-period=60
+
+  ssh -o StrictHostKeyChecking=accept-new "wizard@$n" sudo bash <<EOF
+set -euo pipefail
+mkdir -p "$CERTS_DIR"
+cat > "$CERTS_DIR/hosts.toml" <<'TOML'
+$HOSTS_TOML
+TOML
+systemctl restart containerd
+EOF
+
+  kubectl uncordon "$n"
+
+  # Wait for the node to report Ready before moving to the next one.
+  for i in {1..30}; do
+    if kubectl get node "$n" -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' | grep -q True; then
+      echo "    node Ready"
+      break
+    fi
+    sleep 2
+  done
+done
+
+echo "All nodes updated."
--- a/stacks/beads-server/main.tf
+++ b/stacks/beads-server/main.tf
@ -567,7 +567,8 @@ resource "kubernetes_deployment" "beadboard" {

        container {
          name  = "beadboard"
-          image = "registry.viktorbarzin.me:5050/beadboard:${var.beadboard_image_tag}"
+          # Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
+          image = "forgejo.viktorbarzin.me/viktor/beadboard:${var.beadboard_image_tag}"

          port {
            name           = "http"
@ -725,7 +726,8 @@ resource "kubernetes_config_map" "beads_metadata" {
 }

 locals {
-  claude_agent_service_image = "registry.viktorbarzin.me/claude-agent-service:${var.claude_agent_service_image_tag}"
+  # Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
+  claude_agent_service_image = "forgejo.viktorbarzin.me/viktor/claude-agent-service:${var.claude_agent_service_image_tag}"
  beadboard_internal_url     = "http://${kubernetes_service.beadboard.metadata[0].name}.${kubernetes_namespace.beads.metadata[0].name}.svc.cluster.local"

  beads_script_prelude = <<-EOT
--- a/stacks/chrome-service/README.md
+++ b/stacks/chrome-service/README.md
@ -0,0 +1,90 @@
+# chrome-service
+
+In-cluster headed Chromium exposed over Playwright's WebSocket protocol.
+Sibling services drive it instead of running their own in-process browser
+— useful when the upstream tries to detect headless mode (e.g. hmembeds'
+`disable-devtool.js` redirect-to-google trap).
+
+## Connect
+
+```python
+from playwright.async_api import async_playwright
+
+WS_URL   = "ws://chrome-service.chrome-service.svc.cluster.local:3000"
+WS_TOKEN = os.environ["CHROME_WS_TOKEN"]   # 32-byte URL-safe random
+
+async with async_playwright() as p:
+    browser = await p.chromium.connect(f"{WS_URL}/{WS_TOKEN}", timeout=15_000)
+    context = await browser.new_context()
+    await context.add_init_script(STEALTH_JS)   # see files/stealth.js
+    page = await context.new_page()
+    ...
+    await browser.close()
+```
+
+The token comes from Vault KV `secret/chrome-service.api_bearer_token`,
+which ESO syncs into a per-namespace K8s Secret in each caller stack
+(see f1-stream's `chrome-service-client-secrets`).
+
+## Add a new caller
+
+1. **Label the caller's namespace** so the chrome-service NetworkPolicy
+   admits it:
+   ```hcl
+   resource "kubernetes_namespace" "<ns>" {
+     metadata {
+       labels = {
+         "chrome-service.viktorbarzin.me/client" = "true"
+       }
+     }
+   }
+   ```
+2. **Add an ExternalSecret** in the caller stack pulling the token:
+   ```hcl
+   resource "kubernetes_manifest" "chrome_token" {
+     manifest = {
+       apiVersion = "external-secrets.io/v1beta1"
+       kind       = "ExternalSecret"
+       metadata = { name = "chrome-service-client-secrets", namespace = "<ns>" }
+       spec = {
+         refreshInterval = "15m"
+         secretStoreRef  = { name = "vault-kv", kind = "ClusterSecretStore" }
+         target          = { name = "chrome-service-client-secrets" }
+         dataFrom        = [{ extract = { key = "chrome-service" } }]
+       }
+     }
+   }
+   ```
+3. **Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN`** into the caller's pod env.
+   Use `secret_key_ref` for the token; the URL is a plain value.
+4. **Vendor `stealth.js`** into the caller (or just paste — it's ~40 lines)
+   and apply via `await context.add_init_script(STEALTH_JS)` after every
+   `new_context()`. Without it, hmembeds-class anti-bot still trips.
+
+## Image pin
+
+Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
+`main.tf`) and the client (`playwright==1.48.0` in callers' requirements)
+must match minor-versions. Bump in lockstep — Playwright protocol changes
+between minors.
+
+## Operations
+
+- **Storage**: encrypted PVC at `/profile` for cookies + npm cache. Ephemeral
+  contexts (`browser.new_context()`) bypass the profile; persistent contexts
+  share it. Backed up tar+gzip every 6h to `/srv/nfs/chrome-service-backup/`,
+  30-day retention.
+- **Probes**: TCP/3000. Playwright run-server has no HTTP `/health`; a TCP
+  open is the only liveness signal available without spinning a browser.
+- **Health page**: visit `https://chrome.viktorbarzin.me` (Authentik-gated)
+  to confirm the pod is up. The WS port stays internal-only.
+- **Token rotation**: `vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
+  Reloader cascades the rotation to both the server pod and any caller
+  whose secret has the `reloader.stakater.com/auto = "true"` annotation.
+
+## Why headed (Xvfb) instead of headless?
+
+`disable-devtool.js` and similar libraries detect `navigator.webdriver`,
+console-clear timing, and the `HeadlessChromium/...` user-agent suffix.
+Running headed inside `Xvfb :99` reports as a normal Chromium, and the
+stealth init script handles the JS-visible giveaways.
--- a/stacks/chrome-service/files/novnc/Dockerfile
+++ b/stacks/chrome-service/files/novnc/Dockerfile
@ -0,0 +1,19 @@
+FROM docker.io/library/ubuntu:24.04
+
+RUN apt-get update \
+ && apt-get install -y --no-install-recommends \
+      x11vnc \
+      novnc \
+      websockify \
+      ca-certificates \
+ && rm -rf /var/lib/apt/lists/*
+
+# noVNC ships /usr/share/novnc/vnc.html; alias to index.html so / works.
+RUN ln -sf /usr/share/novnc/vnc.html /usr/share/novnc/index.html
+
+EXPOSE 6080
+
+COPY entrypoint.sh /entrypoint.sh
+RUN chmod +x /entrypoint.sh
+
+CMD ["/entrypoint.sh"]
--- a/stacks/chrome-service/files/novnc/entrypoint.sh
+++ b/stacks/chrome-service/files/novnc/entrypoint.sh
@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+# Connect to the chrome-service container's Xvfb (shared pod network, TCP)
+# and serve the noVNC HTML5 client + websockify bridge on :6080.
+set -e
+
+for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
+  if echo > /dev/tcp/127.0.0.1/6099 2>/dev/null; then
+    echo "Xvfb TCP up after attempt $i"
+    break
+  fi
+  echo "waiting for Xvfb TCP 6099 attempt=$i"
+  sleep 2
+done
+
+# websockify runs as PID 1; x11vnc is a child so its logs land on container stdout
+# `-noshm` skips MIT-SHM probes that fail across container boundaries (each
+# container has its own /dev/shm); `-noxdamage` skips XDAMAGE which Xvfb
+# doesn't expose; `-quiet` keeps the polling chatter out of pod logs.
+echo "starting x11vnc -> :5900"
+x11vnc -display localhost:99 -nopw -listen 0.0.0.0 -rfbport 5900 \
+       -forever -shared -noshm -noxdamage -quiet 2>&1 &
+X11VNC_PID=$!
+
+for i in 1 2 3 4 5 6 7 8 9 10; do
+  if echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
+    echo "x11vnc bound 5900 after attempt $i"
+    break
+  fi
+  echo "waiting for x11vnc :5900 attempt=$i"
+  sleep 2
+done
+
+if ! echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
+  echo "ERROR: x11vnc did not bind 5900"
+  exit 1
+fi
+
+echo "starting websockify -> :6080"
+exec websockify --web=/usr/share/novnc 6080 localhost:5900
--- a/stacks/chrome-service/files/stealth.js
+++ b/stacks/chrome-service/files/stealth.js
@ -0,0 +1,54 @@
+// Minimal stealth init script for Playwright-driven Chromium.
+// Vendored from puppeteer-extra-plugin-stealth/evasions/* (MIT) — covers:
+//   webdriver, chrome.runtime, navigator.plugins, navigator.languages,
+//   Permissions.query, WebGL getParameter (vendor + renderer spoof).
+// Run via context.add_init_script() so it executes before any page script.
+(() => {
+  // navigator.webdriver — most common detection, removed entirely.
+  Object.defineProperty(Navigator.prototype, 'webdriver', { get: () => undefined });
+
+  // window.chrome.runtime — many sites check that real Chrome exposes this.
+  if (!window.chrome) window.chrome = {};
+  window.chrome.runtime = window.chrome.runtime || {};
+
+  // navigator.plugins — headless reports zero; spoof a plausible PDF viewer.
+  Object.defineProperty(navigator, 'plugins', {
+    get: () => [{ name: 'Chrome PDF Plugin' }, { name: 'Chrome PDF Viewer' }, { name: 'Native Client' }],
+  });
+
+  // navigator.languages — headless returns empty array.
+  Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
+
+  // Permissions.query — headless returns 'denied' for notifications instead of 'default'.
+  const origQuery = window.navigator.permissions && window.navigator.permissions.query;
+  if (origQuery) {
+    window.navigator.permissions.query = (parameters) =>
+      parameters && parameters.name === 'notifications'
+        ? Promise.resolve({ state: Notification.permission })
+        : origQuery(parameters);
+  }
+
+  // WebGL getParameter — spoof vendor + renderer strings to a real GPU.
+  const spoofGl = (proto) => {
+    if (!proto) return;
+    const orig = proto.getParameter;
+    proto.getParameter = function (parameter) {
+      if (parameter === 37445) return 'Intel Inc.';                   // UNMASKED_VENDOR_WEBGL
+      if (parameter === 37446) return 'Intel Iris OpenGL Engine';     // UNMASKED_RENDERER_WEBGL
+      return orig.apply(this, arguments);
+    };
+  };
+  spoofGl(window.WebGLRenderingContext && window.WebGLRenderingContext.prototype);
+  spoofGl(window.WebGL2RenderingContext && window.WebGL2RenderingContext.prototype);
+
+  // disable-devtool.js (theajack/disable-devtool) auto-inits via a script
+  // tag with `disable-devtool-auto`. Its Performance detector trips under
+  // Playwright (CDP adds console.log latency vs console.table) and the
+  // redirect URL is hard-coded — for hmembeds that's google.com.
+  // Hide the auto-init marker so the library's IIFE exits early.
+  const origQS = Document.prototype.querySelector;
+  Document.prototype.querySelector = function (sel) {
+    if (typeof sel === 'string' && sel.indexOf('disable-devtool-auto') !== -1) return null;
+    return origQS.apply(this, arguments);
+  };
+})();
--- a/stacks/chrome-service/main.tf
+++ b/stacks/chrome-service/main.tf
@ -0,0 +1,504 @@
+variable "tls_secret_name" {
+  type      = string
+  sensitive = true
+}
+variable "nfs_server" { type = string }
+
+locals {
+  namespace = "chrome-service"
+  labels = {
+    app = "chrome-service"
+  }
+  # Pin to the same Playwright minor that the Python client requires.
+  # If you bump this image, also bump `playwright==X.Y.Z` in the client
+  # (currently f1-stream) and re-run the connect smoke test.
+  image = "mcr.microsoft.com/playwright:v1.48.0-noble"
+}
+
+# --- Namespace ---
+
+resource "kubernetes_namespace" "chrome_service" {
+  metadata {
+    name = local.namespace
+    labels = {
+      "istio-injection"                       = "disabled"
+      tier                                    = local.tiers.aux
+      "chrome-service.viktorbarzin.me/server" = "true"
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
+    ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
+  }
+}
+
+# --- Secrets (single-key extract: api_bearer_token) ---
+
+resource "kubernetes_manifest" "external_secret" {
+  manifest = {
+    apiVersion = "external-secrets.io/v1beta1"
+    kind       = "ExternalSecret"
+    metadata = {
+      name      = "chrome-service-secrets"
+      namespace = local.namespace
+    }
+    spec = {
+      refreshInterval = "15m"
+      secretStoreRef = {
+        name = "vault-kv"
+        kind = "ClusterSecretStore"
+      }
+      target = {
+        name = "chrome-service-secrets"
+      }
+      dataFrom = [{
+        extract = {
+          key = "chrome-service"
+        }
+      }]
+    }
+  }
+  depends_on = [kubernetes_namespace.chrome_service]
+}
+
+# tls-secret for the chrome.viktorbarzin.me ingress is auto-cloned into
+# every namespace by Kyverno's `sync-tls-secret` ClusterPolicy — no local
+# module call needed.
+
+# --- Encrypted profile PVC ---
+# Holds Chromium user data: cookies, localStorage, IndexedDB. Sites we
+# drive may set auth tokens or session cookies — encrypted is correct.
+resource "kubernetes_persistent_volume_claim" "profile_encrypted" {
+  wait_until_bound = false
+  metadata {
+    name      = "chrome-service-profile-encrypted"
+    namespace = kubernetes_namespace.chrome_service.metadata[0].name
+    annotations = {
+      "resize.topolvm.io/threshold"     = "80%"
+      "resize.topolvm.io/increase"      = "100%"
+      "resize.topolvm.io/storage_limit" = "10Gi"
+    }
+  }
+  spec {
+    access_modes       = ["ReadWriteOnce"]
+    storage_class_name = "proxmox-lvm-encrypted"
+    resources {
+      requests = {
+        storage = "2Gi"
+      }
+    }
+  }
+}
+
+# --- NFS backup target ---
+module "nfs_chrome_service_backup_host" {
+  source     = "../../modules/kubernetes/nfs_volume"
+  name       = "chrome-service-backup-host"
+  namespace  = kubernetes_namespace.chrome_service.metadata[0].name
+  nfs_server = "192.168.1.127"
+  nfs_path   = "/srv/nfs/chrome-service-backup"
+}
+
+# --- Deployment ---
+
+resource "kubernetes_deployment" "chrome_service" {
+  metadata {
+    name      = "chrome-service"
+    namespace = kubernetes_namespace.chrome_service.metadata[0].name
+    labels = merge(local.labels, {
+      tier = local.tiers.aux
+    })
+    annotations = {
+      "reloader.stakater.com/auto" = "true"
+    }
+  }
+  spec {
+    replicas = 1
+    strategy {
+      type = "Recreate"
+    }
+    selector {
+      match_labels = local.labels
+    }
+    template {
+      metadata {
+        labels = local.labels
+      }
+      spec {
+        # The noVNC sidecar pulls from registry.viktorbarzin.me which needs
+        # auth. Kyverno's `sync-registry-credentials` ClusterPolicy syncs
+        # the secret into every namespace.
+        image_pull_secrets {
+          name = "registry-credentials"
+        }
+        security_context {
+          run_as_user  = 1000
+          run_as_group = 1000
+          fs_group     = 1000
+          seccomp_profile {
+            type = "RuntimeDefault"
+          }
+        }
+
+        # Fix profile dir ownership (PVC may have root-owned files from prior run).
+        init_container {
+          name    = "fix-perms"
+          image   = "busybox:1.37"
+          command = ["sh", "-c", "chown -R 1000:1000 /profile"]
+          security_context {
+            run_as_user = 0
+          }
+          volume_mount {
+            name       = "profile"
+            mount_path = "/profile"
+          }
+          resources {
+            requests = { memory = "32Mi" }
+            limits   = { memory = "64Mi" }
+          }
+        }
+
+        container {
+          name              = "chrome-service"
+          image             = local.image
+          image_pull_policy = "IfNotPresent"
+
+          # `launch-server` (not `run-server`) lets us pin headed mode +
+          # specific args. `run-server` defaults to headless, which the
+          # disable-devtool.js Performance detector trips under Playwright
+          # (CDP adds latency to console.log; lib detects + redirects).
+          # The Microsoft image ships only the browsers, not the playwright
+          # npm package itself — `npx -y playwright@<ver>` downloads it on
+          # first start (cached under $HOME/.npm via the PVC) and pins to
+          # the same minor as the Python client. Bump in lockstep.
+          command = ["bash", "-c"]
+          args = [
+            <<-EOT
+            set -e
+            # `-listen tcp` enables localhost:6099 so the noVNC sidecar can
+            # connect over the pod's shared network namespace (Ubuntu 24.04
+            # defaults Xvfb to -nolisten tcp).
+            # `-ac` disables X access control so the noVNC sidecar can
+            # attach without an MIT-MAGIC-COOKIE; safe because Xvfb only
+            # listens on localhost (pod's lo).
+            Xvfb :99 -screen 0 1280x720x24 -listen tcp -ac &
+            sleep 1
+            cat > /tmp/launch.json <<JSON
+            {
+              "headless": false,
+              "port": 3000,
+              "host": "0.0.0.0",
+              "wsPath": "/$${PW_TOKEN}",
+              "args": [
+                "--no-sandbox",
+                "--disable-blink-features=AutomationControlled",
+                "--disable-features=IsolateOrigins,site-per-process",
+                "--autoplay-policy=no-user-gesture-required",
+                "--disable-dev-shm-usage"
+              ]
+            }
+            JSON
+            exec npx -y playwright@1.48.0 launch-server --browser chromium --config /tmp/launch.json
+            EOT
+          ]
+
+          env {
+            name  = "DISPLAY"
+            value = ":99"
+          }
+          env {
+            name  = "HOME"
+            value = "/profile"
+          }
+          env {
+            name = "PW_TOKEN"
+            value_from {
+              secret_key_ref {
+                name = "chrome-service-secrets"
+                key  = "api_bearer_token"
+              }
+            }
+          }
+
+          port {
+            name           = "ws"
+            container_port = 3000
+            protocol       = "TCP"
+          }
+
+          # Playwright run-server exposes only the WS endpoint; no /health.
+          liveness_probe {
+            tcp_socket { port = 3000 }
+            initial_delay_seconds = 30
+            period_seconds        = 30
+            failure_threshold     = 3
+          }
+          readiness_probe {
+            tcp_socket { port = 3000 }
+            initial_delay_seconds = 10
+            period_seconds        = 10
+          }
+          startup_probe {
+            tcp_socket { port = 3000 }
+            period_seconds    = 5
+            failure_threshold = 24 # up to 2 minutes
+          }
+
+          volume_mount {
+            name       = "profile"
+            mount_path = "/profile"
+          }
+          volume_mount {
+            name       = "dshm"
+            mount_path = "/dev/shm"
+          }
+
+          resources {
+            requests = {
+              cpu    = "200m"
+              memory = "1500Mi"
+            }
+            limits = {
+              memory = "2Gi"
+            }
+          }
+        }
+
+        # noVNC sidecar — exposes a live HTML5 view of the headed Chromium
+        # session via x11vnc + websockify, gated by the Authentik-protected
+        # ingress at chrome.viktorbarzin.me. WS port 3000 (the Playwright
+        # endpoint) stays internal-only.
+        container {
+          name = "novnc"
+          # Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
+          image             = "forgejo.viktorbarzin.me/viktor/chrome-service-novnc:v4"
+          image_pull_policy = "IfNotPresent"
+          port {
+            name           = "http"
+            container_port = 6080
+            protocol       = "TCP"
+          }
+          # x11vnc connects to the chrome-service container's Xvfb over
+          # localhost TCP (shared pod network). Same uid 1000 as chrome
+          # container so we can read MIT-MAGIC-COOKIE if Xvfb adds one.
+          resources {
+            requests = { cpu = "10m", memory = "32Mi" }
+            limits   = { memory = "96Mi" }
+          }
+        }
+
+        volume {
+          name = "profile"
+          persistent_volume_claim {
+            claim_name = kubernetes_persistent_volume_claim.profile_encrypted.metadata[0].name
+          }
+        }
+        volume {
+          name = "dshm"
+          empty_dir {
+            medium     = "Memory"
+            size_limit = "256Mi"
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
+    ignore_changes = [spec[0].template[0].spec[0].dns_config]
+  }
+}
+
+# --- Services ---
+# WS endpoint (internal only, gated by NetworkPolicy + token).
+resource "kubernetes_service" "chrome_service" {
+  metadata {
+    name      = "chrome-service"
+    namespace = kubernetes_namespace.chrome_service.metadata[0].name
+    labels    = local.labels
+  }
+
+  spec {
+    selector = local.labels
+    port {
+      name        = "ws"
+      port        = 3000
+      target_port = 3000
+      protocol    = "TCP"
+    }
+  }
+}
+
+# noVNC view (Authentik-gated, exposed via ingress).
+resource "kubernetes_service" "chrome_novnc" {
+  metadata {
+    name      = "chrome"
+    namespace = kubernetes_namespace.chrome_service.metadata[0].name
+    labels    = local.labels
+  }
+
+  spec {
+    selector = local.labels
+    port {
+      name        = "http"
+      port        = 80
+      target_port = 6080
+      protocol    = "TCP"
+    }
+  }
+}
+
+module "ingress" {
+  source          = "../../modules/kubernetes/ingress_factory"
+  dns_type        = "proxied"
+  namespace       = kubernetes_namespace.chrome_service.metadata[0].name
+  name            = "chrome"
+  tls_secret_name = var.tls_secret_name
+  protected       = true
+  # noVNC defaults to /vnc.html — auto-redirect / there.
+  ingress_path = ["/"]
+  extra_annotations = {
+    "gethomepage.dev/enabled"     = "true"
+    "gethomepage.dev/name"        = "Chrome Service"
+    "gethomepage.dev/description" = "Live noVNC view of headed Chromium"
+    "gethomepage.dev/icon"        = "chromium.png"
+    "gethomepage.dev/group"       = "Infrastructure"
+  }
+}
+
+# --- NetworkPolicy: scoped ingress.
+# - TCP/3000 (Playwright WS): only from labelled client namespaces.
+# - TCP/6080 (noVNC HTTP+WS): only from the traefik namespace, since the
+#   public-facing path is `chrome.viktorbarzin.me` ingress → Traefik →
+#   sidecar. Authentik forward-auth still gates external access at the
+#   Traefik layer.
+# The cluster has no default-deny, so this NP only takes effect inside
+# chrome-service ns — pods elsewhere remain unaffected.
+resource "kubernetes_network_policy_v1" "ws_ingress" {
+  metadata {
+    name      = "chrome-service-ws-ingress"
+    namespace = kubernetes_namespace.chrome_service.metadata[0].name
+  }
+  spec {
+    pod_selector {
+      match_labels = local.labels
+    }
+    policy_types = ["Ingress"]
+    ingress {
+      from {
+        namespace_selector {
+          match_labels = {
+            "chrome-service.viktorbarzin.me/client" = "true"
+          }
+        }
+      }
+      # Explicit fallback list — admit f1-stream by name in case the label
+      # is removed by accident. Keep this in sync with the labels above.
+      from {
+        namespace_selector {
+          match_labels = {
+            "kubernetes.io/metadata.name" = "f1-stream"
+          }
+        }
+      }
+      ports {
+        port     = "3000"
+        protocol = "TCP"
+      }
+    }
+    ingress {
+      from {
+        namespace_selector {
+          match_labels = {
+            "kubernetes.io/metadata.name" = "traefik"
+          }
+        }
+      }
+      ports {
+        port     = "6080"
+        protocol = "TCP"
+      }
+    }
+  }
+}
+
+# --- Backup CronJob: tar+gzip the profile every 6h, 30-day retention. ---
+resource "kubernetes_cron_job_v1" "chrome_service_backup" {
+  metadata {
+    name      = "chrome-service-backup"
+    namespace = kubernetes_namespace.chrome_service.metadata[0].name
+  }
+  spec {
+    concurrency_policy            = "Replace"
+    failed_jobs_history_limit     = 3
+    successful_jobs_history_limit = 1
+    schedule                      = "47 */6 * * *"
+    starting_deadline_seconds     = 60
+    job_template {
+      metadata {}
+      spec {
+        backoff_limit              = 2
+        ttl_seconds_after_finished = 300
+        template {
+          metadata {}
+          spec {
+            # PVC is RWO — colocate the backup pod with the chrome-service
+            # pod so both can mount the volume on the same node.
+            affinity {
+              pod_affinity {
+                required_during_scheduling_ignored_during_execution {
+                  label_selector {
+                    match_labels = local.labels
+                  }
+                  topology_key = "kubernetes.io/hostname"
+                }
+              }
+            }
+            container {
+              name  = "backup"
+              image = "docker.io/library/alpine:3.20"
+              command = ["/bin/sh", "-c", <<-EOT
+                set -euxo pipefail
+                ts=$(date +"%Y_%m_%d_%H")
+                tar -czf /backup/$${ts}.tar.gz -C /profile .
+                find /backup -maxdepth 1 -type f -name '*.tar.gz' -mtime +30 -delete
+                echo "Backup complete: $${ts}.tar.gz"
+              EOT
+              ]
+              volume_mount {
+                name       = "profile"
+                mount_path = "/profile"
+                read_only  = true
+              }
+              volume_mount {
+                name       = "backup"
+                mount_path = "/backup"
+              }
+              resources {
+                requests = { cpu = "10m", memory = "32Mi" }
+                limits   = { memory = "64Mi" }
+              }
+            }
+            volume {
+              name = "profile"
+              persistent_volume_claim {
+                claim_name = kubernetes_persistent_volume_claim.profile_encrypted.metadata[0].name
+              }
+            }
+            volume {
+              name = "backup"
+              persistent_volume_claim {
+                claim_name = module.nfs_chrome_service_backup_host.claim_name
+              }
+            }
+            restart_policy = "OnFailure"
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
+    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
+  }
+}
--- a/stacks/chrome-service/terragrunt.hcl
+++ b/stacks/chrome-service/terragrunt.hcl
@ -0,0 +1,8 @@
+include "root" {
+  path = find_in_parent_folders()
+}
+
+dependency "platform" {
+  config_path  = "../platform"
+  skip_outputs = true
+}
--- a/stacks/claude-agent-service/main.tf
+++ b/stacks/claude-agent-service/main.tf
@ -10,7 +10,8 @@ data "vault_kv_secret_v2" "viktor_secrets" {

 locals {
  namespace = "claude-agent"
-  image     = "registry.viktorbarzin.me/claude-agent-service"
+  # Phase 3 cutover 2026-05-07 — see infra/docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md.
+  image     = "forgejo.viktorbarzin.me/viktor/claude-agent-service"
  image_tag = "2fd7670d"
  labels = {
    app = "claude-agent-service"
--- a/stacks/claude-memory/main.tf
+++ b/stacks/claude-memory/main.tf
@ -175,8 +175,10 @@ resource "kubernetes_deployment" "claude-memory" {
          }
        }
        container {
-          name  = "claude-memory"
-          image = "viktorbarzin/claude-memory-mcp:17"
+          name = "claude-memory"
+          # Phase 3 cutover 2026-05-07 — moved off DockerHub to Forgejo as
+          # part of the registry consolidation. Old: viktorbarzin/claude-memory-mcp:17
+          image = "forgejo.viktorbarzin.me/viktor/claude-memory-mcp:17"

          port {
            container_port = 8000
--- a/stacks/f1-stream/files/Dockerfile
+++ b/stacks/f1-stream/files/Dockerfile
@ -14,9 +14,26 @@ FROM python:3.13-slim-bookworm

 WORKDIR /app

+# Headless Chromium runtime libs for the playback verifier. Listed inline
+# (instead of running `playwright install-deps`) so the image build doesn't
+# need root-network apt fetches at runtime.
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ca-certificates \
+    libnss3 libnspr4 \
+    libatk1.0-0 libatk-bridge2.0-0 libcups2 \
+    libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
+    libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 \
+    libasound2 libatspi2.0-0 \
+    fonts-liberation fonts-noto-color-emoji \
+ && rm -rf /var/lib/apt/lists/*
+
 COPY backend/requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt

+# Install the Chromium browser binary used by the verifier. Skip
+# --with-deps because we already installed the system libs above.
+RUN playwright install chromium
+
 COPY backend/ ./backend/

 # Copy built frontend into the image
--- a/stacks/f1-stream/files/backend/embed_proxy.py
+++ b/stacks/f1-stream/files/backend/embed_proxy.py
@ -0,0 +1,359 @@
+"""Embed iframe-stripping reverse proxy.
+
+Serves third-party embed pages (e.g. https://hmembeds.one/embed/{hash},
+https://pooembed.eu/embed/{slug}) through our origin so we can:
+
+1. Strip X-Frame-Options and Content-Security-Policy: frame-ancestors headers,
+   so the embed loads in our <iframe> regardless of upstream policy.
+2. Inject <base> + a frame-buster-defeat <script> at the top of <head> so
+   the embed's JS sees `window.top === window` and a plausible
+   `document.referrer` pointing at the upstream origin.
+3. Forward Referer / User-Agent matching the upstream's own pages so
+   the upstream's hotlink / origin-allowlist checks pass.
+
+Two endpoints:
+- GET /embed?url=<base64url> — the embed HTML page (rewritten).
+- GET /embed-asset?url=<base64url> — fallback for any subresource the
+  upstream blocks based on hotlink protection. Most assets load directly
+  via the injected <base> tag and bypass our proxy.
+"""
+
+import logging
+import re
+from typing import AsyncGenerator
+from urllib.parse import urlparse
+
+import httpx
+from fastapi import HTTPException
+
+from backend.m3u8_rewriter import decode_url
+
+logger = logging.getLogger(__name__)
+
+EMBED_TIMEOUT = 20.0
+ASSET_TIMEOUT = 30.0
+RELAY_CHUNK_SIZE = 65536
+
+USER_AGENT = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/120.0.0.0 Safari/537.36"
+)
+
+# Response headers we never forward (they break frame embedding or leak upstream policy).
+STRIP_RESPONSE_HEADERS = {
+    "x-frame-options",
+    "content-security-policy",
+    "content-security-policy-report-only",
+    "set-cookie",
+    "report-to",
+    "nel",
+    "permissions-policy",
+    "cross-origin-opener-policy",
+    "cross-origin-embedder-policy",
+    "cross-origin-resource-policy",
+    # let httpx/uvicorn re-set these
+    "transfer-encoding",
+    "content-encoding",
+    "content-length",
+    "connection",
+}
+
+# Inject this <script> at the top of <head> to defeat JS frame-busters.
+# - Locks window.top, window.parent, and window.self to the embed window
+#   itself, so `self !== window.top` checks pass.
+# - Forces document.referrer to the upstream origin so allowlist checks
+#   like `document.referrer.includes("timstreams.net")` keep working.
+# - No-ops anything that would call window.parent.location or attempt to
+#   reload the top frame.
+_FRAME_BUSTER_DEFEAT_TEMPLATE = """
+<script>(function(){{
+  try {{
+    var fakeWindow = window;
+    Object.defineProperty(window, 'top', {{get: function(){{return fakeWindow;}}, configurable: false}});
+    Object.defineProperty(window, 'parent', {{get: function(){{return fakeWindow;}}, configurable: false}});
+    Object.defineProperty(window, 'frameElement', {{get: function(){{return null;}}, configurable: false}});
+    Object.defineProperty(document, 'referrer', {{get: function(){{return {referrer!r};}}, configurable: false}});
+  }} catch (e) {{}}
+  // Defeat the `disable-devtool.js` redirect trap that hmembeds and similar
+  // embed hosts use. The trap fires `console.clear`/`console.table` in a
+  // tight loop, then if it thinks DevTools is open, calls
+  // `window.location = "https://www.google.com"`. We block those redirect
+  // sinks while leaving normal playback unaffected.
+  try {{
+    var noop = function(){{}};
+    console.clear = noop;
+    console.table = noop;
+    console.dir = noop;
+    var loc = window.location;
+    Object.defineProperty(window, 'location', {{
+      get: function(){{ return loc; }},
+      set: function(v){{ /* swallow assignment */ }},
+      configurable: false,
+    }});
+    var origAssign = loc.assign && loc.assign.bind(loc);
+    var origReplace = loc.replace && loc.replace.bind(loc);
+    loc.assign = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origAssign) origAssign(u); }};
+    loc.replace = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origReplace) origReplace(u); }};
+  }} catch (e) {{}}
+
+  // Route all cross-origin fetch/XHR requests through our /embed-asset
+  // proxy. The hmembeds player calls a token-binding endpoint
+  // (hghndasw.gbgdhdffhf.shop/sec/<JWT>) that CORS-rejects requests from
+  // any origin other than hmembeds.one. By rewriting the URL to
+  // /embed-asset?url=..., the browser fetches our same-origin endpoint
+  // (no CORS issue), and our backend fetches the upstream with the
+  // correct Referer/Origin server-side (no CORS issue there either).
+  try {{
+    var b64url = function(s) {{
+      return btoa(unescape(encodeURIComponent(s)))
+        .replace(/\\+/g, '-').replace(/\\//g, '_').replace(/=+$/, '');
+    }};
+    var sameOrigin = function(u) {{
+      try {{ return (new URL(u, document.baseURI || location.href)).origin === location.origin; }}
+      catch (_) {{ return true; }}
+    }};
+    var toAbsolute = function(u) {{
+      try {{ return (new URL(u, document.baseURI || location.href)).toString(); }}
+      catch (_) {{ return u; }}
+    }};
+    var proxify = function(u) {{
+      var abs = toAbsolute(u);
+      if (sameOrigin(abs)) return u;
+      // Don't double-proxy.
+      if (abs.indexOf('/embed-asset?') !== -1 || abs.indexOf('/embed?') !== -1) return u;
+      return location.origin + '/embed-asset?url=' + b64url(abs);
+    }};
+
+    var _fetch = window.fetch && window.fetch.bind(window);
+    if (_fetch) {{
+      window.fetch = function(input, init) {{
+        try {{
+          if (typeof input === 'string') {{
+            return _fetch(proxify(input), init);
+          }} else if (input && input.url) {{
+            var newUrl = proxify(input.url);
+            if (newUrl !== input.url) {{
+              return _fetch(new Request(newUrl, input), init);
+            }}
+          }}
+        }} catch (e) {{}}
+        return _fetch(input, init);
+      }};
+    }}
+
+    var XHR = window.XMLHttpRequest;
+    if (XHR && XHR.prototype && XHR.prototype.open) {{
+      var _open = XHR.prototype.open;
+      XHR.prototype.open = function(method, url) {{
+        try {{ url = proxify(url); }} catch (e) {{}}
+        var args = Array.prototype.slice.call(arguments);
+        args[1] = url;
+        return _open.apply(this, args);
+      }};
+    }}
+  }} catch (e) {{}}
+}})();</script>
+"""
+
+
+def _decode(encoded_url: str) -> str:
+    try:
+        return decode_url(encoded_url)
+    except Exception as e:
+        raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
+
+
+def _filter_headers(upstream_headers: httpx.Headers) -> dict[str, str]:
+    """Forward upstream headers minus the ones we strip."""
+    out: dict[str, str] = {}
+    for k, v in upstream_headers.items():
+        if k.lower() in STRIP_RESPONSE_HEADERS:
+            continue
+        out[k] = v
+    # Always allow our domain to embed and load cross-origin
+    out["Access-Control-Allow-Origin"] = "*"
+    out["X-Frame-Options-Stripped"] = "by-f1-embed-proxy"
+    return out
+
+
+def _make_referer(upstream_url: str) -> str:
+    """Build a plausible Referer header — the upstream's own root."""
+    parsed = urlparse(upstream_url)
+    return f"{parsed.scheme}://{parsed.netloc}/"
+
+
+def _make_origin(upstream_url: str) -> str:
+    parsed = urlparse(upstream_url)
+    return f"{parsed.scheme}://{parsed.netloc}"
+
+
+def _inject_into_head(html: str, upstream_url: str) -> str:
+    """Inject <base> tag + frame-buster defeat script into the response HTML."""
+    parsed = urlparse(upstream_url)
+    base_href = f"{parsed.scheme}://{parsed.netloc}/"
+
+    # The frame-buster-defeat script. Use the upstream's own URL as the spoofed referrer.
+    busted = _FRAME_BUSTER_DEFEAT_TEMPLATE.format(referrer=upstream_url)
+
+    base_tag = f'<base href="{base_href}">'
+
+    injection = base_tag + busted
+
+    # Drop any inline CSP <meta> tags first so they can't override our header strip.
+    html = re.sub(
+        r'<meta[^>]+http-equiv=[\'"]?Content-Security-Policy[\'"]?[^>]*>',
+        "",
+        html,
+        flags=re.IGNORECASE,
+    )
+
+    # Strip disable-devtool.js script tags. The library runs detection heuristics
+    # and redirects on match. Removing it reduces attack surface even with our
+    # location-setter lockdown — saves redundant work and one fewer thing to
+    # bypass in case the lockdown misses an edge case.
+    html = re.sub(
+        r'<script[^>]+(?:disable-devtool|devtool|disabledevtool)[^<]*</script>',
+        "",
+        html,
+        flags=re.IGNORECASE,
+    )
+    html = re.sub(
+        r'<script[^>]+src=["\'][^"\']*disable-devtool[^"\']*["\'][^>]*></script>',
+        "",
+        html,
+        flags=re.IGNORECASE,
+    )
+
+    # Insert immediately after the opening <head> (case-insensitive).
+    head_match = re.search(r"<head[^>]*>", html, flags=re.IGNORECASE)
+    if head_match:
+        idx = head_match.end()
+        return html[:idx] + injection + html[idx:]
+
+    # No <head> — prepend at the start of the document so the script runs first.
+    return injection + html
+
+
+def _looks_blocked_by_anti_bot(content: str) -> bool:
+    """Detect Cloudflare-style challenge interstitials in the upstream body."""
+    sample = content[:4096].lower()
+    markers = (
+        "cf-chl-bypass",
+        "checking your browser",
+        "just a moment",
+        "attention required",
+        "cf-browser-verification",
+    )
+    return any(m in sample for m in markers)
+
+
+async def fetch_embed(encoded_url: str) -> tuple[bytes, dict[str, str], int]:
+    """Fetch an upstream embed page, rewrite the HTML, and return the response.
+
+    Returns: (body_bytes, headers_dict, status_code).
+    Raises HTTPException on transport errors.
+    """
+    url = _decode(encoded_url)
+    logger.info("Embed-proxying: %s", url)
+
+    upstream_headers = {
+        "User-Agent": USER_AGENT,
+        "Referer": _make_referer(url),
+        "Origin": _make_origin(url),
+        "Accept": (
+            "text/html,application/xhtml+xml,application/xml;q=0.9,"
+            "image/avif,image/webp,*/*;q=0.8"
+        ),
+        "Accept-Language": "en-US,en;q=0.9",
+    }
+
+    try:
+        async with httpx.AsyncClient(
+            timeout=EMBED_TIMEOUT,
+            follow_redirects=True,
+        ) as client:
+            response = await client.get(url, headers=upstream_headers)
+    except httpx.TimeoutException:
+        raise HTTPException(status_code=504, detail="Upstream embed timeout")
+    except httpx.HTTPError as e:
+        raise HTTPException(status_code=502, detail=f"Upstream embed error: {e}")
+
+    status_code = response.status_code
+    upstream_ct = response.headers.get("content-type", "")
+    headers_out = _filter_headers(response.headers)
+
+    body = response.content
+
+    # Detect Cloudflare-style challenge so the frontend can show a clear error.
+    if "html" in upstream_ct.lower():
+        text = response.text
+        if _looks_blocked_by_anti_bot(text):
+            logger.warning("Upstream returned anti-bot challenge: %s", url)
+            raise HTTPException(
+                status_code=502,
+                detail="Upstream returned anti-bot challenge — proxy cannot bypass",
+            )
+
+        rewritten = _inject_into_head(text, url)
+        body = rewritten.encode("utf-8")
+        headers_out["Content-Type"] = "text/html; charset=utf-8"
+
+    return body, headers_out, status_code
+
+
+async def relay_asset(
+    encoded_url: str, range_header: str | None
+) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
+    """Relay an upstream subresource (JS/CSS/image/font) as a chunked stream.
+
+    Used as a fallback when an upstream blocks hotlinked assets via Referer
+    or Origin checks. The injected <base> tag handles most of these cases
+    by letting the browser hit upstream directly — the relay is only for
+    the awkward few that need a proxied origin.
+    """
+    url = _decode(encoded_url)
+    logger.debug("Embed-asset relay: %s", url)
+
+    headers = {
+        "User-Agent": USER_AGENT,
+        "Referer": _make_referer(url),
+        "Origin": _make_origin(url),
+        "Accept": "*/*",
+    }
+    if range_header:
+        headers["Range"] = range_header
+
+    client = httpx.AsyncClient(timeout=ASSET_TIMEOUT, follow_redirects=True)
+
+    try:
+        response = await client.send(
+            client.build_request("GET", url, headers=headers),
+            stream=True,
+        )
+    except httpx.TimeoutException:
+        await client.aclose()
+        raise HTTPException(status_code=504, detail="Upstream asset timeout")
+    except httpx.HTTPError as e:
+        await client.aclose()
+        raise HTTPException(status_code=502, detail=f"Upstream asset error: {e}")
+
+    if response.status_code >= 400:
+        await response.aclose()
+        await client.aclose()
+        raise HTTPException(
+            status_code=502,
+            detail=f"Upstream asset returned HTTP {response.status_code}",
+        )
+
+    headers_out = _filter_headers(response.headers)
+
+    async def _stream() -> AsyncGenerator[bytes, None]:
+        try:
+            async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
+                yield chunk
+        finally:
+            await response.aclose()
+            await client.aclose()
+
+    return _stream(), headers_out, response.status_code
--- a/stacks/f1-stream/files/backend/extractors/init.py
+++ b/stacks/f1-stream/files/backend/extractors/init.py
@ -12,12 +12,20 @@ Example:
 """

 from backend.extractors.aceztrims import AceztrimsExtractor
+from backend.extractors.chrome_browser import ChromeBrowserExtractor
+from backend.extractors.curated import CuratedExtractor
+from backend.extractors.dd12 import DD12Extractor
+from backend.extractors.stremio import StremioAddonExtractor
+from backend.extractors.subreddit import SubredditExtractor
 from backend.extractors.daddylive import DaddyLiveExtractor
-from backend.extractors.demo import DemoExtractor
+from backend.extractors.discord_source import DiscordExtractor
 from backend.extractors.models import ExtractedStream
+from backend.extractors.pitsport import PitsportExtractor
+from backend.extractors.ppv import PPVExtractor
 from backend.extractors.registry import ExtractorRegistry
 from backend.extractors.service import ExtractionService
 from backend.extractors.streamed import StreamedExtractor
+from backend.extractors.timstreams import TimStreamsExtractor

 __all__ = [
    "ExtractedStream",
@ -36,10 +44,36 @@ def create_registry() -> ExtractorRegistry:
    registry = ExtractorRegistry()

    # --- Register extractors below ---
-    registry.register(DemoExtractor())
+    # CuratedExtractor previously surfaced two hmembeds 24/7 channels (Sky
+    # Sports F1, DAZN F1) but their JW Player decoder produces an empty
+    # playlist in our environment (error 102630) regardless of headed mode,
+    # IP, or fingerprint we tried. The streams loaded the upstream's ad
+    # overlay but never produced a video element, so they confused users —
+    # disabled until/unless we find a working bypass.
+    # registry.register(CuratedExtractor())
    registry.register(StreamedExtractor())
+    # ChromeBrowserExtractor drives the in-cluster chrome-service via the
+    # CHROME_WS_URL / CHROME_WS_TOKEN env vars to scrape JS-rendered
+    # pages whose m3u8 is computed at runtime.
+    registry.register(ChromeBrowserExtractor())
+    # SubredditExtractor pulls live-stream posts from motorsport subreddits.
+    # Returns embed-type streams; the verifier will visit each via
+    # chrome-service to confirm playability.
+    registry.register(SubredditExtractor())
+    # DD12Extractor scrapes DD12Streams' per-channel pages for the inline
+    # JW Player file URL. The site embeds the m3u8 in HTML so curl-based
+    # parsing is enough — no browser needed.
+    registry.register(DD12Extractor())
+    # StremioAddonExtractor calls Stremio addon HTTP APIs (TvVoo, StremVerse)
+    # which already index Sky F1 / DAZN F1 / Vavoo IPTV channels. No
+    # Stremio client needed — just /stream/<type>/<id>.json calls.
+    registry.register(StremioAddonExtractor())
    registry.register(DaddyLiveExtractor())
    registry.register(AceztrimsExtractor())
+    registry.register(PitsportExtractor())
+    registry.register(PPVExtractor())
+    registry.register(TimStreamsExtractor())
+    registry.register(DiscordExtractor())

    return registry

--- a/stacks/f1-stream/files/backend/extractors/chrome_browser.py
+++ b/stacks/f1-stream/files/backend/extractors/chrome_browser.py
@ -0,0 +1,243 @@
+"""Generic chrome-service-driven extractor.
+
+Drives the in-cluster headed Chromium pool (chrome-service) to load a list
+of stream/aggregator pages, captures any HLS playlist URL the page fetches
+at runtime, and returns one ExtractedStream per discovered playlist.
+
+Unlike the API-based extractors (pitsport/streamed/ppv) this one handles
+sites where the m3u8 is computed by JavaScript at page load time — the
+URL only exists after the page evaluates an obfuscated decoder, fetches a
+token, etc. Curl can't see it; a real browser can.
+
+Add new targets via the `TARGETS` constant below. Each entry is a (label,
+title, page_url) tuple. The extractor visits each URL with a stealthed
+context, waits for the JS to settle, and yields any captured HLS URL.
+"""
+
+import asyncio
+import logging
+import os
+import re
+import urllib.parse
+from dataclasses import dataclass
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+# Best-effort pause between navigation and capture. The decoder usually
+# fires within 5s; 12s gives slow JS time to settle without dragging the
+# extraction round.
+DEFAULT_SETTLE_SECONDS = 12
+
+USER_AGENT = (
+    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+    "AppleWebKit/605.1.15 (KHTML, like Gecko) "
+    "Version/17.4 Safari/605.1.15"
+)
+
+
+@dataclass(frozen=True)
+class _Target:
+    label: str         # site_name (homepage label in the UI)
+    title: str         # human-readable stream title
+    url: str           # page to navigate
+    settle: int = DEFAULT_SETTLE_SECONDS
+
+
+# ---------------------------------------------------------------------------
+# Target list. F1-relevant 24/7 channels and motorsport aggregator pages
+# whose m3u8 is JS-computed. Add freely — each one takes ~12s to scrape.
+# ---------------------------------------------------------------------------
+TARGETS: tuple[_Target, ...] = (
+    # MotoMundo embed pages — the community-curated WordPress site for
+    # MotoGP. Each /e/<id> URL is one of the iframes their "Watch Online"
+    # post lists for the active session (FP/Q/Race). The m3u8 is
+    # JS-computed at load time so a real browser is required to capture
+    # it. Update IDs each weekend to match the current race; subreddit.py
+    # discovers them from the Reddit "[Watch / Download]" thread.
+    _Target(
+        label="MotoMundo",
+        title="MotoGP Live (MotoMundo) — French GP / Le Mans",
+        url="https://motomundo.top/e/9yzn08jk9py4",
+        settle=15,
+    ),
+    _Target(
+        label="MotoMundo",
+        title="MotoGP Live (MotoMundo upns) — French GP / Le Mans",
+        url="https://motomundo.upns.xyz/#kqasde",
+        settle=15,
+    ),
+)
+
+
+# Heuristic to recognise an HLS playlist URL from network capture. Most CDNs
+# use `.m3u8`; some (pushembdz/oe1.ossfeed) disguise the playlist as `.css`
+# under a /out/v… or /hls/ path. Filter out obvious junk (.css for actual
+# stylesheets, .ts segments — we only want the playlist).
+_HLS_URL_RE = re.compile(r"\.m3u8(\?|$)|/out/v[0-9]+/.+\.css(\?|$)|/hls/.+/master\.css(\?|$)")
+_SEGMENT_EXT_RE = re.compile(r"\.(ts|m4s|aac|key)(\?|$)")
+
+
+def _looks_like_hls_playlist(url: str) -> bool:
+    if _SEGMENT_EXT_RE.search(url):
+        return False
+    return bool(_HLS_URL_RE.search(url))
+
+
+def _resolve_chrome_ws() -> str | None:
+    base = os.getenv("CHROME_WS_URL")
+    token = os.getenv("CHROME_WS_TOKEN")
+    if not base or not token:
+        return None
+    return f"{base.rstrip('/')}/{token}"
+
+
+class ChromeBrowserExtractor(BaseExtractor):
+    """Drive chrome-service to capture m3u8 URLs from JS-heavy pages."""
+
+    @property
+    def site_key(self) -> str:
+        return "chrome-browser"
+
+    @property
+    def site_name(self) -> str:
+        return "Chrome Browser"
+
+    async def extract(self) -> list[ExtractedStream]:
+        ws_url = _resolve_chrome_ws()
+        if not ws_url:
+            logger.warning(
+                "[chrome-browser] CHROME_WS_URL/TOKEN not set — extractor disabled"
+            )
+            return []
+
+        try:
+            from playwright.async_api import async_playwright
+        except ImportError:
+            logger.warning("[chrome-browser] playwright not installed — disabled")
+            return []
+
+        # One Playwright instance + one browser connection per extraction
+        # round. Contexts are cheap; the browser is shared.
+        async with async_playwright() as p:
+            try:
+                browser = await p.chromium.connect(ws_url, timeout=15_000)
+            except Exception:
+                logger.exception("[chrome-browser] connect to chrome-service failed")
+                return []
+
+            results: list[ExtractedStream] = []
+            for target in TARGETS:
+                try:
+                    stream = await self._scrape(browser, target)
+                    if stream:
+                        results.append(stream)
+                except Exception:
+                    logger.exception(
+                        "[chrome-browser] failed to scrape %s", target.url
+                    )
+
+            try:
+                await browser.close()
+            except Exception:
+                pass
+
+        logger.info("[chrome-browser] returned %d stream(s)", len(results))
+        return results
+
+    async def _scrape(self, browser, target: _Target) -> ExtractedStream | None:
+        ctx = await browser.new_context(
+            user_agent=USER_AGENT,
+            viewport={"width": 1280, "height": 720},
+            bypass_csp=True,
+        )
+        # Inject the same stealth script the verifier uses so anti-bot
+        # checks don't trip the page before its decoder runs.
+        try:
+            from backend.stealth import STEALTH_JS
+            await ctx.add_init_script(STEALTH_JS)
+        except Exception:
+            pass
+
+        page = await ctx.new_page()
+        captured: list[str] = []
+
+        def on_response(resp):
+            try:
+                if _looks_like_hls_playlist(resp.url):
+                    captured.append(resp.url)
+            except Exception:
+                pass
+
+        page.on("response", on_response)
+        # Some pages (DD12 variants) load the player in a child iframe;
+        # frame events catch nested navigations.
+        page.on(
+            "framenavigated",
+            lambda fr: captured.append(fr.url) if _looks_like_hls_playlist(fr.url) else None,
+        )
+
+        try:
+            await page.goto(target.url, wait_until="domcontentloaded", timeout=20_000)
+        except Exception as e:
+            logger.debug("[chrome-browser] %s goto failed: %s", target.url, e)
+            await ctx.close()
+            return None
+
+        # Let the page's JS settle.
+        await asyncio.sleep(target.settle)
+
+        # Also probe child iframes — `pushembdz`, `pooembed`, `embedsports`
+        # all live behind one. Collect any HLS URL the iframes loaded.
+        for fr in page.frames:
+            if fr is page.main_frame:
+                continue
+            try:
+                # JW Player and Clappr both expose the playing source via
+                # a <video>/`<source>` element after setup completes.
+                sources = await fr.evaluate(
+                    "() => Array.from(document.querySelectorAll('video, source')).map(e => e.currentSrc || e.src || '').filter(s => s.includes('.m3u8') || s.includes('.css'))"
+                )
+                for s in sources:
+                    if _looks_like_hls_playlist(s):
+                        captured.append(s)
+            except Exception:
+                pass
+
+        await ctx.close()
+
+        # Pick the first plausible URL (any subsequent are usually variant
+        # playlists referenced from the master). Prefer URLs that look like
+        # full master playlists.
+        unique = list(dict.fromkeys(captured))
+        if not unique:
+            logger.debug("[chrome-browser] %s yielded no HLS URL", target.url)
+            return None
+
+        # Prefer URLs that look like a master/index playlist over variant
+        # playlists when both are captured.
+        master = next(
+            (u for u in unique if "master" in u.lower() or "index" in u.lower()),
+            unique[0],
+        )
+        # Strip query strings on URLs that include short-lived tokens —
+        # the verifier and frontend re-resolve them per request.
+        # (Some CDNs require the query though; only strip when obvious.)
+        m3u8 = master
+        # Decode URL-encoded characters so the proxy gets a clean URL.
+        m3u8 = urllib.parse.unquote(m3u8)
+
+        logger.info(
+            "[chrome-browser] %s -> %s",
+            target.url, m3u8[:120],
+        )
+        return ExtractedStream(
+            url=m3u8,
+            site_key=self.site_key,
+            site_name=target.label,
+            quality="",
+            title=target.title,
+            stream_type="m3u8",
+        )
--- a/stacks/f1-stream/files/backend/extractors/curated.py
+++ b/stacks/f1-stream/files/backend/extractors/curated.py
@ -0,0 +1,61 @@
+"""Curated extractor — known-good 24/7 F1 channels via direct embed URLs.
+
+Returns a small, hand-picked list of embed URLs that are reliable enough to
+be served as fallback "always-on" streams when the dynamic extractors find
+nothing (e.g. between race weekends, when API providers are down).
+
+These are direct embed URLs. The frontend routes them through /embed so the
+iframe-stripping proxy bypasses any frame-buster JS in the upstream player.
+"""
+
+import logging
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+
+# Curated list. Each entry is a known direct embed URL. These were sourced
+# from the timstreams.py ALWAYS_INCLUDE_HASHES list (Sky Sports F1, DAZN F1)
+# and are documented as 24/7 channels that play F1 content year-round.
+_CURATED_STREAMS = [
+    {
+        "url": "https://hmembeds.one/embed/888520f36cd94c5da4c71fddc1a5fc9b",
+        "title": "Sky Sports F1 (24/7)",
+        "quality": "HD",
+    },
+    {
+        "url": "https://hmembeds.one/embed/fc3a54634d0867b0c02ee3223292e7c6",
+        "title": "DAZN F1 (24/7)",
+        "quality": "HD",
+    },
+]
+
+
+class CuratedExtractor(BaseExtractor):
+    """Returns curated known-good 24/7 F1 channel embed URLs."""
+
+    @property
+    def site_key(self) -> str:
+        return "curated"
+
+    @property
+    def site_name(self) -> str:
+        return "Curated 24/7 Channels"
+
+    async def extract(self) -> list[ExtractedStream]:
+        streams = [
+            ExtractedStream(
+                url=entry["url"],
+                site_key=self.site_key,
+                site_name=self.site_name,
+                quality=entry["quality"],
+                title=entry["title"],
+                stream_type="embed",
+                embed_url=entry["url"],
+            )
+            for entry in _CURATED_STREAMS
+        ]
+        logger.info("[curated] Returning %d curated stream(s)", len(streams))
+        return streams
--- a/stacks/f1-stream/files/backend/extractors/dd12.py
+++ b/stacks/f1-stream/files/backend/extractors/dd12.py
@ -0,0 +1,111 @@
+"""DD12Streams extractor — scrapes inline m3u8 URLs from per-channel pages.
+
+Each DD12 sport page (`/nas`, `/f1`, `/sky`, etc.) renders an iframe to
+`/<channel>c1` which 302-redirects to `/new-<channel>/jwplayer`. That
+page contains a JW Player setup with the m3u8 URL hard-coded inline:
+
+    playerInstance.setup({
+      file: "https://...b-cdn.net/.../master.m3u8",
+      ...
+    });
+
+The JW Player runtime fails in our cluster (same fingerprint trap as
+hmembeds), but we don't need it — the file URL is in the HTML and any
+browser with H.264 codecs can play it directly via hls.js.
+
+Channel discovery: probe a known list. New ones can be added by checking
+DD12's own homepage / nav.
+"""
+
+import logging
+import re
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+BASE = "https://dd12streams.com"
+USER_AGENT = (
+    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+    "AppleWebKit/605.1.15 (KHTML, like Gecko) "
+    "Version/17.4 Safari/605.1.15"
+)
+
+# (path, channel_label, title). Add as DD12 surfaces new channels.
+CHANNELS = (
+    ("nas", "DD12Streams", "NASCAR Cup Series (24/7) — DD12"),
+)
+
+_FILE_URL_RE = re.compile(r"""file\s*:\s*["']([^"']+\.m3u8[^"']*)["']""")
+
+
+class DD12Extractor(BaseExtractor):
+    @property
+    def site_key(self) -> str:
+        return "dd12"
+
+    @property
+    def site_name(self) -> str:
+        return "DD12Streams"
+
+    async def extract(self) -> list[ExtractedStream]:
+        results: list[ExtractedStream] = []
+        async with httpx.AsyncClient(
+            timeout=15.0,
+            follow_redirects=True,
+            headers={"User-Agent": USER_AGENT},
+        ) as client:
+            for path, label, title in CHANNELS:
+                try:
+                    page_url = f"{BASE}/{path}"
+                    resp = await client.get(page_url)
+                    if resp.status_code != 200:
+                        continue
+                    iframe_path = self._extract_iframe(resp.text)
+                    if not iframe_path:
+                        continue
+                    iframe_url = (
+                        iframe_path
+                        if iframe_path.startswith("http")
+                        else f"{BASE}{iframe_path}"
+                    )
+                    iframe_resp = await client.get(
+                        iframe_url, headers={"Referer": page_url}
+                    )
+                    if iframe_resp.status_code != 200:
+                        continue
+                    m3u8 = self._find_m3u8(iframe_resp.text)
+                    if not m3u8:
+                        continue
+                    results.append(
+                        ExtractedStream(
+                            url=m3u8,
+                            site_key=self.site_key,
+                            site_name=label,
+                            quality="",
+                            title=title,
+                            stream_type="m3u8",
+                        )
+                    )
+                except Exception:
+                    logger.debug(
+                        "[dd12] /%s extraction failed", path, exc_info=True
+                    )
+        logger.info("[dd12] Extracted %d stream(s)", len(results))
+        return results
+
+    @staticmethod
+    def _extract_iframe(html: str) -> str | None:
+        m = re.search(
+            r'<iframe[^>]+id=["\']vplayer["\'][^>]+src=["\']([^"\']+)["\']',
+            html,
+        )
+        return m.group(1) if m else None
+
+    @staticmethod
+    def _find_m3u8(html: str) -> str | None:
+        m = _FILE_URL_RE.search(html)
+        return m.group(1) if m else None
--- a/stacks/f1-stream/files/backend/extractors/discord_source.py
+++ b/stacks/f1-stream/files/backend/extractors/discord_source.py
@ -0,0 +1,203 @@
+"""Discord extractor - monitors Discord channels for F1 stream links.
+
+Reads recent messages from configured Discord channels using a user token,
+extracts URLs that look like stream links, and returns them as embed streams.
+"""
+
+import logging
+import os
+import re
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+DISCORD_API = "https://discord.com/api/v9"
+DISCORD_TOKEN = os.getenv("DISCORD_TOKEN", "")
+# Comma-separated channel IDs to monitor
+DISCORD_CHANNELS = os.getenv("DISCORD_CHANNELS", "").split(",")
+# How many messages to fetch per channel
+MESSAGE_LIMIT = 50
+
+USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+
+# URL pattern to match stream links (exclude Discord CDN, images, etc.)
+URL_PATTERN = re.compile(r"https?://[^\s<>\)\]\"']+", re.IGNORECASE)
+
+# Domains that publish news/articles, not playable streams. Discord users share
+# these links during race weekends; they are NOT streams and pollute the list.
+EXCLUDED_DOMAINS = {
+    "discord.com", "discord.gg", "cdn.discordapp.com",
+    "tenor.com", "giphy.com", "imgur.com",
+    "youtube.com", "youtu.be", "twitter.com", "x.com",
+    "reddit.com", "instagram.com", "tiktok.com",
+    "fmhy.net", "github.com", "freemotorsports.com",
+    # News / official sites — never playable embeds
+    "formula1.com", "fia.com", "skysports.com", "motorsport.com",
+    "driverdb.com", "autosport.com", "the-race.com", "racefans.net",
+    "wikipedia.org", "fantasy.formula1.com",
+}
+
+# A URL is treated as a candidate stream embed only if its path looks like
+# a *direct* player/embed page — `/embed/{id}`, `/player/{...}`, `*.m3u8`,
+# `*.php` (legacy iframe1.php style). Aggregator landing pages
+# (`/event/...`, `/watch?session=...`, etc.) are rejected because they
+# show a list of links instead of playing automatically — those produce
+# verifier-passing UI without actual playback.
+_PATH_KEYWORDS = (
+    "/embed/", "/player/", ".m3u8", ".php",
+)
+
+
+def _is_stream_url(url: str) -> bool:
+    """Heuristic: does this URL look like an actual stream/embed/player link?
+
+    Discord users share lots of news links during race weekends. The old
+    filter only blocked specific domains and let everything else through,
+    which produced a stream list dominated by formula1.com news articles.
+    The new filter is positive-match: a URL must contain at least one
+    stream-shaped path keyword to be included.
+    """
+    from urllib.parse import urlparse
+
+    try:
+        parsed = urlparse(url)
+        domain = parsed.netloc.lower()
+        path = parsed.path.lower()
+    except Exception:
+        return False
+
+    if not domain:
+        return False
+
+    for excluded in EXCLUDED_DOMAINS:
+        if excluded in domain:
+            return False
+
+    if any(path.endswith(ext) for ext in (".png", ".jpg", ".jpeg", ".gif", ".webp", ".mp4", ".webm", ".svg", ".css", ".js")):
+        return False
+
+    full = path + ("?" + parsed.query if parsed.query else "")
+    if not any(kw in full for kw in _PATH_KEYWORDS):
+        return False
+
+    return True
+
+
+class DiscordExtractor(BaseExtractor):
+    """Extracts stream links from Discord channel messages.
+
+    Monitors configured Discord channels for URLs shared by users,
+    filters to likely stream links, and returns them as embed streams.
+    """
+
+    @property
+    def site_key(self) -> str:
+        return "discord"
+
+    @property
+    def site_name(self) -> str:
+        return "Discord Community"
+
+    async def extract(self) -> list[ExtractedStream]:
+        """Fetch recent messages from Discord channels and extract URLs."""
+        if not DISCORD_TOKEN:
+            logger.info("[discord] No DISCORD_TOKEN set, skipping")
+            return []
+
+        channels = [c.strip() for c in DISCORD_CHANNELS if c.strip()]
+        if not channels:
+            logger.info("[discord] No DISCORD_CHANNELS configured, skipping")
+            return []
+
+        streams: list[ExtractedStream] = []
+        seen_urls: set[str] = set()
+
+        try:
+            async with httpx.AsyncClient(
+                timeout=15.0,
+                follow_redirects=True,
+                headers={
+                    "Authorization": DISCORD_TOKEN,
+                    "User-Agent": USER_AGENT,
+                },
+            ) as client:
+                for channel_id in channels:
+                    try:
+                        channel_streams = await self._fetch_channel(
+                            client, channel_id, seen_urls
+                        )
+                        streams.extend(channel_streams)
+                    except Exception:
+                        logger.debug(
+                            "[discord] Failed to fetch channel %s",
+                            channel_id,
+                            exc_info=True,
+                        )
+        except Exception:
+            logger.exception("[discord] Failed to connect to Discord API")
+
+        logger.info("[discord] Extracted %d stream(s) from %d channel(s)", len(streams), len(channels))
+        return streams
+
+    async def _fetch_channel(
+        self,
+        client: httpx.AsyncClient,
+        channel_id: str,
+        seen_urls: set[str],
+    ) -> list[ExtractedStream]:
+        """Fetch messages from a single channel and extract stream URLs."""
+        resp = await client.get(
+            f"{DISCORD_API}/channels/{channel_id}/messages",
+            params={"limit": MESSAGE_LIMIT},
+        )
+        if resp.status_code != 200:
+            logger.warning(
+                "[discord] Channel %s returned HTTP %d", channel_id, resp.status_code
+            )
+            return []
+
+        messages = resp.json()
+        if not isinstance(messages, list):
+            return []
+
+        streams: list[ExtractedStream] = []
+
+        for msg in messages:
+            content = msg.get("content", "")
+            author = msg.get("author", {}).get("username", "unknown")
+
+            # Extract URLs from message content
+            urls = URL_PATTERN.findall(content)
+
+            # Also check embeds
+            for embed in msg.get("embeds", []):
+                if embed.get("url"):
+                    urls.append(embed["url"])
+
+            for url in urls:
+                # Clean trailing punctuation
+                url = url.rstrip(".,;:!?)")
+
+                if url in seen_urls:
+                    continue
+                if not _is_stream_url(url):
+                    continue
+
+                seen_urls.add(url)
+                streams.append(
+                    ExtractedStream(
+                        url=url,
+                        site_key=self.site_key,
+                        site_name=self.site_name,
+                        quality="",
+                        title=f"Shared by {author}",
+                        stream_type="embed",
+                        embed_url=url,
+                    )
+                )
+
+        return streams
--- a/stacks/f1-stream/files/backend/extractors/pitsport.py
+++ b/stacks/f1-stream/files/backend/extractors/pitsport.py
@ -0,0 +1,544 @@
+"""Pitsport.xyz extractor - fetches F1 streams from the Next.js RSC payload.
+
+Architecture:
+- Main page (pitsport.xyz) has a "Live Now" section with event cards containing
+  category, title, time, imageUrl props and /watch/{UUID} links.
+- Schedule page (pitsport.xyz/schedule) lists all events grouped by category
+  (h2 headings) with /watch/{UUID} links and event titles.
+- Watch pages (/watch/{UUID}) embed iframes from pushembdz.store/embed/{EMBED_UUID}.
+- Embed pages contain an RSC payload with a stream config: {title, link, method}.
+- When method is "player" or "hls", the link field points to a serveplay.site
+  m3u8 playlist. Otherwise we return the embed URL for iframe playback.
+"""
+
+import logging
+import re
+from dataclasses import dataclass
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+PITSPORT_BASE = "https://pitsport.xyz"
+EMBED_BASE = "https://pushembdz.store"
+USER_AGENT = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/120.0.0.0 Safari/537.36"
+)
+
+# Categories to include (case-insensitive match). Broadened beyond F1
+# to also surface MotoGP and adjacent motorsports — keeps the f1-stream
+# UI useful between race weekends and during the off-season.
+MOTORSPORT_CATEGORIES = {
+    "formula 1", "formula 2", "formula 3",
+    "motogp", "moto gp", "moto2", "moto3", "motoe",
+    "world rally championship", "wrc",
+    "world endurance championship", "wec",
+    "indycar series", "indycar", "indynxt",
+    "nascar cup series", "nascar truck series", "nascar o'reilly auto parts series",
+    "nascar xfinity series", "nascar",
+}
+
+# Title keywords that are strong positives even when the category text
+# is missing (live-now cards sometimes elide it).
+MOTORSPORT_KEYWORDS = {
+    "formula 1", "formula one", "f1",
+    "motogp", "moto gp", "moto2", "moto3",
+    "rally", "wrc",
+    "indycar", "indy car",
+    "nascar",
+    "le mans", "lemans", "wec", "endurance",
+}
+GP_KEYWORD = "grand prix"
+
+
+@dataclass
+class _PitsportEvent:
+    """An event discovered from the Pitsport site."""
+
+    category: str
+    title: str
+    watch_uuid: str
+
+
+def _is_motorsport_category(category: str) -> bool:
+    """Check if a category string matches an included motorsport series."""
+    return category.strip().lower() in MOTORSPORT_CATEGORIES
+
+
+def _is_motorsport_event(category: str, title: str) -> bool:
+    """Accept anything pitsport.xyz lists. Pitsport curates sports
+    broadcasts (WRC, MotoGP, IndyCar, NASCAR, Premier League Darts,
+    Premier League football, etc.) — the site's own selection is the
+    filter we want. Empty/garbage events still get filtered downstream
+    when `_resolve_event_streams` produces no playable URL."""
+    return bool(category or title)
+
+
+# Aliases kept so older call-sites stay compiling. Both now point at the
+# broadened motorsport filter.
+_is_f1_category = _is_motorsport_category
+_is_f1_event = _is_motorsport_event
+
+
+def _parse_live_events(html: str) -> list[_PitsportEvent]:
+    """Parse live events from the main page RSC payload.
+
+    The main page contains event cards with props:
+        category, title, time, imageUrl
+    wrapped in <a href="/watch/{UUID}"> links.
+    """
+    events: list[_PitsportEvent] = []
+
+    # Match event cards in the RSC payload - they appear as JSON-like structures
+    # Pattern: href="/watch/UUID" ... category":"...", "title":"..."
+    # In the RSC payload, the data is in the format:
+    #   ["$","$L2","/watch/UUID",{"href":"/watch/UUID","children":["$","$L10",null,
+    #     {"category":"...","title":"...","time":...,"imageUrl":"..."}]}]
+    pattern = re.compile(
+        r'"href":"(/watch/([0-9a-f-]{36}))"[^}]*?"category":"([^"]+)","title":"([^"]+)"',
+    )
+    for match in pattern.finditer(html):
+        _, uuid, category, title = match.groups()
+        events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
+
+    return events
+
+
+def _parse_schedule_events(html: str) -> list[_PitsportEvent]:
+    """Parse events from the schedule page.
+
+    The schedule page groups events under category headers (h2 elements).
+    In the rendered HTML:
+        <h2 ...>Formula 1</h2>
+        <div ...>
+            <a href="/watch/UUID">...</a>
+            ...
+        </div>
+
+    In the RSC payload, similar structure with section divs containing
+    a category h2 and child event links with titles.
+    """
+    events: list[_PitsportEvent] = []
+
+    # Strategy 1: Parse from rendered HTML
+    # Find category sections: >CategoryName</h2> followed by watch links
+    # Split HTML at each category header
+    section_pattern = re.compile(
+        r'>([^<]+)</h2>\s*<div[^>]*class="flex flex-wrap gap-6">(.*?)(?=</div>\s*</div>\s*(?:<div|</div>|$))',
+        re.DOTALL,
+    )
+    for section_match in section_pattern.finditer(html):
+        category = section_match.group(1).strip()
+        section_html = section_match.group(2)
+
+        # Find all watch links in this section
+        link_pattern = re.compile(
+            r'href="/watch/([0-9a-f-]{36})".*?<h1[^>]*>([^<]+)</h1>',
+            re.DOTALL,
+        )
+        for link_match in link_pattern.finditer(section_html):
+            uuid = link_match.group(1)
+            title = link_match.group(2).strip()
+            events.append(
+                _PitsportEvent(category=category, title=title, watch_uuid=uuid)
+            )
+
+    # Strategy 2: Parse from RSC payload if rendered HTML didn't yield results
+    # The RSC payload has patterns like:
+    #   "children":"Formula 1"}] ... "/watch/UUID" ... "title":"EventTitle"
+    if not events:
+        events = _parse_schedule_rsc(html)
+
+    return events
+
+
+def _parse_schedule_rsc(html: str) -> list[_PitsportEvent]:
+    """Parse events from schedule page RSC payload as fallback.
+
+    Extracts category section divs from the RSC JSON structure.
+    """
+    events: list[_PitsportEvent] = []
+
+    # Find the RSC payload chunks
+    rsc_chunks = re.findall(
+        r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL
+    )
+    if not rsc_chunks:
+        return events
+
+    # Concatenate and unescape
+    full_payload = ""
+    for chunk in rsc_chunks:
+        try:
+            full_payload += chunk.encode().decode("unicode_escape")
+        except Exception:
+            full_payload += chunk
+
+    # Find category sections in the RSC data
+    # Pattern: "children":"CategoryName"}],["$","div",...watch links...
+    # Each section div contains an h2 with the category name and watch links
+    cat_pattern = re.compile(
+        r'border-gray-700 pb-2","children":"([^"]+)"\}.*?'
+        r'(?=border-gray-700 pb-2","children"|$)',
+        re.DOTALL,
+    )
+    for cat_match in cat_pattern.finditer(full_payload):
+        category = cat_match.group(1)
+        section_text = cat_match.group(0)
+
+        # Find watch UUIDs and titles in this section
+        # Pattern: "/watch/UUID" ... "title":"EventTitle"
+        event_pattern = re.compile(
+            r'/watch/([0-9a-f-]{36}).*?"title":"([^"]+)"',
+        )
+        for ev_match in event_pattern.finditer(section_text):
+            uuid = ev_match.group(1)
+            title = ev_match.group(2)
+            events.append(
+                _PitsportEvent(category=category, title=title, watch_uuid=uuid)
+            )
+
+    return events
+
+
+def _parse_embed_uuids(html: str) -> list[str]:
+    """Extract embed UUIDs from a watch page.
+
+    Watch pages contain iframes like:
+        <iframe src="https://pushembdz.store/embed/{EMBED_UUID}" ...>
+
+    And in the RSC payload:
+        "iframe":"https://pushembdz.store/embed/{EMBED_UUID}"
+    """
+    uuids: list[str] = []
+
+    # From rendered HTML
+    iframe_pattern = re.compile(
+        r'pushembdz\.store/embed/([0-9a-f-]{36})',
+    )
+    for match in iframe_pattern.finditer(html):
+        uuid = match.group(1)
+        if uuid not in uuids:
+            uuids.append(uuid)
+
+    return uuids
+
+
+@dataclass
+class _StreamConfig:
+    """Stream configuration extracted from an embed page."""
+
+    title: str
+    link: str
+    method: str
+
+
+def _parse_stream_config(html: str) -> _StreamConfig | None:
+    """Extract stream config from an embed page RSC payload.
+
+    The embed page now uses a `safeStream` payload that elides the link:
+        4:["$","$Ld",null,{"safeStream":{"title":"Rally TV","method":"jwp"},
+           "error":null,"slug":"..."}]
+    The actual stream URL is fetched at runtime via
+    pushembdz.store/api/stream/<slug>. Older payloads used "stream" with
+    inline title+link+method — kept as fallback.
+    """
+    # Current format: safeStream with title + method only (link via API).
+    pattern_safe = re.compile(
+        r'\\?"safeStream\\?"\s*:\s*\{'
+        r'\\?"title\\?"\s*:\s*\\?"([^"\\]+)\\?"\s*,\s*'
+        r'\\?"method\\?"\s*:\s*\\?"([^"\\]+)\\?"',
+    )
+    match = pattern_safe.search(html)
+    if match:
+        return _StreamConfig(
+            title=match.group(1),
+            link="",  # filled in by the caller via the api/stream endpoint
+            method=match.group(2),
+        )
+
+    # Legacy: escaped RSC payload with inline link.
+    pattern = re.compile(
+        r'"stream":\{["\']?\\?"title\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
+        r'["\']?\\?"link\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
+        r'["\']?\\?"method\\?"["\']?:["\']?\\?"([^"\\]+)\\?"',
+    )
+    match = pattern.search(html)
+    if match:
+        return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
+
+    pattern2 = re.compile(
+        r'\\?"stream\\?":\{\\?"title\\?":\\?"([^\\]+)\\?",'
+        r'\\?"link\\?":\\?"([^\\]+)\\?",'
+        r'\\?"method\\?":\\?"([^\\]+)\\?"',
+    )
+    match = pattern2.search(html)
+    if match:
+        return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
+
+    pattern3 = re.compile(
+        r'"stream"\s*:\s*\{\s*"title"\s*:\s*"([^"]+)"\s*,'
+        r'\s*"link"\s*:\s*"([^"]+)"\s*,'
+        r'\s*"method"\s*:\s*"([^"]+)"',
+    )
+    match = pattern3.search(html)
+    if match:
+        return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
+
+    return None
+
+
+def _is_m3u8_method(method: str) -> bool:
+    """Check if the stream method indicates a direct HLS stream."""
+    # `jwp` (current pushembdz format) returns an m3u8 from the api/stream
+    # endpoint regardless of player UI; treat it as HLS.
+    return method.lower() in ("player", "hls", "jwp")
+
+
+def _extract_m3u8_url(link: str) -> str:
+    """Convert a serveplay.site player URL to an m3u8 playlist URL.
+
+    Input:  https://dash.serveplay.site/{channel}/index.html
+    Output: https://dash.serveplay.site/{channel}/index.html
+
+    The index.html IS the m3u8 playlist (served with proper content-type
+    when fetched with the correct Referer header).
+    """
+    return link
+
+
+class PitsportExtractor(BaseExtractor):
+    """Extracts F1 streams from Pitsport.xyz.
+
+    Scrapes the Next.js RSC payload from the main page and schedule page
+    to find F1 events, then resolves embed UUIDs to stream configurations.
+    """
+
+    @property
+    def site_key(self) -> str:
+        return "pitsport"
+
+    @property
+    def site_name(self) -> str:
+        return "Pitsport"
+
+    async def extract(self) -> list[ExtractedStream]:
+        """Fetch F1 events and return stream URLs or embed URLs."""
+        streams: list[ExtractedStream] = []
+
+        try:
+            async with httpx.AsyncClient(
+                timeout=20.0,
+                follow_redirects=True,
+                headers={"User-Agent": USER_AGENT},
+            ) as client:
+                # Fetch both pages to get comprehensive event data
+                events = await self._discover_events(client)
+                logger.info(
+                    "[pitsport] Found %d F1 event(s) to process", len(events)
+                )
+
+                # Deduplicate by watch UUID
+                seen_uuids: set[str] = set()
+                unique_events: list[_PitsportEvent] = []
+                for ev in events:
+                    if ev.watch_uuid not in seen_uuids:
+                        seen_uuids.add(ev.watch_uuid)
+                        unique_events.append(ev)
+
+                # For each event, resolve streams
+                for event in unique_events:
+                    event_streams = await self._resolve_event_streams(
+                        client, event
+                    )
+                    streams.extend(event_streams)
+
+        except Exception:
+            logger.exception("[pitsport] Failed to extract streams")
+
+        logger.info("[pitsport] Extracted %d stream(s)", len(streams))
+        return streams
+
+    async def _discover_events(
+        self, client: httpx.AsyncClient
+    ) -> list[_PitsportEvent]:
+        """Discover F1 events from both main page and schedule page."""
+        all_events: list[_PitsportEvent] = []
+
+        # Fetch main page for live events
+        try:
+            resp = await client.get(PITSPORT_BASE)
+            if resp.status_code == 200:
+                live_events = _parse_live_events(resp.text)
+                logger.info(
+                    "[pitsport] Main page: %d live event(s)", len(live_events)
+                )
+                for ev in live_events:
+                    if _is_f1_event(ev.category, ev.title):
+                        all_events.append(ev)
+            else:
+                logger.warning(
+                    "[pitsport] Main page returned HTTP %d", resp.status_code
+                )
+        except Exception:
+            logger.exception("[pitsport] Failed to fetch main page")
+
+        # Fetch schedule page for upcoming events
+        try:
+            resp = await client.get(f"{PITSPORT_BASE}/schedule")
+            if resp.status_code == 200:
+                schedule_events = _parse_schedule_events(resp.text)
+                logger.info(
+                    "[pitsport] Schedule page: %d total event(s)",
+                    len(schedule_events),
+                )
+                for ev in schedule_events:
+                    if _is_f1_event(ev.category, ev.title):
+                        all_events.append(ev)
+            else:
+                logger.warning(
+                    "[pitsport] Schedule page returned HTTP %d",
+                    resp.status_code,
+                )
+        except Exception:
+            logger.exception("[pitsport] Failed to fetch schedule page")
+
+        return all_events
+
+    async def _resolve_event_streams(
+        self, client: httpx.AsyncClient, event: _PitsportEvent
+    ) -> list[ExtractedStream]:
+        """Resolve an event's watch page to actual stream URLs."""
+        streams: list[ExtractedStream] = []
+
+        try:
+            # Fetch the watch page to get embed UUIDs
+            watch_url = f"{PITSPORT_BASE}/watch/{event.watch_uuid}"
+            resp = await client.get(watch_url)
+            if resp.status_code != 200:
+                logger.debug(
+                    "[pitsport] Watch page %s returned HTTP %d",
+                    event.watch_uuid,
+                    resp.status_code,
+                )
+                return []
+
+            embed_uuids = _parse_embed_uuids(resp.text)
+            if not embed_uuids:
+                logger.debug(
+                    "[pitsport] No embed UUIDs found for %s", event.watch_uuid
+                )
+                return []
+
+            logger.debug(
+                "[pitsport] Event '%s' has %d embed(s)",
+                event.title,
+                len(embed_uuids),
+            )
+
+            # Resolve each embed to a stream config
+            for i, embed_uuid in enumerate(embed_uuids):
+                stream = await self._resolve_embed(
+                    client, embed_uuid, event, stream_num=i + 1
+                )
+                if stream:
+                    streams.append(stream)
+
+        except Exception:
+            logger.debug(
+                "[pitsport] Failed to resolve event %s",
+                event.watch_uuid,
+                exc_info=True,
+            )
+
+        return streams
+
+    async def _resolve_embed(
+        self,
+        client: httpx.AsyncClient,
+        embed_uuid: str,
+        event: _PitsportEvent,
+        stream_num: int,
+    ) -> ExtractedStream | None:
+        """Resolve an embed UUID to a stream configuration."""
+        try:
+            embed_url = f"{EMBED_BASE}/embed/{embed_uuid}"
+            resp = await client.get(embed_url)
+            if resp.status_code != 200:
+                logger.debug(
+                    "[pitsport] Embed page %s returned HTTP %d",
+                    embed_uuid,
+                    resp.status_code,
+                )
+                return None
+
+            config = _parse_stream_config(resp.text)
+            if not config:
+                logger.debug(
+                    "[pitsport] No stream config found in embed %s",
+                    embed_uuid,
+                )
+                return None
+
+            # Build the stream title
+            stream_title = f"{event.category} - {event.title}"
+            if config.title:
+                stream_title += f" ({config.title})"
+            if stream_num > 1:
+                stream_title += f" #{stream_num}"
+
+            # `safeStream` payload elides the link — fetch it from the
+            # pushembdz.store/api/stream/<slug> endpoint. Older `stream`
+            # payloads provided the link inline.
+            link = config.link
+            if not link and _is_m3u8_method(config.method):
+                api_url = f"{EMBED_BASE}/api/stream/{embed_uuid}"
+                try:
+                    api_resp = await client.get(
+                        api_url,
+                        headers={"Referer": embed_url, "Accept": "application/json"},
+                    )
+                    if api_resp.status_code == 200:
+                        link = (api_resp.json() or {}).get("link", "")
+                except Exception:
+                    logger.debug(
+                        "[pitsport] api/stream lookup failed for %s",
+                        embed_uuid,
+                        exc_info=True,
+                    )
+
+            # Treat any HLS-ish URL (m3u8, or pushembdz's .css disguise) as m3u8.
+            looks_hls = link and (".m3u8" in link or link.endswith(".css") or "serveplay.site" in link)
+            if _is_m3u8_method(config.method) and looks_hls:
+                return ExtractedStream(
+                    url=link,
+                    site_key=self.site_key,
+                    site_name=self.site_name,
+                    quality="",
+                    title=stream_title,
+                    stream_type="m3u8",
+                )
+            else:
+                # Iframe embed fallback
+                return ExtractedStream(
+                    url=embed_url,
+                    site_key=self.site_key,
+                    site_name=self.site_name,
+                    quality="",
+                    title=stream_title,
+                    stream_type="embed",
+                    embed_url=embed_url,
+                )
+
+        except Exception:
+            logger.debug(
+                "[pitsport] Failed to resolve embed %s",
+                embed_uuid,
+                exc_info=True,
+            )
+            return None
--- a/stacks/f1-stream/files/backend/extractors/ppv.py
+++ b/stacks/f1-stream/files/backend/extractors/ppv.py
@ -0,0 +1,270 @@
+"""PPV.to extractor - fetches F1 streams via the public PPV API.
+
+Returns embed URLs (pooembed.eu) for iframe playback.
+The API at api.ppv.to/api/streams requires no authentication.
+Falls back to api.ppv.st if the primary API is unreachable.
+"""
+
+import logging
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+PRIMARY_API = "https://api.ppv.to/api/streams"
+FALLBACK_API = "https://api.ppv.st/api/streams"
+EMBED_BASE = "https://pooembed.eu/embed"
+
+USER_AGENT = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/120.0.0.0 Safari/537.36"
+)
+
+# Category name for motorsport on PPV.to
+MOTORSPORT_CATEGORY = "motorsports"
+
+# Only include events matching these keywords (case-insensitive)
+F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1"}
+# Grand Prix is shared with MotoGP/IndyCar — only match if no other series keywords
+GP_KEYWORD = "grand prix"
+NON_F1_KEYWORDS = {
+    "motogp", "moto gp", "moto2", "moto3", "motoe",
+    "indycar", "indy car", "firestone", "nascar",
+    "rally", "wrc", "wec", "lemans", "le mans",
+    "superbike", "dtm", "supercars",
+}
+
+
+def _is_f1_stream(name: str, category_name: str = "") -> bool:
+    """Check if a stream is Formula 1 related.
+
+    Checks both the stream name and the category name.
+    A stream qualifies if:
+    - It is in the motorsport category AND matches F1 keywords, OR
+    - It matches F1 keywords regardless of category.
+    """
+    lower_name = name.lower()
+    lower_cat = category_name.lower()
+
+    # Reject if it contains non-F1 motorsport keywords
+    if any(kw in lower_name for kw in NON_F1_KEYWORDS):
+        return False
+
+    # Direct F1 keyword match in the stream name
+    if any(kw in lower_name for kw in F1_KEYWORDS):
+        return True
+
+    # "grand prix" in the name, only if in motorsports category and no non-F1 keywords
+    if GP_KEYWORD in lower_name and MOTORSPORT_CATEGORY in lower_cat:
+        return True
+
+    # If the category is motorsport, also check category-level keywords
+    if MOTORSPORT_CATEGORY in lower_cat and any(kw in lower_cat for kw in F1_KEYWORDS):
+        return True
+
+    return False
+
+
+class PPVExtractor(BaseExtractor):
+    """Extracts embed URLs from PPV.to's public JSON API.
+
+    Uses the endpoint:
+    - GET https://api.ppv.to/api/streams -> all streams grouped by category
+    - Fallback: https://api.ppv.st/api/streams
+
+    Each stream object contains an `iframe` field with the embed URL,
+    or a `uri_name` from which the embed URL can be constructed.
+    """
+
+    @property
+    def site_key(self) -> str:
+        return "ppv"
+
+    @property
+    def site_name(self) -> str:
+        return "PPV.to"
+
+    async def _fetch_streams(self, client: httpx.AsyncClient) -> dict | None:
+        """Try primary and fallback APIs, return parsed JSON or None."""
+        for api_url in (PRIMARY_API, FALLBACK_API):
+            try:
+                resp = await client.get(api_url)
+                if resp.status_code == 200:
+                    data = resp.json()
+                    logger.info("[ppv] Fetched streams from %s", api_url)
+                    return data
+                logger.warning(
+                    "[ppv] %s returned HTTP %d", api_url, resp.status_code
+                )
+            except Exception:
+                logger.debug(
+                    "[ppv] Failed to reach %s", api_url, exc_info=True
+                )
+        return None
+
+    async def extract(self) -> list[ExtractedStream]:
+        """Fetch F1 streams and return embed URLs for iframe playback."""
+        streams: list[ExtractedStream] = []
+
+        try:
+            async with httpx.AsyncClient(
+                timeout=15.0,
+                follow_redirects=True,
+                headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
+            ) as client:
+                data = await self._fetch_streams(client)
+                if data is None:
+                    logger.warning("[ppv] Could not fetch streams from any API")
+                    return []
+
+                # The API returns:
+                # { "streams": [ { "category": "Name", "id": N, "streams": [...] }, ... ] }
+                # Flatten into (category_name, stream_obj) tuples.
+                all_streams = self._normalize_streams(data)
+
+                logger.info(
+                    "[ppv] Found %d total stream(s) across all categories",
+                    len(all_streams),
+                )
+
+                for category_name, stream_obj in all_streams:
+                    name = stream_obj.get("name", "") or stream_obj.get("title", "")
+
+                    if not _is_f1_stream(name, category_name):
+                        continue
+
+                    # Build the embed URL
+                    embed_url = self._get_embed_url(stream_obj)
+                    if not embed_url:
+                        logger.debug("[ppv] No embed URL for stream: %s", name)
+                        continue
+
+                    # Extract quality from tag if present
+                    tag = stream_obj.get("tag", "")
+                    quality = tag if tag else ""
+
+                    # Build descriptive title
+                    title = name
+                    viewers = stream_obj.get("viewers")
+                    if viewers and int(viewers) > 0:
+                        title += f" ({viewers} viewers)"
+
+                    # Check for substreams (multiple quality/language options)
+                    substreams = stream_obj.get("substreams")
+                    if isinstance(substreams, list) and substreams:
+                        for i, sub in enumerate(substreams):
+                            sub_embed = sub.get("iframe", "") or sub.get("embed_url", "")
+                            if not sub_embed:
+                                # Fall back to the parent embed URL
+                                sub_embed = embed_url
+                            sub_name = sub.get("name", "") or sub.get("label", "")
+                            sub_quality = sub.get("tag", "") or sub.get("quality", "") or quality
+                            sub_title = f"{name}"
+                            if sub_name:
+                                sub_title += f" - {sub_name}"
+                            elif i > 0:
+                                sub_title += f" #{i + 1}"
+
+                            streams.append(
+                                ExtractedStream(
+                                    url=sub_embed,
+                                    site_key=self.site_key,
+                                    site_name=self.site_name,
+                                    quality=sub_quality,
+                                    title=sub_title,
+                                    stream_type="embed",
+                                    embed_url=sub_embed,
+                                )
+                            )
+                    else:
+                        # Single stream, no substreams
+                        streams.append(
+                            ExtractedStream(
+                                url=embed_url,
+                                site_key=self.site_key,
+                                site_name=self.site_name,
+                                quality=quality,
+                                title=title,
+                                stream_type="embed",
+                                embed_url=embed_url,
+                            )
+                        )
+
+        except Exception:
+            logger.exception("[ppv] Failed to extract streams")
+
+        logger.info("[ppv] Extracted %d F1 stream(s)", len(streams))
+        return streams
+
+    @staticmethod
+    def _normalize_streams(data: dict | list) -> list[tuple[str, dict]]:
+        """Normalize the API response into a flat list of (category_name, stream_dict) tuples.
+
+        The PPV API returns data in this shape:
+        {
+            "streams": [
+                {
+                    "category": "Motorsports",
+                    "id": 35,
+                    "streams": [ { stream objects... } ]
+                },
+                ...
+            ]
+        }
+
+        Each category group has a "category" string and a nested "streams" list.
+        """
+        result: list[tuple[str, dict]] = []
+
+        # Handle the top-level wrapper
+        if isinstance(data, dict):
+            categories = data.get("streams", [])
+        elif isinstance(data, list):
+            categories = data
+        else:
+            return result
+
+        for category_group in categories:
+            if not isinstance(category_group, dict):
+                continue
+
+            category_name = category_group.get("category", "")
+
+            # The nested streams within this category
+            inner_streams = category_group.get("streams", [])
+            if isinstance(inner_streams, list):
+                for stream_obj in inner_streams:
+                    if isinstance(stream_obj, dict):
+                        # Attach category_name to each stream for filtering
+                        result.append((category_name, stream_obj))
+            elif isinstance(category_group, dict) and "name" in category_group:
+                # Fallback: the item itself is a stream (flat list format)
+                result.append((category_name, category_group))
+
+        return result
+
+    @staticmethod
+    def _get_embed_url(stream: dict) -> str:
+        """Extract or construct the embed URL for a stream."""
+        # Prefer the iframe field directly
+        iframe = stream.get("iframe", "")
+        if iframe:
+            return iframe
+
+        # Construct from uri_name
+        uri_name = stream.get("uri_name", "") or stream.get("uri", "")
+        if uri_name:
+            # Strip leading slash if present
+            uri_name = uri_name.lstrip("/")
+            return f"{EMBED_BASE}/{uri_name}"
+
+        # Last resort: use the stream id
+        stream_id = stream.get("id")
+        if stream_id:
+            return f"{EMBED_BASE}/{stream_id}"
+
+        return ""
--- a/stacks/f1-stream/files/backend/extractors/service.py
+++ b/stacks/f1-stream/files/backend/extractors/service.py
@ -6,6 +6,7 @@ from datetime import datetime, timezone
 from backend.extractors.models import ExtractedStream
 from backend.extractors.registry import ExtractorRegistry
 from backend.health import StreamHealthChecker
+from backend.playback_verifier import PlaybackVerifier

 logger = logging.getLogger(__name__)

@ -29,6 +30,11 @@ class ExtractionService:
        self._last_run: str | None = None
        self._last_run_stream_count: int = 0
        self._health_checker = StreamHealthChecker()
+        self._playback_verifier = PlaybackVerifier()
+
+    async def shutdown(self) -> None:
+        """Release the headless browser instance owned by the verifier."""
+        await self._playback_verifier.shutdown()

    async def run_extraction(self) -> None:
        """Run all extractors, health-check results, and cache them.
@ -43,31 +49,93 @@ class ExtractionService:

        streams = await self._registry.extract_all()

-        # Run health checks on all extracted streams
+        # Dedupe by canonical URL — pitsport surfaces every WRC stage as a
+        # separate event but they all point at the same RallyTV master.m3u8
+        # (and similar for MotoGP weekend sessions). Keep the first
+        # occurrence so the user sees one entry per actual stream.
+        deduped: list[ExtractedStream] = []
+        seen_urls: set[str] = set()
+        for stream in streams:
+            key = (stream.embed_url or "").strip() or (stream.url or "").strip()
+            if not key or key in seen_urls:
+                continue
+            seen_urls.add(key)
+            deduped.append(stream)
+        if len(deduped) < len(streams):
+            logger.info(
+                "Deduped streams: %d -> %d (collapsed %d duplicate URL(s))",
+                len(streams), len(deduped), len(streams) - len(deduped),
+            )
+        streams = deduped
+
+        # Run health checks + headless-browser playback verification.
+        # Both stream types are now verified end-to-end so the user only
+        # ever sees streams that actually play in a browser.
        if streams:
-            # Separate m3u8 streams (need health check) from embed streams (skip)
            m3u8_streams = [s for s in streams if s.stream_type != "embed"]
            embed_streams = [s for s in streams if s.stream_type == "embed"]

-            # Mark embed streams as live (no health check possible for iframes)
-            for stream in embed_streams:
-                stream.is_live = True
-                stream.response_time_ms = 0
-                stream.checked_at = start.isoformat()
-
-            # Health-check only m3u8 streams
+            # m3u8 streams: cheap structural health check (validates manifest,
+            # checks first variant playlist), then a headless-browser test
+            # to confirm hls.js can decode and render frames.
            if m3u8_streams:
                stream_dicts = [s.to_dict() for s in m3u8_streams]
                health_map = await self._health_checker.check_all(stream_dicts)
-
                for stream in m3u8_streams:
                    health = health_map.get(stream.url)
                    if health:
-                        stream.is_live = health.is_live
                        stream.response_time_ms = health.response_time_ms
                        stream.checked_at = health.checked_at
                        if health.bitrate > 0:
                            stream.bitrate = health.bitrate
+                        # tentatively mark live; final word comes from the verifier
+                        stream.is_live = health.is_live
+
+            # Browser verification: applies to both m3u8 (only those that
+            # passed structural health) and embed (always — they have no
+            # other way to verify).
+            verify_items: list[tuple[str, str]] = []
+            for stream in m3u8_streams:
+                if stream.is_live:
+                    verify_items.append((stream.url, "m3u8"))
+            for stream in embed_streams:
+                verify_items.append((stream.embed_url or stream.url, "embed"))
+
+            verdicts = await self._playback_verifier.verify_many(verify_items)
+
+            now_iso = datetime.now(timezone.utc).isoformat()
+            for stream in m3u8_streams:
+                if not stream.is_live:
+                    continue  # already failed health check
+                verdict = verdicts.get(stream.url)
+                if verdict is None:
+                    continue  # verifier disabled or unavailable
+                stream.is_live = verdict.is_playable
+                stream.checked_at = now_iso
+
+            # Curated streams skip the verifier — they are hand-picked
+            # 24/7 channels whose embed pages aggressively detect headless
+            # automation. We can't reliably confirm playback server-side,
+            # but we trust the curator. The user's real browser does NOT
+            # trigger the same anti-bot heuristics (real plugins, real
+            # mouse movements, etc.).
+            CURATED_BYPASS = {"curated"}
+            for stream in embed_streams:
+                stream.checked_at = now_iso
+                if stream.site_key in CURATED_BYPASS:
+                    stream.is_live = True
+                    stream.response_time_ms = 0
+                    continue
+                key = stream.embed_url or stream.url
+                verdict = verdicts.get(key)
+                if verdict is None:
+                    # Verifier unavailable — fall back to "trust extractor".
+                    # This keeps the service usable even without playwright.
+                    stream.is_live = True
+                    stream.response_time_ms = 0
+                else:
+                    stream.is_live = verdict.is_playable
+                    stream.response_time_ms = verdict.elapsed_ms

        # Group streams by site_key and update cache
        new_cache: dict[str, list[ExtractedStream]] = {}
--- a/stacks/f1-stream/files/backend/extractors/streamed.py
+++ b/stacks/f1-stream/files/backend/extractors/streamed.py
@ -9,7 +9,9 @@ from backend.extractors.models import ExtractedStream

 logger = logging.getLogger(__name__)

-BASE_URL = "https://streamed.su"
+# Site renamed from streamed.su → streamed.pk in 2026; the .su domain
+# stopped resolving the API host (only the marketing page is left).
+BASE_URL = "https://streamed.pk"
 USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
--- a/stacks/f1-stream/files/backend/extractors/stremio.py
+++ b/stacks/f1-stream/files/backend/extractors/stremio.py
@ -0,0 +1,161 @@
+"""Stremio-addon-driven extractor.
+
+Stremio addons expose a public HTTP API: each addon has a manifest at
+`<base>/manifest.json` and per-resource endpoints like
+`<base>/stream/<type>/<id>.json` returning `{streams:[{url,name,...}]}`.
+
+This extractor calls a curated set of live-TV addons that surface F1
+and Sky-Sports-class motorsport channels. We treat each returned URL as
+an ExtractedStream and let the playback verifier confirm playability.
+We don't need a Stremio client — we just call the documented HTTP API.
+
+Findings from initial research (2026-05-07):
+- **TvVoo** (`tvvoo.hayd.uk`) — wraps the Vavoo IPTV network, lists
+  Sky Sports F1 (UK + IT + DE), DAZN F1, Movistar F1, Canal+ F1,
+  Viaplay F1. The returned m3u8 URLs are IP-bound at the Vavoo CDN
+  (`*.ngolpdkyoctjcddxshli469r.org/sunshine/...`); they're tokenised
+  to whichever IP fetched the manifest. Currently their SSL certs have
+  expired which fails most clients — the addon framework is right but
+  delivery is degraded today.
+- **StremVerse** (`stremverse.onrender.com`) — returns 11+ streams per
+  catalog id (`stremevent_591`=F1, `stremevent_866`=MotoGP). Mix of
+  DRM-walled DASH, JW-Player-broken-chain JWT, and apar151 HuggingFace
+  proxy URLs. Master playlists parse; variant URLs sometimes return 404
+  if they're meant to be resolved by the addon's player rather than
+  directly.
+
+Adding a new addon = one entry in `_ADDONS`. Each addon's resolver only
+needs the manifest + stream endpoints; the addon does the heavy lifting.
+"""
+
+import asyncio
+import logging
+from dataclasses import dataclass
+from typing import Iterable
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+USER_AGENT = (
+    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+    "AppleWebKit/605.1.15 (KHTML, like Gecko) "
+    "Version/17.4 Safari/605.1.15"
+)
+
+
+@dataclass(frozen=True)
+class _Addon:
+    name: str
+    base: str               # e.g. "https://tvvoo.hayd.uk"
+    stream_ids: tuple[tuple[str, str, str], ...]
+    """(stream_type, stream_id, label) per F1/motorsport entry."""
+
+
+# Curated addon list — see module docstring. These IDs are documented in
+# the addons' manifests / channel lists. Update when channel names/IDs
+# rotate.
+_ADDONS: tuple[_Addon, ...] = (
+    _Addon(
+        name="TvVoo",
+        base="https://tvvoo.hayd.uk",
+        stream_ids=(
+            ("tv", "vavoo_SKY%20SPORTS%20F1|group:uk", "Sky Sports F1 UK (Vavoo)"),
+            ("tv", "vavoo_SKY%20SPORTS%20F1%20HD|group:uk", "Sky Sports F1 HD UK (Vavoo)"),
+            ("tv", "vavoo_SKY%20SPORT%20F1|group:it", "Sky Sport F1 IT (Vavoo)"),
+            ("tv", "vavoo_SKY%20SPORT%20F1%20HD|group:de", "Sky Sport F1 DE (Vavoo)"),
+            ("tv", "vavoo_DAZN%20F1|group:es", "DAZN F1 ES (Vavoo)"),
+        ),
+    ),
+    _Addon(
+        name="StremVerse",
+        base="https://stremverse.onrender.com",
+        stream_ids=(
+            ("tv", "stremevent_591", "Formula 1 (StremVerse)"),
+            ("tv", "stremevent_866", "MotoGP (StremVerse)"),
+        ),
+    ),
+)
+
+
+class StremioAddonExtractor(BaseExtractor):
+    """Pull F1 + Sky-class motorsport URLs from public Stremio addons."""
+
+    @property
+    def site_key(self) -> str:
+        return "stremio"
+
+    @property
+    def site_name(self) -> str:
+        return "Stremio Addon"
+
+    async def extract(self) -> list[ExtractedStream]:
+        async with httpx.AsyncClient(
+            timeout=15.0,
+            follow_redirects=True,
+            headers={"User-Agent": USER_AGENT},
+            # Some addons (TvVoo→Vavoo) hand back URLs whose origin certs
+            # are expired; honest-default verify=True is preserved here so
+            # the verifier sees the same TLS errors a browser would.
+        ) as client:
+            tasks = []
+            for addon in _ADDONS:
+                for stype, sid, label in addon.stream_ids:
+                    tasks.append(self._resolve(client, addon, stype, sid, label))
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+
+        streams: list[ExtractedStream] = []
+        for r in results:
+            if isinstance(r, Exception):
+                logger.debug("[stremio] resolve failed: %s", r)
+                continue
+            streams.extend(r)
+
+        logger.info("[stremio] surfaced %d candidate stream URL(s) across %d addon(s)",
+                    len(streams), len(_ADDONS))
+        return streams
+
+    async def _resolve(
+        self, client: httpx.AsyncClient, addon: _Addon,
+        stype: str, sid: str, label: str,
+    ) -> list[ExtractedStream]:
+        url = f"{addon.base}/stream/{stype}/{sid}.json"
+        try:
+            resp = await client.get(url)
+        except Exception as e:
+            logger.debug("[stremio] %s fetch failed: %s", url, e)
+            return []
+        if resp.status_code != 200:
+            logger.debug("[stremio] %s -> HTTP %d", url, resp.status_code)
+            return []
+        try:
+            data = resp.json()
+        except Exception:
+            return []
+
+        out: list[ExtractedStream] = []
+        for idx, s in enumerate(data.get("streams") or []):
+            stream_url = (s.get("url") or "").strip()
+            if not stream_url:
+                continue
+            # Skip DRM-tagged entries — they need Widevine which neither
+            # our verifier nor a clean hls.js path can play.
+            if "DRM" in (s.get("name") or "").upper():
+                continue
+            title = label
+            if idx > 0:
+                title = f"{label} #{idx + 1}"
+            out.append(
+                ExtractedStream(
+                    url=stream_url,
+                    site_key=self.site_key,
+                    site_name=f"{addon.name}",
+                    quality="",
+                    title=title,
+                    stream_type="m3u8",
+                )
+            )
+        return out
--- a/stacks/f1-stream/files/backend/extractors/subreddit.py
+++ b/stacks/f1-stream/files/backend/extractors/subreddit.py
@ -0,0 +1,249 @@
+"""Subreddit extractor — pulls community-curated live-stream URLs from
+the *MotorsportsReplays* subreddit (and a few siblings).
+
+The community follows a stable pattern: a single mod-curated post titled
+`[Watch / Download] <Series> <Year> - <Round> | <Event>` goes up on or
+near each race weekend with a `**Watch Online:**` link in the selftext,
+pointing at an admin-run WordPress site (motomundo.net for MotoGP, the
+F1 equivalent has rotated over the years). That WordPress page hosts
+iframe embeds whose m3u8 is JS-computed at load time — ideal target for
+the chrome-service pipeline downstream.
+
+This extractor:
+- Hits Reddit with a real-browser User-Agent (httpx default UA + cluster
+  IP combo gets HTTP 403'd on r/motogp; a Safari UA does not).
+- Searches for the `[Watch` thread pattern AND scans `/new.json` for
+  any flair set to LIVE.
+- Pulls selftext URLs and returns each candidate as an `embed`-type
+  ExtractedStream. The verifier already drives chrome-service for embed
+  streams, so the m3u8 capture happens there.
+"""
+
+import asyncio
+import logging
+import re
+import urllib.parse
+from typing import NamedTuple
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+USER_AGENT = (
+    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
+    "AppleWebKit/605.1.15 (KHTML, like Gecko) "
+    "Version/17.4 Safari/605.1.15"
+)
+
+# Subreddits to scan.
+# - r/motorsportsstreams2 is the active 12.5k-sub successor to the banned
+#   r/motorsportstreams; race-weekend "[F1 STREAM]" posts include
+#   `boxboxbox.pro/stream-1` URLs and similar fresh aggregator links.
+# - r/MotorsportsReplays runs the [Watch / Download] mod-post pattern
+#   linking to motomundo.net (MotoGP) and sister sites.
+# - The rest are low-yield but cost nothing.
+SUBREDDITS: tuple[str, ...] = (
+    "motorsportsstreams2",
+    "MotorsportsReplays",
+    "f1streams",
+    "motorsports",
+    "formula1",
+    "motogp",
+)
+
+# Search queries fired against r/motorsportsstreams2 + r/MotorsportsReplays.
+# The first set captures the [Watch / Download] mod posts; the second set
+# catches race-weekend live discussion threads.
+SEARCH_QUERIES: tuple[str, ...] = (
+    "Watch Download F1 2026",
+    "Watch Download MotoGP 2026",
+    "Watch Online F1 2026",
+    "F1 STREAM live",
+    "Sky Sports F1 live",
+    "Sky F1 stream",
+)
+
+# Hosts we accept as "interesting" stream-page URLs. These are the
+# admin-curated WordPress / aggregator sites the community links to.
+# Anchored to what r/motorsportsstreams2 currently posts (May 2026 sweep).
+_INTERESTING_HOSTS = (
+    # WordPress wrappers / community-run sites
+    "motomundo.net",        # MotoGP — admin-curated WP
+    "motomundo.top",        # MotoMundo embed host
+    "motomundo.upns.xyz",   # MotoMundo embed host (newer)
+    "freemotorsports.com",  # WAC successor curated link list
+    "boxboxbox.pro",        # F1 race-weekend aggregator (community fav)
+    "boxboxbox.live",       # boxboxbox sister
+    "boxboxbox.lol",
+    # Aggregators we already have direct extractors for, but Reddit may
+    # surface event-specific deeplinks (e.g. /watch/<UUID>) we'd miss
+    # otherwise.
+    "pitsport.xyz",
+    "pitsport.live",
+    "rerace.io",
+    "dd12streams.com",
+    "ppv.to",
+    "streamed.pk",
+    "acestrlms.pages.dev",
+    "aceztrims.pages.dev",
+    # Sport-specific direct CDNs that occasionally appear in posts
+    "racelive.jp",          # Super Formula
+    "cdn.sfgo.jp",          # Super Formula CDN
+    # Speculative F1 sister sites — pattern likely if motomundo for MotoGP
+    "f1mundo.net",
+    "f1.live",
+    "f1live",
+    "skystreams",
+    "raceon",
+    "watchf1",
+)
+
+# URLs we actively never try to scrape (auth-walled, social media,
+# direct downloads with no live stream).
+_REJECT_HOSTS = (
+    "discord.gg", "discord.com",
+    "twitter.com", "x.com",
+    "youtube.com", "youtu.be",
+    "instagram.com", "tiktok.com",
+    "f1tv.formula1.com",
+    "viktorbarzin.me",
+    "gofile.io",
+    "mega.nz", "drive.google.com",
+    "1fichier.com", "rapidgator", "uploaded.net",
+    "magnet:",
+)
+
+_URL_RE = re.compile(r"https?://[^\s\)\]\>\"']+")
+
+
+class _Candidate(NamedTuple):
+    title: str
+    url: str
+    subreddit: str
+    flair: str
+
+
+def _is_interesting(url: str) -> bool:
+    low = url.lower()
+    if any(host in low for host in _REJECT_HOSTS):
+        return False
+    return any(host in low for host in _INTERESTING_HOSTS)
+
+
+def _has_live_marker(post: dict) -> bool:
+    title = (post.get("title") or "").lower()
+    flair = (post.get("link_flair_text") or "").lower()
+    if "[watch" in title or "watch online" in title or "live" in flair:
+        return True
+    return False
+
+
+class SubredditExtractor(BaseExtractor):
+    """Scan motorsport subreddits for community-curated live-stream URLs."""
+
+    @property
+    def site_key(self) -> str:
+        return "subreddit"
+
+    @property
+    def site_name(self) -> str:
+        return "Subreddit"
+
+    async def extract(self) -> list[ExtractedStream]:
+        # NB: do NOT send `Accept: application/json` — Reddit's anti-bot
+        # fingerprint flags that header from datacenter IPs and returns
+        # HTTP 403 with HTML. Default Accept (`*/*`) gets through fine
+        # and `.json` URLs always return JSON regardless.
+        async with httpx.AsyncClient(
+            timeout=15.0,
+            follow_redirects=True,
+            headers={"User-Agent": USER_AGENT},
+        ) as client:
+            tasks = [self._fetch_new(client, sub) for sub in SUBREDDITS]
+            tasks.extend(self._search(client, q) for q in SEARCH_QUERIES)
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+
+        candidates: list[_Candidate] = []
+        for r in results:
+            if isinstance(r, Exception):
+                logger.debug("[subreddit] fetch failed: %s", r)
+                continue
+            candidates.extend(r)
+
+        # Dedupe by URL, keep first occurrence.
+        seen: set[str] = set()
+        picks: list[_Candidate] = []
+        for c in candidates:
+            if c.url in seen:
+                continue
+            seen.add(c.url)
+            picks.append(c)
+
+        logger.info(
+            "[subreddit] scanned %d source(s) — %d unique candidate URL(s)",
+            len(SUBREDDITS) + len(SEARCH_QUERIES), len(picks),
+        )
+        return [
+            ExtractedStream(
+                url=c.url,
+                site_key=self.site_key,
+                site_name=f"r/{c.subreddit}",
+                quality="",
+                title=c.title[:100],
+                stream_type="embed",
+                embed_url=c.url,
+            )
+            for c in picks
+        ]
+
+    async def _fetch_new(self, client: httpx.AsyncClient, sub: str) -> list[_Candidate]:
+        return await self._collect(
+            client,
+            f"https://www.reddit.com/r/{sub}/new.json?limit=25",
+            sub,
+        )
+
+    async def _search(self, client: httpx.AsyncClient, query: str) -> list[_Candidate]:
+        q = urllib.parse.quote_plus(query)
+        return await self._collect(
+            client,
+            f"https://www.reddit.com/r/MotorsportsReplays/search.json?q={q}&restrict_sr=on&sort=new&limit=10",
+            "MotorsportsReplays",
+        )
+
+    async def _collect(
+        self, client: httpx.AsyncClient, url: str, sub: str
+    ) -> list[_Candidate]:
+        try:
+            resp = await client.get(url)
+        except Exception as e:
+            logger.debug("[subreddit] fetch %s failed: %s", url, e)
+            return []
+        if resp.status_code != 200:
+            logger.debug("[subreddit] %s -> HTTP %d", url, resp.status_code)
+            return []
+        try:
+            data = resp.json()
+        except Exception:
+            return []
+        out: list[_Candidate] = []
+        for child in (data.get("data", {}) or {}).get("children", []):
+            d = child.get("data", {}) or {}
+            if not _has_live_marker(d):
+                continue
+            text = (d.get("selftext") or "")
+            title = d.get("title") or ""
+            flair = d.get("link_flair_text") or ""
+            # First, the linked URL itself (if it's a recognised live site).
+            top = d.get("url") or ""
+            if top and _is_interesting(top):
+                out.append(_Candidate(title, top, sub, flair))
+            # Then any URL embedded in the selftext that points at a
+            # community-curated live page.
+            for u in _URL_RE.findall(text):
+                if _is_interesting(u):
+                    out.append(_Candidate(title, u, sub, flair))
+        return out
--- a/stacks/f1-stream/files/backend/extractors/timstreams.py
+++ b/stacks/f1-stream/files/backend/extractors/timstreams.py
@ -0,0 +1,190 @@
+"""TimStreams extractor - fetches F1 streams from the TimStreams JSON API.
+
+Returns embed URLs from hmembeds.one for iframe playback.
+The public API at stra.viaplus.site/main requires no authentication
+and returns all events/channels across Events, Replays, and 24/7 categories.
+"""
+
+import logging
+
+import httpx
+
+from backend.extractors.base import BaseExtractor
+from backend.extractors.models import ExtractedStream
+
+logger = logging.getLogger(__name__)
+
+API_URL = "https://stra.viaplus.site/main"
+USER_AGENT = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/120.0.0.0 Safari/537.36"
+)
+
+# Direct F1 keyword matches (case-insensitive)
+F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1", "dazn f1"}
+# "Grand prix" is F1-related only if non-F1 motorsport keywords are absent
+GP_KEYWORD = "grand prix"
+# Exclude these motorsport series when matching on "grand prix"
+NON_F1_KEYWORDS = {
+    "motogp", "moto gp", "moto2", "moto3", "motoe",
+    "indycar", "indy car", "nascar",
+    "rally", "wrc", "wec", "lemans", "le mans",
+    "superbike", "dtm", "supercars",
+}
+
+# 24/7 channels that should always be included (embed hashes on hmembeds.one)
+ALWAYS_INCLUDE_HASHES = {
+    "888520f36cd94c5da4c71fddc1a5fc9b",  # Sky Sports F1
+    "fc3a54634d0867b0c02ee3223292e7c6",  # DAZN F1
+}
+
+
+def _is_f1_event(name: str) -> bool:
+    """Check if an event/channel is Formula 1 related by name.
+
+    Returns True when the name contains a direct F1 keyword, or contains
+    "grand prix" without non-F1 series keywords.
+
+    Note: The TimStreams API genre field (genre=2) covers ALL sports channels,
+    not just motorsport, so we rely solely on name-based matching.
+    """
+    lower = name.lower()
+
+    # Direct F1 keyword match
+    if any(kw in lower for kw in F1_KEYWORDS):
+        return True
+
+    # Grand prix without competing series
+    if GP_KEYWORD in lower and not any(kw in lower for kw in NON_F1_KEYWORDS):
+        return True
+
+    return False
+
+
+def _extract_embed_hash(url: str) -> str | None:
+    """Extract the hash from an hmembeds.one embed URL.
+
+    Expected format: https://hmembeds.one/embed/{hash}
+    Returns the hash string, or None if the URL is not in the expected format.
+    """
+    if not url:
+        return None
+    # Handle both with and without trailing slash
+    url = url.rstrip("/")
+    prefix = "https://hmembeds.one/embed/"
+    alt_prefix = "http://hmembeds.one/embed/"
+    if url.startswith(prefix):
+        return url[len(prefix):] or None
+    if url.startswith(alt_prefix):
+        return url[len(alt_prefix):] or None
+    return None
+
+
+def _is_always_include(url: str) -> bool:
+    """Check if a stream URL is one of the always-include 24/7 channels."""
+    embed_hash = _extract_embed_hash(url)
+    return embed_hash in ALWAYS_INCLUDE_HASHES if embed_hash else False
+
+
+class TimStreamsExtractor(BaseExtractor):
+    """Extracts embed URLs from TimStreams' public JSON API.
+
+    The API at stra.viaplus.site/main returns a JSON array of categories,
+    each containing events with stream URLs pointing to hmembeds.one embeds.
+    """
+
+    @property
+    def site_key(self) -> str:
+        return "timstreams"
+
+    @property
+    def site_name(self) -> str:
+        return "TimStreams"
+
+    async def extract(self) -> list[ExtractedStream]:
+        """Fetch F1 events/channels and return embed URLs for iframe playback."""
+        streams: list[ExtractedStream] = []
+        seen_urls: set[str] = set()
+
+        try:
+            async with httpx.AsyncClient(
+                timeout=15.0,
+                follow_redirects=True,
+                headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
+            ) as client:
+                resp = await client.get(API_URL)
+                if resp.status_code != 200:
+                    logger.warning(
+                        "[timstreams] API returned HTTP %d", resp.status_code
+                    )
+                    return []
+
+                data = resp.json()
+                if not isinstance(data, list):
+                    logger.warning("[timstreams] Unexpected API response type: %s", type(data).__name__)
+                    return []
+
+                logger.info("[timstreams] API returned %d categorie(s)", len(data))
+
+                for category in data:
+                    category_name = category.get("category", "Unknown")
+                    events = category.get("events", [])
+                    if not isinstance(events, list):
+                        continue
+
+                    for event in events:
+                        event_name = event.get("name", "Unknown")
+                        event_streams = event.get("streams", [])
+
+                        if not isinstance(event_streams, list) or not event_streams:
+                            continue
+
+                        # Check if any stream URL matches an always-include channel
+                        always_include = any(
+                            _is_always_include(s.get("url", ""))
+                            for s in event_streams
+                        )
+
+                        # Filter: must be F1-related or an always-include channel
+                        if not always_include and not _is_f1_event(event_name):
+                            continue
+
+                        for stream_info in event_streams:
+                            stream_name = stream_info.get("name", "")
+                            stream_url = stream_info.get("url", "")
+
+                            if not stream_url:
+                                continue
+
+                            # Deduplicate by URL
+                            if stream_url in seen_urls:
+                                continue
+                            seen_urls.add(stream_url)
+
+                            # Build a descriptive title
+                            title = event_name
+                            if stream_name and stream_name.lower() != event_name.lower():
+                                title = f"{event_name} - {stream_name}"
+                            if category_name:
+                                title = f"[{category_name}] {title}"
+
+                            streams.append(
+                                ExtractedStream(
+                                    url=stream_url,
+                                    site_key=self.site_key,
+                                    site_name=self.site_name,
+                                    quality="",
+                                    title=title,
+                                    stream_type="embed",
+                                    embed_url=stream_url,
+                                )
+                            )
+
+        except httpx.TimeoutException:
+            logger.warning("[timstreams] API request timed out")
+        except Exception:
+            logger.exception("[timstreams] Failed to fetch from API")
+
+        logger.info("[timstreams] Extracted %d stream(s)", len(streams))
+        return streams
--- a/stacks/f1-stream/files/backend/main.py
+++ b/stacks/f1-stream/files/backend/main.py
@ -3,6 +3,7 @@
 import logging
 import os
 from contextlib import asynccontextmanager
+from datetime import datetime, timedelta, timezone

 from apscheduler.schedulers.asyncio import AsyncIOScheduler
 from apscheduler.triggers.cron import CronTrigger
@ -13,6 +14,7 @@ from fastapi.staticfiles import StaticFiles
 from pydantic import BaseModel
 from starlette.responses import Response, StreamingResponse

+from backend.embed_proxy import fetch_embed, relay_asset
 from backend.extractors import create_extraction_service
 from backend.proxy import proxy_playlist, relay_stream
 from backend.schedule import ScheduleService
@ -117,10 +119,6 @@ async def lifespan(app: FastAPI):
    # Startup: load schedule and start background scheduler
    await schedule_service.initialize()

-    # Run initial extraction
-    logger.info("Running initial stream extraction...")
-    await extraction_service.run_extraction()
-
    # Schedule daily schedule refresh
    scheduler.add_job(
        _scheduled_refresh,
@ -130,13 +128,18 @@ async def lifespan(app: FastAPI):
        replace_existing=True,
    )

-    # Schedule periodic stream extraction (default: every 30 minutes)
+    # Schedule periodic stream extraction (default: every 30 minutes).
+    # next_run_time fires the first run 8s after startup. We don't run
+    # extraction inline here because it calls the playback verifier,
+    # which hits http://127.0.0.1:8000/embed for embed streams — uvicorn
+    # isn't listening yet inside the lifespan startup phase.
    scheduler.add_job(
        _scheduled_extraction,
        trigger=IntervalTrigger(minutes=30),
        id="stream_extraction",
        name="Extract streams from all registered sites",
        replace_existing=True,
+        next_run_time=datetime.now(timezone.utc) + timedelta(seconds=8),
    )

    # Schedule token refresh every 4 minutes (safe margin for 5-min CDN tokens).
@ -159,6 +162,10 @@ async def lifespan(app: FastAPI):
    # Shutdown
    scheduler.shutdown(wait=False)
    logger.info("APScheduler shut down")
+    try:
+        await extraction_service.shutdown()
+    except Exception:
+        logger.exception("extraction_service shutdown failed")


 app = FastAPI(title="F1 Streams", lifespan=lifespan)
@ -409,6 +416,37 @@ async def relay_endpoint(
    )


+# --- Embed iframe-stripping proxy ---
+
+
+@app.get("/embed")
+async def embed_proxy(url: str = Query(..., description="Base64url-encoded embed URL")):
+    """Proxy a third-party embed page so it can be iframed in our origin.
+
+    Strips X-Frame-Options and CSP frame-ancestors from the upstream
+    response, injects a base href + frame-buster-defeat script, and
+    forwards a plausible Referer/Origin to bypass upstream allowlists.
+    """
+    body, headers, status_code = await fetch_embed(url)
+    return Response(content=body, headers=headers, status_code=status_code)
+
+
+@app.get("/embed-asset")
+async def embed_asset(
+    request: Request,
+    url: str = Query(..., description="Base64url-encoded subresource URL"),
+):
+    """Relay an upstream subresource (JS/CSS/image/etc.) for the embed proxy.
+
+    Used as a fallback when an upstream blocks hotlinked assets via Origin
+    or Referer checks. Most assets load directly via the injected <base>
+    tag without going through this endpoint.
+    """
+    range_header = request.headers.get("range")
+    stream_gen, headers, status_code = await relay_asset(url, range_header)
+    return StreamingResponse(stream_gen, headers=headers, status_code=status_code)
+
+
 # --- Frontend Static Files ---
 # Mount the SvelteKit static build AFTER all API routes so API endpoints take priority.
 # SvelteKit adapter-static with ssr=false produces {page}.html files and a fallback index.html.
--- a/stacks/f1-stream/files/backend/playback_verifier.py
+++ b/stacks/f1-stream/files/backend/playback_verifier.py
@ -0,0 +1,449 @@
+"""Headless-browser playback verification for extracted streams.
+
+The basic health checker (backend/health.py) only validates m3u8 syntax.
+For embed/iframe streams it has nothing to check — the previous code blindly
+marked every embed `is_live=True`, which meant the stream list was full of
+news articles and aggregator landing pages that never actually played.
+
+This module loads each candidate stream URL in headless Chromium (via
+Playwright) and looks for *codec-independent* signals that the upstream
+serves a playable stream:
+
+- For m3u8: hls.js receives MANIFEST_PARSED + at least one FRAG_LOADED
+  event. We don't wait for `<video>` to gain dimensions, because Playwright's
+  chromium build doesn't include the H.264/AAC codecs. The user's real
+  browser does, so confirming "manifest + segment fetch succeed" is the
+  right server-side signal.
+- For embed: a `<video>` element appears at top level OR inside the iframe
+  (the embed proxy strips X-Frame-Options + frame-buster JS so we can
+  introspect the iframe content), OR the player has set up a MediaSource.
+
+Designed to be called from the extraction service's run_extraction()
+hook, with bounded concurrency. Each verification typically takes
+4-12 seconds.
+"""
+
+import asyncio
+import base64
+import logging
+import os
+import time
+from dataclasses import dataclass
+
+logger = logging.getLogger(__name__)
+
+# Toggle off in development by setting PLAYBACK_VERIFY_ENABLED=false.
+VERIFY_ENABLED = os.getenv("PLAYBACK_VERIFY_ENABLED", "true").lower() in ("true", "1", "yes")
+
+# Maximum number of concurrent browser pages.
+MAX_CONCURRENCY = int(os.getenv("PLAYBACK_VERIFY_CONCURRENCY", "2"))
+
+# Per-stream verification budget (seconds). Beyond this we declare unplayable.
+PER_STREAM_TIMEOUT = float(os.getenv("PLAYBACK_VERIFY_TIMEOUT", "20"))
+
+# Where the embed proxy lives, used to wrap embed URLs so they bypass
+# X-Frame-Options/CSP/JS frame-busters during verification. Defaults to
+# loopback because verification runs inside the same FastAPI process.
+PROXY_BASE = os.getenv("PLAYBACK_VERIFY_PROXY_BASE", "http://127.0.0.1:8000")
+
+USER_AGENT = (
+    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
+    "AppleWebKit/537.36 (KHTML, like Gecko) "
+    "Chrome/120.0.0.0 Safari/537.36"
+)
+
+
+@dataclass
+class PlaybackVerdict:
+    is_playable: bool
+    signal: str = ""  # which check triggered the positive verdict
+    elapsed_ms: int = 0
+    error: str = ""
+
+
+def _b64url(s: str) -> str:
+    """URL-safe base64 with padding stripped — matches m3u8_rewriter.encode_url."""
+    return base64.urlsafe_b64encode(s.encode()).decode().rstrip("=")
+
+
+def _hls_test_html(m3u8_url: str) -> str:
+    """A self-contained HTML page that loads an m3u8 via hls.js into a <video>.
+
+    The page exposes window._verifier with manifest_parsed / frag_loaded
+    booleans the verifier polls. It also marks media-error or fatal-error
+    so we can distinguish 'upstream is unreachable' from 'codec missing'.
+    """
+    return f"""<!doctype html>
+<html><head><meta charset="utf-8"><title>verify</title>
+<script src="https://cdn.jsdelivr.net/npm/hls.js@1.5/dist/hls.min.js"></script>
+</head><body>
+<video id="v" muted playsinline width="640" height="360"></video>
+<script>
+window._verifier = {{
+  manifest_parsed: false,
+  frag_loaded: false,
+  media_loaded: false,  // true when MSE has appended any buffer
+  fatal_network_error: false,  // upstream truly unreachable
+  manifest_incompatible: false,  // codec missing — separate from network reachability
+  hls_error_details: ""
+}};
+const v = document.getElementById('v');
+const url = {m3u8_url!r};
+function start() {{
+  if (window.Hls && Hls.isSupported()) {{
+    const hls = new Hls({{enableWorker: true}});
+    hls.on(Hls.Events.MANIFEST_PARSED, () => {{ window._verifier.manifest_parsed = true; }});
+    hls.on(Hls.Events.FRAG_LOADED, () => {{ window._verifier.frag_loaded = true; }});
+    hls.on(Hls.Events.BUFFER_APPENDED, () => {{ window._verifier.media_loaded = true; }});
+    hls.on(Hls.Events.ERROR, (_, d) => {{
+      window._verifier.hls_error_details = d.details || "";
+      if (d.fatal && d.type === Hls.ErrorTypes.NETWORK_ERROR) {{
+        window._verifier.fatal_network_error = true;
+      }}
+      if (d.details === Hls.ErrorDetails.MANIFEST_INCOMPATIBLE_CODECS_ERROR) {{
+        window._verifier.manifest_incompatible = true;
+      }}
+    }});
+    hls.loadSource(url);
+    hls.attachMedia(v);
+  }} else if (v.canPlayType('application/vnd.apple.mpegurl')) {{
+    v.src = url;
+    v.addEventListener('loadedmetadata', () => {{ window._verifier.manifest_parsed = true; window._verifier.frag_loaded = true; }});
+    v.addEventListener('error', () => {{ window._verifier.fatal_network_error = true; }});
+  }} else {{
+    window._verifier.hls_error_details = "no hls support";
+  }}
+}}
+window.addEventListener('load', start);
+</script></body></html>"""
+
+
+def _embed_test_html(_proxied_embed_url: str) -> str:
+    """No longer used — verifier navigates the page directly to the proxy URL.
+
+    The earlier iframe-wrapper approach hit same-origin policy when inspecting
+    the iframe's contentDocument (the wrapper page was a data: URL, the iframe
+    was http://127.0.0.1:8000), so we couldn't read the embed's DOM.
+    """
+    return ""
+
+
+_M3U8_POLL_JS = """
+() => {
+  const v = window._verifier || {};
+  const vid = document.querySelector('video');
+  return {
+    manifest_parsed: !!v.manifest_parsed,
+    frag_loaded: !!v.frag_loaded,
+    media_loaded: !!v.media_loaded,
+    fatal_network_error: !!v.fatal_network_error,
+    manifest_incompatible: !!v.manifest_incompatible,
+    hls_error_details: v.hls_error_details || "",
+    video_width: vid ? vid.videoWidth : 0,
+    video_ready: vid ? vid.readyState : 0,
+  };
+}
+"""
+
+
+_EMBED_POLL_JS = """
+() => {
+  try {
+    const vids = document.querySelectorAll('video');
+    if (vids.length > 0) {
+      const v = vids[0];
+      return {
+        has_video: true,
+        src: v.currentSrc || v.src || "",
+        width: v.videoWidth,
+        ready: v.readyState,
+        duration: isFinite(v.duration) ? v.duration : 0,
+        media_keys: !!v.mediaKeys,
+        sources: v.querySelectorAll('source').length,
+      };
+    }
+    return {has_video: false};
+  } catch (e) {
+    return {has_video: false, err: String(e)};
+  }
+}
+"""
+
+
+async def _verify_m3u8(page, m3u8_url: str, deadline: float) -> PlaybackVerdict:
+    """Confirm an m3u8 URL is fetchable via hls.js end-to-end.
+
+    Positive signal hierarchy:
+      1. media_loaded (MSE buffer appended) — strongest, codec-supported.
+      2. frag_loaded (hls.js fetched at least one segment) — upstream is OK
+         even if the local browser lacks codecs.
+      3. manifest_parsed without media_loaded but with manifest_incompatible
+         — indicates upstream playlist is valid; player can't decode here
+         but a real user's browser will.
+    Negative signal:
+      - fatal_network_error: upstream is unreachable.
+      - timeout with no manifest_parsed: upstream did not respond.
+    """
+    start = time.monotonic()
+    html = _hls_test_html(m3u8_url)
+    data_url = "data:text/html;base64," + base64.b64encode(html.encode()).decode()
+
+    try:
+        await page.goto(data_url, wait_until="domcontentloaded", timeout=10_000)
+    except Exception as e:
+        return PlaybackVerdict(
+            is_playable=False, error=f"goto failed: {e}",
+            elapsed_ms=int((time.monotonic() - start) * 1000),
+        )
+
+    last_state: dict = {}
+    while time.monotonic() < deadline:
+        try:
+            state = await page.evaluate(_M3U8_POLL_JS)
+        except Exception as e:
+            return PlaybackVerdict(
+                is_playable=False, error=f"evaluate failed: {e}",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        last_state = state
+        if state.get("media_loaded"):
+            return PlaybackVerdict(
+                is_playable=True, signal="media_loaded",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        if state.get("frag_loaded"):
+            return PlaybackVerdict(
+                is_playable=True, signal="frag_loaded",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        # MANIFEST_INCOMPATIBLE_CODECS_ERROR fires after hls.js successfully
+        # fetched and parsed the manifest — the failure is purely local
+        # (chromium lacks H.264). The user's real browser has codecs, so
+        # this URL is playable from the user's perspective.
+        if state.get("manifest_incompatible"):
+            return PlaybackVerdict(
+                is_playable=True, signal="manifest_parsed_codec_missing_in_verifier",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        if state.get("manifest_parsed"):
+            return PlaybackVerdict(
+                is_playable=True, signal="manifest_parsed",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        if state.get("fatal_network_error"):
+            return PlaybackVerdict(
+                is_playable=False, error="upstream network error",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        await asyncio.sleep(0.25)
+
+    err = "no playback signal"
+    if last_state.get("hls_error_details"):
+        err = f"hls.js error: {last_state['hls_error_details']}"
+    return PlaybackVerdict(
+        is_playable=False, error=err,
+        elapsed_ms=int((time.monotonic() - start) * 1000),
+    )
+
+
+async def _verify_embed(page, proxied_url: str, deadline: float) -> PlaybackVerdict:
+    """Navigate directly to the proxied embed and confirm a player rendered.
+
+    Positive signals (in priority order):
+      - <video> with src/sources/mediaKeys set (player wired up).
+      - <video> element exists with any state (script ran, player attaching).
+      - A player container div (jwplayer, video-js, [id*=player], etc.).
+
+    Loading the embed page directly (not via iframe wrapper) avoids the
+    same-origin policy that prevented earlier iframe-introspection runs
+    from seeing the embed DOM.
+    """
+    start = time.monotonic()
+    try:
+        await page.goto(proxied_url, wait_until="domcontentloaded", timeout=15_000)
+    except Exception as e:
+        return PlaybackVerdict(
+            is_playable=False, error=f"goto failed: {e}",
+            elapsed_ms=int((time.monotonic() - start) * 1000),
+        )
+
+    # Track the best state seen across all polls. Some embeds load a player
+    # briefly then anti-bot JS tears the DOM down (hmembeds redirects to
+    # google.com if its devtool-detection trips). We accept any positive
+    # signal observed during the window, even if it's gone by timeout.
+    #
+    # We require an actual <video> element — a "player container div"
+    # is too weak (sportsurge has player-class divs but no real player).
+    seen_video_wired = False
+    seen_video_tag = False
+    last_err = ""
+
+    while time.monotonic() < deadline:
+        try:
+            r = await page.evaluate(_EMBED_POLL_JS)
+        except Exception as e:
+            return PlaybackVerdict(
+                is_playable=False, error=f"evaluate failed: {e}",
+                elapsed_ms=int((time.monotonic() - start) * 1000),
+            )
+        if r.get("has_video"):
+            seen_video_tag = True
+            if r.get("src") or r.get("width", 0) > 0 or r.get("media_keys") or r.get("sources", 0) > 0:
+                seen_video_wired = True
+                return PlaybackVerdict(
+                    is_playable=True, signal="video.wired",
+                    elapsed_ms=int((time.monotonic() - start) * 1000),
+                )
+        last_err = r.get("err", "")
+        await asyncio.sleep(0.5)
+
+    if seen_video_wired:
+        return PlaybackVerdict(is_playable=True, signal="video.wired",
+                               elapsed_ms=int((time.monotonic() - start) * 1000))
+    if seen_video_tag:
+        return PlaybackVerdict(is_playable=True, signal="video.tag_only",
+                               elapsed_ms=int((time.monotonic() - start) * 1000))
+
+    err = "no <video> element rendered"
+    if last_err:
+        err += f"; last_err: {last_err}"
+    return PlaybackVerdict(is_playable=False, error=err,
+                           elapsed_ms=int((time.monotonic() - start) * 1000))
+
+
+class PlaybackVerifier:
+    """Verifies playability of m3u8 and embed URLs via headless Chromium.
+
+    Manages a single browser instance for the process lifetime (cheap per-page
+    contexts) and bounds concurrency with a semaphore.
+    """
+
+    def __init__(self) -> None:
+        self._browser = None
+        self._playwright = None
+        self._sem = asyncio.Semaphore(MAX_CONCURRENCY)
+        self._lock = asyncio.Lock()
+
+    async def _ensure_browser(self):
+        if self._browser is not None:
+            return self._browser
+        async with self._lock:
+            if self._browser is not None:
+                return self._browser
+            try:
+                from playwright.async_api import async_playwright
+            except ImportError:
+                logger.error("playwright not installed — playback verification disabled")
+                return None
+            self._playwright = await async_playwright().start()
+            ws_base = os.getenv("CHROME_WS_URL")
+            ws_token = os.getenv("CHROME_WS_TOKEN")
+            if ws_base and ws_token:
+                self._browser = await self._playwright.chromium.connect(
+                    f"{ws_base.rstrip('/')}/{ws_token}", timeout=15_000,
+                )
+                logger.info("connected to remote chrome-service (concurrency=%d)", MAX_CONCURRENCY)
+            else:
+                self._browser = await self._playwright.chromium.launch(
+                    headless=True,
+                    args=[
+                        "--disable-dev-shm-usage",
+                        "--disable-web-security",
+                        "--no-sandbox",
+                        "--disable-setuid-sandbox",
+                        "--disable-features=IsolateOrigins,site-per-process",
+                        "--autoplay-policy=no-user-gesture-required",
+                    ],
+                )
+                logger.warning("CHROME_WS_URL not set — using in-process Chromium (concurrency=%d)", MAX_CONCURRENCY)
+            return self._browser
+
+    async def shutdown(self) -> None:
+        if self._browser is not None:
+            try:
+                await self._browser.close()
+            except Exception:
+                logger.exception("error closing browser")
+        if self._playwright is not None:
+            try:
+                await self._playwright.stop()
+            except Exception:
+                logger.exception("error stopping playwright")
+        self._browser = None
+        self._playwright = None
+
+    async def verify(self, url: str, stream_type: str) -> PlaybackVerdict:
+        if not VERIFY_ENABLED:
+            return PlaybackVerdict(is_playable=True, error="disabled")
+
+        browser = await self._ensure_browser()
+        if browser is None:
+            return PlaybackVerdict(is_playable=False, error="playwright unavailable")
+
+        is_m3u8 = stream_type == "m3u8"
+        if not is_m3u8:
+            url = f"{PROXY_BASE}/embed?url={_b64url(url)}"
+
+        async with self._sem:
+            # Set the per-stream deadline AFTER acquiring the semaphore.
+            # Otherwise queued streams that wait behind earlier ones
+            # would have already-expired deadlines when they start.
+            deadline = time.monotonic() + PER_STREAM_TIMEOUT
+            try:
+                context = await browser.new_context(
+                    user_agent=USER_AGENT,
+                    viewport={"width": 1280, "height": 720},
+                    bypass_csp=True,
+                )
+                from backend.stealth import STEALTH_JS
+                await context.add_init_script(STEALTH_JS)
+                page = await context.new_page()
+            except Exception as e:
+                return PlaybackVerdict(
+                    is_playable=False, error=f"context create failed: {e}",
+                )
+            try:
+                if is_m3u8:
+                    verdict = await _verify_m3u8(page, url, deadline)
+                else:
+                    verdict = await _verify_embed(page, url, deadline)
+            except asyncio.TimeoutError:
+                verdict = PlaybackVerdict(is_playable=False, error="overall timeout")
+            except Exception as e:
+                verdict = PlaybackVerdict(
+                    is_playable=False, error=f"verify exception: {e}",
+                )
+            finally:
+                try:
+                    await page.close()
+                    await context.close()
+                except Exception:
+                    pass
+            logger.info(
+                "[verify] %s -> playable=%s signal=%s err=%s elapsed=%dms",
+                url[:120], verdict.is_playable, verdict.signal,
+                verdict.error, verdict.elapsed_ms,
+            )
+            return verdict
+
+    async def verify_many(self, items: list[tuple[str, str]]) -> dict[str, PlaybackVerdict]:
+        if not items:
+            return {}
+        if not VERIFY_ENABLED:
+            return {url: PlaybackVerdict(is_playable=True, error="disabled") for url, _ in items}
+
+        async def _run(url: str, stream_type: str):
+            verdict = await self.verify(url, stream_type)
+            return url, verdict
+
+        results = await asyncio.gather(
+            *[_run(url, st) for url, st in items], return_exceptions=True
+        )
+        out: dict[str, PlaybackVerdict] = {}
+        for r in results:
+            if isinstance(r, Exception):
+                logger.exception("verify task crashed: %s", r)
+                continue
+            url, verdict = r
+            out[url] = verdict
+        return out
--- a/stacks/f1-stream/files/backend/requirements.txt
+++ b/stacks/f1-stream/files/backend/requirements.txt
@ -3,3 +3,4 @@ uvicorn[standard]
 httpx>=0.27.0
 apscheduler>=3.10.0,<4.0
 pydantic>=2.0.0
+playwright==1.48.0
--- a/stacks/f1-stream/files/backend/stealth.py
+++ b/stacks/f1-stream/files/backend/stealth.py
@ -0,0 +1,43 @@
+"""Vendored Playwright stealth init script.
+
+Mirror of `stacks/chrome-service/files/stealth.js`. Kept in sync by hand
+— update both files together if the JS is changed.
+"""
+
+STEALTH_JS = r"""
+(() => {
+  Object.defineProperty(Navigator.prototype, 'webdriver', { get: () => undefined });
+  if (!window.chrome) window.chrome = {};
+  window.chrome.runtime = window.chrome.runtime || {};
+  Object.defineProperty(navigator, 'plugins', {
+    get: () => [{ name: 'Chrome PDF Plugin' }, { name: 'Chrome PDF Viewer' }, { name: 'Native Client' }],
+  });
+  Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
+  const origQuery = window.navigator.permissions && window.navigator.permissions.query;
+  if (origQuery) {
+    window.navigator.permissions.query = (parameters) =>
+      parameters && parameters.name === 'notifications'
+        ? Promise.resolve({ state: Notification.permission })
+        : origQuery(parameters);
+  }
+  const spoofGl = (proto) => {
+    if (!proto) return;
+    const orig = proto.getParameter;
+    proto.getParameter = function (parameter) {
+      if (parameter === 37445) return 'Intel Inc.';
+      if (parameter === 37446) return 'Intel Iris OpenGL Engine';
+      return orig.apply(this, arguments);
+    };
+  };
+  spoofGl(window.WebGLRenderingContext && window.WebGLRenderingContext.prototype);
+  spoofGl(window.WebGL2RenderingContext && window.WebGL2RenderingContext.prototype);
+  // disable-devtool.js auto-init evasion: hide the marker attribute so the
+  // library's IIFE exits early. Without this, hmembeds-class players redirect
+  // to google.com when the Performance detector trips under Playwright.
+  const origQS = Document.prototype.querySelector;
+  Document.prototype.querySelector = function (sel) {
+    if (typeof sel === 'string' && sel.indexOf('disable-devtool-auto') !== -1) return null;
+    return origQS.apply(this, arguments);
+  };
+})();
+"""
--- a/stacks/f1-stream/files/frontend/src/lib/api.js
+++ b/stacks/f1-stream/files/frontend/src/lib/api.js
@ -44,6 +44,20 @@ export function getProxyUrl(m3u8Url) {
 	return `${API_BASE}/proxy?url=${encoded}`;
 }

+/**
+ * Get the embed-proxy URL for an upstream iframe embed page.
+ *
+ * The proxy strips X-Frame-Options / CSP frame-ancestors and injects a
+ * frame-buster-defeat script so the embed renders inside our iframe even
+ * when the upstream tries to block it.
+ * @param {string} embedUrl - The original embed page URL
+ * @returns {string} URL pointing at our /embed proxy
+ */
+export function getEmbedProxyUrl(embedUrl) {
+	const encoded = toBase64Url(embedUrl);
+	return `${API_BASE}/embed?url=${encoded}`;
+}
+
 /**
 * Mark a stream as actively being watched (enables token refresh).
 * @param {string} url - The stream URL
--- a/stacks/f1-stream/files/frontend/src/routes/watch/+page.svelte
+++ b/stacks/f1-stream/files/frontend/src/routes/watch/+page.svelte
@ -1,5 +1,5 @@
 <script>
-	import { fetchStreams, fetchSchedule, getProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
+	import { fetchStreams, fetchSchedule, getProxyUrl, getEmbedProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
 	import { onMount, onDestroy } from 'svelte';
 	import { page } from '$app/state';

@ -107,12 +107,14 @@
 		}

 		if (stream.stream_type === 'embed') {
-			// Embed/iframe player — no hls.js needed
+			// Embed/iframe player — route through our /embed proxy so the
+			// upstream's X-Frame-Options / CSP / JS frame-busters can't
+			// block the iframe.
 			const newPlayer = {
 				id: Date.now(),
 				proxyUrl: '',
 				originalUrl: stream.embed_url,
-				embedUrl: stream.embed_url,
+				embedUrl: getEmbedProxyUrl(stream.embed_url),
 				streamType: 'embed',
 				siteKey: stream.site_key || '',
 				siteName: stream.site_name || stream.site_key || 'Unknown',
@ -173,9 +175,13 @@
 		if (!player || !player.videoEl) return;

 		if (Hls.isSupported()) {
+			// `lowLatencyMode` previously broke playback on regular (non-LL-HLS)
+			// providers like RallyTV — they don't ship the LL-HLS extensions
+			// hls.js needs in that mode. Default off; explicit per-stream flag
+			// can re-enable later.
 			const hlsInstance = new Hls({
 				enableWorker: true,
-				lowLatencyMode: true,
+				lowLatencyMode: false,
 				backBufferLength: 90
 			});

--- a/stacks/f1-stream/main.tf
+++ b/stacks/f1-stream/main.tf
@ -11,7 +11,8 @@ resource "kubernetes_namespace" "f1-stream" {
    name = "f1-stream"
    labels = {
      "istio-injection" : "disabled"
-      tier = local.tiers.aux
+      tier                                    = local.tiers.aux
+      "chrome-service.viktorbarzin.me/client" = "true"
    }
  }
  lifecycle {
@ -47,6 +48,35 @@ resource "kubernetes_manifest" "external_secret" {
  depends_on = [kubernetes_namespace.f1-stream]
 }

+# Pull the chrome-service bearer token into this namespace as a separate
+# Secret so the verifier can reach the in-cluster Playwright pool.
+resource "kubernetes_manifest" "chrome_service_client_secret" {
+  manifest = {
+    apiVersion = "external-secrets.io/v1beta1"
+    kind       = "ExternalSecret"
+    metadata = {
+      name      = "chrome-service-client-secrets"
+      namespace = "f1-stream"
+    }
+    spec = {
+      refreshInterval = "15m"
+      secretStoreRef = {
+        name = "vault-kv"
+        kind = "ClusterSecretStore"
+      }
+      target = {
+        name = "chrome-service-client-secrets"
+      }
+      dataFrom = [{
+        extract = {
+          key = "chrome-service"
+        }
+      }]
+    }
+  }
+  depends_on = [kubernetes_namespace.f1-stream]
+}
+
 resource "kubernetes_persistent_volume_claim" "data_proxmox" {
  wait_until_bound = false
  metadata {
@ -104,11 +134,11 @@ resource "kubernetes_deployment" "f1-stream" {
          name              = "f1-stream"
          resources {
            limits = {
-              memory = "256Mi"
+              memory = "1Gi"
            }
            requests = {
-              cpu    = "25m"
-              memory = "256Mi"
+              cpu    = "100m"
+              memory = "1Gi"
            }
          }
          port {
@ -127,6 +157,29 @@ resource "kubernetes_deployment" "f1-stream" {
            name  = "DISCORD_CHANNELS"
            value = var.discord_f1_channel_ids
          }
+          # Verifier connects to in-cluster headed Chromium pool — see
+          # stacks/chrome-service/. Falls back to in-process headless if unset.
+          env {
+            name  = "CHROME_WS_URL"
+            value = "ws://chrome-service.chrome-service.svc.cluster.local:3000"
+          }
+          env {
+            name = "CHROME_WS_TOKEN"
+            value_from {
+              secret_key_ref {
+                name = "chrome-service-client-secrets"
+                key  = "api_bearer_token"
+              }
+            }
+          }
+          # The embed proxy (this pod's /embed?url=…) must be reachable from
+          # the remote chrome-service pod. Default 127.0.0.1 only works for
+          # in-process Chromium — for the remote browser we point it at our
+          # own ClusterIP service.
+          env {
+            name  = "PLAYBACK_VERIFY_PROXY_BASE"
+            value = "http://f1.f1-stream.svc.cluster.local"
+          }
          volume_mount {
            name       = "data"
            mount_path = "/data"
--- a/stacks/fire-planner/main.tf
+++ b/stacks/fire-planner/main.tf
@ -8,7 +8,11 @@ variable "postgresql_host" { type = string }

 locals {
  namespace = "fire-planner"
-  image     = "registry.viktorbarzin.me/fire-planner:${var.image_tag}"
+  # Phase 3 cutover 2026-05-07. NOTE: the registry-private repo for
+  # fire-planner has 0 tags — first build via Woodpecker on the new Forgejo
+  # repo (viktor/fire-planner, Dockerfile + .woodpecker.yml added 2026-05-07)
+  # must succeed BEFORE the next pod restart, otherwise pulls will 404.
+  image = "forgejo.viktorbarzin.me/viktor/fire-planner:${var.image_tag}"
  labels = {
    app = "fire-planner"
  }
--- a/stacks/forgejo/cleanup.tf
+++ b/stacks/forgejo/cleanup.tf
@ -0,0 +1,123 @@
+# Forgejo container-package retention CronJob.
+#
+# Forgejo's per-package "Cleanup Rules" UI is not exposed via Terraform —
+# it's per-user runtime state inside the Forgejo DB. Driving retention from
+# a CronJob hitting the public API keeps the policy versioned in this repo.
+#
+# Auth: a write:package PAT belonging to ci-pusher (same user that pushes
+# from CI). DELETE on packages requires write:package scope. PAT lives in
+# Vault at secret/viktor/forgejo_cleanup_token.
+
+data "vault_kv_secret_v2" "forgejo_viktor" {
+  mount = "secret"
+  name  = "viktor"
+}
+
+locals {
+  # Flip to false after first 7 days of dry-run logs look correct.
+  forgejo_cleanup_dry_run = true
+}
+
+resource "kubernetes_config_map" "forgejo_cleanup_script" {
+  metadata {
+    name      = "forgejo-cleanup-script"
+    namespace = kubernetes_namespace.forgejo.metadata[0].name
+  }
+  data = {
+    "cleanup.sh" = file("${path.module}/files/cleanup.sh")
+  }
+}
+
+resource "kubernetes_secret" "forgejo_cleanup_token" {
+  metadata {
+    name      = "forgejo-cleanup-token"
+    namespace = kubernetes_namespace.forgejo.metadata[0].name
+  }
+  type = "Opaque"
+  data = {
+    # try() so the apply succeeds before the Vault key is populated during
+    # Phase 0 bootstrap (see docs/runbooks/forgejo-registry-setup.md). Empty
+    # token causes the cleanup CronJob to fail visibly — that's intended.
+    FORGEJO_TOKEN = try(data.vault_kv_secret_v2.forgejo_viktor.data["forgejo_cleanup_token"], "")
+  }
+}
+
+resource "kubernetes_cron_job_v1" "forgejo_cleanup" {
+  metadata {
+    name      = "forgejo-cleanup"
+    namespace = kubernetes_namespace.forgejo.metadata[0].name
+  }
+  spec {
+    concurrency_policy            = "Forbid"
+    schedule                      = "0 4 * * *"
+    failed_jobs_history_limit     = 3
+    successful_jobs_history_limit = 3
+    job_template {
+      metadata {}
+      spec {
+        backoff_limit              = 1
+        ttl_seconds_after_finished = 3600
+        template {
+          metadata {}
+          spec {
+            container {
+              name    = "cleanup"
+              image   = "docker.io/library/alpine:3.20"
+              command = ["/bin/sh", "/scripts/cleanup.sh"]
+              env {
+                name = "FORGEJO_TOKEN"
+                value_from {
+                  secret_key_ref {
+                    name = kubernetes_secret.forgejo_cleanup_token.metadata[0].name
+                    key  = "FORGEJO_TOKEN"
+                  }
+                }
+              }
+              env {
+                name  = "FORGEJO_HOST"
+                value = "http://forgejo.forgejo.svc.cluster.local"
+              }
+              env {
+                name  = "FORGEJO_OWNER"
+                value = "viktor"
+              }
+              env {
+                name  = "KEEP_LAST_N"
+                value = "10"
+              }
+              env {
+                name  = "DRY_RUN"
+                value = local.forgejo_cleanup_dry_run ? "true" : "false"
+              }
+              volume_mount {
+                name       = "scripts"
+                mount_path = "/scripts"
+              }
+              resources {
+                requests = {
+                  cpu    = "10m"
+                  memory = "32Mi"
+                }
+                limits = {
+                  memory = "96Mi"
+                }
+              }
+            }
+            volume {
+              name = "scripts"
+              config_map {
+                name         = kubernetes_config_map.forgejo_cleanup_script.metadata[0].name
+                default_mode = "0755"
+              }
+            }
+            restart_policy = "OnFailure"
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
+    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
+  }
+}
--- a/stacks/forgejo/files/cleanup.sh
+++ b/stacks/forgejo/files/cleanup.sh
@ -0,0 +1,109 @@
+#!/bin/sh
+# Forgejo container-package retention.
+#
+# For each container package owned by ${FORGEJO_OWNER}, keep newest
+# ${KEEP_LAST_N} versions + always keep tag "latest". Deletes the rest via
+# DELETE /api/v1/packages/{owner}/container/{name}/{version}.
+#
+# DRY_RUN=true logs what would be deleted but issues no DELETE calls.
+#
+# Required env:
+#   FORGEJO_HOST   e.g. http://forgejo.forgejo.svc.cluster.local
+#   FORGEJO_OWNER  e.g. viktor
+#   FORGEJO_USER   PAT owner (write:package scope)
+#   FORGEJO_TOKEN  PAT
+#   KEEP_LAST_N    integer (default 10)
+#   DRY_RUN        true|false (default true)
+
+set -eu
+
+apk add --no-cache curl jq >/dev/null
+
+OWNER="${FORGEJO_OWNER}"
+KEEP="${KEEP_LAST_N:-10}"
+DRY="${DRY_RUN:-true}"
+BASE="${FORGEJO_HOST%/}/api/v1"
+
+AUTH_HEADER="Authorization: token $FORGEJO_TOKEN"
+
+echo "Forgejo cleanup: owner=$OWNER keep_last=$KEEP dry_run=$DRY"
+echo "API base: $BASE"
+
+# Page through ALL container packages.
+TMPDIR=$(mktemp -d)
+trap 'rm -rf "$TMPDIR"' EXIT
+ALL="$TMPDIR/all.json"
+echo "[]" > "$ALL"
+
+PAGE=1
+while :; do
+  RESP=$(curl -sf -H "$AUTH_HEADER" \
+    "$BASE/packages/$OWNER?type=container&limit=50&page=$PAGE")
+  COUNT=$(echo "$RESP" | jq 'length')
+  if [ "$COUNT" = "0" ]; then break; fi
+  jq -s '.[0] + .[1]' "$ALL" <(echo "$RESP") > "$TMPDIR/merged.json"
+  mv "$TMPDIR/merged.json" "$ALL"
+  PAGE=$((PAGE + 1))
+  # Safety: never run away.
+  if [ "$PAGE" -gt 100 ]; then break; fi
+done
+
+TOTAL=$(jq 'length' "$ALL")
+echo "Found $TOTAL package version(s)."
+
+if [ "$TOTAL" = "0" ]; then
+  echo "Nothing to do."
+  exit 0
+fi
+
+# Group by name and process each group.
+NAMES=$(jq -r '.[].name' "$ALL" | sort -u)
+
+DEL=0
+KEPT=0
+
+for NAME in $NAMES; do
+  # All versions of this name, sorted by created_at descending.
+  jq --arg n "$NAME" '
+    [.[] | select(.name == $n)]
+    | sort_by(.created_at) | reverse
+  ' "$ALL" > "$TMPDIR/$NAME.json"
+
+  N_VERSIONS=$(jq 'length' "$TMPDIR/$NAME.json")
+  echo "[$NAME] $N_VERSIONS version(s)"
+
+  # Build the keep set: top $KEEP + anything tagged 'latest'.
+  jq -r --argjson keep "$KEEP" '
+    [.[0:$keep][].version] + [.[] | select(.version == "latest") | .version]
+    | unique
+    | .[]
+  ' "$TMPDIR/$NAME.json" > "$TMPDIR/$NAME.keep"
+
+  # Build the delete set.
+  jq -r '.[].version' "$TMPDIR/$NAME.json" \
+    | grep -vxFf "$TMPDIR/$NAME.keep" > "$TMPDIR/$NAME.delete" || true
+
+  D_COUNT=$(wc -l < "$TMPDIR/$NAME.delete" | tr -d ' ')
+  K_COUNT=$(wc -l < "$TMPDIR/$NAME.keep" | tr -d ' ')
+  echo "  keep=$K_COUNT delete=$D_COUNT"
+  KEPT=$((KEPT + K_COUNT))
+
+  while IFS= read -r VER; do
+    [ -z "$VER" ] && continue
+    URL="$BASE/packages/$OWNER/container/$NAME/$VER"
+    if [ "$DRY" = "true" ]; then
+      echo "  DRY_RUN would DELETE $URL"
+    else
+      HTTP=$(curl -s -o /dev/null -w '%{http_code}' \
+        -X DELETE -H "$AUTH_HEADER" "$URL" || echo "000")
+      if [ "$HTTP" = "204" ] || [ "$HTTP" = "200" ]; then
+        echo "  deleted $NAME:$VER"
+      else
+        echo "  FAIL $NAME:$VER HTTP $HTTP"
+      fi
+    fi
+    DEL=$((DEL + 1))
+  done < "$TMPDIR/$NAME.delete"
+done
+
+echo "Summary: kept=$KEPT to_delete=$DEL dry_run=$DRY"
--- a/stacks/forgejo/main.tf
+++ b/stacks/forgejo/main.tf
@ -32,7 +32,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
    annotations = {
      "resize.topolvm.io/threshold"     = "80%"
      "resize.topolvm.io/increase"      = "50%"
-      "resize.topolvm.io/storage_limit" = "20Gi"
+      "resize.topolvm.io/storage_limit" = "50Gi"
    }
  }
  spec {
@ -40,7 +40,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
    storage_class_name = "proxmox-lvm-encrypted"
    resources {
      requests = {
-        storage = "5Gi"
+        storage = "15Gi"
      }
    }
  }
@ -72,6 +72,14 @@ resource "kubernetes_deployment" "forgejo" {
        }
      }
      spec {
+        # fsGroup chowns the mounted PVC to GID 1000 (the forgejo user) on
+        # mount. Without this, /data is owned by root and the
+        # `[packages].CHUNKED_UPLOAD_PATH` default at /data/tmp is not
+        # writable, crashlooping the pod when packages is enabled. Pre-23-day
+        # Forgejo ran without packages on so this never surfaced.
+        security_context {
+          fs_group = 1000
+        }
        container {
          name  = "forgejo"
          image = "codeberg.org/forgejo/forgejo:11"
@ -101,10 +109,30 @@ resource "kubernetes_deployment" "forgejo" {
            name  = "FORGEJO__openid__ENABLE_OPENID_SIGNIN"
            value = "false"
          }
-          # Allow webhook delivery to internal k8s services
+          # Allow webhook delivery to internal k8s services AND to the public
+          # ingress hostnames Forgejo's own webhooks point to (ci.viktorbarzin.me
+          # for Woodpecker pipelines).
          env {
            name  = "FORGEJO__webhook__ALLOWED_HOST_LIST"
-            value = "*.svc.cluster.local"
+            value = "*.svc.cluster.local,ci.viktorbarzin.me,*.viktorbarzin.me"
+          }
+          # Default DELIVER_TIMEOUT is 5s — too tight for the Cloudflare-tunnel
+          # round-trip on first request after pod restart (cold TLS handshake
+          # can hit 6-8s). 30s comfortably covers retries.
+          env {
+            name  = "FORGEJO__webhook__DELIVER_TIMEOUT"
+            value = "30"
+          }
+          # OCI registry (container packages). Default-on in Forgejo v11 but
+          # explicit so it can't be silently disabled by an upstream config
+          # change. CHUNKED_UPLOAD_PATH defaults to `data/tmp/package-upload`
+          # under Forgejo's AppDataPath (resolves to a writable subdir of
+          # /data/gitea/) — overriding to /data/tmp directly hits a perms
+          # issue because /data is the volume mount root and is not chowned
+          # to the forgejo user.
+          env {
+            name  = "FORGEJO__packages__ENABLED"
+            value = "true"
          }
          volume_mount {
            name       = "data"
@ -113,10 +141,10 @@ resource "kubernetes_deployment" "forgejo" {
          resources {
            requests = {
              cpu    = "15m"
-              memory = "384Mi"
+              memory = "1Gi"
            }
            limits = {
-              memory = "384Mi"
+              memory = "1Gi"
            }
          }
          port {
@ -165,6 +193,9 @@ module "ingress" {
  namespace       = kubernetes_namespace.forgejo.metadata[0].name
  name            = "forgejo"
  tls_secret_name = var.tls_secret_name
+  # OCI registry pushes ship full image layer blobs in one request; default
+  # Traefik buffering chokes on anything past a few hundred MB.
+  max_body_size = "5g"
  extra_annotations = {
    "gethomepage.dev/enabled"                      = "true"
    "gethomepage.dev/name"                         = "Forgejo"
--- a/stacks/freedify/factory/main.tf
+++ b/stacks/freedify/factory/main.tf
@ -105,7 +105,8 @@ resource "kubernetes_deployment" "freedify" {
          name = "registry-credentials"
        }
        container {
-          image = "registry.viktorbarzin.me/freedify:${var.tag}"
+          # Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
+          image = "forgejo.viktorbarzin.me/viktor/freedify:${var.tag}"
          name  = "freedify"

          port {
--- a/stacks/infra/main.tf
+++ b/stacks/infra/main.tf
@ -75,13 +75,13 @@ module "k8s-node-template" {
  mkdir -p /etc/containerd/certs.d/ghcr.io
  printf 'server = "https://ghcr.io"\n\n[host."http://10.0.20.10:5010"]\n  capabilities = ["pull", "resolve"]\n\n[host."https://ghcr.io"]\n  capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/ghcr.io/hosts.toml

-  # Create hosts.toml for private registry — both IP and hostname entries
-  # IP-based (10.0.20.10:5050): direct access, skip TLS verify (wildcard cert, no IP SAN)
-  mkdir -p /etc/containerd/certs.d/10.0.20.10:5050
-  printf 'server = "https://10.0.20.10:5050"\n\n[host."https://10.0.20.10:5050"]\n  capabilities = ["pull", "resolve", "push"]\n  skip_verify = true\n' > /etc/containerd/certs.d/10.0.20.10:5050/hosts.toml
-  # Hostname-based (registry.viktorbarzin.me): redirects to LAN IP to avoid Traefik round-trip
-  mkdir -p /etc/containerd/certs.d/registry.viktorbarzin.me
-  printf 'server = "https://registry.viktorbarzin.me"\n\n[host."https://10.0.20.10:5050"]\n  capabilities = ["pull", "resolve", "push"]\n  skip_verify = true\n' > /etc/containerd/certs.d/registry.viktorbarzin.me/hosts.toml
+  # Forgejo OCI registry: redirect to in-cluster Traefik LB (10.0.20.200) so
+  # pulls don't hairpin out through the WAN gateway. Traefik serves the
+  # *.viktorbarzin.me wildcard so SNI verification still passes.
+  # registry.viktorbarzin.me / 10.0.20.10:5050 entries removed in Phase 4 of
+  # the forgejo-registry-consolidation 2026-05-07 — registry-private is gone.
+  mkdir -p /etc/containerd/certs.d/forgejo.viktorbarzin.me
+  printf 'server = "https://forgejo.viktorbarzin.me"\n\n[host."https://10.0.20.200"]\n  capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml

  # Low-traffic registries (registry.k8s.io, quay.io, reg.kyverno.io) pull directly.
  # Pull-through cache removed: caused corrupted images (truncated downloads)
--- a/stacks/job-hunter/main.tf
+++ b/stacks/job-hunter/main.tf
@ -8,7 +8,8 @@ variable "postgresql_host" { type = string }

 locals {
  namespace = "job-hunter"
-  image     = "registry.viktorbarzin.me/job-hunter:${var.image_tag}"
+  # Phase 3 cutover 2026-05-07 — see infra/docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md.
+  image = "forgejo.viktorbarzin.me/viktor/job-hunter:${var.image_tag}"
  labels = {
    app = "job-hunter"
  }
--- a/stacks/kms/.terraform.lock.hcl
+++ b/stacks/kms/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }

+provider "registry.terraform.io/goauthentik/authentik" {
+  version     = "2024.12.1"
+  constraints = "~> 2024.10"
+  hashes = [
+    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
+  ]
+}
+
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/kms/main.tf
+++ b/stacks/kms/main.tf
@ -24,16 +24,6 @@ module "tls_secret" {
  tls_secret_name = var.tls_secret_name
 }

-resource "kubernetes_config_map" "kms-web-page" {
-  metadata {
-    name      = "kms-web-page-config"
-    namespace = kubernetes_namespace.kms.metadata[0].name
-  }
-  data = {
-    "index.html" = var.index_html
-  }
-}
-
 resource "kubernetes_deployment" "kms-web-page" {
  metadata {
    name      = "kms-web-page"
@ -59,8 +49,11 @@ resource "kubernetes_deployment" "kms-web-page" {
        }
      }
      spec {
+        image_pull_secrets {
+          name = "registry-credentials"
+        }
        container {
-          image             = "nginx"
+          image             = "forgejo.viktorbarzin.me/viktor/kms-website:${var.image_tag}"
          name              = "kms-web-page"
          image_pull_policy = "IfNotPresent"
          resources {
@ -76,29 +69,17 @@ resource "kubernetes_deployment" "kms-web-page" {
            container_port = 80
            protocol       = "TCP"
          }
-          volume_mount {
-            name       = "config"
-            mount_path = "/usr/share/nginx/html/"
-          }
-        }
-
-        volume {
-          name = "config"
-          config_map {
-            name = "kms-web-page-config"
-            items {
-              key  = "index.html"
-              path = "index.html"
-            }
-          }
        }
      }
    }
  }
-  depends_on = [kubernetes_config_map.kms-web-page]
  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
-    ignore_changes = [spec[0].template[0].spec[0].dns_config]
+    ignore_changes = [
+      # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
+      spec[0].template[0].spec[0].dns_config,
+      # CI (Woodpecker) manages the live image tag via `kubectl set image`
+      spec[0].template[0].spec[0].container[0].image,
+    ]
  }
 }

--- a/stacks/kms/variables.tf
+++ b/stacks/kms/variables.tf
@ -1,68 +1,5 @@
-variable "index_html" {
-
-  default = <<EOT
-<h1>How to activate windows</h1>
-Open the following link and find a key for you version of windows: </br>
-<b><a href="https://goo.gl/BcrPjW" target="_blank">https://goo.gl/BcrPjW</a></b>
-</br>
-</br>
-Open cmd as <b>Administrator</b> and run the following: </br>
-</br>
-<b>slmgr.vbs /ipk key_for_your_windows</b>
-</br>
-<b>slmgr.vbs /skms kms.viktorbarzin.me </b>
-<br>
-<b>
-    slmgr /ato
-</b>
-<br>
-<p>
-<h3> If you have an evaluation windows, you need to change it to retail one. This is how:</h3>
-<br>
-From an elevated command prompt, determine the current edition name with the command <br>
-<strong>DISM /online /Get-CurrentEdition</strong>.
-<br>Make note of the edition ID, an abbreviated form of the edition name. Then run
-<br>
-<strong>DISM /online /Set-Edition:<edition ID> /ProductKey:XXXXX-XXXXX-XXXXX-XXXXX-XXXXX /AcceptEula</strong>
-<br> providing the edition ID and a retail product key. The server will restart
-</p>
-<hr>
-
-
-<h1>How to activate Microsoft Office</h1>
-<br>
-<b>
-    CD \Program Files\Microsoft Office\Office16 </b> OR <b>CD \Program Files (x86)\Microsoft Office\Office16
-</b>
-<br>
-<b>
-    cscript ospp.vbs /sethst:kms.viktorbarzin.me
-</b>
-<br>
-<b>
-    cscript ospp.vbs /inpkey:xxxxx-xxxxx-xxxxx-xxxxx-xxxxx
-</b>
-<br>
-where 'xxxx' is a key for your office. Some examples for office 2016 - <a
-    href="https://www.techdee.com/microsoft-office-2016-product-key/">https://www.techdee.com/microsoft-office-2016-product-key/</a>
-<br>
-<b>
-    cscript ospp.vbs /act
-</b>
-
-<br>
-<br>
-If you messed up activation settings reset them using
-<br>
-slmgr /upk
-
-<br>
-slmgr /cpky
-<br>
-and
-<br>
-slmgr /rearm
-
-<h3>Buy me a beer :P</h3>
-EOT
+variable "image_tag" {
+  type        = string
+  default     = "latest"
+  description = "kms-website image tag pushed to forgejo.viktorbarzin.me/viktor/kms-website. Use 8-char git SHA in CI."
 }
--- a/stacks/kyverno/modules/kyverno/registry-credentials.tf
+++ b/stacks/kyverno/modules/kyverno/registry-credentials.tf
@ -20,14 +20,14 @@ resource "kubernetes_secret" "registry_credentials" {
  data = {
    ".dockerconfigjson" = jsonencode({
      auths = {
-        "registry.viktorbarzin.me" = {
-          auth = base64encode("${data.vault_kv_secret_v2.viktor.data["registry_user"]}:${data.vault_kv_secret_v2.viktor.data["registry_password"]}")
-        }
-        "registry.viktorbarzin.me:5050" = {
-          auth = base64encode("${data.vault_kv_secret_v2.viktor.data["registry_user"]}:${data.vault_kv_secret_v2.viktor.data["registry_password"]}")
-        }
-        "10.0.20.10:5050" = {
-          auth = base64encode("${data.vault_kv_secret_v2.viktor.data["registry_user"]}:${data.vault_kv_secret_v2.viktor.data["registry_password"]}")
+        # Phase 4 of forgejo-registry-consolidation 2026-05-07 — registry-
+        # private decommissioned. Old auths entries (registry.viktorbarzin.me,
+        # registry.viktorbarzin.me:5050, 10.0.20.10:5050) removed to prevent
+        # silent fallback. If a pod somehow references the old hostname now,
+        # it will visibly fail with auth missing rather than silently pulling
+        # potentially-stale blobs.
+        "forgejo.viktorbarzin.me" = {
+          auth = base64encode("cluster-puller:${try(data.vault_kv_secret_v2.viktor.data["forgejo_pull_token"], "")}")
        }
      }
    })
--- a/stacks/monitoring/main.tf
+++ b/stacks/monitoring/main.tf
@ -33,5 +33,10 @@ module "monitoring" {
  kube_config_path              = var.kube_config_path
  registry_user                 = data.vault_kv_secret_v2.viktor.data["registry_user"]
  registry_password             = data.vault_kv_secret_v2.viktor.data["registry_password"]
-  tier                          = local.tiers.cluster
+  # try() so apply succeeds before the Vault key is populated during Phase 0
+  # bootstrap (see docs/runbooks/forgejo-registry-setup.md). Empty token =
+  # probe will report an auth failure and fire RegistryCatalogInaccessible —
+  # that's the intended visible-broken state until the PAT is created.
+  forgejo_pull_token = try(data.vault_kv_secret_v2.viktor.data["forgejo_pull_token"], "")
+  tier               = local.tiers.cluster
 }
--- a/stacks/monitoring/modules/monitoring/dashboards/openclaw.json
+++ b/stacks/monitoring/modules/monitoring/dashboards/openclaw.json
@ -0,0 +1,476 @@
+{
+  "annotations": {"list": []},
+  "editable": true,
+  "fiscalYearStartMonth": 0,
+  "graphTooltip": 0,
+  "id": null,
+  "links": [],
+  "liveNow": false,
+  "refresh": "30s",
+  "schemaVersion": 38,
+  "tags": ["openclaw", "ai", "codex"],
+  "time": {"from": "now-6h", "to": "now"},
+  "timepicker": {},
+  "timezone": "",
+  "title": "OpenClaw — Codex Usage",
+  "uid": "openclaw-codex",
+  "version": 1,
+  "panels": [
+    {
+      "type": "row",
+      "id": 100,
+      "title": "Now",
+      "gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
+      "collapsed": false,
+      "panels": []
+    },
+    {
+      "type": "stat",
+      "id": 1,
+      "title": "Messages last 5h — gpt-5.4-mini",
+      "description": "Plus rate-card lower bound: 1,200 / 5h. Hard cap at the upper bound: 7,000 / 5h.",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 0, "y": 1},
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
+        "textMode": "auto"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "decimals": 0,
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "green", "value": null},
+              {"color": "yellow", "value": 960},
+              {"color": "orange", "value": 1500},
+              {"color": "red", "value": 5600}
+            ]
+          },
+          "unit": "short"
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum(increase(openclaw_codex_messages_total{provider=\"openai-codex\",model=\"gpt-5.4-mini\"}[5h]))",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "gauge",
+      "id": 2,
+      "title": "% of Plus 5h floor (1,200 cap)",
+      "description": "Conservative gauge against the lower bound of the published rate-card. Real ceiling depends on dynamic allocation (1,200–7,000). Re-baseline if you observe throttling at <80%.",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 6, "y": 1},
+      "options": {
+        "orientation": "auto",
+        "showThresholdLabels": false,
+        "showThresholdMarkers": true,
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "min": 0,
+          "max": 100,
+          "decimals": 1,
+          "unit": "percent",
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "green", "value": null},
+              {"color": "yellow", "value": 60},
+              {"color": "orange", "value": 80},
+              {"color": "red", "value": 95}
+            ]
+          }
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "100 * sum(increase(openclaw_codex_messages_total{provider=\"openai-codex\",model=\"gpt-5.4-mini\"}[5h])) / 1200",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "stat",
+      "id": 3,
+      "title": "Tokens last 5h (input + output, codex)",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 12, "y": 1},
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "decimals": 0,
+          "unit": "short",
+          "thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum(increase(openclaw_codex_input_tokens_total{provider=\"openai-codex\"}[5h])) + sum(increase(openclaw_codex_output_tokens_total{provider=\"openai-codex\"}[5h]))",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "stat",
+      "id": 4,
+      "title": "Cache hit ratio (codex, 5h)",
+      "description": "cacheRead / (cacheRead + input). Higher is better — caching cuts effective Plus quota burn.",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 18, "y": 1},
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "min": 0,
+          "max": 100,
+          "decimals": 1,
+          "unit": "percent",
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "red", "value": null},
+              {"color": "yellow", "value": 30},
+              {"color": "green", "value": 60}
+            ]
+          }
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "100 * sum(increase(openclaw_codex_cache_read_tokens_total{provider=\"openai-codex\"}[5h])) / clamp_min(sum(increase(openclaw_codex_input_tokens_total{provider=\"openai-codex\"}[5h])) + sum(increase(openclaw_codex_cache_read_tokens_total{provider=\"openai-codex\"}[5h])), 1)",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "stat",
+      "id": 5,
+      "title": "OAuth token expiry",
+      "description": "Days until the openai-codex OAuth token expires. Re-run `openclaw models auth login --provider openai-codex` before this hits 0.",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 0, "y": 6},
+      "options": {
+        "colorMode": "background",
+        "graphMode": "none",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "decimals": 1,
+          "unit": "d",
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "red", "value": null},
+              {"color": "orange", "value": 1},
+              {"color": "yellow", "value": 3},
+              {"color": "green", "value": 5}
+            ]
+          }
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "max(openclaw_codex_oauth_expiry_seconds{provider=\"openai-codex\"}) / 86400",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "stat",
+      "id": 6,
+      "title": "Active sessions",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 6, "y": 6},
+      "options": {
+        "colorMode": "value",
+        "graphMode": "none",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": true},
+        "textMode": "value_and_name"
+      },
+      "fieldConfig": {
+        "defaults": {
+          "unit": "short",
+          "thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "openclaw_codex_active_sessions",
+          "legendFormat": "{{kind}}",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "stat",
+      "id": 7,
+      "title": "Last assistant turn",
+      "description": "Time since the latest assistant message landed in any session.",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 12, "y": 6},
+      "options": {
+        "colorMode": "background",
+        "graphMode": "none",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "unit": "s",
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "green", "value": null},
+              {"color": "yellow", "value": 1800},
+              {"color": "orange", "value": 7200},
+              {"color": "red", "value": 86400}
+            ]
+          }
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "time() - openclaw_codex_last_run_timestamp",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "stat",
+      "id": 8,
+      "title": "Errors last 24h",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 5, "w": 6, "x": 18, "y": 6},
+      "options": {
+        "colorMode": "background",
+        "graphMode": "area",
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "decimals": 0,
+          "unit": "short",
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "green", "value": null},
+              {"color": "yellow", "value": 1},
+              {"color": "red", "value": 10}
+            ]
+          }
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum(increase(openclaw_codex_message_errors_total[24h]))",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "row",
+      "id": 200,
+      "title": "Over time",
+      "gridPos": {"h": 1, "w": 24, "x": 0, "y": 11},
+      "collapsed": false,
+      "panels": []
+    },
+    {
+      "type": "timeseries",
+      "id": 10,
+      "title": "Messages / min by model",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 12},
+      "fieldConfig": {
+        "defaults": {
+          "color": {"mode": "palette-classic"},
+          "custom": {
+            "drawStyle": "bars",
+            "fillOpacity": 60,
+            "lineWidth": 1,
+            "stacking": {"mode": "normal"}
+          },
+          "unit": "short"
+        }
+      },
+      "options": {
+        "legend": {"displayMode": "table", "placement": "right", "showLegend": true, "calcs": ["sum"]},
+        "tooltip": {"mode": "multi", "sort": "desc"}
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum by (provider, model) (rate(openclaw_codex_messages_total[1m])) * 60",
+          "legendFormat": "{{provider}}/{{model}}",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "timeseries",
+      "id": 11,
+      "title": "Tokens / min by type (codex)",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 20},
+      "fieldConfig": {
+        "defaults": {
+          "color": {"mode": "palette-classic"},
+          "custom": {
+            "drawStyle": "line",
+            "fillOpacity": 25,
+            "lineWidth": 2,
+            "stacking": {"mode": "none"}
+          },
+          "unit": "short"
+        }
+      },
+      "options": {
+        "legend": {"displayMode": "list", "placement": "bottom", "showLegend": true},
+        "tooltip": {"mode": "multi", "sort": "desc"}
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum(rate(openclaw_codex_input_tokens_total{provider=\"openai-codex\"}[5m])) * 60",
+          "legendFormat": "input",
+          "refId": "A"
+        },
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum(rate(openclaw_codex_output_tokens_total{provider=\"openai-codex\"}[5m])) * 60",
+          "legendFormat": "output",
+          "refId": "B"
+        },
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum(rate(openclaw_codex_cache_read_tokens_total{provider=\"openai-codex\"}[5m])) * 60",
+          "legendFormat": "cache_read",
+          "refId": "C"
+        }
+      ]
+    },
+    {
+      "type": "bargauge",
+      "id": 12,
+      "title": "Messages / 5h by model",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 8, "w": 12, "x": 12, "y": 20},
+      "options": {
+        "displayMode": "gradient",
+        "orientation": "horizontal",
+        "showUnfilled": true,
+        "reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
+      },
+      "fieldConfig": {
+        "defaults": {
+          "min": 0,
+          "decimals": 0,
+          "unit": "short",
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [
+              {"color": "green", "value": null},
+              {"color": "yellow", "value": 100},
+              {"color": "orange", "value": 500},
+              {"color": "red", "value": 1000}
+            ]
+          }
+        }
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum by (provider, model) (increase(openclaw_codex_messages_total[5h]))",
+          "legendFormat": "{{provider}}/{{model}}",
+          "refId": "A"
+        }
+      ]
+    },
+    {
+      "type": "row",
+      "id": 300,
+      "title": "Errors",
+      "gridPos": {"h": 1, "w": 24, "x": 0, "y": 28},
+      "collapsed": false,
+      "panels": []
+    },
+    {
+      "type": "table",
+      "id": 20,
+      "title": "Recent errors by model and reason",
+      "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+      "gridPos": {"h": 8, "w": 24, "x": 0, "y": 29},
+      "options": {
+        "showHeader": true
+      },
+      "fieldConfig": {
+        "defaults": {
+          "custom": {"align": "auto", "displayMode": "auto"}
+        },
+        "overrides": [
+          {
+            "matcher": {"id": "byName", "options": "Value"},
+            "properties": [
+              {"id": "displayName", "value": "Errors (24h)"},
+              {"id": "custom.displayMode", "value": "color-background"},
+              {
+                "id": "thresholds",
+                "value": {
+                  "mode": "absolute",
+                  "steps": [
+                    {"color": "green", "value": null},
+                    {"color": "yellow", "value": 1},
+                    {"color": "red", "value": 10}
+                  ]
+                }
+              }
+            ]
+          }
+        ]
+      },
+      "targets": [
+        {
+          "datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
+          "expr": "sum by (provider, model, reason) (increase(openclaw_codex_message_errors_total[24h])) > 0",
+          "format": "table",
+          "instant": true,
+          "refId": "A"
+        }
+      ],
+      "transformations": [
+        {
+          "id": "organize",
+          "options": {
+            "excludeByName": {"Time": true, "__name__": true, "instance": true, "job": true, "namespace": true, "pod": true, "app": true},
+            "indexByName": {"provider": 0, "model": 1, "reason": 2, "Value": 3},
+            "renameByName": {}
+          }
+        }
+      ]
+    }
+  ]
+}
--- a/stacks/monitoring/modules/monitoring/dashboards/wealth.json
+++ b/stacks/monitoring/modules/monitoring/dashboards/wealth.json
--- a/stacks/monitoring/modules/monitoring/grafana.tf
+++ b/stacks/monitoring/modules/monitoring/grafana.tf
@ -134,6 +134,7 @@ locals {
    # Applications
    "qbittorrent.json"        = "Applications"
    "realestate-crawler.json" = "Applications"
+    "openclaw.json"           = "Applications"
    "uk-payslip.json"         = "Finance (Personal)"
    "wealth.json"             = "Finance (Personal)"
    "job-hunter.json"         = "Finance"
--- a/stacks/monitoring/modules/monitoring/main.tf
+++ b/stacks/monitoring/modules/monitoring/main.tf
@ -41,6 +41,11 @@ variable "registry_password" {
  type      = string
  sensitive = true
 }
+variable "forgejo_pull_token" {
+  type        = string
+  sensitive   = true
+  description = "PAT for the cluster-puller user, used by the Forgejo registry integrity probe."
+}

 resource "kubernetes_namespace" "monitoring" {
  metadata {
@ -238,27 +243,42 @@ resource "kubernetes_cron_job_v1" "dns_anomaly_monitor" {
 }

 # -----------------------------------------------------------------------------
-# Registry manifest-integrity probe — HEADs every tag in the private R/W
-# registry's catalog, walks multi-platform image indexes, and reports blob
-# availability. Catches the orphan-index failure mode seen 2026-04-13 and
-# 2026-04-19 before downstream pipelines hit it.
+# Phase 4 of forgejo-registry-consolidation 2026-05-07: registry-private
+# decommissioned. The integrity probe below caught the orphan-index failure
+# mode in `registry:2.8.3` (post-mortem 2026-04-19). With that engine
+# retired, the probe is replaced by `forgejo_integrity_probe` below.
+#
+# Resource definitions stripped wholesale — terragrunt apply destroys the
+# in-cluster CronJob + Secret on the next run.
 # See: docs/post-mortems/2026-04-19-registry-orphan-index.md
 # -----------------------------------------------------------------------------
-resource "kubernetes_secret" "registry_probe_credentials" {
+
+# -----------------------------------------------------------------------------
+# Forgejo registry integrity probe — same algorithm as registry-integrity-probe
+# above, but targets the Forgejo OCI registry instead of registry-private. Runs
+# in parallel with the existing probe during the dual-push bake; once Phase 4
+# decommissions registry-private, the registry-integrity-probe CronJob is
+# deleted and only this one remains.
+#
+# Auth: HTTP Basic with cluster-puller PAT (read:package scope is enough to
+# walk catalog + manifests). Reaches Forgejo via the in-cluster service so we
+# don't hairpin out through Traefik for every probe run.
+# -----------------------------------------------------------------------------
+resource "kubernetes_secret" "forgejo_probe_credentials" {
  metadata {
-    name      = "registry-probe-credentials"
+    name      = "forgejo-probe-credentials"
    namespace = kubernetes_namespace.monitoring.metadata[0].name
  }
  type = "Opaque"
  data = {
-    REG_USER = var.registry_user
-    REG_PASS = var.registry_password
+    REG_USER = "cluster-puller"
+    REG_PASS = var.forgejo_pull_token
  }
 }

-resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
+resource "kubernetes_cron_job_v1" "forgejo_integrity_probe" {
  metadata {
-    name      = "registry-integrity-probe"
+    name      = "forgejo-integrity-probe"
    namespace = kubernetes_namespace.monitoring.metadata[0].name
  }
  spec {
@ -275,13 +295,13 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
          metadata {}
          spec {
            container {
-              name  = "registry-integrity-probe"
+              name  = "forgejo-integrity-probe"
              image = "docker.io/library/alpine:3.20"
              env {
                name = "REG_USER"
                value_from {
                  secret_key_ref {
-                    name = kubernetes_secret.registry_probe_credentials.metadata[0].name
+                    name = kubernetes_secret.forgejo_probe_credentials.metadata[0].name
                    key  = "REG_USER"
                  }
                }
@ -290,22 +310,26 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
                name = "REG_PASS"
                value_from {
                  secret_key_ref {
-                    name = kubernetes_secret.registry_probe_credentials.metadata[0].name
+                    name = kubernetes_secret.forgejo_probe_credentials.metadata[0].name
                    key  = "REG_PASS"
                  }
                }
              }
              env {
                name  = "REGISTRY_HOST"
-                value = "10.0.20.10:5050"
+                value = "forgejo.forgejo.svc.cluster.local"
+              }
+              env {
+                name  = "REGISTRY_SCHEME"
+                value = "http"
              }
              env {
                name  = "REGISTRY_INSTANCE"
-                value = "registry.viktorbarzin.me:5050"
+                value = "forgejo.viktorbarzin.me"
              }
              env {
                name  = "PUSHGATEWAY"
-                value = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/registry-integrity-probe"
+                value = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/forgejo-integrity-probe"
              }
              env {
                name  = "TAGS_PER_REPO"
@ -316,16 +340,16 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
                apk add --no-cache curl jq >/dev/null

                REG="$REGISTRY_HOST"
+                SCHEME="$${REGISTRY_SCHEME:-https}"
                INSTANCE="$REGISTRY_INSTANCE"
                AUTH="$REG_USER:$REG_PASS"
                ACCEPT='application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json,application/vnd.docker.distribution.manifest.list.v2+json,application/vnd.docker.distribution.manifest.v2+json'

                push() {
-                  # Prometheus pushgateway — body ends with blank line. Ignore push errors.
                  curl -sf --max-time 10 --data-binary @- "$PUSHGATEWAY" >/dev/null 2>&1 || true
                }

-                CATALOG=$(curl -sk -u "$AUTH" --max-time 30 "https://$REG/v2/_catalog?n=1000" || echo "")
+                CATALOG=$(curl -sk -u "$AUTH" --max-time 30 "$SCHEME://$REG/v2/_catalog?n=1000" || echo "")
                REPOS=$(echo "$CATALOG" | jq -r '.repositories[]?' 2>/dev/null || echo "")

                if [ -z "$REPOS" ]; then
@ -350,7 +374,7 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
                  [ -z "$repo" ] && continue
                  REPOS_N=$((REPOS_N + 1))

-                  TAGS_JSON=$(curl -sk -u "$AUTH" --max-time 15 "https://$REG/v2/$repo/tags/list" || echo "")
+                  TAGS_JSON=$(curl -sk -u "$AUTH" --max-time 15 "$SCHEME://$REG/v2/$repo/tags/list" || echo "")
                  echo "$TAGS_JSON" | jq -r '.tags[]?' 2>/dev/null | tail -n "$TAGS_PER_REPO" > /tmp/tags.txt || true

                  while IFS= read -r tag; do
@ -359,7 +383,7 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {

                    HTTP=$(curl -sk -u "$AUTH" -o /tmp/m.json -w '%%{http_code}' \
                      -H "Accept: $ACCEPT" --max-time 15 \
-                      "https://$REG/v2/$repo/manifests/$tag")
+                      "$SCHEME://$REG/v2/$repo/manifests/$tag")
                    if [ "$HTTP" != "200" ]; then
                      echo "FAIL: $repo:$tag manifest HTTP $HTTP"
                      FAIL=$((FAIL + 1))
@ -374,7 +398,7 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
                        [ -z "$d" ] && continue
                        CH=$(curl -sk -u "$AUTH" -o /dev/null -w '%%{http_code}' \
                          -H "Accept: $ACCEPT" --max-time 10 -I \
-                          "https://$REG/v2/$repo/manifests/$d")
+                          "$SCHEME://$REG/v2/$repo/manifests/$d")
                        if [ "$CH" != "200" ]; then
                          echo "FAIL: $repo:$tag index child $d HTTP $CH"
                          FAIL=$((FAIL + 1))
--- a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
+++ b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
@ -1656,22 +1656,22 @@ serverFiles:
            labels:
              severity: critical
            annotations:
-              summary: "Registry has {{ $value }} broken manifest reference(s) — orphan index or missing blob"
-              description: "The registry-integrity-probe CronJob in the monitoring namespace found {{ $value }} manifest/blob references that return non-200 on the private registry. Almost certainly an orphan OCI-index child from the cleanup-tags.sh+GC race. Rebuild the affected image per docs/runbooks/registry-rebuild-image.md and investigate which tag(s) the probe logs flagged."
+              summary: "{{ $labels.instance }}: {{ $value }} broken manifest reference(s) — orphan index or missing blob"
+              description: "The forgejo-integrity-probe CronJob found {{ $value }} manifest/blob references that return non-200 on {{ $labels.instance }}. Rebuild the affected image per docs/runbooks/forgejo-registry-rebuild-image.md. (registry.viktorbarzin.me retired Phase 4 of forgejo-registry-consolidation 2026-05-07 — only forgejo.viktorbarzin.me remains.)"
          - alert: RegistryIntegrityProbeStale
            expr: time() - registry_manifest_integrity_last_run_timestamp > 3600
            for: 15m
            labels:
              severity: warning
            annotations:
-              summary: "Registry integrity probe has not reported in >1h — CronJob may be broken"
+              summary: "{{ $labels.instance }} integrity probe has not reported in >1h — CronJob may be broken"
          - alert: RegistryCatalogInaccessible
            expr: registry_manifest_integrity_catalog_accessible == 0
            for: 15m
            labels:
              severity: critical
            annotations:
-              summary: "Registry probe cannot fetch /v2/_catalog — auth failure or registry down"
+              summary: "{{ $labels.instance }} probe cannot fetch /v2/_catalog — auth failure or registry down"
          - alert: NodeHighCPUUsage
            expr: pve_cpu_usage_ratio * 100 > 60
            for: 6h
--- a/stacks/openclaw/files/exporter.py
+++ b/stacks/openclaw/files/exporter.py
@ -0,0 +1,264 @@
+#!/usr/bin/env python3
+"""OpenClaw / Codex usage exporter.
+
+Reads ~/.openclaw/agents/*/sessions/*.jsonl (assistant messages with usage)
+and ~/.openclaw/agents/*/agent/auth-state.json (OAuth profiles), then exposes
+Prometheus text-format metrics on :9099/metrics. Stdlib only — no pip install
+needed at startup.
+
+Metrics (all cumulative-since-session-start; use Prometheus increase()/rate()
+for windowed views):
+
+  openclaw_codex_messages_total{provider,model,session_kind}    counter
+  openclaw_codex_input_tokens_total{provider,model}             counter
+  openclaw_codex_output_tokens_total{provider,model}            counter
+  openclaw_codex_cache_read_tokens_total{provider,model}        counter
+  openclaw_codex_cache_write_tokens_total{provider,model}       counter
+  openclaw_codex_message_errors_total{provider,model,reason}    counter
+  openclaw_codex_active_sessions{kind}                          gauge
+  openclaw_codex_oauth_expiry_seconds{provider,account}         gauge
+  openclaw_codex_last_run_timestamp                             gauge
+  openclaw_codex_exporter_scrape_duration_ms                    gauge
+"""
+import glob
+import json
+import os
+import re
+import time
+from datetime import datetime
+from http.server import BaseHTTPRequestHandler, HTTPServer
+from threading import Lock
+
+OPENCLAW_HOME = os.environ.get("OPENCLAW_HOME", "/home/node/.openclaw")
+PORT = int(os.environ.get("METRICS_PORT", "9099"))
+CACHE_SEC = float(os.environ.get("CACHE_SEC", "5"))
+SKIP_FRAGMENTS = (".broken.", ".reset.", ".deleted.", ".bak.")
+SESSION_RE = re.compile(r"^([0-9a-f-]{36})\.jsonl$")
+
+_lock = Lock()
+_cache = {"text": "", "ts": 0.0}
+
+
+def _esc(value: str) -> str:
+    return str(value).replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n")
+
+
+def _line(name: str, labels: dict, value) -> str:
+    if labels:
+        rendered = ",".join(f'{k}="{_esc(v)}"' for k, v in sorted(labels.items()))
+        return f"{name}{{{rendered}}} {value}"
+    return f"{name} {value}"
+
+
+def _kind_for(session_id: str, sessions_index: dict) -> str:
+    for key, val in sessions_index.items():
+        if val.get("sessionId") != session_id:
+            continue
+        if key.startswith("agent:main:cron:"):
+            return "cron"
+        if key.startswith("telegram:slash:"):
+            return "telegram-slash"
+        if key.startswith("agent:main:"):
+            return "main"
+        surface = (val.get("origin") or {}).get("surface")
+        if surface:
+            return surface
+        return key.split(":", 1)[0]
+    return "unknown"
+
+
+def _parse_ts(value):
+    if isinstance(value, (int, float)):
+        return float(value)
+    if isinstance(value, str):
+        try:
+            return datetime.fromisoformat(value.replace("Z", "+00:00")).timestamp()
+        except ValueError:
+            return 0.0
+    return 0.0
+
+
+def _build_text() -> str:
+    start = time.monotonic()
+    out = []
+
+    sessions_index: dict = {}
+    for sp in glob.glob(os.path.join(OPENCLAW_HOME, "agents/*/sessions/sessions.json")):
+        try:
+            with open(sp) as f:
+                sessions_index.update(json.load(f))
+        except Exception:
+            pass
+
+    msg_count: dict = {}
+    in_tok: dict = {}
+    out_tok: dict = {}
+    cr_tok: dict = {}
+    cw_tok: dict = {}
+    err_count: dict = {}
+    latest_ts = 0.0
+
+    for jsonl in glob.glob(os.path.join(OPENCLAW_HOME, "agents/*/sessions/*.jsonl")):
+        bn = os.path.basename(jsonl)
+        if any(s in bn for s in SKIP_FRAGMENTS):
+            continue
+        m = SESSION_RE.match(bn)
+        if not m:
+            continue
+        sid = m.group(1)
+        kind = _kind_for(sid, sessions_index)
+        try:
+            with open(jsonl) as f:
+                for line in f:
+                    line = line.strip()
+                    if not line:
+                        continue
+                    try:
+                        obj = json.loads(line)
+                    except Exception:
+                        continue
+                    if obj.get("type") != "message":
+                        continue
+                    msg = obj.get("message") or {}
+                    if msg.get("role") != "assistant":
+                        continue
+                    provider = msg.get("provider") or "unknown"
+                    model = msg.get("model") or "unknown"
+                    usage = msg.get("usage") or {}
+                    ts = _parse_ts(obj.get("timestamp"))
+                    if ts > latest_ts:
+                        latest_ts = ts
+                    if msg.get("stopReason") == "error":
+                        reason = (msg.get("errorMessage") or "unknown")[:80]
+                        ek = (provider, model, reason)
+                        err_count[ek] = err_count.get(ek, 0) + 1
+                        continue
+                    mk = (provider, model, kind)
+                    msg_count[mk] = msg_count.get(mk, 0) + 1
+                    pm = (provider, model)
+                    in_tok[pm] = in_tok.get(pm, 0) + (usage.get("input") or 0)
+                    out_tok[pm] = out_tok.get(pm, 0) + (usage.get("output") or 0)
+                    cr_tok[pm] = cr_tok.get(pm, 0) + (usage.get("cacheRead") or 0)
+                    cw_tok[pm] = cw_tok.get(pm, 0) + (usage.get("cacheWrite") or 0)
+        except Exception:
+            pass
+
+    out.append("# HELP openclaw_codex_messages_total Cumulative assistant messages")
+    out.append("# TYPE openclaw_codex_messages_total counter")
+    for (p, mdl, k), c in msg_count.items():
+        out.append(_line("openclaw_codex_messages_total",
+                         {"provider": p, "model": mdl, "session_kind": k}, c))
+
+    for name, src, hlp in [
+        ("openclaw_codex_input_tokens_total", in_tok, "Cumulative input tokens"),
+        ("openclaw_codex_output_tokens_total", out_tok, "Cumulative output tokens"),
+        ("openclaw_codex_cache_read_tokens_total", cr_tok, "Cumulative cache-read tokens"),
+        ("openclaw_codex_cache_write_tokens_total", cw_tok, "Cumulative cache-write tokens"),
+    ]:
+        out.append(f"# HELP {name} {hlp}")
+        out.append(f"# TYPE {name} counter")
+        for (p, mdl), c in src.items():
+            out.append(_line(name, {"provider": p, "model": mdl}, c))
+
+    out.append("# HELP openclaw_codex_message_errors_total Cumulative assistant errors")
+    out.append("# TYPE openclaw_codex_message_errors_total counter")
+    for (p, mdl, r), c in err_count.items():
+        out.append(_line("openclaw_codex_message_errors_total",
+                         {"provider": p, "model": mdl, "reason": r}, c))
+
+    out.append("# HELP openclaw_codex_active_sessions Active sessions in sessions.json")
+    out.append("# TYPE openclaw_codex_active_sessions gauge")
+    kc: dict = {}
+    for k in sessions_index:
+        if k.startswith("agent:main:cron:"):
+            kk = "cron"
+        elif k.startswith("telegram:slash:"):
+            kk = "telegram-slash"
+        elif k.startswith("agent:main:"):
+            kk = "main"
+        else:
+            kk = k.split(":", 1)[0]
+        kc[kk] = kc.get(kk, 0) + 1
+    for k, c in kc.items():
+        out.append(_line("openclaw_codex_active_sessions", {"kind": k}, c))
+
+    if latest_ts:
+        out.append("# HELP openclaw_codex_last_run_timestamp Unix ts of newest assistant message")
+        out.append("# TYPE openclaw_codex_last_run_timestamp gauge")
+        out.append(_line("openclaw_codex_last_run_timestamp", {}, latest_ts))
+
+    out.append("# HELP openclaw_codex_oauth_expiry_seconds Seconds until OAuth token expires")
+    out.append("# TYPE openclaw_codex_oauth_expiry_seconds gauge")
+    now = time.time()
+    for af in glob.glob(os.path.join(OPENCLAW_HOME, "agents/*/agent/auth-profiles.json")):
+        try:
+            with open(af) as f:
+                data = json.load(f)
+        except Exception:
+            continue
+        # Schema: {"version": 1, "profiles": {"<id>": {...}}}.
+        # `expires` is Unix milliseconds.
+        for profile in (data.get("profiles") or {}).values():
+            exp_ms = profile.get("expires")
+            if not isinstance(exp_ms, (int, float)):
+                continue
+            exp_ts = exp_ms / 1000.0
+            out.append(_line(
+                "openclaw_codex_oauth_expiry_seconds",
+                {
+                    "provider": profile.get("provider", "unknown"),
+                    "account": profile.get("email") or profile.get("account") or "unknown",
+                    "plan": profile.get("chatgptPlanType") or "unknown",
+                },
+                max(0, exp_ts - now),
+            ))
+
+    out.append("# HELP openclaw_codex_exporter_scrape_duration_ms Last scrape duration ms")
+    out.append("# TYPE openclaw_codex_exporter_scrape_duration_ms gauge")
+    out.append(_line("openclaw_codex_exporter_scrape_duration_ms", {},
+                     (time.monotonic() - start) * 1000))
+
+    return "\n".join(out) + "\n"
+
+
+class Handler(BaseHTTPRequestHandler):
+    def do_GET(self):
+        if self.path == "/healthz":
+            self.send_response(200)
+            self.send_header("Content-Type", "text/plain")
+            self.end_headers()
+            self.wfile.write(b"ok\n")
+            return
+        if self.path != "/metrics":
+            self.send_response(404)
+            self.end_headers()
+            return
+        with _lock:
+            now = time.time()
+            if now - _cache["ts"] > CACHE_SEC:
+                try:
+                    _cache["text"] = _build_text()
+                except Exception as exc:  # noqa: BLE001
+                    _cache["text"] = (
+                        f'openclaw_codex_exporter_errors_total{{kind="scrape"}} 1\n'
+                        f'# scrape error: {_esc(str(exc))[:200]}\n'
+                    )
+                _cache["ts"] = now
+            body = _cache["text"].encode()
+        self.send_response(200)
+        self.send_header("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
+        self.send_header("Content-Length", str(len(body)))
+        self.end_headers()
+        self.wfile.write(body)
+
+    def log_message(self, *args, **kwargs):
+        pass
+
+
+def main():
+    print(f"openclaw exporter listening on :{PORT}", flush=True)
+    HTTPServer(("0.0.0.0", PORT), Handler).serve_forever()
+
+
+if __name__ == "__main__":
+    main()
--- a/stacks/openclaw/main.tf
+++ b/stacks/openclaw/main.tf
@ -131,8 +131,12 @@ resource "kubernetes_config_map" "openclaw_config" {
            mode = "off"
          }
          model = {
-            primary   = "nim/qwen/qwen3.5-397b-a17b"
-            fallbacks = ["nim/mistralai/mistral-large-3-675b-instruct-2512", "nim/nvidia/llama-3.1-nemotron-ultra-253b-v1", "modelrelay/auto-fastest"]
+            # ChatGPT Plus OAuth via openai-codex plugin (account: ancaelena98@gmail.com).
+            # gpt-5.4-mini is the only mini variant the Codex backend accepts for Plus tier;
+            # gpt-5-mini / gpt-5.1-codex-mini return model_not_found / "not supported with
+            # ChatGPT account". Plus rate-card: 1,200–7,000 local msgs / 5h on gpt-5.4-mini.
+            primary   = "openai-codex/gpt-5.4-mini"
+            fallbacks = ["openai-codex/gpt-5.5", "nim/qwen/qwen3-coder-480b-a35b-instruct", "modelrelay/auto-fastest"]
          }
          models = {
            "modelrelay/auto-fastest"                                = {}
@ -146,6 +150,8 @@ resource "kubernetes_config_map" "openclaw_config" {
            "llama-as-openai/Llama-4-Scout-17B-16E-Instruct-FP8"     = {}
            "openrouter/stepfun/step-3.5-flash:free"                 = {}
            "openrouter/arcee-ai/trinity-large-preview:free"         = {}
+            "openai-codex/gpt-5.4-mini"                              = {}
+            "openai-codex/gpt-5.5"                                   = {}
          }
        }
      }
@ -255,6 +261,19 @@ resource "random_password" "gateway_token" {
  special = false
 }

+# Prometheus exporter script — read by the openclaw-exporter sidecar.
+# Stdlib-only Python so no pip install at startup. Reads sessions JSONL +
+# auth-profiles.json from the NFS-backed openclaw home volume (mounted ro).
+resource "kubernetes_config_map" "openclaw_exporter" {
+  metadata {
+    name      = "openclaw-exporter"
+    namespace = kubernetes_namespace.openclaw.metadata[0].name
+  }
+  data = {
+    "exporter.py" = file("${path.module}/files/exporter.py")
+  }
+}
+
 module "nfs_tools_host" {
  source     = "../../modules/kubernetes/nfs_volume"
  name       = "openclaw-tools-host"
@ -344,6 +363,11 @@ resource "kubernetes_deployment" "openclaw" {
        }
        annotations = {
          "reloader.stakater.com/search" = "true"
+          # Prometheus auto-discovers pods with these annotations.
+          # Scraped by the openclaw-exporter sidecar — exposes /metrics on :9099.
+          "prometheus.io/scrape" = "true"
+          "prometheus.io/port"   = "9099"
+          "prometheus.io/path"   = "/metrics"
        }
      }
      spec {
@ -383,8 +407,10 @@ resource "kubernetes_deployment" "openclaw" {
        # Main container: OpenClaw
        container {
          name    = "openclaw"
-          image   = "ghcr.io/openclaw/openclaw:2026.2.26"
-          command = ["sh", "-c", "node openclaw.mjs doctor --fix 2>/dev/null; exec node openclaw.mjs gateway --allow-unconfigured --bind lan"]
+          image   = "ghcr.io/openclaw/openclaw:2026.5.4"
+          # Doctor --fix auto-promotes the highest-tier codex model (gpt-5-pro) after
+          # auth-profile-based model discovery; pin gpt-5.4-mini back to default after it.
+          command = ["sh", "-c", "node openclaw.mjs doctor --fix 2>/dev/null; node openclaw.mjs models set openai-codex/gpt-5.4-mini 2>/dev/null; exec node openclaw.mjs gateway --allow-unconfigured --bind lan"]
          port {
            container_port = 18789
          }
@ -510,6 +536,54 @@ resource "kubernetes_deployment" "openclaw" {
          }
        }

+        # Sidecar: openclaw-exporter — Prometheus exporter for Codex/OAuth usage.
+        # Reads sessions JSONL files + auth-profiles.json, exposes /metrics on :9099.
+        # Stdlib-only Python; no pip install at startup.
+        container {
+          name    = "openclaw-exporter"
+          image   = "docker.io/library/python:3.12-slim"
+          command = ["python3", "/scripts/exporter.py"]
+          port {
+            container_port = 9099
+            name           = "metrics"
+          }
+          env {
+            name  = "OPENCLAW_HOME"
+            value = "/home/node/.openclaw"
+          }
+          env {
+            name  = "METRICS_PORT"
+            value = "9099"
+          }
+          volume_mount {
+            name       = "openclaw-exporter-script"
+            mount_path = "/scripts"
+            read_only  = true
+          }
+          volume_mount {
+            name       = "openclaw-home"
+            mount_path = "/home/node/.openclaw"
+            read_only  = true
+          }
+          readiness_probe {
+            http_get {
+              path = "/healthz"
+              port = 9099
+            }
+            initial_delay_seconds = 5
+            period_seconds        = 30
+          }
+          resources {
+            requests = {
+              cpu    = "10m"
+              memory = "64Mi"
+            }
+            limits = {
+              memory = "128Mi"
+            }
+          }
+        }
+
        # Sidecar: modelrelay — auto-routes to fastest healthy free model
        container {
          name  = "modelrelay"
@ -598,6 +672,13 @@ resource "kubernetes_deployment" "openclaw" {
            name = kubernetes_config_map.openclaw_config.metadata[0].name
          }
        }
+        volume {
+          name = "openclaw-exporter-script"
+          config_map {
+            name         = kubernetes_config_map.openclaw_exporter.metadata[0].name
+            default_mode = "0555"
+          }
+        }
      }
    }
  }
--- a/stacks/payslip-ingest/main.tf
+++ b/stacks/payslip-ingest/main.tf
@ -8,7 +8,10 @@ variable "postgresql_host" { type = string }

 locals {
  namespace = "payslip-ingest"
-  image     = "registry.viktorbarzin.me/payslip-ingest:${var.image_tag}"
+  # Phase 3 of forgejo-registry-consolidation — image= flipped to Forgejo
+  # 2026-05-07. registry-private kept image at the same path, so the new
+  # Forgejo URL is `viktor/<name>` under forgejo.viktorbarzin.me.
+  image = "forgejo.viktorbarzin.me/viktor/payslip-ingest:${var.image_tag}"
  labels = {
    app = "payslip-ingest"
  }
--- a/stacks/traefik/modules/traefik/main.tf
+++ b/stacks/traefik/modules/traefik/main.tf
@ -307,6 +307,12 @@ resource "kubernetes_config_map" "bot_block_proxy_config" {
      server {
          listen 8080;
          location /auth {
+              access_by_lua_block {
+                  ngx.req.clear_header("If-Match")
+                  ngx.req.clear_header("If-None-Match")
+                  ngx.req.clear_header("If-Modified-Since")
+                  ngx.req.clear_header("If-Unmodified-Since")
+              }
              proxy_pass http://poison_fountain;
              proxy_connect_timeout 3s;
              proxy_read_timeout 5s;
@ -373,7 +379,7 @@ resource "kubernetes_deployment" "bot_block_proxy" {
        }
        container {
          name  = "nginx"
-          image = "nginx:1-alpine"
+          image = "openresty/openresty:alpine"

          port {
            container_port = 8080
--- a/stacks/wealthfolio/main.tf
+++ b/stacks/wealthfolio/main.tf
@ -515,7 +515,12 @@ resource "kubernetes_cron_job_v1" "wealthfolio_sync" {
            }
            container {
              name  = "sync"
-              image = "registry.viktorbarzin.me/wealthfolio-sync:latest"
+              # Phase 4 of forgejo-registry-consolidation 2026-05-07 +
+              # post-cutover wealthfolio-sync rebuild: image is now
+              # produced by /home/wizard/code/broker-sync (Forgejo
+              # viktor/broker-sync, DockerHub viktorbarzin/broker-sync,
+              # Forgejo viktor/wealthfolio-sync as the cluster pull path).
+              image = "forgejo.viktorbarzin.me/viktor/wealthfolio-sync:latest"
              env {
                name = "IMAP_HOST"
                value_from {
--- a/stacks/woodpecker/main.tf
+++ b/stacks/woodpecker/main.tf
@ -172,6 +172,31 @@ resource "helm_release" "woodpecker" {
  depends_on = [kubernetes_manifest.db_external_secret]
 }

+# Patch hostAliases onto the woodpecker-server StatefulSet — the chart 3.5.1
+# does NOT expose this field, so we have to do it after the helm release.
+# Keeps the OAuth/forge-API path off the WAN gateway (forgejo.viktorbarzin.me
+# resolves to the public IP via DNS, which round-trips through Cloudflare
+# and routinely tripped 30s context-deadline timeouts when fetching pipeline
+# config). 10.0.20.200 is the Traefik LB that fronts forgejo internally;
+# Traefik serves the *.viktorbarzin.me wildcard so SNI verification still
+# passes.
+resource "null_resource" "woodpecker_server_host_alias" {
+  triggers = {
+    helm_revision = helm_release.woodpecker.metadata[0].revision
+  }
+
+  provisioner "local-exec" {
+    command     = <<-BASH
+      set -euo pipefail
+      kubectl -n woodpecker patch statefulset/woodpecker-server --type=strategic --patch '{"spec":{"template":{"spec":{"hostAliases":[{"ip":"10.0.20.200","hostnames":["forgejo.viktorbarzin.me"]}]}}}}'
+      kubectl -n woodpecker rollout status statefulset/woodpecker-server --timeout=120s
+    BASH
+    interpreter = ["/bin/bash", "-c"]
+  }
+
+  depends_on = [helm_release.woodpecker]
+}
+
 # ClusterRoleBinding - build pods need cluster-admin to PATCH deployments across namespaces
 resource "kubernetes_cluster_role_binding" "woodpecker" {
  metadata {
--- a/stacks/woodpecker/values.yaml
+++ b/stacks/woodpecker/values.yaml
@ -4,10 +4,19 @@ server:
    reloader.stakater.com/search: "true"
  statefulSet:
    replicaCount: 1
+  # NOTE: hostAliases is NOT exposed by the woodpecker Helm chart (3.5.1 verified) —
+  # see main.tf null_resource.woodpecker_server_host_alias which applies the same
+  # via `kubectl patch` post-helm. Pinned to the in-cluster Traefik LB
+  # (10.0.20.200) so the forge-API fetch path never round-trips through
+  # Cloudflare ("context deadline exceeded" was failing every Forgejo
+  # pipeline trigger).
  image:
    registry: docker.io
    repository: woodpeckerci/woodpecker-server
-    tag: "v3.13.0"
+    # Bumped 2026-05-07 from v3.13.0 → v3.14.0 to fix the
+    # "could not load config from forge: context deadline exceeded"
+    # issue when fetching .woodpecker.yml from Forgejo.
+    tag: "v3.14.0"
  extraSecretNamesForEnvFrom:
    - woodpecker-db-creds
  env:
@ -27,6 +36,14 @@ server:
    WOODPECKER_FORGEJO_CLIENT: "${forgejo_client_id}"
    WOODPECKER_FORGEJO_SECRET: "${forgejo_client_secret}"
    WOODPECKER_FORGEJO_URL: "${forgejo_url}"
+    # Default is 3s (cmd/server/flags.go @ default `--forge-timeout`).
+    # Forgejo responses on this cluster spike to 1-2s under load and the
+    # config-loader makes 4-6 sequential calls (.woodpecker dir, .woodpecker.yaml,
+    # .woodpecker.yml, raw .woodpecker/build.yml, etc.); occasionally the cumulative
+    # overhead trips the 3s deadline → "could not load config from forge: context
+    # deadline exceeded" on every pipeline. 30s removes the false-positive timeouts
+    # without regressing the legitimate-failure detection window meaningfully.
+    WOODPECKER_FORGE_TIMEOUT: "30s"
  service:
    type: ClusterIP
    port: 80
@ -46,7 +63,7 @@ agent:
  image:
    registry: docker.io
    repository: woodpeckerci/woodpecker-agent
-    tag: "v3.13.0"
+    tag: "v3.14.0"
  env:
    WOODPECKER_BACKEND: "kubernetes"
    WOODPECKER_BACKEND_K8S_NAMESPACE: "woodpecker"
--- a/state/stacks/vault/terraform.tfstate.enc
+++ b/state/stacks/vault/terraform.tfstate.enc