Compare commits
42 commits
813148c4af
...
afd78f8d3e
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
afd78f8d3e | ||
|
|
4518aff71c | ||
|
|
d832a33039 | ||
|
|
afafc9928f | ||
|
|
5b255cf6f2 | ||
|
|
108bef7b1a | ||
|
|
e110b40a4a | ||
|
|
84fd752747 | ||
|
|
f1d69b0a7a | ||
|
|
d942a21d93 | ||
|
|
8c73a0243a | ||
|
|
59885c21d0 | ||
|
|
3f3e5fc954 | ||
|
|
56fbd281c9 | ||
|
|
a91bbe189e | ||
|
|
4ec40ea804 | ||
|
|
e86efd107a | ||
|
|
874f80ecbe | ||
|
|
ff19d86557 | ||
|
|
a0b70482fe | ||
|
|
83496f6e0c | ||
|
|
76d2d0e536 | ||
|
|
413ceec35c | ||
|
|
3fb05825d8 | ||
|
|
d67e8ddaf8 | ||
|
|
a3024d1f51 | ||
|
|
fbb41eff9d | ||
|
|
70ea1cf6fd | ||
|
|
f793a5f50b | ||
|
|
00614a3302 | ||
|
|
18d96712c7 | ||
|
|
8146d05191 | ||
|
|
f18cd1d314 | ||
|
|
41655096c7 | ||
|
|
115ca184ff | ||
|
|
574cdf08d2 | ||
|
|
f90d79ed4e | ||
|
|
8b180f7662 | ||
|
|
f006b48566 | ||
|
|
0f107aeacb | ||
|
|
87069ae5c3 | ||
|
|
da7a11eb3b |
75 changed files with 9635 additions and 2415 deletions
|
|
@ -30,7 +30,7 @@ Violations cause state drift, which causes future applies to break or silently r
|
|||
- **New service**: Use `setup-project` skill for full workflow
|
||||
- **Ingress**: `ingress_factory` module. Auth: `protected = true`. Anti-AI: on by default. **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`.
|
||||
- **Docker images**: Always build for `linux/amd64`. Use 8-char git SHA tags — `:latest` causes stale pull-through cache.
|
||||
- **Private registry**: `registry.viktorbarzin.me` (htpasswd auth, credentials in Vault `secret/viktor`). Use `image: registry.viktorbarzin.me/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the secret to all namespaces. Build & push from registry VM (`10.0.20.10`). Containerd `hosts.toml` redirects pulls to LAN IP directly. Web UI at `docker.viktorbarzin.me` (Authentik-protected). Engine pinned to `registry:2.8.3` (see post-mortem 2026-04-19); on-VM configs deploy via `.woodpecker/registry-config-sync.yml`; integrity probed every 15m by `registry-integrity-probe` CronJob in `monitoring` ns — the HTTP API is the authoritative integrity check, NOT `/blobs/*/data` presence (revision-link absence is the real failure mode).
|
||||
- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.200` to avoid hairpin NAT. Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest`; integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
|
||||
- **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts.
|
||||
- **Node memory changes**: When changing VM memory on any k8s node, update kubelet `systemReserved`, `kubeReserved`, and eviction thresholds accordingly. Config: `/var/lib/kubelet/config.yaml`. Template: `stacks/infra/main.tf`. Current values: systemReserved=512Mi, kubeReserved=512Mi, evictionHard=500Mi, evictionSoft=1Gi.
|
||||
- **Node OS disk tuning** (in `stacks/infra/main.tf`): kubelet `imageGCHighThresholdPercent=70` (was 85), `imageGCLowThresholdPercent=60` (was 80), ext4 `commit=60` in fstab (was default 5s), journald `SystemMaxUse=200M` + `MaxRetentionSec=3day`.
|
||||
|
|
|
|||
|
|
@ -45,7 +45,8 @@
|
|||
| nextcloud | File sync/share | nextcloud |
|
||||
| calibre | E-book management (may be merged into ebooks stack) | calibre |
|
||||
| onlyoffice | Document editing | onlyoffice |
|
||||
| f1-stream | F1 streaming | f1-stream |
|
||||
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier) | f1-stream |
|
||||
| chrome-service | Headed Chromium WebSocket pool (`ws://chrome-service.chrome-service.svc:3000/<token>`) for sibling services driving anti-bot embeds | chrome-service |
|
||||
| rybbit | Analytics | rybbit |
|
||||
| isponsorblocktv | SponsorBlock for TV | isponsorblocktv |
|
||||
| actualbudget | Budgeting (factory pattern) | actualbudget |
|
||||
|
|
|
|||
|
|
@ -14,104 +14,72 @@ steps:
|
|||
- name: build-and-push
|
||||
image: woodpeckerci/plugin-docker-buildx
|
||||
settings:
|
||||
repo: registry.viktorbarzin.me:5050/infra-ci
|
||||
# Phase 4 of forgejo-registry-consolidation 2026-05-07 —
|
||||
# registry.viktorbarzin.me dropped, Forgejo is the only target.
|
||||
repo:
|
||||
- forgejo.viktorbarzin.me/viktor/infra-ci
|
||||
dockerfile: ci/Dockerfile
|
||||
context: ci/
|
||||
tags:
|
||||
- latest
|
||||
- "${CI_COMMIT_SHA:0:8}"
|
||||
platforms: linux/amd64
|
||||
registry: registry.viktorbarzin.me:5050
|
||||
logins:
|
||||
- registry: registry.viktorbarzin.me:5050
|
||||
- registry: forgejo.viktorbarzin.me
|
||||
username:
|
||||
from_secret: registry_user
|
||||
from_secret: forgejo_user
|
||||
password:
|
||||
from_secret: registry_password
|
||||
from_secret: forgejo_push_token
|
||||
|
||||
# Post-push integrity check. Re-resolves the image we just pushed and HEADs
|
||||
# every blob it references — top-level manifest (index or single), each child
|
||||
# platform manifest, each config blob, each layer blob. If any returns !=200
|
||||
# the pipeline fails loudly here so we never ship a broken index downstream.
|
||||
# Historical context: 2026-04-13 and 2026-04-19 incidents both shipped indexes
|
||||
# whose platform/attestation children had been GC-orphaned on the registry VM.
|
||||
- name: verify-integrity
|
||||
# Post-push integrity check is now redundant with the every-15min
|
||||
# forgejo-integrity-probe in stacks/monitoring/, which walks
|
||||
# /v2/_catalog + HEADs every blob across the entire Forgejo registry.
|
||||
# If a corruption pattern emerges that the periodic probe misses,
|
||||
# restore a verify step similar to the pre-Phase-4 version (see
|
||||
# commit 49f4956f) but pointed at forgejo.viktorbarzin.me.
|
||||
|
||||
# Break-glass tarball: save the just-pushed infra-ci image to disk on the
|
||||
# registry VM (10.0.20.10) so we can `docker load` it back into a node
|
||||
# when Forgejo is unreachable. Pulls from Forgejo (the only registry now).
|
||||
# Best-effort — failure here doesn't fail the pipeline.
|
||||
# Recovery procedure: docs/runbooks/forgejo-registry-breakglass.md.
|
||||
- name: breakglass-tarball
|
||||
image: alpine:3.20
|
||||
failure: ignore
|
||||
environment:
|
||||
REG_USER:
|
||||
from_secret: registry_user
|
||||
REG_PASS:
|
||||
from_secret: registry_password
|
||||
REGISTRY_SSH_KEY:
|
||||
from_secret: registry_ssh_key
|
||||
FORGEJO_USER:
|
||||
from_secret: forgejo_user
|
||||
FORGEJO_PASS:
|
||||
from_secret: forgejo_push_token
|
||||
commands:
|
||||
- apk add --no-cache curl jq
|
||||
- REG=registry.viktorbarzin.me:5050
|
||||
- REPO=infra-ci
|
||||
- apk add --no-cache openssh-client
|
||||
- mkdir -p ~/.ssh && chmod 700 ~/.ssh
|
||||
- printf '%s\n' "$REGISTRY_SSH_KEY" > ~/.ssh/id_ed25519
|
||||
- chmod 600 ~/.ssh/id_ed25519
|
||||
- ssh-keyscan -t ed25519 10.0.20.10 >> ~/.ssh/known_hosts 2>/dev/null
|
||||
- SHA=${CI_COMMIT_SHA:0:8}
|
||||
- AUTH="$REG_USER:$REG_PASS"
|
||||
- |
|
||||
set -euo pipefail
|
||||
ACCEPT='Accept: application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json,application/vnd.docker.distribution.manifest.list.v2+json,application/vnd.docker.distribution.manifest.v2+json'
|
||||
|
||||
fetch_manifest() {
|
||||
# Prints the body to $2, returns the HTTP code as stdout.
|
||||
curl -sk -u "$AUTH" -H "$ACCEPT" \
|
||||
-o "$2" -w '%{http_code}' \
|
||||
"https://$REG/v2/$REPO/manifests/$1"
|
||||
}
|
||||
head_blob() {
|
||||
curl -sk -u "$AUTH" -o /dev/null -w '%{http_code}' \
|
||||
-I "https://$REG/v2/$REPO/blobs/$1"
|
||||
}
|
||||
|
||||
verify_single_manifest() {
|
||||
local ref="$1" tmp=/tmp/m-$$.json
|
||||
local rc cfg
|
||||
rc=$(fetch_manifest "$ref" "$tmp")
|
||||
if [ "$rc" != "200" ]; then
|
||||
echo "FAIL: manifest $ref returned HTTP $rc"; return 1
|
||||
fi
|
||||
cfg=$(jq -r '.config.digest // empty' "$tmp")
|
||||
if [ -n "$cfg" ]; then
|
||||
rc=$(head_blob "$cfg")
|
||||
[ "$rc" = "200" ] || { echo "FAIL: config blob $cfg returned HTTP $rc"; return 1; }
|
||||
fi
|
||||
jq -r '.layers[]?.digest' "$tmp" > /tmp/layers-$$.txt
|
||||
while IFS= read -r layer; do
|
||||
[ -z "$layer" ] && continue
|
||||
rc=$(head_blob "$layer")
|
||||
[ "$rc" = "200" ] || { echo "FAIL: layer blob $layer returned HTTP $rc"; return 1; }
|
||||
done < /tmp/layers-$$.txt
|
||||
return 0
|
||||
}
|
||||
|
||||
echo "=== Verifying push integrity for $REPO:$SHA ==="
|
||||
TOP=/tmp/top-$$.json
|
||||
rc=$(fetch_manifest "$SHA" "$TOP")
|
||||
[ "$rc" = "200" ] || { echo "FAIL: top manifest :$SHA returned HTTP $rc"; exit 1; }
|
||||
|
||||
MT=$(jq -r '.mediaType // empty' "$TOP")
|
||||
echo "Top-level media type: ${MT:-<unset>}"
|
||||
|
||||
if echo "$MT" | grep -Eq 'manifest\.list|image\.index'; then
|
||||
jq -r '.manifests[].digest' "$TOP" > /tmp/children-$$.txt
|
||||
echo "Multi-platform index: $(wc -l </tmp/children-$$.txt) child manifest(s)"
|
||||
while IFS= read -r d; do
|
||||
echo "--- child $d ---"
|
||||
verify_single_manifest "$d" || exit 1
|
||||
done < /tmp/children-$$.txt
|
||||
else
|
||||
echo "Single-platform manifest — verifying directly"
|
||||
verify_single_manifest "$SHA" || exit 1
|
||||
fi
|
||||
|
||||
echo "=== All manifests + blobs verified. Push integrity intact. ==="
|
||||
ssh -n -o BatchMode=yes root@10.0.20.10 "
|
||||
set -e
|
||||
mkdir -p /opt/registry/data/private/_breakglass
|
||||
IMAGE=forgejo.viktorbarzin.me/viktor/infra-ci:$SHA
|
||||
echo \$FORGEJO_PASS | docker login forgejo.viktorbarzin.me -u \$FORGEJO_USER --password-stdin
|
||||
docker pull \$IMAGE
|
||||
docker save \$IMAGE | gzip > /opt/registry/data/private/_breakglass/infra-ci-$SHA.tar.gz
|
||||
ln -sfn infra-ci-$SHA.tar.gz /opt/registry/data/private/_breakglass/infra-ci-latest.tar.gz
|
||||
ls -t /opt/registry/data/private/_breakglass/infra-ci-*.tar.gz \
|
||||
| grep -v 'latest' | tail -n +6 | xargs -r rm -v
|
||||
ls -lh /opt/registry/data/private/_breakglass/
|
||||
"
|
||||
|
||||
- name: slack
|
||||
image: curlimages/curl
|
||||
commands:
|
||||
- |
|
||||
curl -s -X POST -H 'Content-type: application/json' \
|
||||
--data "{\"text\":\"CI image built: registry.viktorbarzin.me:5050/infra-ci:${CI_COMMIT_SHA:0:8}\"}" \
|
||||
--data "{\"text\":\"CI image built: forgejo.viktorbarzin.me/viktor/infra-ci:${CI_COMMIT_SHA:0:8} (and registry-private mirror)\"}" \
|
||||
"$SLACK_WEBHOOK" || true
|
||||
environment:
|
||||
SLACK_WEBHOOK:
|
||||
|
|
|
|||
|
|
@ -25,7 +25,7 @@ clone:
|
|||
|
||||
steps:
|
||||
- name: apply
|
||||
image: registry.viktorbarzin.me/infra-ci:latest
|
||||
image: forgejo.viktorbarzin.me/viktor/infra-ci:latest
|
||||
pull: true
|
||||
backend_options:
|
||||
kubernetes:
|
||||
|
|
|
|||
|
|
@ -14,7 +14,7 @@ clone:
|
|||
|
||||
steps:
|
||||
- name: detect-drift
|
||||
image: registry.viktorbarzin.me/infra-ci:latest
|
||||
image: forgejo.viktorbarzin.me/viktor/infra-ci:latest
|
||||
pull: true
|
||||
backend_options:
|
||||
kubernetes:
|
||||
|
|
|
|||
136
docs/architecture/chrome-service.md
Normal file
136
docs/architecture/chrome-service.md
Normal file
|
|
@ -0,0 +1,136 @@
|
|||
# chrome-service — In-cluster headed Chromium pool
|
||||
|
||||
## Overview
|
||||
|
||||
`chrome-service` is a single-replica, persistent-profile, bearer-token-gated
|
||||
Playwright **launch-server** that exposes a headed Chromium browser over a
|
||||
WebSocket. Sibling services connect to it instead of running their own
|
||||
in-process Chromium when the upstream's anti-bot tooling
|
||||
(`disable-devtool.js` redirect-to-google trap, console-clear timing tricks,
|
||||
`navigator.webdriver` checks) defeats a headless browser.
|
||||
|
||||
Initial caller: `f1-stream`'s `playback_verifier`. Future callers attach
|
||||
via the WS+token contract documented in `stacks/chrome-service/README.md`.
|
||||
|
||||
## Why a separate stack
|
||||
|
||||
In-process Chromium inside `f1-stream`:
|
||||
|
||||
- Runs **headless** by default (no `Xvfb`/`DISPLAY`).
|
||||
- Has the `HeadlessChromium/...` UA suffix and `navigator.webdriver === true`.
|
||||
- Trips `disable-devtool.js`'s **Performance** detector — Playwright's CDP
|
||||
adds latency to `console.log(largeArray)` vs `console.table(largeArray)`,
|
||||
which the lib reads as "DevTools is open" and redirects to
|
||||
`https://www.google.com/`.
|
||||
|
||||
`chrome-service` solves this by:
|
||||
|
||||
1. Running **headed** under `Xvfb :99` (via `playwright launch-server` with
|
||||
a JSON config that pins `headless: false`).
|
||||
2. Living in a long-lived pod so JIT browser launch latency disappears.
|
||||
3. Allowing a per-context init script
|
||||
(`stacks/chrome-service/files/stealth.js` ~ 40 lines, vendored from
|
||||
`puppeteer-extra-plugin-stealth`) to spoof `webdriver`, `chrome.runtime`,
|
||||
`plugins`, `languages`, `Permissions.query`, WebGL renderer strings, and
|
||||
to hide the `disable-devtool-auto` script-tag attribute so the lib's
|
||||
IIFE exits early.
|
||||
|
||||
## Wire protocol
|
||||
|
||||
```text
|
||||
ws://chrome-service.chrome-service.svc.cluster.local:3000/<TOKEN>
|
||||
│
|
||||
┌───────────────────────────────┼───────────────────────────────┐
|
||||
│ caller pod │ chrome-service pod
|
||||
│ (e.g. f1-stream) │ (single replica)
|
||||
│ │
|
||||
│ CHROME_WS_URL ──────────────┘
|
||||
│ CHROME_WS_TOKEN ─── from `secret/chrome-service.api_bearer_token` (ESO)
|
||||
│
|
||||
│ await chromium.connect(f"{ws}/{token}")
|
||||
│ await ctx.add_init_script(STEALTH_JS)
|
||||
│ page.goto("https://upstream.com/embed/...")
|
||||
│
|
||||
└─── ←── pages render under Xvfb, headed Chromium ──── ─────────┘
|
||||
```
|
||||
|
||||
## Image pin
|
||||
|
||||
Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
||||
`stacks/chrome-service/main.tf`) and the Python client
|
||||
(`playwright==1.48.0` in callers' `requirements.txt`) **must match
|
||||
minor-versions**. Bump in lockstep — Playwright protocol changes between
|
||||
minors and the client cannot connect to a mismatched server.
|
||||
|
||||
The Microsoft image ships only the browser binaries, not the `playwright`
|
||||
npm SDK; the start command runs `npx -y playwright@1.48.0 launch-server`
|
||||
which downloads the SDK on first start (cached under `$HOME/.npm` via the
|
||||
PVC) and reuses it on subsequent restarts.
|
||||
|
||||
## Storage
|
||||
|
||||
- **`chrome-service-profile-encrypted`** (PVC, 2Gi → 10Gi autoresize,
|
||||
`proxmox-lvm-encrypted`) — Chromium user-data dir + npm cache.
|
||||
Encrypted because cookies/localStorage may include third-party auth tokens
|
||||
for sites callers drive. `HOME=/profile` so npx caches there.
|
||||
- **`chrome-service-backup-host`** (NFS, RWX) — destination for a 6-hourly
|
||||
CronJob that `tar -czf /backup/<YYYY_MM_DD_HH>.tar.gz -C /profile .`,
|
||||
retention 30 days.
|
||||
|
||||
## Auth + secrets
|
||||
|
||||
- Vault KV `secret/chrome-service.api_bearer_token` — 32-byte URL-safe
|
||||
random, rotated by hand:
|
||||
`vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
|
||||
- ESO syncs into namespace-local Secret `chrome-service-secrets`
|
||||
(server pod) and `chrome-service-client-secrets` (each caller pod).
|
||||
- Reloader (`reloader.stakater.com/auto = "true"`) cascades token rotation
|
||||
to both server and any annotated caller — no manual rollout.
|
||||
|
||||
## Network controls
|
||||
|
||||
- **`kubernetes_network_policy_v1.ws_ingress`** — two separate ingress
|
||||
rules on the same policy:
|
||||
- **TCP/3000** (Playwright WS): only namespaces labelled
|
||||
`chrome-service.viktorbarzin.me/client = "true"` (plus an explicit
|
||||
fallback for `f1-stream` by `kubernetes.io/metadata.name`).
|
||||
- **TCP/6080** (noVNC HTTP+WS): only the `traefik` namespace, since
|
||||
the public-facing path is `chrome.viktorbarzin.me` ingress →
|
||||
Traefik → sidecar. Authentik forward-auth still gates external
|
||||
access at the Traefik layer.
|
||||
- **WS port 3000** is internal-only (no ingress, no Cloudflare DNS).
|
||||
- **noVNC sidecar** (`forgejo.viktorbarzin.me/viktor/chrome-service-novnc`)
|
||||
exposes a live HTML5 view of the headed Chromium session via
|
||||
`x11vnc` (connected to Xvfb on `localhost:6099`) bridged to
|
||||
`websockify` on port 6080. Service `chrome` maps :80 → :6080 and is
|
||||
exposed via `ingress_factory` at `chrome.viktorbarzin.me`,
|
||||
Authentik-gated. Both static page and WebSocket upgrade share the
|
||||
same path — Cloudflare proxy, Cloudflared tunnel, Traefik, and
|
||||
Authentik forward-auth all preserve `Upgrade: websocket`.
|
||||
|
||||
## Adding a new caller
|
||||
|
||||
See `stacks/chrome-service/README.md` for the four-step recipe:
|
||||
|
||||
1. Label the caller's namespace.
|
||||
2. Add an `ExternalSecret` pulling `secret/chrome-service`.
|
||||
3. Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN` env vars.
|
||||
4. Vendor `stealth.js` and apply via `await context.add_init_script(...)`
|
||||
after every `new_context()`.
|
||||
|
||||
## Limits + risks
|
||||
|
||||
- **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
|
||||
license check, device-fingerprint mismatch, hotlink protection that
|
||||
whitelists specific parent domains), the verifier returns
|
||||
`is_playable=False` and the extractor moves on. No user-visible
|
||||
breakage, just empty stream lists for that source.
|
||||
- **JWPlayer DRM error 102630** — observed with several hmembeds embeds
|
||||
even from the headed chrome-service. The license check bails because
|
||||
the request origin isn't on the embed's allowlist; this is upstream
|
||||
policy, not an infra defect.
|
||||
- **Single replica + RWO PVC** — the deployment uses `Recreate` strategy.
|
||||
Brief outage on rollout, ~30s for browser warmup.
|
||||
- **No `/metrics` endpoint** — the cluster's generic
|
||||
`KubePodCrashLooping` rule covers basic alerting. A Prometheus scrape
|
||||
exporter is day-2 work.
|
||||
|
|
@ -19,7 +19,7 @@ graph LR
|
|||
I --> J[Pull from DockerHub<br/>or Pull-Through Cache]
|
||||
|
||||
K[Pull-Through Cache<br/>10.0.20.10] -.-> J
|
||||
L[registry.viktorbarzin.me<br/>Private Registry] -.-> J
|
||||
L[forgejo.viktorbarzin.me<br/>Private Registry on Forgejo] -.-> J
|
||||
|
||||
style B fill:#2088ff
|
||||
style F fill:#4c9e47
|
||||
|
|
@ -33,7 +33,7 @@ graph LR
|
|||
| GitHub Actions | Cloud | `.github/workflows/build-and-deploy.yml` | Build Docker images, push to DockerHub |
|
||||
| Woodpecker CI | Self-hosted | `ci.viktorbarzin.me` | Deploy to Kubernetes cluster |
|
||||
| DockerHub | Cloud | `viktorbarzin/*` | Public image registry |
|
||||
| Private Registry | Custom | `registry.viktorbarzin.me` | Private images, htpasswd auth |
|
||||
| Private Registry | Forgejo Packages | `forgejo.viktorbarzin.me/viktor` | Private container images (PAT auth, retention CronJob) — migrated from registry.viktorbarzin.me 2026-05-07 |
|
||||
| Pull-Through Cache | Custom | `10.0.20.10:5000` (docker.io)<br/>`10.0.20.10:5010` (ghcr.io) | LAN cache for remote registries |
|
||||
| Kyverno | Cluster | `kyverno` namespace | Auto-sync registry credentials to all namespaces |
|
||||
| Vault | Cluster | `vault.viktorbarzin.me` | K8s auth for Woodpecker pipelines |
|
||||
|
|
@ -102,7 +102,7 @@ Woodpecker API uses numeric IDs (not owner/name):
|
|||
1. **Containerd hosts.toml** redirects pulls from docker.io and ghcr.io to pull-through cache at `10.0.20.10`
|
||||
2. **Pull-through cache** serves cached images from LAN, fetches from upstream on cache miss
|
||||
3. **Kyverno ClusterPolicy** auto-syncs `registry-credentials` Secret to all namespaces for private registry access
|
||||
4. **Private registry** (`registry.viktorbarzin.me`) uses htpasswd auth, credentials stored in Vault. Runs `registry:2.8.3` (pinned — floating `registry:2` was the root cause of the 2026-04-13 + 2026-04-19 orphan-index incidents; see `docs/post-mortems/2026-04-19-registry-orphan-index.md`).
|
||||
4. **Private registry** has been Forgejo's built-in OCI registry at `forgejo.viktorbarzin.me/viktor/<image>` since 2026-05-07. Auth via PAT (Vault `secret/ci/global/forgejo_push_token` for push, `secret/viktor/forgejo_pull_token` for pull). The pre-migration `registry:2.8.3`-based private registry on `registry.viktorbarzin.me:5050` was the root cause of three orphan-index incidents in three weeks (2026-04-13, 2026-04-19, 2026-05-04 — see `docs/post-mortems/2026-04-19-registry-orphan-index.md` and the full migration writeup at `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md`). The five pull-through caches on `10.0.20.10` (ports 5000/5010/5020/5030/5040) stay in place for upstream registries.
|
||||
5. **Integrity probe** (`registry-integrity-probe` CronJob in `monitoring` ns, every 15m) walks `/v2/_catalog` → tags → indexes → child manifests via HEAD and pushes `registry_manifest_integrity_failures` to Pushgateway; alerts `RegistryManifestIntegrityFailure` / `RegistryIntegrityProbeStale` / `RegistryCatalogInaccessible` page on broken state. Authoritative check (HTTP API, not filesystem).
|
||||
|
||||
### Infra Pipelines (Woodpecker-only)
|
||||
|
|
|
|||
|
|
@ -63,7 +63,7 @@ graph TB
|
|||
| External Monitor Sync | Python 3.12 | `stacks/uptime-kuma/` | CronJob (10min) syncs `[External]` monitors from `cloudflare_proxied_names` |
|
||||
| dcgm-exporter | Configurable resources | `stacks/monitoring/modules/monitoring/` | NVIDIA GPU metrics collection |
|
||||
| Email Roundtrip Probe | Python 3.12 | `stacks/mailserver/modules/mailserver/` | E2E email delivery verification via Mailgun API + IMAP |
|
||||
| Registry Integrity Probe | Alpine 3.20 + curl/jq | `stacks/monitoring/modules/monitoring/main.tf` | CronJob every 15m: walks `/v2/_catalog` on `registry.viktorbarzin.me:5050`, HEADs every tagged manifest + index child; emits `registry_manifest_integrity_*` metrics to Pushgateway. Catches orphan OCI-index state that filesystem scans miss. |
|
||||
| Forgejo Registry Integrity Probe | Alpine 3.20 + curl/jq | `stacks/monitoring/modules/monitoring/main.tf` | CronJob every 15m: walks `/v2/_catalog` on `forgejo.viktorbarzin.me` (HTTP via in-cluster service), HEADs every tagged manifest + index child; emits `registry_manifest_integrity_*` metrics to Pushgateway. Replaces the legacy `registry-integrity-probe` against `registry.viktorbarzin.me:5050` decommissioned in Phase 4 of forgejo-registry-consolidation 2026-05-07. |
|
||||
|
||||
## How It Works
|
||||
|
||||
|
|
|
|||
195
docs/plans/2026-05-07-forgejo-registry-consolidation-design.md
Normal file
195
docs/plans/2026-05-07-forgejo-registry-consolidation-design.md
Normal file
|
|
@ -0,0 +1,195 @@
|
|||
# Forgejo Registry Consolidation — Design
|
||||
|
||||
**Date**: 2026-05-07
|
||||
**Status**: Approved
|
||||
|
||||
## Problem
|
||||
|
||||
`registry-private` (the `registry:2` container on the docker-registry
|
||||
VM at `10.0.20.10`) has hit `distribution#3324` corruption three
|
||||
times in three weeks (2026-04-13, 2026-04-19, 2026-05-04). Each
|
||||
incident required manual blob recovery and another round of
|
||||
hardening to `cleanup-tags.sh` and the GC procedure. The integrity
|
||||
probe catches it within 15 minutes now, but every hit still costs
|
||||
~1h of cleanup, and we keep tightening the same loose screw.
|
||||
|
||||
Root cause is a known race in `distribution`: tag deletes that race
|
||||
with concurrent garbage collection produce orphan OCI-index children.
|
||||
Upstream has not patched it; our mitigations (probe, blob
|
||||
fix-up script, idempotent cleanup) reduce blast radius but don't
|
||||
remove the failure mode.
|
||||
|
||||
Forgejo (deployed for OAuth and personal repos at
|
||||
`forgejo.viktorbarzin.me`) ships a built-in OCI registry as part of
|
||||
the Packages feature, default-on in v11. Using it removes
|
||||
`distribution`-the-engine from the path entirely, replaces it with
|
||||
Forgejo's own implementation backed by Forgejo's DB+blob store, and
|
||||
gets us source hosting + image hosting in one resource.
|
||||
|
||||
The PVE host RAM upgrade from 142GB to 272GB (memory id=569) means
|
||||
the cluster can absorb the resource bump Forgejo needs for the
|
||||
registry workload (1Gi → 1Gi).
|
||||
|
||||
## Decision
|
||||
|
||||
Move every image currently on `registry.viktorbarzin.me:5050` to
|
||||
Forgejo's OCI registry at `forgejo.viktorbarzin.me`. Decommission
|
||||
`registry-private` after a 14-day dual-push bake.
|
||||
|
||||
Pull-through caches for upstream registries (DockerHub, GHCR, Quay,
|
||||
k8s.gcr, Kyverno) stay on the registry VM permanently — Forgejo
|
||||
won't serve as a pull-through, so the chicken-and-egg of "Forgejo
|
||||
pulling its own image through itself" never arises.
|
||||
|
||||
## Design
|
||||
|
||||
### Registry hostname
|
||||
|
||||
Image references become `forgejo.viktorbarzin.me/viktor/<image>:<tag>`.
|
||||
The `viktor/` prefix is the Forgejo owner namespace; all current
|
||||
private images ship under that single owner.
|
||||
|
||||
### Auth
|
||||
|
||||
Two service-account users:
|
||||
|
||||
| User | Scope | Vault key | Used by |
|
||||
|---|---|---|---|
|
||||
| `cluster-puller` | `read:package` | `secret/viktor/forgejo_pull_token` | cluster-wide `registry-credentials` Secret, monitoring probe |
|
||||
| `ci-pusher` | `write:package` | `secret/ci/global/forgejo_push_token` | Woodpecker pipelines (synced via `vault-woodpecker-sync` CronJob) |
|
||||
|
||||
A third PAT (`secret/viktor/forgejo_cleanup_token`, also belongs to
|
||||
`ci-pusher`) drives the retention CronJob — kept separate from the
|
||||
push PAT so a leaked CI token doesn't immediately enable mass deletes.
|
||||
|
||||
PATs have no expiry. Rotation policy: regenerate via Forgejo Web UI
|
||||
and `vault kv patch` if a leak is suspected; ESO/sync downstream is
|
||||
automatic.
|
||||
|
||||
### Cluster pull path
|
||||
|
||||
`registry-credentials` is a single Secret in `kyverno` ns, cloned
|
||||
into every namespace by the existing
|
||||
`sync-registry-credentials` ClusterPolicy. We extend its
|
||||
`dockerconfigjson` `auths` map with a fourth entry for
|
||||
`forgejo.viktorbarzin.me`. **No new Secret, no new ClusterPolicy,
|
||||
no `imagePullSecrets =` line edits across stacks.**
|
||||
|
||||
Containerd `hosts.toml` redirects `forgejo.viktorbarzin.me` → in-cluster
|
||||
Traefik LB at `10.0.20.200`, the same pattern used for
|
||||
`registry.viktorbarzin.me` → `10.0.20.10:5050`. Avoids hairpin NAT
|
||||
through the WAN gateway for in-cluster pulls.
|
||||
|
||||
### Push path
|
||||
|
||||
Woodpecker pipelines push to BOTH targets during the bake:
|
||||
|
||||
```yaml
|
||||
- name: build-and-push
|
||||
image: woodpeckerci/plugin-docker-buildx
|
||||
settings:
|
||||
repo:
|
||||
- registry.viktorbarzin.me/<name>
|
||||
- forgejo.viktorbarzin.me/viktor/<name>
|
||||
logins:
|
||||
- registry: registry.viktorbarzin.me
|
||||
username:
|
||||
from_secret: registry_user
|
||||
password:
|
||||
from_secret: registry_password
|
||||
- registry: forgejo.viktorbarzin.me
|
||||
username:
|
||||
from_secret: forgejo_user
|
||||
password:
|
||||
from_secret: forgejo_push_token
|
||||
```
|
||||
|
||||
The `vault-woodpecker-sync` CronJob (every 6h) propagates
|
||||
`secret/ci/global` keys to every Woodpecker repo as global secrets.
|
||||
|
||||
### Retention
|
||||
|
||||
Forgejo's per-package "Cleanup Rules" UI is per-user runtime DB
|
||||
state, not Terraform-driven. Retention runs as a CronJob in the
|
||||
`forgejo` namespace, schedule `0 4 * * *`, that:
|
||||
|
||||
1. Lists all container packages under the `viktor` owner.
|
||||
2. Groups by package name.
|
||||
3. Keeps newest 10 versions + always keeps `latest`.
|
||||
4. DELETEs the rest via `/api/v1/packages/{owner}/{type}/{name}/{version}`.
|
||||
|
||||
First 7 days run with `DRY_RUN=true` — script logs what it would
|
||||
delete but issues no DELETE calls. After log review, flip the
|
||||
`forgejo_cleanup_dry_run` local in `cleanup.tf` to false.
|
||||
|
||||
### Integrity monitoring
|
||||
|
||||
Mirror the existing `registry-integrity-probe` CronJob: walk
|
||||
`/v2/_catalog`, walk every tag, HEAD every manifest + index child,
|
||||
push `registry_manifest_integrity_*` metrics. Existing
|
||||
Prometheus alerts fire on the `instance` label, so they cover both
|
||||
probes automatically once the alert annotations are made
|
||||
instance-aware (done in this change).
|
||||
|
||||
### Source migration
|
||||
|
||||
Projects currently living as plain dirs in the local-only monorepo
|
||||
become standalone Forgejo repos. Two GitHub-hosted private repos
|
||||
(`beadboard`, `claude-memory-mcp`) move to Forgejo and are archived
|
||||
on GitHub.
|
||||
|
||||
CI standardises on Woodpecker for everything in scope. The two
|
||||
projects that used GHA (build + Woodpecker-deploy via GHA-hosted
|
||||
DockerHub push) keep DockerHub for legacy compatibility but their
|
||||
canonical image source becomes Forgejo.
|
||||
|
||||
### Break-glass for infra-ci
|
||||
|
||||
`infra-ci` is the Docker image used by all infra Woodpecker
|
||||
pipelines, including `default.yml` (terragrunt apply). If Forgejo is
|
||||
unreachable at the moment we need to apply, `infra-ci` is
|
||||
unreachable, and we can't apply our way out.
|
||||
|
||||
Mitigation: dual-push step also `docker save | gzip` the built
|
||||
infra-ci image to:
|
||||
|
||||
- `/opt/registry/data/private/_breakglass/infra-ci-<sha>.tar.gz` on
|
||||
the registry VM disk (Copy 1)
|
||||
- `/srv/nfs/forgejo-breakglass/` on the NAS (Copy 2)
|
||||
|
||||
A `latest` symlink in each location points at the most recent.
|
||||
Recovery procedure (`docs/runbooks/forgejo-registry-breakglass.md`):
|
||||
scp tarball → `docker load` → `ctr -n k8s.io images import` → fix
|
||||
Forgejo via that node.
|
||||
|
||||
### Cutover style
|
||||
|
||||
**Dual-push bake**: pipelines push to both registries for ≥14 days.
|
||||
Pods continue pulling from `registry.viktorbarzin.me`. After bake:
|
||||
|
||||
1. Per-project PR: flip `image=` lines in Terraform stacks. Pod
|
||||
re-pull naturally on next rollout.
|
||||
2. Phase 4: stop `registry-private` container, remove its
|
||||
`auths` entry from the cluster Secret, drop containerd hosts.toml
|
||||
entry.
|
||||
|
||||
## Why not alternatives
|
||||
|
||||
| Option | Rejected because |
|
||||
|---|---|
|
||||
| Stay on `registry-private` | Three corruption incidents in three weeks; mitigation cost rising |
|
||||
| Run a fresh registry container alongside (no Forgejo) | Same upstream, same `distribution#3324` failure mode |
|
||||
| GHCR / DockerHub for all private images | Public-by-default model + push rate limits; loses owner-owned blob storage |
|
||||
| Harbor | Heavier than Forgejo registry, would need its own DB + ingress, no source-hosting integration |
|
||||
|
||||
## Risks
|
||||
|
||||
See plan doc § "Risk register" for the full table. Top three:
|
||||
|
||||
1. **Forgejo registry hits the same corruption pattern.** Mitigated
|
||||
by 14-day bake + integrity probe within 15 min.
|
||||
2. **Forgejo down → infra-ci unreachable → can't apply.** Mitigated
|
||||
by tarball break-glass on VM + NAS.
|
||||
3. **Pod re-pulls fail after `image=` flip due to containerd cache
|
||||
poisoning.** Mitigated by hosts.toml deployment + per-project
|
||||
`kubectl rollout restart` in Phase 3.
|
||||
152
docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md
Normal file
152
docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
# Forgejo Registry Consolidation — Plan
|
||||
|
||||
**Date**: 2026-05-07
|
||||
**Status**: Approved — execution in progress (Phase 0)
|
||||
**Design**: `2026-05-07-forgejo-registry-consolidation-design.md`
|
||||
|
||||
This is the implementation roadmap for migrating off `registry-private`
|
||||
onto Forgejo's OCI registry. See the design doc for problem
|
||||
statement and rationale. Execution spans 5 phases over ≥3 weeks.
|
||||
|
||||
## Phase 0 — Prepare Forgejo (1 PR, no cutover risk)
|
||||
|
||||
| Task | File / artifact |
|
||||
|---|---|
|
||||
| Bump Forgejo memory request+limit 384Mi → 1Gi | `infra/stacks/forgejo/main.tf` |
|
||||
| Add `FORGEJO__packages__ENABLED=true` and `FORGEJO__packages__CHUNKED_UPLOAD_PATH=/data/tmp/package-upload` env vars (defensive — already default in v11) | `infra/stacks/forgejo/main.tf` |
|
||||
| Bump Forgejo PVC 5Gi → 15Gi, auto-resize cap 20Gi → 50Gi | `infra/stacks/forgejo/main.tf` |
|
||||
| Bump ingress `max_body_size = "5g"` (wired into ingress_factory as a Buffering middleware) | `infra/stacks/forgejo/main.tf`, `infra/modules/kubernetes/ingress_factory/main.tf` |
|
||||
| Create `cluster-puller` (read:package), `ci-pusher` (write:package), and a third `cleanup` PAT on `ci-pusher`; store PATs in Vault | runbook: `docs/runbooks/forgejo-registry-setup.md` |
|
||||
| Extend `registry-credentials` Secret with 4th `auths` entry for `forgejo.viktorbarzin.me` | `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` |
|
||||
| Add containerd `hosts.toml` entry redirecting `forgejo.viktorbarzin.me` → in-cluster Traefik LB `10.0.20.200` | `infra/stacks/infra/main.tf` cloud-init + new `infra/scripts/setup-forgejo-containerd-mirror.sh` for existing nodes |
|
||||
| Forgejo retention CronJob (`0 4 * * *`, dry-run for first 7 days) | new `infra/stacks/forgejo/cleanup.tf` + `infra/stacks/forgejo/files/cleanup.sh` |
|
||||
| Forgejo integrity probe CronJob (`*/15 * * * *`) | `infra/stacks/monitoring/modules/monitoring/main.tf` |
|
||||
| Make existing alerts instance-aware so they cover both registries | `infra/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl` |
|
||||
|
||||
**Smoke test (must pass before declaring Phase 0 done):**
|
||||
|
||||
- `docker login forgejo.viktorbarzin.me` succeeds.
|
||||
- Push a hello-world image to `forgejo.viktorbarzin.me/viktor/smoketest:1` succeeds.
|
||||
- `crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1` from a k8s
|
||||
node succeeds, using the auto-synced `registry-credentials` Secret.
|
||||
- A fresh namespace gets the cloned Secret with 4 `auths` entries.
|
||||
- Delete the smoketest package via API.
|
||||
- Forgejo integrity probe completes once and pushes metrics.
|
||||
|
||||
## Phase 1 — Source migration (parallel-safe, no production impact)
|
||||
|
||||
For each project the recipe is identical:
|
||||
|
||||
1. `git init` + push to `forgejo.viktorbarzin.me/viktor/<name>` —
|
||||
register in Woodpecker via OAuth.
|
||||
2. Add `.woodpecker.yml` based on `payslip-ingest/.woodpecker.yml`.
|
||||
Push step uses `woodpeckerci/plugin-docker-buildx` with TWO
|
||||
`repo:` entries (dual-push).
|
||||
3. Confirm first build pushes to BOTH registries.
|
||||
|
||||
Projects (bake clock starts at "all dual-push"):
|
||||
|
||||
| Project | Action |
|
||||
|---|---|
|
||||
| `claude-agent-service` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
|
||||
| `fire-planner` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
|
||||
| `wealthfolio-sync` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
|
||||
| `hmrc-sync` | Extract from monorepo to Forgejo. New `.woodpecker.yml`. |
|
||||
| `freedify` | Push from monorepo to Forgejo. New `.woodpecker.yml`. (Upstream is gone.) |
|
||||
| `payslip-ingest` | Already on Forgejo. Add second `repo:` entry to `.woodpecker.yml`. |
|
||||
| `job-hunter` | Already on Forgejo. Add second `repo:` entry. |
|
||||
| `beadboard` | Push to Forgejo. New `.woodpecker.yml`. Disable GHA workflow. **Don't archive GitHub yet** (deferred to Phase 3). |
|
||||
| `claude-memory-mcp` | Push to Forgejo. New `.woodpecker.yml`. |
|
||||
| `infra-ci` | Edit `.woodpecker/build-ci-image.yml` to dual-push. ALSO `docker save | gzip` to `/opt/registry/data/private/_breakglass/` on VM AND `/srv/nfs/forgejo-breakglass/` on NAS. Pin a `latest` symlink. |
|
||||
|
||||
Break-glass runbook (`docs/runbooks/forgejo-registry-breakglass.md`)
|
||||
documents the recovery path.
|
||||
|
||||
## Phase 2 — Bake (≥14 days)
|
||||
|
||||
- No `image=` lines change. Pods still pull from
|
||||
`registry.viktorbarzin.me`.
|
||||
- **Daily smoke check**: pull a recent image from Forgejo as
|
||||
`cluster-puller`, verify integrity (HEAD on manifest + each blob).
|
||||
- **Bake exit criteria**:
|
||||
- Zero `RegistryManifestIntegrityFailure` alerts on Forgejo.
|
||||
- Zero `ContainerNearOOM` for the forgejo pod.
|
||||
- Retention CronJob has run ≥14 times successfully.
|
||||
- At least one full Sunday GC cycle has elapsed.
|
||||
- Switch retention CronJob to `DRY_RUN=false` on day 7, observe
|
||||
until day 14.
|
||||
|
||||
## Phase 3 — Cutover (one PR per project, single session)
|
||||
|
||||
Order = lowest blast radius first. Each step:
|
||||
`image=` flip → `kubectl rollout restart` → verify pull from Forgejo.
|
||||
|
||||
1. `payslip-ingest` (`infra/stacks/payslip-ingest/main.tf`)
|
||||
2. `job-hunter` (`infra/stacks/job-hunter/main.tf`)
|
||||
3. `claude-agent-service` (`infra/stacks/claude-agent-service/main.tf`)
|
||||
4. `fire-planner` (`infra/stacks/fire-planner/main.tf`)
|
||||
5. `wealthfolio-sync` (`infra/stacks/wealthfolio/main.tf`)
|
||||
6. `freedify` (`infra/stacks/freedify/factory/main.tf`)
|
||||
7. `chrome-service` (`infra/stacks/chrome-service/main.tf`)
|
||||
8. `beads-server` / `beadboard` (`infra/stacks/beads-server/main.tf`).
|
||||
Then `gh repo archive ViktorBarzin/beadboard`.
|
||||
9. `infra-ci` — flip `image:` references in 4 `.woodpecker/*.yml`
|
||||
files in the infra repo. Verify next push to master applies cleanly.
|
||||
10. `claude-memory-mcp` — update `CLAUDE.md` install instruction from
|
||||
`claude plugins install github:ViktorBarzin/claude-memory-mcp` to
|
||||
`claude plugins install https://forgejo.viktorbarzin.me/viktor/claude-memory-mcp.git`.
|
||||
`gh repo archive ViktorBarzin/claude-memory-mcp`.
|
||||
|
||||
## Phase 4 — Decommission
|
||||
|
||||
| Step | File / location |
|
||||
|---|---|
|
||||
| Stop `registry-private` container on VM (10.0.20.10): edit `/opt/registry/docker-compose.yml`, comment out service, `docker compose up -d --remove-orphans`. (Manual SSH — cloud-init won't redeploy on TF apply per memory id=1078.) | live VM |
|
||||
| Update cloud-init template to match the new compose file | `infra/stacks/infra/main.tf:288` |
|
||||
| Delete `auths` entries for `registry.viktorbarzin.me` / `:5050` / `10.0.20.10:5050` from the dockerconfigjson | `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` |
|
||||
| Drop `registry.viktorbarzin.me` and `10.0.20.10:5050` `hosts.toml` entries on each node + cloud-init template | `infra/stacks/infra/main.tf` cloud-init + ad-hoc script |
|
||||
| After 1 week of no incidents, delete `/opt/registry/data/private/` blob storage on the VM (~2.6GB freed) | manual SSH |
|
||||
|
||||
## Phase 5 — Docs
|
||||
|
||||
In the same commit as the Phase 4 closing:
|
||||
|
||||
| Doc | Update |
|
||||
|---|---|
|
||||
| `docs/runbooks/registry-vm.md` | Note `registry-private` is gone; pull-through caches and break-glass tarballs only |
|
||||
| `docs/runbooks/registry-rebuild-image.md` | Replaced by NEW `forgejo-registry-rebuild-image.md` |
|
||||
| `docs/runbooks/forgejo-registry-rebuild-image.md` (NEW) | Forgejo PVC restore procedure |
|
||||
| `docs/runbooks/forgejo-registry-breakglass.md` (NEW) | infra-ci tarball recovery |
|
||||
| `docs/architecture/ci-cd.md` | Image registry section flips to Forgejo |
|
||||
| `docs/architecture/monitoring.md` | Integrity probe target updated |
|
||||
| `infra/.claude/CLAUDE.md` | Registry references updated |
|
||||
| `CLAUDE.md` (monorepo root) | claude-memory-mcp install URL updated |
|
||||
| `infra/.claude/reference/service-catalog.md` | Cross-reference checked |
|
||||
|
||||
## Critical files modified
|
||||
|
||||
| File | Phase | What |
|
||||
|---|---|---|
|
||||
| `infra/stacks/forgejo/main.tf` | 0 | Memory bump, packages env vars, PVC bump, ingress max_body_size |
|
||||
| `infra/stacks/forgejo/cleanup.tf` (NEW) | 0 | Retention CronJob |
|
||||
| `infra/stacks/forgejo/files/cleanup.sh` (NEW) | 0 | Retention script (mounted via ConfigMap) |
|
||||
| `infra/modules/kubernetes/ingress_factory/main.tf` | 0 | Wire `max_body_size` into a Traefik Buffering middleware |
|
||||
| `infra/stacks/kyverno/modules/kyverno/registry-credentials.tf` | 0 | Add 4th `auths` entry |
|
||||
| `infra/stacks/infra/main.tf` | 0 + 4 | Containerd hosts.toml block (add Forgejo, later remove registry-private); compose template update |
|
||||
| `infra/scripts/setup-forgejo-containerd-mirror.sh` (NEW) | 0 | One-shot rollout for existing nodes |
|
||||
| `infra/stacks/monitoring/modules/monitoring/main.tf` | 0 | Forgejo integrity probe CronJob |
|
||||
| `infra/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl` | 0 | Make alerts instance-aware |
|
||||
| `infra/stacks/monitoring/main.tf` | 0 | Plumb `forgejo_pull_token` into module |
|
||||
| `infra/.woodpecker/build-ci-image.yml` | 1 | Dual-push to add Forgejo target + tarball break-glass |
|
||||
| `<each-project>/.woodpecker.yml` | 1 | Dual-push (NEW for fire-planner, wealthfolio-sync, hmrc-sync, freedify, beadboard, claude-memory-mcp; EDIT for payslip-ingest, job-hunter, claude-agent-service) |
|
||||
| `infra/.woodpecker/{default,drift-detection,build-cli}.yml` | 3 | Flip `image:` to Forgejo for infra-ci |
|
||||
| `infra/stacks/{beads-server,chrome-service,claude-agent-service,fire-planner,freedify/factory,job-hunter,payslip-ingest,wealthfolio}/main.tf` | 3 | Flip `image =` to Forgejo |
|
||||
|
||||
## Verification
|
||||
|
||||
- **Push** (Phase 0/1): `docker push forgejo.viktorbarzin.me/viktor/<name>` visible in Forgejo Web UI under viktor/.
|
||||
- **Pull** (Phase 0): `crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1` succeeds with auto-synced Secret.
|
||||
- **Dual-push** (Phase 1): every Woodpecker pipeline run pushes to BOTH endpoints — confirmed via HEAD checks on `<reg>:<sha>` for both.
|
||||
- **Bake** (Phase 2): existing daily Forgejo `/api/healthz` external monitor stays green; integrity probe stays green; no `ContainerNearOOM` for forgejo pod.
|
||||
- **Cutover** (Phase 3): `kubectl rollout status deploy/<svc> -n <ns>` succeeds. `kubectl describe pod` shows the image was pulled from `forgejo.viktorbarzin.me`.
|
||||
- **Decommission** (Phase 4): `docker ps` on registry VM no longer shows `registry-private`. Brand-new namespace gets the Secret with only the Forgejo `auths` entry. Pull still works.
|
||||
126
docs/runbooks/forgejo-registry-breakglass.md
Normal file
126
docs/runbooks/forgejo-registry-breakglass.md
Normal file
|
|
@ -0,0 +1,126 @@
|
|||
# Runbook: Forgejo registry break-glass — recovering infra-ci
|
||||
|
||||
Last updated: 2026-05-07
|
||||
|
||||
## When to use this runbook
|
||||
|
||||
When **all** of the following are true:
|
||||
|
||||
1. Forgejo (`forgejo.viktorbarzin.me`) is unreachable.
|
||||
2. `registry-private` is also gone (post-Phase 4 of the consolidation),
|
||||
so you can't fall back to `registry.viktorbarzin.me:5050/infra-ci`.
|
||||
3. You need to run an infra Woodpecker pipeline (apply, build-cli,
|
||||
drift-detection, etc.) — but those pipelines pull `infra-ci` and
|
||||
crash because the registry is down.
|
||||
|
||||
If only Forgejo is down but `registry-private` is still alive, the
|
||||
pipelines work — `image:` references in `infra/.woodpecker/*.yml`
|
||||
still hit `registry.viktorbarzin.me:5050/infra-ci` until Phase 3
|
||||
flips them. Skip this runbook entirely.
|
||||
|
||||
## What's available
|
||||
|
||||
The `build-ci-image.yml` Woodpecker pipeline saves a tarball after
|
||||
each successful push:
|
||||
|
||||
| Location | Path |
|
||||
|---|---|
|
||||
| Registry VM disk (10.0.20.10) | `/opt/registry/data/private/_breakglass/infra-ci-<sha>.tar.gz` |
|
||||
| Registry VM disk (latest symlink) | `/opt/registry/data/private/_breakglass/infra-ci-latest.tar.gz` |
|
||||
| Synology NAS (offsite copy via daily-backup sync) | `/volume1/Backup/Viki/pve-backup/_forgejo-breakglass/` |
|
||||
|
||||
The registry VM keeps the last 5 tarballs. Synology mirrors them
|
||||
through the existing offsite-sync-backup job (`/usr/local/bin/
|
||||
offsite-sync-backup`).
|
||||
|
||||
## Recovery procedure
|
||||
|
||||
The goal is to get a working `infra-ci` image onto a k8s node so
|
||||
Woodpecker pods can run it. Then run a Woodpecker pipeline that
|
||||
restores Forgejo from PVC backup or rebuilds it.
|
||||
|
||||
### Step 1 — copy the tarball to a node
|
||||
|
||||
From your workstation (the registry VM is reachable but Forgejo is
|
||||
not — the rest of the cluster might be in a similar partial state):
|
||||
|
||||
```bash
|
||||
ssh wizard@10.0.20.103 # any responsive k8s node
|
||||
sudo mkdir -p /var/breakglass
|
||||
sudo scp root@10.0.20.10:/opt/registry/data/private/_breakglass/infra-ci-latest.tar.gz \
|
||||
/var/breakglass/
|
||||
```
|
||||
|
||||
If the registry VM is also down, fall back to Synology:
|
||||
|
||||
```bash
|
||||
sudo scp 192.168.1.13:/volume1/Backup/Viki/pve-backup/_forgejo-breakglass/infra-ci-latest.tar.gz \
|
||||
/var/breakglass/
|
||||
```
|
||||
|
||||
### Step 2 — load into containerd
|
||||
|
||||
`docker load` won't help on a k8s node — it loads into the docker
|
||||
daemon, which kubelet/containerd doesn't see. Use `ctr`:
|
||||
|
||||
```bash
|
||||
sudo ctr -n k8s.io images import /var/breakglass/infra-ci-latest.tar.gz
|
||||
sudo ctr -n k8s.io images list | grep infra-ci
|
||||
```
|
||||
|
||||
Confirm the image is tagged with the original repository name
|
||||
(`registry.viktorbarzin.me:5050/infra-ci:<sha>` — the tarball was
|
||||
saved with that tag, NOT the Forgejo name).
|
||||
|
||||
### Step 3 — pin pods to this node
|
||||
|
||||
Add a node selector or taint-toleration to whatever pipeline you
|
||||
need to run. Simplest: cordon the other nodes briefly so Woodpecker
|
||||
schedules onto this one.
|
||||
|
||||
```bash
|
||||
for n in $(kubectl get nodes -o name | grep -v $(hostname)); do
|
||||
kubectl cordon ${n#node/}
|
||||
done
|
||||
```
|
||||
|
||||
Run the pipeline. After it completes:
|
||||
|
||||
```bash
|
||||
for n in $(kubectl get nodes -o name); do
|
||||
kubectl uncordon ${n#node/}
|
||||
done
|
||||
```
|
||||
|
||||
### Step 4 — fix the underlying problem
|
||||
|
||||
The pipeline you just ran was meant to restore Forgejo. Common
|
||||
options:
|
||||
|
||||
- **Forgejo PVC corrupt** — `docs/runbooks/forgejo-registry-rebuild-image.md`
|
||||
walks through PVC restore from LVM snapshot or PVE backup.
|
||||
- **Forgejo OOM-loop** — bump memory request+limit in
|
||||
`infra/stacks/forgejo/main.tf` and apply.
|
||||
- **Forgejo unreachable due to network** — check Traefik, MetalLB,
|
||||
pfSense.
|
||||
|
||||
Once Forgejo is back, run `build-ci-image.yml` manually so the
|
||||
tarball regenerates with the latest commit.
|
||||
|
||||
## Why this exists
|
||||
|
||||
The 2026-04-19 post-mortem on the registry-orphan-index incident
|
||||
showed that a single registry going corrupt could block ALL infra
|
||||
pipelines (because every pipeline pulls `infra-ci` from that
|
||||
registry). The dual-push to Forgejo + registry-private removes that
|
||||
single-point-of-failure during the bake. After Phase 4
|
||||
decommissions registry-private, the tarball is the last line of
|
||||
defense.
|
||||
|
||||
## Why on the registry VM and not in-cluster
|
||||
|
||||
The Forgejo pod and registry-private pod both depend on cluster
|
||||
networking + storage. The registry VM is an independent
|
||||
non-clustered VM with local storage. If the cluster is in a bad
|
||||
state, the VM's disk is still readable from any other host on the
|
||||
LAN.
|
||||
128
docs/runbooks/forgejo-registry-rebuild-image.md
Normal file
128
docs/runbooks/forgejo-registry-rebuild-image.md
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
# Runbook: Rebuild an Image on the Forgejo OCI Registry
|
||||
|
||||
Last updated: 2026-05-07
|
||||
|
||||
## When to use this
|
||||
|
||||
Pipelines pulling from `forgejo.viktorbarzin.me/viktor/<image>` fail with:
|
||||
|
||||
- `failed to resolve reference … : not found`
|
||||
- `manifest unknown`
|
||||
- HEAD on a manifest/blob digest returns 404
|
||||
- `forgejo-integrity-probe` CronJob in `monitoring` reports
|
||||
`registry_manifest_integrity_failures > 0` for
|
||||
`instance="forgejo.viktorbarzin.me"`
|
||||
|
||||
This is the Forgejo equivalent of the registry-private orphan-index
|
||||
failure mode (`docs/post-mortems/2026-04-19-registry-orphan-index.md`).
|
||||
Cause is usually package-version delete races with an in-flight pull,
|
||||
or PVC corruption. Fix is to rebuild the image from source and
|
||||
re-push, so Forgejo receives a complete, fresh upload.
|
||||
|
||||
If the symptom is different (Forgejo unreachable, PVC OOM,
|
||||
authentication failure), use:
|
||||
- `docs/runbooks/forgejo-registry-setup.md` for auth + token issues
|
||||
- `docs/runbooks/forgejo-registry-breakglass.md` if Forgejo + the
|
||||
cluster are both unreachable
|
||||
- `docs/runbooks/restore-pvc-from-backup.md` for PVC corruption
|
||||
|
||||
## Phase 1 — Confirm the diagnosis
|
||||
|
||||
From any host:
|
||||
|
||||
```sh
|
||||
REG=forgejo.viktorbarzin.me
|
||||
USER=cluster-puller
|
||||
PASS="$(vault kv get -field=forgejo_pull_token secret/viktor)"
|
||||
IMAGE=viktor/payslip-ingest
|
||||
TAG=latest
|
||||
|
||||
# 1. Confirm the manifest exists at all.
|
||||
curl -sk -u "$USER:$PASS" \
|
||||
-H 'Accept: application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json' \
|
||||
"https://$REG/v2/$IMAGE/manifests/$TAG" | jq '.mediaType, .manifests[].digest // .config.digest'
|
||||
|
||||
# 2. HEAD each child / config / layer digest. Any non-200 = confirmed.
|
||||
for d in $(curl -sk -u "$USER:$PASS" -H 'Accept: application/vnd.oci.image.index.v1+json' \
|
||||
"https://$REG/v2/$IMAGE/manifests/$TAG" | jq -r '.manifests[].digest // empty'); do
|
||||
code=$(curl -sk -u "$USER:$PASS" -o /dev/null -w '%{http_code}' \
|
||||
-I "https://$REG/v2/$IMAGE/manifests/$d")
|
||||
echo "$d → $code"
|
||||
done
|
||||
```
|
||||
|
||||
The probe's last log run is also a fast way to see what's affected:
|
||||
|
||||
```sh
|
||||
kubectl -n monitoring logs \
|
||||
$(kubectl -n monitoring get pods -l job-name -o name \
|
||||
| grep forgejo-integrity-probe | head -1)
|
||||
```
|
||||
|
||||
## Phase 2 — Rebuild and re-push
|
||||
|
||||
Forgejo lets you delete a specific package version through the API.
|
||||
Doing this **before** the rebuild ensures the new push doesn't
|
||||
collide with the half-broken existing entry.
|
||||
|
||||
```sh
|
||||
# Delete the broken version (replace TAG with the actual tag).
|
||||
curl -X DELETE -H "Authorization: token $(vault kv get -field=forgejo_cleanup_token secret/viktor)" \
|
||||
"https://$REG/api/v1/packages/viktor/container/$(basename $IMAGE)/$TAG"
|
||||
```
|
||||
|
||||
Rebuild via Woodpecker (manual run if the pipeline isn't triggered
|
||||
by a code change):
|
||||
|
||||
1. Open `https://ci.viktorbarzin.me/repos/<repo>/manual` for the
|
||||
project.
|
||||
2. Click **Run pipeline** with `branch=master`.
|
||||
3. Wait for the build-and-push step to complete.
|
||||
4. Confirm the new version is visible in Forgejo Web UI under
|
||||
`viktor/<image>` → Packages → Container.
|
||||
|
||||
## Phase 3 — Restart consumers
|
||||
|
||||
Pods that already cached the broken digest may continue using it.
|
||||
Force a fresh pull:
|
||||
|
||||
```sh
|
||||
kubectl rollout restart deploy/<service> -n <ns>
|
||||
```
|
||||
|
||||
If the pod still fails, the new manifest digest may not have
|
||||
propagated through containerd's cache. Drain + restart containerd on
|
||||
the affected node:
|
||||
|
||||
```sh
|
||||
kubectl drain <node> --ignore-daemonsets --delete-emptydir-data
|
||||
ssh wizard@<node> sudo systemctl restart containerd
|
||||
kubectl uncordon <node>
|
||||
```
|
||||
|
||||
## Phase 4 — Verify integrity recovery
|
||||
|
||||
The next probe run (every 15 min) will report:
|
||||
|
||||
```
|
||||
registry_manifest_integrity_failures{instance="forgejo.viktorbarzin.me"} 0
|
||||
```
|
||||
|
||||
The `RegistryManifestIntegrityFailure` alert resolves automatically
|
||||
30 minutes after the metric goes back to 0.
|
||||
|
||||
## Why this happens
|
||||
|
||||
Forgejo's OCI registry stores blobs in its own DB+filesystem. Unlike
|
||||
`registry:2` + `distribution`, it doesn't have the
|
||||
[`distribution#3324`](https://github.com/distribution/distribution/issues/3324)
|
||||
GC-vs-tag-delete race. But it can still reach a broken state if:
|
||||
|
||||
- The retention CronJob deletes a version while a pull is in flight
|
||||
on the same digest.
|
||||
- The PVC fills up mid-push (`docs/runbooks/restore-pvc-from-backup.md`).
|
||||
- A Forgejo upgrade migrates the package schema and a row is dropped.
|
||||
|
||||
In all cases the recovery procedure is identical: delete the broken
|
||||
version through the API, rebuild from source, force consumers to
|
||||
re-pull.
|
||||
163
docs/runbooks/forgejo-registry-setup.md
Normal file
163
docs/runbooks/forgejo-registry-setup.md
Normal file
|
|
@ -0,0 +1,163 @@
|
|||
# Runbook: Forgejo OCI registry — initial setup
|
||||
|
||||
Last updated: 2026-05-07
|
||||
|
||||
This runbook covers the **one-time** bootstrap of Forgejo's container
|
||||
registry, executed during Phase 0 of the registry consolidation plan
|
||||
(`docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md`).
|
||||
|
||||
After this runbook is complete, the Forgejo OCI registry at
|
||||
`forgejo.viktorbarzin.me` accepts pushes from CI and pulls from the
|
||||
cluster, with retention and integrity monitoring in place.
|
||||
|
||||
## Order of operations
|
||||
|
||||
The Terraform stacks reference Vault keys that don't exist on a fresh
|
||||
cluster. Create the keys **before** running `scripts/tg apply`.
|
||||
|
||||
1. Apply the resource bumps (memory, PVC, ingress body size,
|
||||
packages env vars) — these don't depend on the new Vault keys.
|
||||
2. Create the service-account users + PATs in Forgejo.
|
||||
3. Push the PATs to Vault.
|
||||
4. Apply the rest of Phase 0 (registry-credentials extension,
|
||||
monitoring probe, retention CronJob).
|
||||
|
||||
### Step 1 — apply Forgejo deployment bumps
|
||||
|
||||
```bash
|
||||
cd infra/stacks/forgejo
|
||||
scripts/tg apply
|
||||
```
|
||||
|
||||
Wait for the new pod to come up at the bumped 1Gi memory request and
|
||||
the resized 15Gi PVC. Verify packages are enabled:
|
||||
|
||||
```bash
|
||||
kubectl exec -n forgejo deploy/forgejo -- forgejo manager flush-queues
|
||||
kubectl exec -n forgejo deploy/forgejo -- env | grep PACKAGES
|
||||
```
|
||||
|
||||
### Step 2 — create service-account users
|
||||
|
||||
`forgejo admin user create` is idempotent only with
|
||||
`--must-change-password=false`. Re-running it on an existing user
|
||||
errors out — that's fine; skip on rerun.
|
||||
|
||||
```bash
|
||||
# cluster-puller — read:package PAT for in-cluster pulls.
|
||||
kubectl exec -n forgejo deploy/forgejo -- \
|
||||
forgejo admin user create \
|
||||
--username cluster-puller \
|
||||
--email cluster-puller@viktorbarzin.me \
|
||||
--password "$(openssl rand -base64 24)" \
|
||||
--must-change-password=false
|
||||
|
||||
# ci-pusher — write:package PAT for CI dual-push, also reused as the
|
||||
# cleanup CronJob credential (write:package includes delete).
|
||||
kubectl exec -n forgejo deploy/forgejo -- \
|
||||
forgejo admin user create \
|
||||
--username ci-pusher \
|
||||
--email ci-pusher@viktorbarzin.me \
|
||||
--password "$(openssl rand -base64 24)" \
|
||||
--must-change-password=false
|
||||
```
|
||||
|
||||
The user passwords are throwaway — we only ever auth via PAT. Forgejo
|
||||
admin can reset them at any time from the Web UI.
|
||||
|
||||
### Step 3 — generate the PATs
|
||||
|
||||
PATs **must** be generated through the Web UI logged in as the
|
||||
respective user (the CLI doesn't expose token creation). To log in
|
||||
without OAuth (registration is disabled for everyone except `viktor`,
|
||||
the admin), use the per-user temporary password from step 2.
|
||||
|
||||
For each of `cluster-puller` and `ci-pusher`:
|
||||
|
||||
1. Sign out of `viktor`.
|
||||
2. Go to `https://forgejo.viktorbarzin.me/user/login` and sign in
|
||||
with the throwaway password.
|
||||
3. Settings → Applications → Generate new token.
|
||||
4. Name: `cluster-pull` / `ci-push`. **Expiration: never.**
|
||||
5. Scopes:
|
||||
- `cluster-puller`: `read:package`
|
||||
- `ci-pusher`: `write:package` (covers read+write+delete)
|
||||
6. Save the token shown on the next page — it is **not** displayed again.
|
||||
|
||||
For the cleanup CronJob, generate a third PAT on `ci-pusher`:
|
||||
|
||||
7. Repeat steps 4-6 with name `cleanup`, scope `write:package`.
|
||||
|
||||
### Step 4 — push PATs to Vault
|
||||
|
||||
```bash
|
||||
vault login -method=oidc
|
||||
|
||||
# Read-only, used by the cluster-wide registry-credentials Secret and
|
||||
# by the Forgejo integrity probe.
|
||||
vault kv patch secret/viktor \
|
||||
forgejo_pull_token=<paste cluster-puller PAT>
|
||||
|
||||
# Write+delete, used by the retention CronJob inside Forgejo's
|
||||
# namespace.
|
||||
vault kv patch secret/viktor \
|
||||
forgejo_cleanup_token=<paste ci-pusher cleanup PAT>
|
||||
|
||||
# Write, propagated by vault-woodpecker-sync to all Woodpecker repos.
|
||||
vault kv patch secret/ci/global \
|
||||
forgejo_user=ci-pusher \
|
||||
forgejo_push_token=<paste ci-pusher push PAT>
|
||||
```
|
||||
|
||||
### Step 5 — apply the rest of Phase 0
|
||||
|
||||
```bash
|
||||
# Registry credential Secret (now reads forgejo_pull_token).
|
||||
cd infra/stacks/kyverno && scripts/tg apply
|
||||
|
||||
# Monitoring probe + retention CronJob.
|
||||
cd infra/stacks/monitoring && scripts/tg apply
|
||||
cd infra/stacks/forgejo && scripts/tg apply
|
||||
|
||||
# Containerd hosts.toml on each existing k8s node — VM cloud-init
|
||||
# only fires on first boot.
|
||||
infra/scripts/setup-forgejo-containerd-mirror.sh
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# Login from a workstation with docker.
|
||||
echo "<ci-pusher PAT>" | docker login forgejo.viktorbarzin.me -u ci-pusher --password-stdin
|
||||
|
||||
# Push a smoketest image.
|
||||
docker pull alpine:3.20
|
||||
docker tag alpine:3.20 forgejo.viktorbarzin.me/viktor/smoketest:1
|
||||
docker push forgejo.viktorbarzin.me/viktor/smoketest:1
|
||||
|
||||
# Pull from a k8s node.
|
||||
ssh wizard@<node> sudo crictl pull forgejo.viktorbarzin.me/viktor/smoketest:1
|
||||
|
||||
# Confirm the cluster-wide Secret was synced into a fresh namespace.
|
||||
kubectl create namespace forgejo-smoketest
|
||||
kubectl get secret -n forgejo-smoketest registry-credentials \
|
||||
-o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq '.auths | keys'
|
||||
# Expect: ["10.0.20.10:5050", "forgejo.viktorbarzin.me",
|
||||
# "registry.viktorbarzin.me", "registry.viktorbarzin.me:5050"]
|
||||
kubectl delete namespace forgejo-smoketest
|
||||
|
||||
# Delete the smoketest package via API.
|
||||
curl -X DELETE -H "Authorization: token <ci-pusher cleanup PAT>" \
|
||||
https://forgejo.viktorbarzin.me/api/v1/packages/viktor/container/smoketest/1
|
||||
```
|
||||
|
||||
## When to revisit
|
||||
|
||||
- **PAT rotation**: PATs created here have no expiry by design. If a
|
||||
PAT leaks, regenerate via the Web UI and `vault kv patch` the new
|
||||
value into the same key — the next `terragrunt apply` will sync it
|
||||
to all consumers within minutes (Kyverno ClusterPolicy clones the
|
||||
Secret, vault-woodpecker-sync runs every 6h).
|
||||
- **New service account**: if a future workload needs different
|
||||
scopes, add a parallel user/PAT here rather than expanding existing
|
||||
PAT scope. Principle of least privilege.
|
||||
|
|
@ -1,12 +1,30 @@
|
|||
# Runbook: Registry VM (docker-registry, 10.0.20.10)
|
||||
|
||||
Last updated: 2026-04-19
|
||||
Last updated: 2026-05-07
|
||||
|
||||
The registry VM hosts `registry.viktorbarzin.me` (private Docker
|
||||
registry, htpasswd-auth, NGINX → registry:2). It is an Ubuntu 24.04
|
||||
VM on the cluster LAN subnet `10.0.20.0/24`, with a static netplan
|
||||
config (no DHCP). Because it sits on a subnet that only has pfSense
|
||||
as its gateway, its DNS must be statically configured.
|
||||
The registry VM is an Ubuntu 24.04 VM on the cluster LAN subnet
|
||||
`10.0.20.0/24`, with a static netplan config (no DHCP). Because it
|
||||
sits on a subnet that only has pfSense as its gateway, its DNS must
|
||||
be statically configured.
|
||||
|
||||
**As of Phase 4 of forgejo-registry-consolidation 2026-05-07** the VM
|
||||
no longer hosts the private R/W registry. It hosts pull-through
|
||||
caches only:
|
||||
|
||||
| Port | Upstream |
|
||||
|---|---|
|
||||
| 5000 | docker.io (Docker Hub) — auth via dockerhub_registry_password |
|
||||
| 5010 | ghcr.io |
|
||||
| 5020 | quay.io |
|
||||
| 5030 | registry.k8s.io |
|
||||
| 5040 | reg.kyverno.io |
|
||||
|
||||
The decommissioned private registry (port 5050) is now hosted on
|
||||
Forgejo at `forgejo.viktorbarzin.me/viktor/<image>`. See
|
||||
`docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md` for the
|
||||
migration. Break-glass tarballs of `infra-ci` are still produced on
|
||||
each build to `/opt/registry/data/private/_breakglass/` — see
|
||||
`docs/runbooks/forgejo-registry-breakglass.md`.
|
||||
|
||||
## DNS configuration
|
||||
|
||||
|
|
|
|||
73
docs/runbooks/woodpecker-onboard-forgejo-repo.md
Normal file
73
docs/runbooks/woodpecker-onboard-forgejo-repo.md
Normal file
|
|
@ -0,0 +1,73 @@
|
|||
# Runbook: Onboarding a new Forgejo repo to Woodpecker
|
||||
|
||||
Last updated: 2026-05-07
|
||||
|
||||
When you create a new repo on `forgejo.viktorbarzin.me`, Woodpecker
|
||||
does NOT auto-discover it via the cluster's existing OAuth session.
|
||||
The `forgejo` user inside Woodpecker (Forgejo-OAuth'd) needs to:
|
||||
|
||||
1. Open `https://ci.viktorbarzin.me/` in a browser.
|
||||
2. Log in via Forgejo OAuth (the "Sign in with Forgejo" button).
|
||||
3. Click "Add Repository" — your new repo should appear.
|
||||
4. Click the toggle to activate it. Woodpecker will:
|
||||
- Add a webhook on the Forgejo repo (push, PR, release events).
|
||||
- Register the repo's `forge_remote_id` in its DB so subsequent
|
||||
hooks deserialize correctly.
|
||||
5. Push a commit (or hit "Run pipeline" in Woodpecker UI) — first
|
||||
build fires.
|
||||
|
||||
## Why API-only doesn't work
|
||||
|
||||
The webhook URL contains a JWT signed with a per-server key that's
|
||||
stored in the DB and only accessible at OAuth-flow time. POST'ing
|
||||
`/api/repos` as the admin (`ViktorBarzin` GitHub user) returns 500
|
||||
because the lookup queries forge-side OAuth state for THAT user,
|
||||
which doesn't exist for the Forgejo `viktor` user. We confirmed:
|
||||
|
||||
- Direct `POST /api/repos?forge_remote_id=N` → HTTP 500 server-side.
|
||||
- Generating a JWT with the agent secret → "token is unverifiable"
|
||||
on hook delivery (the signing key is repo-specific, not the
|
||||
global agent secret).
|
||||
|
||||
There's no admin endpoint that side-steps the OAuth flow.
|
||||
|
||||
## Bootstrap when UI access isn't available
|
||||
|
||||
If you absolutely need to bootstrap a new image without UI access
|
||||
(e.g., during an outage), the workaround is:
|
||||
|
||||
1. Build locally:
|
||||
```bash
|
||||
docker build -t forgejo.viktorbarzin.me/viktor/<name>:<tag> /path/to/source
|
||||
docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
|
||||
```
|
||||
2. Or pull from another already-built source and retag:
|
||||
```bash
|
||||
docker pull viktorbarzin/<name>:<tag> # DockerHub
|
||||
docker tag viktorbarzin/<name>:<tag> forgejo.viktorbarzin.me/viktor/<name>:<tag>
|
||||
docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
|
||||
```
|
||||
3. Flip the cluster `image=` reference and restart deployments.
|
||||
|
||||
Document the bootstrap in the relevant stack so future maintainers
|
||||
know the image was put there by hand. After Woodpecker UI onboarding,
|
||||
the next pipeline run replaces the bootstrap image with a CI-built one.
|
||||
|
||||
## Repos onboarded in flight 2026-05-07
|
||||
|
||||
These were created during the forgejo-registry-consolidation but the
|
||||
UI step above hasn't been done yet — their `.woodpecker.yml` /
|
||||
`.woodpecker/build.yml` exists on Forgejo but no pipeline fires:
|
||||
|
||||
- `viktor/broker-sync` — image bootstrapped via DockerHub (see
|
||||
`infra/stacks/wealthfolio/main.tf` comment).
|
||||
- `viktor/fire-planner` — image bootstrapped via local docker build.
|
||||
- `viktor/hmrc-sync`
|
||||
- `viktor/freedify`
|
||||
- `viktor/claude-agent-service`
|
||||
- `viktor/beadboard` — image bootstrapped via local docker build.
|
||||
- `viktor/claude-memory-mcp`
|
||||
|
||||
Walk through each in the Woodpecker UI to enable. Pipelines for
|
||||
already-onboarded repos (payslip-ingest, job-hunter, infra) fired
|
||||
correctly after the v3.13 → v3.14 upgrade.
|
||||
|
|
@ -89,35 +89,26 @@ services:
|
|||
retries: 3
|
||||
start_period: 10s
|
||||
|
||||
registry-private:
|
||||
image: registry:2.8.3
|
||||
container_name: registry-private
|
||||
restart: always
|
||||
volumes:
|
||||
- /opt/registry/data/private:/var/lib/registry
|
||||
- /opt/registry/config-private.yml:/etc/docker/registry/config.yml:ro
|
||||
- /opt/registry/htpasswd:/auth/htpasswd:ro
|
||||
networks:
|
||||
- registry
|
||||
healthcheck:
|
||||
# 401 is expected (auth required) — any HTTP response means the registry is healthy
|
||||
test: ["CMD", "sh", "-c", "wget -qS -O /dev/null http://127.0.0.1:5000/v2/ 2>&1 | grep -q 'HTTP/'"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
# registry-private decommissioned in Phase 4 of
|
||||
# forgejo-registry-consolidation 2026-05-07 — image migration completed,
|
||||
# cluster flipped to forgejo.viktorbarzin.me/viktor/<image>. The remaining
|
||||
# five services on this VM are pull-through caches for upstream registries.
|
||||
# After 1 week of no incidents, `rm -rf /opt/registry/data/private/` on the
|
||||
# VM frees ~2.6 GB. The tarball break-glass under
|
||||
# /opt/registry/data/private/_breakglass/ stays — it's how we recover
|
||||
# infra-ci if Forgejo ever goes fully down.
|
||||
|
||||
nginx:
|
||||
image: nginx:alpine
|
||||
container_name: registry-nginx
|
||||
restart: always
|
||||
# 5050 dropped Phase 4 of forgejo-registry-consolidation 2026-05-07.
|
||||
ports:
|
||||
- "5000:5000"
|
||||
- "5010:5010"
|
||||
- "5020:5020"
|
||||
- "5030:5030"
|
||||
- "5040:5040"
|
||||
- "5050:5050"
|
||||
volumes:
|
||||
- /opt/registry/nginx.conf:/etc/nginx/nginx.conf:ro
|
||||
- /opt/registry/tls:/etc/nginx/tls:ro
|
||||
|
|
@ -135,8 +126,6 @@ services:
|
|||
condition: service_healthy
|
||||
registry-kyverno:
|
||||
condition: service_healthy
|
||||
registry-private:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "sh", "-c", "wget -qO- http://127.0.0.1:5000/v2/ >/dev/null 2>&1"]
|
||||
interval: 30s
|
||||
|
|
|
|||
|
|
@ -33,10 +33,9 @@ http {
|
|||
keepalive 32;
|
||||
}
|
||||
|
||||
upstream private {
|
||||
server registry-private:5000;
|
||||
keepalive 32;
|
||||
}
|
||||
# `upstream private` removed in Phase 4 of forgejo-registry-consolidation
|
||||
# 2026-05-07. The /v2/ private registry is now Forgejo at
|
||||
# forgejo.viktorbarzin.me/viktor/.
|
||||
|
||||
# --- Docker Hub (port 5000) ---
|
||||
|
||||
|
|
@ -168,37 +167,8 @@ http {
|
|||
}
|
||||
}
|
||||
|
||||
# --- Private R/W Registry (port 5050, TLS) ---
|
||||
|
||||
server {
|
||||
listen 5050 ssl;
|
||||
server_name registry.viktorbarzin.me;
|
||||
|
||||
ssl_certificate /etc/nginx/tls/fullchain.pem;
|
||||
ssl_certificate_key /etc/nginx/tls/privkey.pem;
|
||||
ssl_protocols TLSv1.2 TLSv1.3;
|
||||
|
||||
client_max_body_size 0;
|
||||
proxy_request_buffering off;
|
||||
proxy_buffering off;
|
||||
chunked_transfer_encoding on;
|
||||
|
||||
location /v2/ {
|
||||
proxy_pass http://private;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Host $http_host;
|
||||
proxy_set_header Connection "";
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
proxy_read_timeout 900;
|
||||
proxy_send_timeout 900;
|
||||
}
|
||||
|
||||
location / {
|
||||
return 200 'ok';
|
||||
add_header Content-Type text/plain;
|
||||
}
|
||||
}
|
||||
# --- Private R/W Registry (port 5050) decommissioned Phase 4 2026-05-07 ---
|
||||
# The TLS port 5050 server block previously fronted `registry-private`.
|
||||
# Migrated to Forgejo at forgejo.viktorbarzin.me/viktor/. Both
|
||||
# docker-compose.yml and this nginx config no longer reference port 5050.
|
||||
}
|
||||
|
|
|
|||
|
|
@ -40,8 +40,9 @@ variable "ingress_path" {
|
|||
default = ["/"]
|
||||
}
|
||||
variable "max_body_size" {
|
||||
type = string
|
||||
default = "50m"
|
||||
type = string
|
||||
default = null
|
||||
description = "Maximum request body size, e.g. '5g'. null = no limit (Traefik default). When set, a per-ingress Buffering middleware is created and attached."
|
||||
}
|
||||
variable "extra_annotations" {
|
||||
default = {}
|
||||
|
|
@ -203,6 +204,17 @@ locals {
|
|||
"gethomepage.dev/href" = "https://${local.effective_host}"
|
||||
"gethomepage.dev/icon" = "${replace(var.name, "-", "")}.png"
|
||||
} : {}
|
||||
|
||||
# Parse "5g"/"50m"/"1024k"/"42" into bytes. Traefik's Buffering middleware
|
||||
# takes maxRequestBodyBytes as an integer. Empty unit = bytes.
|
||||
body_size_match = var.max_body_size == null ? null : regex("^([0-9]+)([kmgKMG]?)$", var.max_body_size)
|
||||
body_size_unit_multiplier = var.max_body_size == null ? 0 : (
|
||||
lower(local.body_size_match[1]) == "g" ? 1073741824 :
|
||||
lower(local.body_size_match[1]) == "m" ? 1048576 :
|
||||
lower(local.body_size_match[1]) == "k" ? 1024 :
|
||||
1
|
||||
)
|
||||
max_body_size_bytes = var.max_body_size == null ? 0 : tonumber(local.body_size_match[0]) * local.body_size_unit_multiplier
|
||||
}
|
||||
|
||||
|
||||
|
|
@ -245,6 +257,7 @@ resource "kubernetes_ingress_v1" "proxied-ingress" {
|
|||
var.protected ? "traefik-authentik-forward-auth@kubernetescrd" : null,
|
||||
var.allow_local_access_only ? "traefik-local-only@kubernetescrd" : null,
|
||||
var.custom_content_security_policy != null ? "${var.namespace}-custom-csp-${var.name}@kubernetescrd" : null,
|
||||
var.max_body_size != null ? "${var.namespace}-buffering-${var.name}@kubernetescrd" : null,
|
||||
], var.extra_middlewares)))
|
||||
"traefik.ingress.kubernetes.io/router.entrypoints" = "websecure"
|
||||
}, local.homepage_defaults, var.extra_annotations,
|
||||
|
|
@ -302,6 +315,27 @@ resource "kubernetes_manifest" "custom_csp" {
|
|||
}
|
||||
}
|
||||
|
||||
# Buffering middleware - created per service when max_body_size is set.
|
||||
# Traefik default is unlimited; setting maxRequestBodyBytes enforces a limit
|
||||
# (e.g. Forgejo container pushes can ship multi-GB layer blobs).
|
||||
resource "kubernetes_manifest" "buffering" {
|
||||
count = var.max_body_size != null ? 1 : 0
|
||||
|
||||
manifest = {
|
||||
apiVersion = "traefik.io/v1alpha1"
|
||||
kind = "Middleware"
|
||||
metadata = {
|
||||
name = "buffering-${var.name}"
|
||||
namespace = var.namespace
|
||||
}
|
||||
spec = {
|
||||
buffering = {
|
||||
maxRequestBodyBytes = local.max_body_size_bytes
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Cloudflare DNS records — created automatically when dns_type is set.
|
||||
# Proxied: CNAME to Cloudflare tunnel. Non-proxied: A + AAAA to public IP.
|
||||
resource "cloudflare_record" "proxied" {
|
||||
|
|
|
|||
76
scripts/forgejo-migrate-orphan-images.sh
Executable file
76
scripts/forgejo-migrate-orphan-images.sh
Executable file
|
|
@ -0,0 +1,76 @@
|
|||
#!/usr/bin/env bash
|
||||
# One-shot migration of every private image on registry.viktorbarzin.me to
|
||||
# Forgejo. Used as a stop-gap when the dual-push CI pipelines aren't
|
||||
# producing Forgejo images on their own (Forgejo-Woodpecker forge driver
|
||||
# context-deadline-exceeded issue, see bd code-d3y / 2026-05-07).
|
||||
#
|
||||
# Pulls each image from registry.viktorbarzin.me, retags, pushes to
|
||||
# forgejo.viktorbarzin.me/viktor/<name>:<tag> — preserving the blob bytes
|
||||
# verbatim so the cluster can flip image= without a rebuild.
|
||||
#
|
||||
# Run from any host with docker + network reach to BOTH registries. Auth
|
||||
# from `docker login` (~/.docker/config.json) — make sure both registries
|
||||
# are logged in:
|
||||
# docker login registry.viktorbarzin.me -u viktorbarzin
|
||||
# docker login forgejo.viktorbarzin.me -u viktor # use viktor PAT, not ci-pusher
|
||||
#
|
||||
# (ci-pusher CANNOT push to viktor/<image> — Forgejo container packages
|
||||
# are scoped to the pushing user. Only viktor's PAT can write to viktor/*.)
|
||||
#
|
||||
# After the script, the new image lives at
|
||||
# forgejo.viktorbarzin.me/viktor/<name>:<tag>
|
||||
# Phase 3 of the consolidation flips infra/stacks/<svc>/main.tf image=
|
||||
# to that path.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
OLD_REG=registry.viktorbarzin.me
|
||||
NEW_REG=forgejo.viktorbarzin.me/viktor
|
||||
|
||||
# Image list: <name>:<tag>. Generated 2026-05-07 from `grep -rEn 'image\s*=\s*
|
||||
# "registry\.viktorbarzin\.me'` across infra/stacks/.
|
||||
#
|
||||
# Excluded:
|
||||
# - wealthfolio-sync: registry repo exists but has 0 tags (CronJob has been
|
||||
# broken for 36+ days, separate decision needed). User to triage before
|
||||
# migration.
|
||||
# - fire-planner: registry repo exists but has 0 tags. Dockerfile + CI added
|
||||
# in this session (commit 8b53d99e); rebuild via Woodpecker before flipping.
|
||||
IMAGES=(
|
||||
"chrome-service-novnc:v4"
|
||||
"chrome-service-novnc:latest"
|
||||
"payslip-ingest:latest"
|
||||
"job-hunter:latest"
|
||||
"claude-agent-service:latest"
|
||||
"freedify:latest"
|
||||
"beadboard:latest"
|
||||
"infra-ci:latest"
|
||||
)
|
||||
|
||||
for img in "${IMAGES[@]}"; do
|
||||
echo "=== $img ==="
|
||||
src="$OLD_REG/$img"
|
||||
dst="$NEW_REG/$img"
|
||||
|
||||
if ! docker pull "$src" 2>&1 | tee /tmp/pull-$$ | grep -q 'Status: '; then
|
||||
if grep -q 'not found' /tmp/pull-$$; then
|
||||
echo " SKIP — image not present in source registry"
|
||||
rm -f /tmp/pull-$$
|
||||
continue
|
||||
fi
|
||||
fi
|
||||
rm -f /tmp/pull-$$
|
||||
|
||||
echo " tag → $dst"
|
||||
docker tag "$src" "$dst"
|
||||
|
||||
echo " push $dst"
|
||||
docker push "$dst" 2>&1 | tail -2
|
||||
|
||||
echo " cleanup local copy"
|
||||
docker rmi "$src" "$dst" 2>&1 | tail -1 || true
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "Done. Verify in Forgejo Web UI: https://forgejo.viktorbarzin.me/viktor/-/packages?type=container"
|
||||
echo "Phase 3 of the plan flips infra/stacks/{wealthfolio,fire-planner}/main.tf image= references."
|
||||
59
scripts/setup-forgejo-containerd-mirror.sh
Executable file
59
scripts/setup-forgejo-containerd-mirror.sh
Executable file
|
|
@ -0,0 +1,59 @@
|
|||
#!/usr/bin/env bash
|
||||
# One-shot deployment of the forgejo.viktorbarzin.me containerd hosts.toml
|
||||
# entry across every k8s node. Cloud-init only fires on VM provision, so
|
||||
# existing nodes need this manual rollout.
|
||||
#
|
||||
# What it does, per node:
|
||||
# 1. drain (ignore-daemonsets, delete-emptydir-data)
|
||||
# 2. ssh in: mkdir + write /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
|
||||
# 3. systemctl restart containerd
|
||||
# 4. uncordon
|
||||
#
|
||||
# hosts.toml is documented as hot-reloaded but the post-2026-04-19
|
||||
# containerd corruption playbook calls for an explicit restart so the
|
||||
# config is unambiguously in effect. Running drain/uncordon around it
|
||||
# avoids pulling against an in-flight containerd restart.
|
||||
#
|
||||
# Re-run is safe: writes are idempotent.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
CERTS_DIR=/etc/containerd/certs.d/forgejo.viktorbarzin.me
|
||||
HOSTS_TOML='server = "https://forgejo.viktorbarzin.me"
|
||||
|
||||
[host."https://10.0.20.200"]
|
||||
capabilities = ["pull", "resolve"]
|
||||
'
|
||||
|
||||
NODES=$(kubectl get nodes -o name | sed 's|^node/||')
|
||||
if [[ -z "$NODES" ]]; then
|
||||
echo "ERROR: no nodes returned from kubectl get nodes" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
for n in $NODES; do
|
||||
echo "=== $n ==="
|
||||
kubectl drain "$n" --ignore-daemonsets --delete-emptydir-data --force --grace-period=60
|
||||
|
||||
ssh -o StrictHostKeyChecking=accept-new "wizard@$n" sudo bash <<EOF
|
||||
set -euo pipefail
|
||||
mkdir -p "$CERTS_DIR"
|
||||
cat > "$CERTS_DIR/hosts.toml" <<'TOML'
|
||||
$HOSTS_TOML
|
||||
TOML
|
||||
systemctl restart containerd
|
||||
EOF
|
||||
|
||||
kubectl uncordon "$n"
|
||||
|
||||
# Wait for the node to report Ready before moving to the next one.
|
||||
for i in {1..30}; do
|
||||
if kubectl get node "$n" -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' | grep -q True; then
|
||||
echo " node Ready"
|
||||
break
|
||||
fi
|
||||
sleep 2
|
||||
done
|
||||
done
|
||||
|
||||
echo "All nodes updated."
|
||||
|
|
@ -567,7 +567,8 @@ resource "kubernetes_deployment" "beadboard" {
|
|||
|
||||
container {
|
||||
name = "beadboard"
|
||||
image = "registry.viktorbarzin.me:5050/beadboard:${var.beadboard_image_tag}"
|
||||
# Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
|
||||
image = "forgejo.viktorbarzin.me/viktor/beadboard:${var.beadboard_image_tag}"
|
||||
|
||||
port {
|
||||
name = "http"
|
||||
|
|
@ -725,7 +726,8 @@ resource "kubernetes_config_map" "beads_metadata" {
|
|||
}
|
||||
|
||||
locals {
|
||||
claude_agent_service_image = "registry.viktorbarzin.me/claude-agent-service:${var.claude_agent_service_image_tag}"
|
||||
# Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
|
||||
claude_agent_service_image = "forgejo.viktorbarzin.me/viktor/claude-agent-service:${var.claude_agent_service_image_tag}"
|
||||
beadboard_internal_url = "http://${kubernetes_service.beadboard.metadata[0].name}.${kubernetes_namespace.beads.metadata[0].name}.svc.cluster.local"
|
||||
|
||||
beads_script_prelude = <<-EOT
|
||||
|
|
|
|||
90
stacks/chrome-service/README.md
Normal file
90
stacks/chrome-service/README.md
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
# chrome-service
|
||||
|
||||
In-cluster headed Chromium exposed over Playwright's WebSocket protocol.
|
||||
Sibling services drive it instead of running their own in-process browser
|
||||
— useful when the upstream tries to detect headless mode (e.g. hmembeds'
|
||||
`disable-devtool.js` redirect-to-google trap).
|
||||
|
||||
## Connect
|
||||
|
||||
```python
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
WS_URL = "ws://chrome-service.chrome-service.svc.cluster.local:3000"
|
||||
WS_TOKEN = os.environ["CHROME_WS_TOKEN"] # 32-byte URL-safe random
|
||||
|
||||
async with async_playwright() as p:
|
||||
browser = await p.chromium.connect(f"{WS_URL}/{WS_TOKEN}", timeout=15_000)
|
||||
context = await browser.new_context()
|
||||
await context.add_init_script(STEALTH_JS) # see files/stealth.js
|
||||
page = await context.new_page()
|
||||
...
|
||||
await browser.close()
|
||||
```
|
||||
|
||||
The token comes from Vault KV `secret/chrome-service.api_bearer_token`,
|
||||
which ESO syncs into a per-namespace K8s Secret in each caller stack
|
||||
(see f1-stream's `chrome-service-client-secrets`).
|
||||
|
||||
## Add a new caller
|
||||
|
||||
1. **Label the caller's namespace** so the chrome-service NetworkPolicy
|
||||
admits it:
|
||||
```hcl
|
||||
resource "kubernetes_namespace" "<ns>" {
|
||||
metadata {
|
||||
labels = {
|
||||
"chrome-service.viktorbarzin.me/client" = "true"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
2. **Add an ExternalSecret** in the caller stack pulling the token:
|
||||
```hcl
|
||||
resource "kubernetes_manifest" "chrome_token" {
|
||||
manifest = {
|
||||
apiVersion = "external-secrets.io/v1beta1"
|
||||
kind = "ExternalSecret"
|
||||
metadata = { name = "chrome-service-client-secrets", namespace = "<ns>" }
|
||||
spec = {
|
||||
refreshInterval = "15m"
|
||||
secretStoreRef = { name = "vault-kv", kind = "ClusterSecretStore" }
|
||||
target = { name = "chrome-service-client-secrets" }
|
||||
dataFrom = [{ extract = { key = "chrome-service" } }]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
3. **Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN`** into the caller's pod env.
|
||||
Use `secret_key_ref` for the token; the URL is a plain value.
|
||||
4. **Vendor `stealth.js`** into the caller (or just paste — it's ~40 lines)
|
||||
and apply via `await context.add_init_script(STEALTH_JS)` after every
|
||||
`new_context()`. Without it, hmembeds-class anti-bot still trips.
|
||||
|
||||
## Image pin
|
||||
|
||||
Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
||||
`main.tf`) and the client (`playwright==1.48.0` in callers' requirements)
|
||||
must match minor-versions. Bump in lockstep — Playwright protocol changes
|
||||
between minors.
|
||||
|
||||
## Operations
|
||||
|
||||
- **Storage**: encrypted PVC at `/profile` for cookies + npm cache. Ephemeral
|
||||
contexts (`browser.new_context()`) bypass the profile; persistent contexts
|
||||
share it. Backed up tar+gzip every 6h to `/srv/nfs/chrome-service-backup/`,
|
||||
30-day retention.
|
||||
- **Probes**: TCP/3000. Playwright run-server has no HTTP `/health`; a TCP
|
||||
open is the only liveness signal available without spinning a browser.
|
||||
- **Health page**: visit `https://chrome.viktorbarzin.me` (Authentik-gated)
|
||||
to confirm the pod is up. The WS port stays internal-only.
|
||||
- **Token rotation**: `vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
|
||||
Reloader cascades the rotation to both the server pod and any caller
|
||||
whose secret has the `reloader.stakater.com/auto = "true"` annotation.
|
||||
|
||||
## Why headed (Xvfb) instead of headless?
|
||||
|
||||
`disable-devtool.js` and similar libraries detect `navigator.webdriver`,
|
||||
console-clear timing, and the `HeadlessChromium/...` user-agent suffix.
|
||||
Running headed inside `Xvfb :99` reports as a normal Chromium, and the
|
||||
stealth init script handles the JS-visible giveaways.
|
||||
19
stacks/chrome-service/files/novnc/Dockerfile
Normal file
19
stacks/chrome-service/files/novnc/Dockerfile
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
FROM docker.io/library/ubuntu:24.04
|
||||
|
||||
RUN apt-get update \
|
||||
&& apt-get install -y --no-install-recommends \
|
||||
x11vnc \
|
||||
novnc \
|
||||
websockify \
|
||||
ca-certificates \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# noVNC ships /usr/share/novnc/vnc.html; alias to index.html so / works.
|
||||
RUN ln -sf /usr/share/novnc/vnc.html /usr/share/novnc/index.html
|
||||
|
||||
EXPOSE 6080
|
||||
|
||||
COPY entrypoint.sh /entrypoint.sh
|
||||
RUN chmod +x /entrypoint.sh
|
||||
|
||||
CMD ["/entrypoint.sh"]
|
||||
39
stacks/chrome-service/files/novnc/entrypoint.sh
Normal file
39
stacks/chrome-service/files/novnc/entrypoint.sh
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
#!/usr/bin/env bash
|
||||
# Connect to the chrome-service container's Xvfb (shared pod network, TCP)
|
||||
# and serve the noVNC HTML5 client + websockify bridge on :6080.
|
||||
set -e
|
||||
|
||||
for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15; do
|
||||
if echo > /dev/tcp/127.0.0.1/6099 2>/dev/null; then
|
||||
echo "Xvfb TCP up after attempt $i"
|
||||
break
|
||||
fi
|
||||
echo "waiting for Xvfb TCP 6099 attempt=$i"
|
||||
sleep 2
|
||||
done
|
||||
|
||||
# websockify runs as PID 1; x11vnc is a child so its logs land on container stdout
|
||||
# `-noshm` skips MIT-SHM probes that fail across container boundaries (each
|
||||
# container has its own /dev/shm); `-noxdamage` skips XDAMAGE which Xvfb
|
||||
# doesn't expose; `-quiet` keeps the polling chatter out of pod logs.
|
||||
echo "starting x11vnc -> :5900"
|
||||
x11vnc -display localhost:99 -nopw -listen 0.0.0.0 -rfbport 5900 \
|
||||
-forever -shared -noshm -noxdamage -quiet 2>&1 &
|
||||
X11VNC_PID=$!
|
||||
|
||||
for i in 1 2 3 4 5 6 7 8 9 10; do
|
||||
if echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
|
||||
echo "x11vnc bound 5900 after attempt $i"
|
||||
break
|
||||
fi
|
||||
echo "waiting for x11vnc :5900 attempt=$i"
|
||||
sleep 2
|
||||
done
|
||||
|
||||
if ! echo > /dev/tcp/127.0.0.1/5900 2>/dev/null; then
|
||||
echo "ERROR: x11vnc did not bind 5900"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "starting websockify -> :6080"
|
||||
exec websockify --web=/usr/share/novnc 6080 localhost:5900
|
||||
54
stacks/chrome-service/files/stealth.js
Normal file
54
stacks/chrome-service/files/stealth.js
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
// Minimal stealth init script for Playwright-driven Chromium.
|
||||
// Vendored from puppeteer-extra-plugin-stealth/evasions/* (MIT) — covers:
|
||||
// webdriver, chrome.runtime, navigator.plugins, navigator.languages,
|
||||
// Permissions.query, WebGL getParameter (vendor + renderer spoof).
|
||||
// Run via context.add_init_script() so it executes before any page script.
|
||||
(() => {
|
||||
// navigator.webdriver — most common detection, removed entirely.
|
||||
Object.defineProperty(Navigator.prototype, 'webdriver', { get: () => undefined });
|
||||
|
||||
// window.chrome.runtime — many sites check that real Chrome exposes this.
|
||||
if (!window.chrome) window.chrome = {};
|
||||
window.chrome.runtime = window.chrome.runtime || {};
|
||||
|
||||
// navigator.plugins — headless reports zero; spoof a plausible PDF viewer.
|
||||
Object.defineProperty(navigator, 'plugins', {
|
||||
get: () => [{ name: 'Chrome PDF Plugin' }, { name: 'Chrome PDF Viewer' }, { name: 'Native Client' }],
|
||||
});
|
||||
|
||||
// navigator.languages — headless returns empty array.
|
||||
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
|
||||
|
||||
// Permissions.query — headless returns 'denied' for notifications instead of 'default'.
|
||||
const origQuery = window.navigator.permissions && window.navigator.permissions.query;
|
||||
if (origQuery) {
|
||||
window.navigator.permissions.query = (parameters) =>
|
||||
parameters && parameters.name === 'notifications'
|
||||
? Promise.resolve({ state: Notification.permission })
|
||||
: origQuery(parameters);
|
||||
}
|
||||
|
||||
// WebGL getParameter — spoof vendor + renderer strings to a real GPU.
|
||||
const spoofGl = (proto) => {
|
||||
if (!proto) return;
|
||||
const orig = proto.getParameter;
|
||||
proto.getParameter = function (parameter) {
|
||||
if (parameter === 37445) return 'Intel Inc.'; // UNMASKED_VENDOR_WEBGL
|
||||
if (parameter === 37446) return 'Intel Iris OpenGL Engine'; // UNMASKED_RENDERER_WEBGL
|
||||
return orig.apply(this, arguments);
|
||||
};
|
||||
};
|
||||
spoofGl(window.WebGLRenderingContext && window.WebGLRenderingContext.prototype);
|
||||
spoofGl(window.WebGL2RenderingContext && window.WebGL2RenderingContext.prototype);
|
||||
|
||||
// disable-devtool.js (theajack/disable-devtool) auto-inits via a script
|
||||
// tag with `disable-devtool-auto`. Its Performance detector trips under
|
||||
// Playwright (CDP adds console.log latency vs console.table) and the
|
||||
// redirect URL is hard-coded — for hmembeds that's google.com.
|
||||
// Hide the auto-init marker so the library's IIFE exits early.
|
||||
const origQS = Document.prototype.querySelector;
|
||||
Document.prototype.querySelector = function (sel) {
|
||||
if (typeof sel === 'string' && sel.indexOf('disable-devtool-auto') !== -1) return null;
|
||||
return origQS.apply(this, arguments);
|
||||
};
|
||||
})();
|
||||
504
stacks/chrome-service/main.tf
Normal file
504
stacks/chrome-service/main.tf
Normal file
|
|
@ -0,0 +1,504 @@
|
|||
variable "tls_secret_name" {
|
||||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
variable "nfs_server" { type = string }
|
||||
|
||||
locals {
|
||||
namespace = "chrome-service"
|
||||
labels = {
|
||||
app = "chrome-service"
|
||||
}
|
||||
# Pin to the same Playwright minor that the Python client requires.
|
||||
# If you bump this image, also bump `playwright==X.Y.Z` in the client
|
||||
# (currently f1-stream) and re-run the connect smoke test.
|
||||
image = "mcr.microsoft.com/playwright:v1.48.0-noble"
|
||||
}
|
||||
|
||||
# --- Namespace ---
|
||||
|
||||
resource "kubernetes_namespace" "chrome_service" {
|
||||
metadata {
|
||||
name = local.namespace
|
||||
labels = {
|
||||
"istio-injection" = "disabled"
|
||||
tier = local.tiers.aux
|
||||
"chrome-service.viktorbarzin.me/server" = "true"
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
|
||||
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
||||
}
|
||||
}
|
||||
|
||||
# --- Secrets (single-key extract: api_bearer_token) ---
|
||||
|
||||
resource "kubernetes_manifest" "external_secret" {
|
||||
manifest = {
|
||||
apiVersion = "external-secrets.io/v1beta1"
|
||||
kind = "ExternalSecret"
|
||||
metadata = {
|
||||
name = "chrome-service-secrets"
|
||||
namespace = local.namespace
|
||||
}
|
||||
spec = {
|
||||
refreshInterval = "15m"
|
||||
secretStoreRef = {
|
||||
name = "vault-kv"
|
||||
kind = "ClusterSecretStore"
|
||||
}
|
||||
target = {
|
||||
name = "chrome-service-secrets"
|
||||
}
|
||||
dataFrom = [{
|
||||
extract = {
|
||||
key = "chrome-service"
|
||||
}
|
||||
}]
|
||||
}
|
||||
}
|
||||
depends_on = [kubernetes_namespace.chrome_service]
|
||||
}
|
||||
|
||||
# tls-secret for the chrome.viktorbarzin.me ingress is auto-cloned into
|
||||
# every namespace by Kyverno's `sync-tls-secret` ClusterPolicy — no local
|
||||
# module call needed.
|
||||
|
||||
# --- Encrypted profile PVC ---
|
||||
# Holds Chromium user data: cookies, localStorage, IndexedDB. Sites we
|
||||
# drive may set auth tokens or session cookies — encrypted is correct.
|
||||
resource "kubernetes_persistent_volume_claim" "profile_encrypted" {
|
||||
wait_until_bound = false
|
||||
metadata {
|
||||
name = "chrome-service-profile-encrypted"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
annotations = {
|
||||
"resize.topolvm.io/threshold" = "80%"
|
||||
"resize.topolvm.io/increase" = "100%"
|
||||
"resize.topolvm.io/storage_limit" = "10Gi"
|
||||
}
|
||||
}
|
||||
spec {
|
||||
access_modes = ["ReadWriteOnce"]
|
||||
storage_class_name = "proxmox-lvm-encrypted"
|
||||
resources {
|
||||
requests = {
|
||||
storage = "2Gi"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# --- NFS backup target ---
|
||||
module "nfs_chrome_service_backup_host" {
|
||||
source = "../../modules/kubernetes/nfs_volume"
|
||||
name = "chrome-service-backup-host"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
nfs_server = "192.168.1.127"
|
||||
nfs_path = "/srv/nfs/chrome-service-backup"
|
||||
}
|
||||
|
||||
# --- Deployment ---
|
||||
|
||||
resource "kubernetes_deployment" "chrome_service" {
|
||||
metadata {
|
||||
name = "chrome-service"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
labels = merge(local.labels, {
|
||||
tier = local.tiers.aux
|
||||
})
|
||||
annotations = {
|
||||
"reloader.stakater.com/auto" = "true"
|
||||
}
|
||||
}
|
||||
spec {
|
||||
replicas = 1
|
||||
strategy {
|
||||
type = "Recreate"
|
||||
}
|
||||
selector {
|
||||
match_labels = local.labels
|
||||
}
|
||||
template {
|
||||
metadata {
|
||||
labels = local.labels
|
||||
}
|
||||
spec {
|
||||
# The noVNC sidecar pulls from registry.viktorbarzin.me which needs
|
||||
# auth. Kyverno's `sync-registry-credentials` ClusterPolicy syncs
|
||||
# the secret into every namespace.
|
||||
image_pull_secrets {
|
||||
name = "registry-credentials"
|
||||
}
|
||||
security_context {
|
||||
run_as_user = 1000
|
||||
run_as_group = 1000
|
||||
fs_group = 1000
|
||||
seccomp_profile {
|
||||
type = "RuntimeDefault"
|
||||
}
|
||||
}
|
||||
|
||||
# Fix profile dir ownership (PVC may have root-owned files from prior run).
|
||||
init_container {
|
||||
name = "fix-perms"
|
||||
image = "busybox:1.37"
|
||||
command = ["sh", "-c", "chown -R 1000:1000 /profile"]
|
||||
security_context {
|
||||
run_as_user = 0
|
||||
}
|
||||
volume_mount {
|
||||
name = "profile"
|
||||
mount_path = "/profile"
|
||||
}
|
||||
resources {
|
||||
requests = { memory = "32Mi" }
|
||||
limits = { memory = "64Mi" }
|
||||
}
|
||||
}
|
||||
|
||||
container {
|
||||
name = "chrome-service"
|
||||
image = local.image
|
||||
image_pull_policy = "IfNotPresent"
|
||||
|
||||
# `launch-server` (not `run-server`) lets us pin headed mode +
|
||||
# specific args. `run-server` defaults to headless, which the
|
||||
# disable-devtool.js Performance detector trips under Playwright
|
||||
# (CDP adds latency to console.log; lib detects + redirects).
|
||||
# The Microsoft image ships only the browsers, not the playwright
|
||||
# npm package itself — `npx -y playwright@<ver>` downloads it on
|
||||
# first start (cached under $HOME/.npm via the PVC) and pins to
|
||||
# the same minor as the Python client. Bump in lockstep.
|
||||
command = ["bash", "-c"]
|
||||
args = [
|
||||
<<-EOT
|
||||
set -e
|
||||
# `-listen tcp` enables localhost:6099 so the noVNC sidecar can
|
||||
# connect over the pod's shared network namespace (Ubuntu 24.04
|
||||
# defaults Xvfb to -nolisten tcp).
|
||||
# `-ac` disables X access control so the noVNC sidecar can
|
||||
# attach without an MIT-MAGIC-COOKIE; safe because Xvfb only
|
||||
# listens on localhost (pod's lo).
|
||||
Xvfb :99 -screen 0 1280x720x24 -listen tcp -ac &
|
||||
sleep 1
|
||||
cat > /tmp/launch.json <<JSON
|
||||
{
|
||||
"headless": false,
|
||||
"port": 3000,
|
||||
"host": "0.0.0.0",
|
||||
"wsPath": "/$${PW_TOKEN}",
|
||||
"args": [
|
||||
"--no-sandbox",
|
||||
"--disable-blink-features=AutomationControlled",
|
||||
"--disable-features=IsolateOrigins,site-per-process",
|
||||
"--autoplay-policy=no-user-gesture-required",
|
||||
"--disable-dev-shm-usage"
|
||||
]
|
||||
}
|
||||
JSON
|
||||
exec npx -y playwright@1.48.0 launch-server --browser chromium --config /tmp/launch.json
|
||||
EOT
|
||||
]
|
||||
|
||||
env {
|
||||
name = "DISPLAY"
|
||||
value = ":99"
|
||||
}
|
||||
env {
|
||||
name = "HOME"
|
||||
value = "/profile"
|
||||
}
|
||||
env {
|
||||
name = "PW_TOKEN"
|
||||
value_from {
|
||||
secret_key_ref {
|
||||
name = "chrome-service-secrets"
|
||||
key = "api_bearer_token"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
port {
|
||||
name = "ws"
|
||||
container_port = 3000
|
||||
protocol = "TCP"
|
||||
}
|
||||
|
||||
# Playwright run-server exposes only the WS endpoint; no /health.
|
||||
liveness_probe {
|
||||
tcp_socket { port = 3000 }
|
||||
initial_delay_seconds = 30
|
||||
period_seconds = 30
|
||||
failure_threshold = 3
|
||||
}
|
||||
readiness_probe {
|
||||
tcp_socket { port = 3000 }
|
||||
initial_delay_seconds = 10
|
||||
period_seconds = 10
|
||||
}
|
||||
startup_probe {
|
||||
tcp_socket { port = 3000 }
|
||||
period_seconds = 5
|
||||
failure_threshold = 24 # up to 2 minutes
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
name = "profile"
|
||||
mount_path = "/profile"
|
||||
}
|
||||
volume_mount {
|
||||
name = "dshm"
|
||||
mount_path = "/dev/shm"
|
||||
}
|
||||
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "200m"
|
||||
memory = "1500Mi"
|
||||
}
|
||||
limits = {
|
||||
memory = "2Gi"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# noVNC sidecar — exposes a live HTML5 view of the headed Chromium
|
||||
# session via x11vnc + websockify, gated by the Authentik-protected
|
||||
# ingress at chrome.viktorbarzin.me. WS port 3000 (the Playwright
|
||||
# endpoint) stays internal-only.
|
||||
container {
|
||||
name = "novnc"
|
||||
# Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
|
||||
image = "forgejo.viktorbarzin.me/viktor/chrome-service-novnc:v4"
|
||||
image_pull_policy = "IfNotPresent"
|
||||
port {
|
||||
name = "http"
|
||||
container_port = 6080
|
||||
protocol = "TCP"
|
||||
}
|
||||
# x11vnc connects to the chrome-service container's Xvfb over
|
||||
# localhost TCP (shared pod network). Same uid 1000 as chrome
|
||||
# container so we can read MIT-MAGIC-COOKIE if Xvfb adds one.
|
||||
resources {
|
||||
requests = { cpu = "10m", memory = "32Mi" }
|
||||
limits = { memory = "96Mi" }
|
||||
}
|
||||
}
|
||||
|
||||
volume {
|
||||
name = "profile"
|
||||
persistent_volume_claim {
|
||||
claim_name = kubernetes_persistent_volume_claim.profile_encrypted.metadata[0].name
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "dshm"
|
||||
empty_dir {
|
||||
medium = "Memory"
|
||||
size_limit = "256Mi"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
ignore_changes = [spec[0].template[0].spec[0].dns_config]
|
||||
}
|
||||
}
|
||||
|
||||
# --- Services ---
|
||||
# WS endpoint (internal only, gated by NetworkPolicy + token).
|
||||
resource "kubernetes_service" "chrome_service" {
|
||||
metadata {
|
||||
name = "chrome-service"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
labels = local.labels
|
||||
}
|
||||
|
||||
spec {
|
||||
selector = local.labels
|
||||
port {
|
||||
name = "ws"
|
||||
port = 3000
|
||||
target_port = 3000
|
||||
protocol = "TCP"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# noVNC view (Authentik-gated, exposed via ingress).
|
||||
resource "kubernetes_service" "chrome_novnc" {
|
||||
metadata {
|
||||
name = "chrome"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
labels = local.labels
|
||||
}
|
||||
|
||||
spec {
|
||||
selector = local.labels
|
||||
port {
|
||||
name = "http"
|
||||
port = 80
|
||||
target_port = 6080
|
||||
protocol = "TCP"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
module "ingress" {
|
||||
source = "../../modules/kubernetes/ingress_factory"
|
||||
dns_type = "proxied"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
name = "chrome"
|
||||
tls_secret_name = var.tls_secret_name
|
||||
protected = true
|
||||
# noVNC defaults to /vnc.html — auto-redirect / there.
|
||||
ingress_path = ["/"]
|
||||
extra_annotations = {
|
||||
"gethomepage.dev/enabled" = "true"
|
||||
"gethomepage.dev/name" = "Chrome Service"
|
||||
"gethomepage.dev/description" = "Live noVNC view of headed Chromium"
|
||||
"gethomepage.dev/icon" = "chromium.png"
|
||||
"gethomepage.dev/group" = "Infrastructure"
|
||||
}
|
||||
}
|
||||
|
||||
# --- NetworkPolicy: scoped ingress.
|
||||
# - TCP/3000 (Playwright WS): only from labelled client namespaces.
|
||||
# - TCP/6080 (noVNC HTTP+WS): only from the traefik namespace, since the
|
||||
# public-facing path is `chrome.viktorbarzin.me` ingress → Traefik →
|
||||
# sidecar. Authentik forward-auth still gates external access at the
|
||||
# Traefik layer.
|
||||
# The cluster has no default-deny, so this NP only takes effect inside
|
||||
# chrome-service ns — pods elsewhere remain unaffected.
|
||||
resource "kubernetes_network_policy_v1" "ws_ingress" {
|
||||
metadata {
|
||||
name = "chrome-service-ws-ingress"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
}
|
||||
spec {
|
||||
pod_selector {
|
||||
match_labels = local.labels
|
||||
}
|
||||
policy_types = ["Ingress"]
|
||||
ingress {
|
||||
from {
|
||||
namespace_selector {
|
||||
match_labels = {
|
||||
"chrome-service.viktorbarzin.me/client" = "true"
|
||||
}
|
||||
}
|
||||
}
|
||||
# Explicit fallback list — admit f1-stream by name in case the label
|
||||
# is removed by accident. Keep this in sync with the labels above.
|
||||
from {
|
||||
namespace_selector {
|
||||
match_labels = {
|
||||
"kubernetes.io/metadata.name" = "f1-stream"
|
||||
}
|
||||
}
|
||||
}
|
||||
ports {
|
||||
port = "3000"
|
||||
protocol = "TCP"
|
||||
}
|
||||
}
|
||||
ingress {
|
||||
from {
|
||||
namespace_selector {
|
||||
match_labels = {
|
||||
"kubernetes.io/metadata.name" = "traefik"
|
||||
}
|
||||
}
|
||||
}
|
||||
ports {
|
||||
port = "6080"
|
||||
protocol = "TCP"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# --- Backup CronJob: tar+gzip the profile every 6h, 30-day retention. ---
|
||||
resource "kubernetes_cron_job_v1" "chrome_service_backup" {
|
||||
metadata {
|
||||
name = "chrome-service-backup"
|
||||
namespace = kubernetes_namespace.chrome_service.metadata[0].name
|
||||
}
|
||||
spec {
|
||||
concurrency_policy = "Replace"
|
||||
failed_jobs_history_limit = 3
|
||||
successful_jobs_history_limit = 1
|
||||
schedule = "47 */6 * * *"
|
||||
starting_deadline_seconds = 60
|
||||
job_template {
|
||||
metadata {}
|
||||
spec {
|
||||
backoff_limit = 2
|
||||
ttl_seconds_after_finished = 300
|
||||
template {
|
||||
metadata {}
|
||||
spec {
|
||||
# PVC is RWO — colocate the backup pod with the chrome-service
|
||||
# pod so both can mount the volume on the same node.
|
||||
affinity {
|
||||
pod_affinity {
|
||||
required_during_scheduling_ignored_during_execution {
|
||||
label_selector {
|
||||
match_labels = local.labels
|
||||
}
|
||||
topology_key = "kubernetes.io/hostname"
|
||||
}
|
||||
}
|
||||
}
|
||||
container {
|
||||
name = "backup"
|
||||
image = "docker.io/library/alpine:3.20"
|
||||
command = ["/bin/sh", "-c", <<-EOT
|
||||
set -euxo pipefail
|
||||
ts=$(date +"%Y_%m_%d_%H")
|
||||
tar -czf /backup/$${ts}.tar.gz -C /profile .
|
||||
find /backup -maxdepth 1 -type f -name '*.tar.gz' -mtime +30 -delete
|
||||
echo "Backup complete: $${ts}.tar.gz"
|
||||
EOT
|
||||
]
|
||||
volume_mount {
|
||||
name = "profile"
|
||||
mount_path = "/profile"
|
||||
read_only = true
|
||||
}
|
||||
volume_mount {
|
||||
name = "backup"
|
||||
mount_path = "/backup"
|
||||
}
|
||||
resources {
|
||||
requests = { cpu = "10m", memory = "32Mi" }
|
||||
limits = { memory = "64Mi" }
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "profile"
|
||||
persistent_volume_claim {
|
||||
claim_name = kubernetes_persistent_volume_claim.profile_encrypted.metadata[0].name
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "backup"
|
||||
persistent_volume_claim {
|
||||
claim_name = module.nfs_chrome_service_backup_host.claim_name
|
||||
}
|
||||
}
|
||||
restart_policy = "OnFailure"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||
}
|
||||
}
|
||||
8
stacks/chrome-service/terragrunt.hcl
Normal file
8
stacks/chrome-service/terragrunt.hcl
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
include "root" {
|
||||
path = find_in_parent_folders()
|
||||
}
|
||||
|
||||
dependency "platform" {
|
||||
config_path = "../platform"
|
||||
skip_outputs = true
|
||||
}
|
||||
|
|
@ -10,7 +10,8 @@ data "vault_kv_secret_v2" "viktor_secrets" {
|
|||
|
||||
locals {
|
||||
namespace = "claude-agent"
|
||||
image = "registry.viktorbarzin.me/claude-agent-service"
|
||||
# Phase 3 cutover 2026-05-07 — see infra/docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md.
|
||||
image = "forgejo.viktorbarzin.me/viktor/claude-agent-service"
|
||||
image_tag = "2fd7670d"
|
||||
labels = {
|
||||
app = "claude-agent-service"
|
||||
|
|
|
|||
|
|
@ -175,8 +175,10 @@ resource "kubernetes_deployment" "claude-memory" {
|
|||
}
|
||||
}
|
||||
container {
|
||||
name = "claude-memory"
|
||||
image = "viktorbarzin/claude-memory-mcp:17"
|
||||
name = "claude-memory"
|
||||
# Phase 3 cutover 2026-05-07 — moved off DockerHub to Forgejo as
|
||||
# part of the registry consolidation. Old: viktorbarzin/claude-memory-mcp:17
|
||||
image = "forgejo.viktorbarzin.me/viktor/claude-memory-mcp:17"
|
||||
|
||||
port {
|
||||
container_port = 8000
|
||||
|
|
|
|||
|
|
@ -14,9 +14,26 @@ FROM python:3.13-slim-bookworm
|
|||
|
||||
WORKDIR /app
|
||||
|
||||
# Headless Chromium runtime libs for the playback verifier. Listed inline
|
||||
# (instead of running `playwright install-deps`) so the image build doesn't
|
||||
# need root-network apt fetches at runtime.
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
libnss3 libnspr4 \
|
||||
libatk1.0-0 libatk-bridge2.0-0 libcups2 \
|
||||
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
|
||||
libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 \
|
||||
libasound2 libatspi2.0-0 \
|
||||
fonts-liberation fonts-noto-color-emoji \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY backend/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Install the Chromium browser binary used by the verifier. Skip
|
||||
# --with-deps because we already installed the system libs above.
|
||||
RUN playwright install chromium
|
||||
|
||||
COPY backend/ ./backend/
|
||||
|
||||
# Copy built frontend into the image
|
||||
|
|
|
|||
359
stacks/f1-stream/files/backend/embed_proxy.py
Normal file
359
stacks/f1-stream/files/backend/embed_proxy.py
Normal file
|
|
@ -0,0 +1,359 @@
|
|||
"""Embed iframe-stripping reverse proxy.
|
||||
|
||||
Serves third-party embed pages (e.g. https://hmembeds.one/embed/{hash},
|
||||
https://pooembed.eu/embed/{slug}) through our origin so we can:
|
||||
|
||||
1. Strip X-Frame-Options and Content-Security-Policy: frame-ancestors headers,
|
||||
so the embed loads in our <iframe> regardless of upstream policy.
|
||||
2. Inject <base> + a frame-buster-defeat <script> at the top of <head> so
|
||||
the embed's JS sees `window.top === window` and a plausible
|
||||
`document.referrer` pointing at the upstream origin.
|
||||
3. Forward Referer / User-Agent matching the upstream's own pages so
|
||||
the upstream's hotlink / origin-allowlist checks pass.
|
||||
|
||||
Two endpoints:
|
||||
- GET /embed?url=<base64url> — the embed HTML page (rewritten).
|
||||
- GET /embed-asset?url=<base64url> — fallback for any subresource the
|
||||
upstream blocks based on hotlink protection. Most assets load directly
|
||||
via the injected <base> tag and bypass our proxy.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from typing import AsyncGenerator
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import httpx
|
||||
from fastapi import HTTPException
|
||||
|
||||
from backend.m3u8_rewriter import decode_url
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
EMBED_TIMEOUT = 20.0
|
||||
ASSET_TIMEOUT = 30.0
|
||||
RELAY_CHUNK_SIZE = 65536
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Response headers we never forward (they break frame embedding or leak upstream policy).
|
||||
STRIP_RESPONSE_HEADERS = {
|
||||
"x-frame-options",
|
||||
"content-security-policy",
|
||||
"content-security-policy-report-only",
|
||||
"set-cookie",
|
||||
"report-to",
|
||||
"nel",
|
||||
"permissions-policy",
|
||||
"cross-origin-opener-policy",
|
||||
"cross-origin-embedder-policy",
|
||||
"cross-origin-resource-policy",
|
||||
# let httpx/uvicorn re-set these
|
||||
"transfer-encoding",
|
||||
"content-encoding",
|
||||
"content-length",
|
||||
"connection",
|
||||
}
|
||||
|
||||
# Inject this <script> at the top of <head> to defeat JS frame-busters.
|
||||
# - Locks window.top, window.parent, and window.self to the embed window
|
||||
# itself, so `self !== window.top` checks pass.
|
||||
# - Forces document.referrer to the upstream origin so allowlist checks
|
||||
# like `document.referrer.includes("timstreams.net")` keep working.
|
||||
# - No-ops anything that would call window.parent.location or attempt to
|
||||
# reload the top frame.
|
||||
_FRAME_BUSTER_DEFEAT_TEMPLATE = """
|
||||
<script>(function(){{
|
||||
try {{
|
||||
var fakeWindow = window;
|
||||
Object.defineProperty(window, 'top', {{get: function(){{return fakeWindow;}}, configurable: false}});
|
||||
Object.defineProperty(window, 'parent', {{get: function(){{return fakeWindow;}}, configurable: false}});
|
||||
Object.defineProperty(window, 'frameElement', {{get: function(){{return null;}}, configurable: false}});
|
||||
Object.defineProperty(document, 'referrer', {{get: function(){{return {referrer!r};}}, configurable: false}});
|
||||
}} catch (e) {{}}
|
||||
// Defeat the `disable-devtool.js` redirect trap that hmembeds and similar
|
||||
// embed hosts use. The trap fires `console.clear`/`console.table` in a
|
||||
// tight loop, then if it thinks DevTools is open, calls
|
||||
// `window.location = "https://www.google.com"`. We block those redirect
|
||||
// sinks while leaving normal playback unaffected.
|
||||
try {{
|
||||
var noop = function(){{}};
|
||||
console.clear = noop;
|
||||
console.table = noop;
|
||||
console.dir = noop;
|
||||
var loc = window.location;
|
||||
Object.defineProperty(window, 'location', {{
|
||||
get: function(){{ return loc; }},
|
||||
set: function(v){{ /* swallow assignment */ }},
|
||||
configurable: false,
|
||||
}});
|
||||
var origAssign = loc.assign && loc.assign.bind(loc);
|
||||
var origReplace = loc.replace && loc.replace.bind(loc);
|
||||
loc.assign = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origAssign) origAssign(u); }};
|
||||
loc.replace = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origReplace) origReplace(u); }};
|
||||
}} catch (e) {{}}
|
||||
|
||||
// Route all cross-origin fetch/XHR requests through our /embed-asset
|
||||
// proxy. The hmembeds player calls a token-binding endpoint
|
||||
// (hghndasw.gbgdhdffhf.shop/sec/<JWT>) that CORS-rejects requests from
|
||||
// any origin other than hmembeds.one. By rewriting the URL to
|
||||
// /embed-asset?url=..., the browser fetches our same-origin endpoint
|
||||
// (no CORS issue), and our backend fetches the upstream with the
|
||||
// correct Referer/Origin server-side (no CORS issue there either).
|
||||
try {{
|
||||
var b64url = function(s) {{
|
||||
return btoa(unescape(encodeURIComponent(s)))
|
||||
.replace(/\\+/g, '-').replace(/\\//g, '_').replace(/=+$/, '');
|
||||
}};
|
||||
var sameOrigin = function(u) {{
|
||||
try {{ return (new URL(u, document.baseURI || location.href)).origin === location.origin; }}
|
||||
catch (_) {{ return true; }}
|
||||
}};
|
||||
var toAbsolute = function(u) {{
|
||||
try {{ return (new URL(u, document.baseURI || location.href)).toString(); }}
|
||||
catch (_) {{ return u; }}
|
||||
}};
|
||||
var proxify = function(u) {{
|
||||
var abs = toAbsolute(u);
|
||||
if (sameOrigin(abs)) return u;
|
||||
// Don't double-proxy.
|
||||
if (abs.indexOf('/embed-asset?') !== -1 || abs.indexOf('/embed?') !== -1) return u;
|
||||
return location.origin + '/embed-asset?url=' + b64url(abs);
|
||||
}};
|
||||
|
||||
var _fetch = window.fetch && window.fetch.bind(window);
|
||||
if (_fetch) {{
|
||||
window.fetch = function(input, init) {{
|
||||
try {{
|
||||
if (typeof input === 'string') {{
|
||||
return _fetch(proxify(input), init);
|
||||
}} else if (input && input.url) {{
|
||||
var newUrl = proxify(input.url);
|
||||
if (newUrl !== input.url) {{
|
||||
return _fetch(new Request(newUrl, input), init);
|
||||
}}
|
||||
}}
|
||||
}} catch (e) {{}}
|
||||
return _fetch(input, init);
|
||||
}};
|
||||
}}
|
||||
|
||||
var XHR = window.XMLHttpRequest;
|
||||
if (XHR && XHR.prototype && XHR.prototype.open) {{
|
||||
var _open = XHR.prototype.open;
|
||||
XHR.prototype.open = function(method, url) {{
|
||||
try {{ url = proxify(url); }} catch (e) {{}}
|
||||
var args = Array.prototype.slice.call(arguments);
|
||||
args[1] = url;
|
||||
return _open.apply(this, args);
|
||||
}};
|
||||
}}
|
||||
}} catch (e) {{}}
|
||||
}})();</script>
|
||||
"""
|
||||
|
||||
|
||||
def _decode(encoded_url: str) -> str:
|
||||
try:
|
||||
return decode_url(encoded_url)
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
|
||||
|
||||
|
||||
def _filter_headers(upstream_headers: httpx.Headers) -> dict[str, str]:
|
||||
"""Forward upstream headers minus the ones we strip."""
|
||||
out: dict[str, str] = {}
|
||||
for k, v in upstream_headers.items():
|
||||
if k.lower() in STRIP_RESPONSE_HEADERS:
|
||||
continue
|
||||
out[k] = v
|
||||
# Always allow our domain to embed and load cross-origin
|
||||
out["Access-Control-Allow-Origin"] = "*"
|
||||
out["X-Frame-Options-Stripped"] = "by-f1-embed-proxy"
|
||||
return out
|
||||
|
||||
|
||||
def _make_referer(upstream_url: str) -> str:
|
||||
"""Build a plausible Referer header — the upstream's own root."""
|
||||
parsed = urlparse(upstream_url)
|
||||
return f"{parsed.scheme}://{parsed.netloc}/"
|
||||
|
||||
|
||||
def _make_origin(upstream_url: str) -> str:
|
||||
parsed = urlparse(upstream_url)
|
||||
return f"{parsed.scheme}://{parsed.netloc}"
|
||||
|
||||
|
||||
def _inject_into_head(html: str, upstream_url: str) -> str:
|
||||
"""Inject <base> tag + frame-buster defeat script into the response HTML."""
|
||||
parsed = urlparse(upstream_url)
|
||||
base_href = f"{parsed.scheme}://{parsed.netloc}/"
|
||||
|
||||
# The frame-buster-defeat script. Use the upstream's own URL as the spoofed referrer.
|
||||
busted = _FRAME_BUSTER_DEFEAT_TEMPLATE.format(referrer=upstream_url)
|
||||
|
||||
base_tag = f'<base href="{base_href}">'
|
||||
|
||||
injection = base_tag + busted
|
||||
|
||||
# Drop any inline CSP <meta> tags first so they can't override our header strip.
|
||||
html = re.sub(
|
||||
r'<meta[^>]+http-equiv=[\'"]?Content-Security-Policy[\'"]?[^>]*>',
|
||||
"",
|
||||
html,
|
||||
flags=re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Strip disable-devtool.js script tags. The library runs detection heuristics
|
||||
# and redirects on match. Removing it reduces attack surface even with our
|
||||
# location-setter lockdown — saves redundant work and one fewer thing to
|
||||
# bypass in case the lockdown misses an edge case.
|
||||
html = re.sub(
|
||||
r'<script[^>]+(?:disable-devtool|devtool|disabledevtool)[^<]*</script>',
|
||||
"",
|
||||
html,
|
||||
flags=re.IGNORECASE,
|
||||
)
|
||||
html = re.sub(
|
||||
r'<script[^>]+src=["\'][^"\']*disable-devtool[^"\']*["\'][^>]*></script>',
|
||||
"",
|
||||
html,
|
||||
flags=re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Insert immediately after the opening <head> (case-insensitive).
|
||||
head_match = re.search(r"<head[^>]*>", html, flags=re.IGNORECASE)
|
||||
if head_match:
|
||||
idx = head_match.end()
|
||||
return html[:idx] + injection + html[idx:]
|
||||
|
||||
# No <head> — prepend at the start of the document so the script runs first.
|
||||
return injection + html
|
||||
|
||||
|
||||
def _looks_blocked_by_anti_bot(content: str) -> bool:
|
||||
"""Detect Cloudflare-style challenge interstitials in the upstream body."""
|
||||
sample = content[:4096].lower()
|
||||
markers = (
|
||||
"cf-chl-bypass",
|
||||
"checking your browser",
|
||||
"just a moment",
|
||||
"attention required",
|
||||
"cf-browser-verification",
|
||||
)
|
||||
return any(m in sample for m in markers)
|
||||
|
||||
|
||||
async def fetch_embed(encoded_url: str) -> tuple[bytes, dict[str, str], int]:
|
||||
"""Fetch an upstream embed page, rewrite the HTML, and return the response.
|
||||
|
||||
Returns: (body_bytes, headers_dict, status_code).
|
||||
Raises HTTPException on transport errors.
|
||||
"""
|
||||
url = _decode(encoded_url)
|
||||
logger.info("Embed-proxying: %s", url)
|
||||
|
||||
upstream_headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Referer": _make_referer(url),
|
||||
"Origin": _make_origin(url),
|
||||
"Accept": (
|
||||
"text/html,application/xhtml+xml,application/xml;q=0.9,"
|
||||
"image/avif,image/webp,*/*;q=0.8"
|
||||
),
|
||||
"Accept-Language": "en-US,en;q=0.9",
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=EMBED_TIMEOUT,
|
||||
follow_redirects=True,
|
||||
) as client:
|
||||
response = await client.get(url, headers=upstream_headers)
|
||||
except httpx.TimeoutException:
|
||||
raise HTTPException(status_code=504, detail="Upstream embed timeout")
|
||||
except httpx.HTTPError as e:
|
||||
raise HTTPException(status_code=502, detail=f"Upstream embed error: {e}")
|
||||
|
||||
status_code = response.status_code
|
||||
upstream_ct = response.headers.get("content-type", "")
|
||||
headers_out = _filter_headers(response.headers)
|
||||
|
||||
body = response.content
|
||||
|
||||
# Detect Cloudflare-style challenge so the frontend can show a clear error.
|
||||
if "html" in upstream_ct.lower():
|
||||
text = response.text
|
||||
if _looks_blocked_by_anti_bot(text):
|
||||
logger.warning("Upstream returned anti-bot challenge: %s", url)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail="Upstream returned anti-bot challenge — proxy cannot bypass",
|
||||
)
|
||||
|
||||
rewritten = _inject_into_head(text, url)
|
||||
body = rewritten.encode("utf-8")
|
||||
headers_out["Content-Type"] = "text/html; charset=utf-8"
|
||||
|
||||
return body, headers_out, status_code
|
||||
|
||||
|
||||
async def relay_asset(
|
||||
encoded_url: str, range_header: str | None
|
||||
) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
|
||||
"""Relay an upstream subresource (JS/CSS/image/font) as a chunked stream.
|
||||
|
||||
Used as a fallback when an upstream blocks hotlinked assets via Referer
|
||||
or Origin checks. The injected <base> tag handles most of these cases
|
||||
by letting the browser hit upstream directly — the relay is only for
|
||||
the awkward few that need a proxied origin.
|
||||
"""
|
||||
url = _decode(encoded_url)
|
||||
logger.debug("Embed-asset relay: %s", url)
|
||||
|
||||
headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Referer": _make_referer(url),
|
||||
"Origin": _make_origin(url),
|
||||
"Accept": "*/*",
|
||||
}
|
||||
if range_header:
|
||||
headers["Range"] = range_header
|
||||
|
||||
client = httpx.AsyncClient(timeout=ASSET_TIMEOUT, follow_redirects=True)
|
||||
|
||||
try:
|
||||
response = await client.send(
|
||||
client.build_request("GET", url, headers=headers),
|
||||
stream=True,
|
||||
)
|
||||
except httpx.TimeoutException:
|
||||
await client.aclose()
|
||||
raise HTTPException(status_code=504, detail="Upstream asset timeout")
|
||||
except httpx.HTTPError as e:
|
||||
await client.aclose()
|
||||
raise HTTPException(status_code=502, detail=f"Upstream asset error: {e}")
|
||||
|
||||
if response.status_code >= 400:
|
||||
await response.aclose()
|
||||
await client.aclose()
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Upstream asset returned HTTP {response.status_code}",
|
||||
)
|
||||
|
||||
headers_out = _filter_headers(response.headers)
|
||||
|
||||
async def _stream() -> AsyncGenerator[bytes, None]:
|
||||
try:
|
||||
async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
|
||||
yield chunk
|
||||
finally:
|
||||
await response.aclose()
|
||||
await client.aclose()
|
||||
|
||||
return _stream(), headers_out, response.status_code
|
||||
|
|
@ -12,12 +12,20 @@ Example:
|
|||
"""
|
||||
|
||||
from backend.extractors.aceztrims import AceztrimsExtractor
|
||||
from backend.extractors.chrome_browser import ChromeBrowserExtractor
|
||||
from backend.extractors.curated import CuratedExtractor
|
||||
from backend.extractors.dd12 import DD12Extractor
|
||||
from backend.extractors.stremio import StremioAddonExtractor
|
||||
from backend.extractors.subreddit import SubredditExtractor
|
||||
from backend.extractors.daddylive import DaddyLiveExtractor
|
||||
from backend.extractors.demo import DemoExtractor
|
||||
from backend.extractors.discord_source import DiscordExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
from backend.extractors.pitsport import PitsportExtractor
|
||||
from backend.extractors.ppv import PPVExtractor
|
||||
from backend.extractors.registry import ExtractorRegistry
|
||||
from backend.extractors.service import ExtractionService
|
||||
from backend.extractors.streamed import StreamedExtractor
|
||||
from backend.extractors.timstreams import TimStreamsExtractor
|
||||
|
||||
__all__ = [
|
||||
"ExtractedStream",
|
||||
|
|
@ -36,10 +44,36 @@ def create_registry() -> ExtractorRegistry:
|
|||
registry = ExtractorRegistry()
|
||||
|
||||
# --- Register extractors below ---
|
||||
registry.register(DemoExtractor())
|
||||
# CuratedExtractor previously surfaced two hmembeds 24/7 channels (Sky
|
||||
# Sports F1, DAZN F1) but their JW Player decoder produces an empty
|
||||
# playlist in our environment (error 102630) regardless of headed mode,
|
||||
# IP, or fingerprint we tried. The streams loaded the upstream's ad
|
||||
# overlay but never produced a video element, so they confused users —
|
||||
# disabled until/unless we find a working bypass.
|
||||
# registry.register(CuratedExtractor())
|
||||
registry.register(StreamedExtractor())
|
||||
# ChromeBrowserExtractor drives the in-cluster chrome-service via the
|
||||
# CHROME_WS_URL / CHROME_WS_TOKEN env vars to scrape JS-rendered
|
||||
# pages whose m3u8 is computed at runtime.
|
||||
registry.register(ChromeBrowserExtractor())
|
||||
# SubredditExtractor pulls live-stream posts from motorsport subreddits.
|
||||
# Returns embed-type streams; the verifier will visit each via
|
||||
# chrome-service to confirm playability.
|
||||
registry.register(SubredditExtractor())
|
||||
# DD12Extractor scrapes DD12Streams' per-channel pages for the inline
|
||||
# JW Player file URL. The site embeds the m3u8 in HTML so curl-based
|
||||
# parsing is enough — no browser needed.
|
||||
registry.register(DD12Extractor())
|
||||
# StremioAddonExtractor calls Stremio addon HTTP APIs (TvVoo, StremVerse)
|
||||
# which already index Sky F1 / DAZN F1 / Vavoo IPTV channels. No
|
||||
# Stremio client needed — just /stream/<type>/<id>.json calls.
|
||||
registry.register(StremioAddonExtractor())
|
||||
registry.register(DaddyLiveExtractor())
|
||||
registry.register(AceztrimsExtractor())
|
||||
registry.register(PitsportExtractor())
|
||||
registry.register(PPVExtractor())
|
||||
registry.register(TimStreamsExtractor())
|
||||
registry.register(DiscordExtractor())
|
||||
|
||||
return registry
|
||||
|
||||
|
|
|
|||
243
stacks/f1-stream/files/backend/extractors/chrome_browser.py
Normal file
243
stacks/f1-stream/files/backend/extractors/chrome_browser.py
Normal file
|
|
@ -0,0 +1,243 @@
|
|||
"""Generic chrome-service-driven extractor.
|
||||
|
||||
Drives the in-cluster headed Chromium pool (chrome-service) to load a list
|
||||
of stream/aggregator pages, captures any HLS playlist URL the page fetches
|
||||
at runtime, and returns one ExtractedStream per discovered playlist.
|
||||
|
||||
Unlike the API-based extractors (pitsport/streamed/ppv) this one handles
|
||||
sites where the m3u8 is computed by JavaScript at page load time — the
|
||||
URL only exists after the page evaluates an obfuscated decoder, fetches a
|
||||
token, etc. Curl can't see it; a real browser can.
|
||||
|
||||
Add new targets via the `TARGETS` constant below. Each entry is a (label,
|
||||
title, page_url) tuple. The extractor visits each URL with a stealthed
|
||||
context, waits for the JS to settle, and yields any captured HLS URL.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import urllib.parse
|
||||
from dataclasses import dataclass
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Best-effort pause between navigation and capture. The decoder usually
|
||||
# fires within 5s; 12s gives slow JS time to settle without dragging the
|
||||
# extraction round.
|
||||
DEFAULT_SETTLE_SECONDS = 12
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class _Target:
|
||||
label: str # site_name (homepage label in the UI)
|
||||
title: str # human-readable stream title
|
||||
url: str # page to navigate
|
||||
settle: int = DEFAULT_SETTLE_SECONDS
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Target list. F1-relevant 24/7 channels and motorsport aggregator pages
|
||||
# whose m3u8 is JS-computed. Add freely — each one takes ~12s to scrape.
|
||||
# ---------------------------------------------------------------------------
|
||||
TARGETS: tuple[_Target, ...] = (
|
||||
# MotoMundo embed pages — the community-curated WordPress site for
|
||||
# MotoGP. Each /e/<id> URL is one of the iframes their "Watch Online"
|
||||
# post lists for the active session (FP/Q/Race). The m3u8 is
|
||||
# JS-computed at load time so a real browser is required to capture
|
||||
# it. Update IDs each weekend to match the current race; subreddit.py
|
||||
# discovers them from the Reddit "[Watch / Download]" thread.
|
||||
_Target(
|
||||
label="MotoMundo",
|
||||
title="MotoGP Live (MotoMundo) — French GP / Le Mans",
|
||||
url="https://motomundo.top/e/9yzn08jk9py4",
|
||||
settle=15,
|
||||
),
|
||||
_Target(
|
||||
label="MotoMundo",
|
||||
title="MotoGP Live (MotoMundo upns) — French GP / Le Mans",
|
||||
url="https://motomundo.upns.xyz/#kqasde",
|
||||
settle=15,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
# Heuristic to recognise an HLS playlist URL from network capture. Most CDNs
|
||||
# use `.m3u8`; some (pushembdz/oe1.ossfeed) disguise the playlist as `.css`
|
||||
# under a /out/v… or /hls/ path. Filter out obvious junk (.css for actual
|
||||
# stylesheets, .ts segments — we only want the playlist).
|
||||
_HLS_URL_RE = re.compile(r"\.m3u8(\?|$)|/out/v[0-9]+/.+\.css(\?|$)|/hls/.+/master\.css(\?|$)")
|
||||
_SEGMENT_EXT_RE = re.compile(r"\.(ts|m4s|aac|key)(\?|$)")
|
||||
|
||||
|
||||
def _looks_like_hls_playlist(url: str) -> bool:
|
||||
if _SEGMENT_EXT_RE.search(url):
|
||||
return False
|
||||
return bool(_HLS_URL_RE.search(url))
|
||||
|
||||
|
||||
def _resolve_chrome_ws() -> str | None:
|
||||
base = os.getenv("CHROME_WS_URL")
|
||||
token = os.getenv("CHROME_WS_TOKEN")
|
||||
if not base or not token:
|
||||
return None
|
||||
return f"{base.rstrip('/')}/{token}"
|
||||
|
||||
|
||||
class ChromeBrowserExtractor(BaseExtractor):
|
||||
"""Drive chrome-service to capture m3u8 URLs from JS-heavy pages."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "chrome-browser"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Chrome Browser"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
ws_url = _resolve_chrome_ws()
|
||||
if not ws_url:
|
||||
logger.warning(
|
||||
"[chrome-browser] CHROME_WS_URL/TOKEN not set — extractor disabled"
|
||||
)
|
||||
return []
|
||||
|
||||
try:
|
||||
from playwright.async_api import async_playwright
|
||||
except ImportError:
|
||||
logger.warning("[chrome-browser] playwright not installed — disabled")
|
||||
return []
|
||||
|
||||
# One Playwright instance + one browser connection per extraction
|
||||
# round. Contexts are cheap; the browser is shared.
|
||||
async with async_playwright() as p:
|
||||
try:
|
||||
browser = await p.chromium.connect(ws_url, timeout=15_000)
|
||||
except Exception:
|
||||
logger.exception("[chrome-browser] connect to chrome-service failed")
|
||||
return []
|
||||
|
||||
results: list[ExtractedStream] = []
|
||||
for target in TARGETS:
|
||||
try:
|
||||
stream = await self._scrape(browser, target)
|
||||
if stream:
|
||||
results.append(stream)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"[chrome-browser] failed to scrape %s", target.url
|
||||
)
|
||||
|
||||
try:
|
||||
await browser.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
logger.info("[chrome-browser] returned %d stream(s)", len(results))
|
||||
return results
|
||||
|
||||
async def _scrape(self, browser, target: _Target) -> ExtractedStream | None:
|
||||
ctx = await browser.new_context(
|
||||
user_agent=USER_AGENT,
|
||||
viewport={"width": 1280, "height": 720},
|
||||
bypass_csp=True,
|
||||
)
|
||||
# Inject the same stealth script the verifier uses so anti-bot
|
||||
# checks don't trip the page before its decoder runs.
|
||||
try:
|
||||
from backend.stealth import STEALTH_JS
|
||||
await ctx.add_init_script(STEALTH_JS)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
page = await ctx.new_page()
|
||||
captured: list[str] = []
|
||||
|
||||
def on_response(resp):
|
||||
try:
|
||||
if _looks_like_hls_playlist(resp.url):
|
||||
captured.append(resp.url)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
page.on("response", on_response)
|
||||
# Some pages (DD12 variants) load the player in a child iframe;
|
||||
# frame events catch nested navigations.
|
||||
page.on(
|
||||
"framenavigated",
|
||||
lambda fr: captured.append(fr.url) if _looks_like_hls_playlist(fr.url) else None,
|
||||
)
|
||||
|
||||
try:
|
||||
await page.goto(target.url, wait_until="domcontentloaded", timeout=20_000)
|
||||
except Exception as e:
|
||||
logger.debug("[chrome-browser] %s goto failed: %s", target.url, e)
|
||||
await ctx.close()
|
||||
return None
|
||||
|
||||
# Let the page's JS settle.
|
||||
await asyncio.sleep(target.settle)
|
||||
|
||||
# Also probe child iframes — `pushembdz`, `pooembed`, `embedsports`
|
||||
# all live behind one. Collect any HLS URL the iframes loaded.
|
||||
for fr in page.frames:
|
||||
if fr is page.main_frame:
|
||||
continue
|
||||
try:
|
||||
# JW Player and Clappr both expose the playing source via
|
||||
# a <video>/`<source>` element after setup completes.
|
||||
sources = await fr.evaluate(
|
||||
"() => Array.from(document.querySelectorAll('video, source')).map(e => e.currentSrc || e.src || '').filter(s => s.includes('.m3u8') || s.includes('.css'))"
|
||||
)
|
||||
for s in sources:
|
||||
if _looks_like_hls_playlist(s):
|
||||
captured.append(s)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
await ctx.close()
|
||||
|
||||
# Pick the first plausible URL (any subsequent are usually variant
|
||||
# playlists referenced from the master). Prefer URLs that look like
|
||||
# full master playlists.
|
||||
unique = list(dict.fromkeys(captured))
|
||||
if not unique:
|
||||
logger.debug("[chrome-browser] %s yielded no HLS URL", target.url)
|
||||
return None
|
||||
|
||||
# Prefer URLs that look like a master/index playlist over variant
|
||||
# playlists when both are captured.
|
||||
master = next(
|
||||
(u for u in unique if "master" in u.lower() or "index" in u.lower()),
|
||||
unique[0],
|
||||
)
|
||||
# Strip query strings on URLs that include short-lived tokens —
|
||||
# the verifier and frontend re-resolve them per request.
|
||||
# (Some CDNs require the query though; only strip when obvious.)
|
||||
m3u8 = master
|
||||
# Decode URL-encoded characters so the proxy gets a clean URL.
|
||||
m3u8 = urllib.parse.unquote(m3u8)
|
||||
|
||||
logger.info(
|
||||
"[chrome-browser] %s -> %s",
|
||||
target.url, m3u8[:120],
|
||||
)
|
||||
return ExtractedStream(
|
||||
url=m3u8,
|
||||
site_key=self.site_key,
|
||||
site_name=target.label,
|
||||
quality="",
|
||||
title=target.title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
61
stacks/f1-stream/files/backend/extractors/curated.py
Normal file
61
stacks/f1-stream/files/backend/extractors/curated.py
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
"""Curated extractor — known-good 24/7 F1 channels via direct embed URLs.
|
||||
|
||||
Returns a small, hand-picked list of embed URLs that are reliable enough to
|
||||
be served as fallback "always-on" streams when the dynamic extractors find
|
||||
nothing (e.g. between race weekends, when API providers are down).
|
||||
|
||||
These are direct embed URLs. The frontend routes them through /embed so the
|
||||
iframe-stripping proxy bypasses any frame-buster JS in the upstream player.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Curated list. Each entry is a known direct embed URL. These were sourced
|
||||
# from the timstreams.py ALWAYS_INCLUDE_HASHES list (Sky Sports F1, DAZN F1)
|
||||
# and are documented as 24/7 channels that play F1 content year-round.
|
||||
_CURATED_STREAMS = [
|
||||
{
|
||||
"url": "https://hmembeds.one/embed/888520f36cd94c5da4c71fddc1a5fc9b",
|
||||
"title": "Sky Sports F1 (24/7)",
|
||||
"quality": "HD",
|
||||
},
|
||||
{
|
||||
"url": "https://hmembeds.one/embed/fc3a54634d0867b0c02ee3223292e7c6",
|
||||
"title": "DAZN F1 (24/7)",
|
||||
"quality": "HD",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
class CuratedExtractor(BaseExtractor):
|
||||
"""Returns curated known-good 24/7 F1 channel embed URLs."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "curated"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Curated 24/7 Channels"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
streams = [
|
||||
ExtractedStream(
|
||||
url=entry["url"],
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=entry["quality"],
|
||||
title=entry["title"],
|
||||
stream_type="embed",
|
||||
embed_url=entry["url"],
|
||||
)
|
||||
for entry in _CURATED_STREAMS
|
||||
]
|
||||
logger.info("[curated] Returning %d curated stream(s)", len(streams))
|
||||
return streams
|
||||
111
stacks/f1-stream/files/backend/extractors/dd12.py
Normal file
111
stacks/f1-stream/files/backend/extractors/dd12.py
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
"""DD12Streams extractor — scrapes inline m3u8 URLs from per-channel pages.
|
||||
|
||||
Each DD12 sport page (`/nas`, `/f1`, `/sky`, etc.) renders an iframe to
|
||||
`/<channel>c1` which 302-redirects to `/new-<channel>/jwplayer`. That
|
||||
page contains a JW Player setup with the m3u8 URL hard-coded inline:
|
||||
|
||||
playerInstance.setup({
|
||||
file: "https://...b-cdn.net/.../master.m3u8",
|
||||
...
|
||||
});
|
||||
|
||||
The JW Player runtime fails in our cluster (same fingerprint trap as
|
||||
hmembeds), but we don't need it — the file URL is in the HTML and any
|
||||
browser with H.264 codecs can play it directly via hls.js.
|
||||
|
||||
Channel discovery: probe a known list. New ones can be added by checking
|
||||
DD12's own homepage / nav.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
BASE = "https://dd12streams.com"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
# (path, channel_label, title). Add as DD12 surfaces new channels.
|
||||
CHANNELS = (
|
||||
("nas", "DD12Streams", "NASCAR Cup Series (24/7) — DD12"),
|
||||
)
|
||||
|
||||
_FILE_URL_RE = re.compile(r"""file\s*:\s*["']([^"']+\.m3u8[^"']*)["']""")
|
||||
|
||||
|
||||
class DD12Extractor(BaseExtractor):
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "dd12"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "DD12Streams"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
results: list[ExtractedStream] = []
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
for path, label, title in CHANNELS:
|
||||
try:
|
||||
page_url = f"{BASE}/{path}"
|
||||
resp = await client.get(page_url)
|
||||
if resp.status_code != 200:
|
||||
continue
|
||||
iframe_path = self._extract_iframe(resp.text)
|
||||
if not iframe_path:
|
||||
continue
|
||||
iframe_url = (
|
||||
iframe_path
|
||||
if iframe_path.startswith("http")
|
||||
else f"{BASE}{iframe_path}"
|
||||
)
|
||||
iframe_resp = await client.get(
|
||||
iframe_url, headers={"Referer": page_url}
|
||||
)
|
||||
if iframe_resp.status_code != 200:
|
||||
continue
|
||||
m3u8 = self._find_m3u8(iframe_resp.text)
|
||||
if not m3u8:
|
||||
continue
|
||||
results.append(
|
||||
ExtractedStream(
|
||||
url=m3u8,
|
||||
site_key=self.site_key,
|
||||
site_name=label,
|
||||
quality="",
|
||||
title=title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[dd12] /%s extraction failed", path, exc_info=True
|
||||
)
|
||||
logger.info("[dd12] Extracted %d stream(s)", len(results))
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _extract_iframe(html: str) -> str | None:
|
||||
m = re.search(
|
||||
r'<iframe[^>]+id=["\']vplayer["\'][^>]+src=["\']([^"\']+)["\']',
|
||||
html,
|
||||
)
|
||||
return m.group(1) if m else None
|
||||
|
||||
@staticmethod
|
||||
def _find_m3u8(html: str) -> str | None:
|
||||
m = _FILE_URL_RE.search(html)
|
||||
return m.group(1) if m else None
|
||||
203
stacks/f1-stream/files/backend/extractors/discord_source.py
Normal file
203
stacks/f1-stream/files/backend/extractors/discord_source.py
Normal file
|
|
@ -0,0 +1,203 @@
|
|||
"""Discord extractor - monitors Discord channels for F1 stream links.
|
||||
|
||||
Reads recent messages from configured Discord channels using a user token,
|
||||
extracts URLs that look like stream links, and returns them as embed streams.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DISCORD_API = "https://discord.com/api/v9"
|
||||
DISCORD_TOKEN = os.getenv("DISCORD_TOKEN", "")
|
||||
# Comma-separated channel IDs to monitor
|
||||
DISCORD_CHANNELS = os.getenv("DISCORD_CHANNELS", "").split(",")
|
||||
# How many messages to fetch per channel
|
||||
MESSAGE_LIMIT = 50
|
||||
|
||||
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
|
||||
|
||||
# URL pattern to match stream links (exclude Discord CDN, images, etc.)
|
||||
URL_PATTERN = re.compile(r"https?://[^\s<>\)\]\"']+", re.IGNORECASE)
|
||||
|
||||
# Domains that publish news/articles, not playable streams. Discord users share
|
||||
# these links during race weekends; they are NOT streams and pollute the list.
|
||||
EXCLUDED_DOMAINS = {
|
||||
"discord.com", "discord.gg", "cdn.discordapp.com",
|
||||
"tenor.com", "giphy.com", "imgur.com",
|
||||
"youtube.com", "youtu.be", "twitter.com", "x.com",
|
||||
"reddit.com", "instagram.com", "tiktok.com",
|
||||
"fmhy.net", "github.com", "freemotorsports.com",
|
||||
# News / official sites — never playable embeds
|
||||
"formula1.com", "fia.com", "skysports.com", "motorsport.com",
|
||||
"driverdb.com", "autosport.com", "the-race.com", "racefans.net",
|
||||
"wikipedia.org", "fantasy.formula1.com",
|
||||
}
|
||||
|
||||
# A URL is treated as a candidate stream embed only if its path looks like
|
||||
# a *direct* player/embed page — `/embed/{id}`, `/player/{...}`, `*.m3u8`,
|
||||
# `*.php` (legacy iframe1.php style). Aggregator landing pages
|
||||
# (`/event/...`, `/watch?session=...`, etc.) are rejected because they
|
||||
# show a list of links instead of playing automatically — those produce
|
||||
# verifier-passing UI without actual playback.
|
||||
_PATH_KEYWORDS = (
|
||||
"/embed/", "/player/", ".m3u8", ".php",
|
||||
)
|
||||
|
||||
|
||||
def _is_stream_url(url: str) -> bool:
|
||||
"""Heuristic: does this URL look like an actual stream/embed/player link?
|
||||
|
||||
Discord users share lots of news links during race weekends. The old
|
||||
filter only blocked specific domains and let everything else through,
|
||||
which produced a stream list dominated by formula1.com news articles.
|
||||
The new filter is positive-match: a URL must contain at least one
|
||||
stream-shaped path keyword to be included.
|
||||
"""
|
||||
from urllib.parse import urlparse
|
||||
|
||||
try:
|
||||
parsed = urlparse(url)
|
||||
domain = parsed.netloc.lower()
|
||||
path = parsed.path.lower()
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
if not domain:
|
||||
return False
|
||||
|
||||
for excluded in EXCLUDED_DOMAINS:
|
||||
if excluded in domain:
|
||||
return False
|
||||
|
||||
if any(path.endswith(ext) for ext in (".png", ".jpg", ".jpeg", ".gif", ".webp", ".mp4", ".webm", ".svg", ".css", ".js")):
|
||||
return False
|
||||
|
||||
full = path + ("?" + parsed.query if parsed.query else "")
|
||||
if not any(kw in full for kw in _PATH_KEYWORDS):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
class DiscordExtractor(BaseExtractor):
|
||||
"""Extracts stream links from Discord channel messages.
|
||||
|
||||
Monitors configured Discord channels for URLs shared by users,
|
||||
filters to likely stream links, and returns them as embed streams.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "discord"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Discord Community"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch recent messages from Discord channels and extract URLs."""
|
||||
if not DISCORD_TOKEN:
|
||||
logger.info("[discord] No DISCORD_TOKEN set, skipping")
|
||||
return []
|
||||
|
||||
channels = [c.strip() for c in DISCORD_CHANNELS if c.strip()]
|
||||
if not channels:
|
||||
logger.info("[discord] No DISCORD_CHANNELS configured, skipping")
|
||||
return []
|
||||
|
||||
streams: list[ExtractedStream] = []
|
||||
seen_urls: set[str] = set()
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={
|
||||
"Authorization": DISCORD_TOKEN,
|
||||
"User-Agent": USER_AGENT,
|
||||
},
|
||||
) as client:
|
||||
for channel_id in channels:
|
||||
try:
|
||||
channel_streams = await self._fetch_channel(
|
||||
client, channel_id, seen_urls
|
||||
)
|
||||
streams.extend(channel_streams)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[discord] Failed to fetch channel %s",
|
||||
channel_id,
|
||||
exc_info=True,
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[discord] Failed to connect to Discord API")
|
||||
|
||||
logger.info("[discord] Extracted %d stream(s) from %d channel(s)", len(streams), len(channels))
|
||||
return streams
|
||||
|
||||
async def _fetch_channel(
|
||||
self,
|
||||
client: httpx.AsyncClient,
|
||||
channel_id: str,
|
||||
seen_urls: set[str],
|
||||
) -> list[ExtractedStream]:
|
||||
"""Fetch messages from a single channel and extract stream URLs."""
|
||||
resp = await client.get(
|
||||
f"{DISCORD_API}/channels/{channel_id}/messages",
|
||||
params={"limit": MESSAGE_LIMIT},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[discord] Channel %s returned HTTP %d", channel_id, resp.status_code
|
||||
)
|
||||
return []
|
||||
|
||||
messages = resp.json()
|
||||
if not isinstance(messages, list):
|
||||
return []
|
||||
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
for msg in messages:
|
||||
content = msg.get("content", "")
|
||||
author = msg.get("author", {}).get("username", "unknown")
|
||||
|
||||
# Extract URLs from message content
|
||||
urls = URL_PATTERN.findall(content)
|
||||
|
||||
# Also check embeds
|
||||
for embed in msg.get("embeds", []):
|
||||
if embed.get("url"):
|
||||
urls.append(embed["url"])
|
||||
|
||||
for url in urls:
|
||||
# Clean trailing punctuation
|
||||
url = url.rstrip(".,;:!?)")
|
||||
|
||||
if url in seen_urls:
|
||||
continue
|
||||
if not _is_stream_url(url):
|
||||
continue
|
||||
|
||||
seen_urls.add(url)
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=f"Shared by {author}",
|
||||
stream_type="embed",
|
||||
embed_url=url,
|
||||
)
|
||||
)
|
||||
|
||||
return streams
|
||||
544
stacks/f1-stream/files/backend/extractors/pitsport.py
Normal file
544
stacks/f1-stream/files/backend/extractors/pitsport.py
Normal file
|
|
@ -0,0 +1,544 @@
|
|||
"""Pitsport.xyz extractor - fetches F1 streams from the Next.js RSC payload.
|
||||
|
||||
Architecture:
|
||||
- Main page (pitsport.xyz) has a "Live Now" section with event cards containing
|
||||
category, title, time, imageUrl props and /watch/{UUID} links.
|
||||
- Schedule page (pitsport.xyz/schedule) lists all events grouped by category
|
||||
(h2 headings) with /watch/{UUID} links and event titles.
|
||||
- Watch pages (/watch/{UUID}) embed iframes from pushembdz.store/embed/{EMBED_UUID}.
|
||||
- Embed pages contain an RSC payload with a stream config: {title, link, method}.
|
||||
- When method is "player" or "hls", the link field points to a serveplay.site
|
||||
m3u8 playlist. Otherwise we return the embed URL for iframe playback.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
PITSPORT_BASE = "https://pitsport.xyz"
|
||||
EMBED_BASE = "https://pushembdz.store"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Categories to include (case-insensitive match). Broadened beyond F1
|
||||
# to also surface MotoGP and adjacent motorsports — keeps the f1-stream
|
||||
# UI useful between race weekends and during the off-season.
|
||||
MOTORSPORT_CATEGORIES = {
|
||||
"formula 1", "formula 2", "formula 3",
|
||||
"motogp", "moto gp", "moto2", "moto3", "motoe",
|
||||
"world rally championship", "wrc",
|
||||
"world endurance championship", "wec",
|
||||
"indycar series", "indycar", "indynxt",
|
||||
"nascar cup series", "nascar truck series", "nascar o'reilly auto parts series",
|
||||
"nascar xfinity series", "nascar",
|
||||
}
|
||||
|
||||
# Title keywords that are strong positives even when the category text
|
||||
# is missing (live-now cards sometimes elide it).
|
||||
MOTORSPORT_KEYWORDS = {
|
||||
"formula 1", "formula one", "f1",
|
||||
"motogp", "moto gp", "moto2", "moto3",
|
||||
"rally", "wrc",
|
||||
"indycar", "indy car",
|
||||
"nascar",
|
||||
"le mans", "lemans", "wec", "endurance",
|
||||
}
|
||||
GP_KEYWORD = "grand prix"
|
||||
|
||||
|
||||
@dataclass
|
||||
class _PitsportEvent:
|
||||
"""An event discovered from the Pitsport site."""
|
||||
|
||||
category: str
|
||||
title: str
|
||||
watch_uuid: str
|
||||
|
||||
|
||||
def _is_motorsport_category(category: str) -> bool:
|
||||
"""Check if a category string matches an included motorsport series."""
|
||||
return category.strip().lower() in MOTORSPORT_CATEGORIES
|
||||
|
||||
|
||||
def _is_motorsport_event(category: str, title: str) -> bool:
|
||||
"""Accept anything pitsport.xyz lists. Pitsport curates sports
|
||||
broadcasts (WRC, MotoGP, IndyCar, NASCAR, Premier League Darts,
|
||||
Premier League football, etc.) — the site's own selection is the
|
||||
filter we want. Empty/garbage events still get filtered downstream
|
||||
when `_resolve_event_streams` produces no playable URL."""
|
||||
return bool(category or title)
|
||||
|
||||
|
||||
# Aliases kept so older call-sites stay compiling. Both now point at the
|
||||
# broadened motorsport filter.
|
||||
_is_f1_category = _is_motorsport_category
|
||||
_is_f1_event = _is_motorsport_event
|
||||
|
||||
|
||||
def _parse_live_events(html: str) -> list[_PitsportEvent]:
|
||||
"""Parse live events from the main page RSC payload.
|
||||
|
||||
The main page contains event cards with props:
|
||||
category, title, time, imageUrl
|
||||
wrapped in <a href="/watch/{UUID}"> links.
|
||||
"""
|
||||
events: list[_PitsportEvent] = []
|
||||
|
||||
# Match event cards in the RSC payload - they appear as JSON-like structures
|
||||
# Pattern: href="/watch/UUID" ... category":"...", "title":"..."
|
||||
# In the RSC payload, the data is in the format:
|
||||
# ["$","$L2","/watch/UUID",{"href":"/watch/UUID","children":["$","$L10",null,
|
||||
# {"category":"...","title":"...","time":...,"imageUrl":"..."}]}]
|
||||
pattern = re.compile(
|
||||
r'"href":"(/watch/([0-9a-f-]{36}))"[^}]*?"category":"([^"]+)","title":"([^"]+)"',
|
||||
)
|
||||
for match in pattern.finditer(html):
|
||||
_, uuid, category, title = match.groups()
|
||||
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def _parse_schedule_events(html: str) -> list[_PitsportEvent]:
|
||||
"""Parse events from the schedule page.
|
||||
|
||||
The schedule page groups events under category headers (h2 elements).
|
||||
In the rendered HTML:
|
||||
<h2 ...>Formula 1</h2>
|
||||
<div ...>
|
||||
<a href="/watch/UUID">...</a>
|
||||
...
|
||||
</div>
|
||||
|
||||
In the RSC payload, similar structure with section divs containing
|
||||
a category h2 and child event links with titles.
|
||||
"""
|
||||
events: list[_PitsportEvent] = []
|
||||
|
||||
# Strategy 1: Parse from rendered HTML
|
||||
# Find category sections: >CategoryName</h2> followed by watch links
|
||||
# Split HTML at each category header
|
||||
section_pattern = re.compile(
|
||||
r'>([^<]+)</h2>\s*<div[^>]*class="flex flex-wrap gap-6">(.*?)(?=</div>\s*</div>\s*(?:<div|</div>|$))',
|
||||
re.DOTALL,
|
||||
)
|
||||
for section_match in section_pattern.finditer(html):
|
||||
category = section_match.group(1).strip()
|
||||
section_html = section_match.group(2)
|
||||
|
||||
# Find all watch links in this section
|
||||
link_pattern = re.compile(
|
||||
r'href="/watch/([0-9a-f-]{36})".*?<h1[^>]*>([^<]+)</h1>',
|
||||
re.DOTALL,
|
||||
)
|
||||
for link_match in link_pattern.finditer(section_html):
|
||||
uuid = link_match.group(1)
|
||||
title = link_match.group(2).strip()
|
||||
events.append(
|
||||
_PitsportEvent(category=category, title=title, watch_uuid=uuid)
|
||||
)
|
||||
|
||||
# Strategy 2: Parse from RSC payload if rendered HTML didn't yield results
|
||||
# The RSC payload has patterns like:
|
||||
# "children":"Formula 1"}] ... "/watch/UUID" ... "title":"EventTitle"
|
||||
if not events:
|
||||
events = _parse_schedule_rsc(html)
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def _parse_schedule_rsc(html: str) -> list[_PitsportEvent]:
|
||||
"""Parse events from schedule page RSC payload as fallback.
|
||||
|
||||
Extracts category section divs from the RSC JSON structure.
|
||||
"""
|
||||
events: list[_PitsportEvent] = []
|
||||
|
||||
# Find the RSC payload chunks
|
||||
rsc_chunks = re.findall(
|
||||
r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL
|
||||
)
|
||||
if not rsc_chunks:
|
||||
return events
|
||||
|
||||
# Concatenate and unescape
|
||||
full_payload = ""
|
||||
for chunk in rsc_chunks:
|
||||
try:
|
||||
full_payload += chunk.encode().decode("unicode_escape")
|
||||
except Exception:
|
||||
full_payload += chunk
|
||||
|
||||
# Find category sections in the RSC data
|
||||
# Pattern: "children":"CategoryName"}],["$","div",...watch links...
|
||||
# Each section div contains an h2 with the category name and watch links
|
||||
cat_pattern = re.compile(
|
||||
r'border-gray-700 pb-2","children":"([^"]+)"\}.*?'
|
||||
r'(?=border-gray-700 pb-2","children"|$)',
|
||||
re.DOTALL,
|
||||
)
|
||||
for cat_match in cat_pattern.finditer(full_payload):
|
||||
category = cat_match.group(1)
|
||||
section_text = cat_match.group(0)
|
||||
|
||||
# Find watch UUIDs and titles in this section
|
||||
# Pattern: "/watch/UUID" ... "title":"EventTitle"
|
||||
event_pattern = re.compile(
|
||||
r'/watch/([0-9a-f-]{36}).*?"title":"([^"]+)"',
|
||||
)
|
||||
for ev_match in event_pattern.finditer(section_text):
|
||||
uuid = ev_match.group(1)
|
||||
title = ev_match.group(2)
|
||||
events.append(
|
||||
_PitsportEvent(category=category, title=title, watch_uuid=uuid)
|
||||
)
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def _parse_embed_uuids(html: str) -> list[str]:
|
||||
"""Extract embed UUIDs from a watch page.
|
||||
|
||||
Watch pages contain iframes like:
|
||||
<iframe src="https://pushembdz.store/embed/{EMBED_UUID}" ...>
|
||||
|
||||
And in the RSC payload:
|
||||
"iframe":"https://pushembdz.store/embed/{EMBED_UUID}"
|
||||
"""
|
||||
uuids: list[str] = []
|
||||
|
||||
# From rendered HTML
|
||||
iframe_pattern = re.compile(
|
||||
r'pushembdz\.store/embed/([0-9a-f-]{36})',
|
||||
)
|
||||
for match in iframe_pattern.finditer(html):
|
||||
uuid = match.group(1)
|
||||
if uuid not in uuids:
|
||||
uuids.append(uuid)
|
||||
|
||||
return uuids
|
||||
|
||||
|
||||
@dataclass
|
||||
class _StreamConfig:
|
||||
"""Stream configuration extracted from an embed page."""
|
||||
|
||||
title: str
|
||||
link: str
|
||||
method: str
|
||||
|
||||
|
||||
def _parse_stream_config(html: str) -> _StreamConfig | None:
|
||||
"""Extract stream config from an embed page RSC payload.
|
||||
|
||||
The embed page now uses a `safeStream` payload that elides the link:
|
||||
4:["$","$Ld",null,{"safeStream":{"title":"Rally TV","method":"jwp"},
|
||||
"error":null,"slug":"..."}]
|
||||
The actual stream URL is fetched at runtime via
|
||||
pushembdz.store/api/stream/<slug>. Older payloads used "stream" with
|
||||
inline title+link+method — kept as fallback.
|
||||
"""
|
||||
# Current format: safeStream with title + method only (link via API).
|
||||
pattern_safe = re.compile(
|
||||
r'\\?"safeStream\\?"\s*:\s*\{'
|
||||
r'\\?"title\\?"\s*:\s*\\?"([^"\\]+)\\?"\s*,\s*'
|
||||
r'\\?"method\\?"\s*:\s*\\?"([^"\\]+)\\?"',
|
||||
)
|
||||
match = pattern_safe.search(html)
|
||||
if match:
|
||||
return _StreamConfig(
|
||||
title=match.group(1),
|
||||
link="", # filled in by the caller via the api/stream endpoint
|
||||
method=match.group(2),
|
||||
)
|
||||
|
||||
# Legacy: escaped RSC payload with inline link.
|
||||
pattern = re.compile(
|
||||
r'"stream":\{["\']?\\?"title\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
|
||||
r'["\']?\\?"link\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
|
||||
r'["\']?\\?"method\\?"["\']?:["\']?\\?"([^"\\]+)\\?"',
|
||||
)
|
||||
match = pattern.search(html)
|
||||
if match:
|
||||
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
|
||||
|
||||
pattern2 = re.compile(
|
||||
r'\\?"stream\\?":\{\\?"title\\?":\\?"([^\\]+)\\?",'
|
||||
r'\\?"link\\?":\\?"([^\\]+)\\?",'
|
||||
r'\\?"method\\?":\\?"([^\\]+)\\?"',
|
||||
)
|
||||
match = pattern2.search(html)
|
||||
if match:
|
||||
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
|
||||
|
||||
pattern3 = re.compile(
|
||||
r'"stream"\s*:\s*\{\s*"title"\s*:\s*"([^"]+)"\s*,'
|
||||
r'\s*"link"\s*:\s*"([^"]+)"\s*,'
|
||||
r'\s*"method"\s*:\s*"([^"]+)"',
|
||||
)
|
||||
match = pattern3.search(html)
|
||||
if match:
|
||||
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _is_m3u8_method(method: str) -> bool:
|
||||
"""Check if the stream method indicates a direct HLS stream."""
|
||||
# `jwp` (current pushembdz format) returns an m3u8 from the api/stream
|
||||
# endpoint regardless of player UI; treat it as HLS.
|
||||
return method.lower() in ("player", "hls", "jwp")
|
||||
|
||||
|
||||
def _extract_m3u8_url(link: str) -> str:
|
||||
"""Convert a serveplay.site player URL to an m3u8 playlist URL.
|
||||
|
||||
Input: https://dash.serveplay.site/{channel}/index.html
|
||||
Output: https://dash.serveplay.site/{channel}/index.html
|
||||
|
||||
The index.html IS the m3u8 playlist (served with proper content-type
|
||||
when fetched with the correct Referer header).
|
||||
"""
|
||||
return link
|
||||
|
||||
|
||||
class PitsportExtractor(BaseExtractor):
|
||||
"""Extracts F1 streams from Pitsport.xyz.
|
||||
|
||||
Scrapes the Next.js RSC payload from the main page and schedule page
|
||||
to find F1 events, then resolves embed UUIDs to stream configurations.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "pitsport"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Pitsport"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch F1 events and return stream URLs or embed URLs."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=20.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
# Fetch both pages to get comprehensive event data
|
||||
events = await self._discover_events(client)
|
||||
logger.info(
|
||||
"[pitsport] Found %d F1 event(s) to process", len(events)
|
||||
)
|
||||
|
||||
# Deduplicate by watch UUID
|
||||
seen_uuids: set[str] = set()
|
||||
unique_events: list[_PitsportEvent] = []
|
||||
for ev in events:
|
||||
if ev.watch_uuid not in seen_uuids:
|
||||
seen_uuids.add(ev.watch_uuid)
|
||||
unique_events.append(ev)
|
||||
|
||||
# For each event, resolve streams
|
||||
for event in unique_events:
|
||||
event_streams = await self._resolve_event_streams(
|
||||
client, event
|
||||
)
|
||||
streams.extend(event_streams)
|
||||
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to extract streams")
|
||||
|
||||
logger.info("[pitsport] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
|
||||
async def _discover_events(
|
||||
self, client: httpx.AsyncClient
|
||||
) -> list[_PitsportEvent]:
|
||||
"""Discover F1 events from both main page and schedule page."""
|
||||
all_events: list[_PitsportEvent] = []
|
||||
|
||||
# Fetch main page for live events
|
||||
try:
|
||||
resp = await client.get(PITSPORT_BASE)
|
||||
if resp.status_code == 200:
|
||||
live_events = _parse_live_events(resp.text)
|
||||
logger.info(
|
||||
"[pitsport] Main page: %d live event(s)", len(live_events)
|
||||
)
|
||||
for ev in live_events:
|
||||
if _is_f1_event(ev.category, ev.title):
|
||||
all_events.append(ev)
|
||||
else:
|
||||
logger.warning(
|
||||
"[pitsport] Main page returned HTTP %d", resp.status_code
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to fetch main page")
|
||||
|
||||
# Fetch schedule page for upcoming events
|
||||
try:
|
||||
resp = await client.get(f"{PITSPORT_BASE}/schedule")
|
||||
if resp.status_code == 200:
|
||||
schedule_events = _parse_schedule_events(resp.text)
|
||||
logger.info(
|
||||
"[pitsport] Schedule page: %d total event(s)",
|
||||
len(schedule_events),
|
||||
)
|
||||
for ev in schedule_events:
|
||||
if _is_f1_event(ev.category, ev.title):
|
||||
all_events.append(ev)
|
||||
else:
|
||||
logger.warning(
|
||||
"[pitsport] Schedule page returned HTTP %d",
|
||||
resp.status_code,
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to fetch schedule page")
|
||||
|
||||
return all_events
|
||||
|
||||
async def _resolve_event_streams(
|
||||
self, client: httpx.AsyncClient, event: _PitsportEvent
|
||||
) -> list[ExtractedStream]:
|
||||
"""Resolve an event's watch page to actual stream URLs."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
# Fetch the watch page to get embed UUIDs
|
||||
watch_url = f"{PITSPORT_BASE}/watch/{event.watch_uuid}"
|
||||
resp = await client.get(watch_url)
|
||||
if resp.status_code != 200:
|
||||
logger.debug(
|
||||
"[pitsport] Watch page %s returned HTTP %d",
|
||||
event.watch_uuid,
|
||||
resp.status_code,
|
||||
)
|
||||
return []
|
||||
|
||||
embed_uuids = _parse_embed_uuids(resp.text)
|
||||
if not embed_uuids:
|
||||
logger.debug(
|
||||
"[pitsport] No embed UUIDs found for %s", event.watch_uuid
|
||||
)
|
||||
return []
|
||||
|
||||
logger.debug(
|
||||
"[pitsport] Event '%s' has %d embed(s)",
|
||||
event.title,
|
||||
len(embed_uuids),
|
||||
)
|
||||
|
||||
# Resolve each embed to a stream config
|
||||
for i, embed_uuid in enumerate(embed_uuids):
|
||||
stream = await self._resolve_embed(
|
||||
client, embed_uuid, event, stream_num=i + 1
|
||||
)
|
||||
if stream:
|
||||
streams.append(stream)
|
||||
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[pitsport] Failed to resolve event %s",
|
||||
event.watch_uuid,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
return streams
|
||||
|
||||
async def _resolve_embed(
|
||||
self,
|
||||
client: httpx.AsyncClient,
|
||||
embed_uuid: str,
|
||||
event: _PitsportEvent,
|
||||
stream_num: int,
|
||||
) -> ExtractedStream | None:
|
||||
"""Resolve an embed UUID to a stream configuration."""
|
||||
try:
|
||||
embed_url = f"{EMBED_BASE}/embed/{embed_uuid}"
|
||||
resp = await client.get(embed_url)
|
||||
if resp.status_code != 200:
|
||||
logger.debug(
|
||||
"[pitsport] Embed page %s returned HTTP %d",
|
||||
embed_uuid,
|
||||
resp.status_code,
|
||||
)
|
||||
return None
|
||||
|
||||
config = _parse_stream_config(resp.text)
|
||||
if not config:
|
||||
logger.debug(
|
||||
"[pitsport] No stream config found in embed %s",
|
||||
embed_uuid,
|
||||
)
|
||||
return None
|
||||
|
||||
# Build the stream title
|
||||
stream_title = f"{event.category} - {event.title}"
|
||||
if config.title:
|
||||
stream_title += f" ({config.title})"
|
||||
if stream_num > 1:
|
||||
stream_title += f" #{stream_num}"
|
||||
|
||||
# `safeStream` payload elides the link — fetch it from the
|
||||
# pushembdz.store/api/stream/<slug> endpoint. Older `stream`
|
||||
# payloads provided the link inline.
|
||||
link = config.link
|
||||
if not link and _is_m3u8_method(config.method):
|
||||
api_url = f"{EMBED_BASE}/api/stream/{embed_uuid}"
|
||||
try:
|
||||
api_resp = await client.get(
|
||||
api_url,
|
||||
headers={"Referer": embed_url, "Accept": "application/json"},
|
||||
)
|
||||
if api_resp.status_code == 200:
|
||||
link = (api_resp.json() or {}).get("link", "")
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[pitsport] api/stream lookup failed for %s",
|
||||
embed_uuid,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
# Treat any HLS-ish URL (m3u8, or pushembdz's .css disguise) as m3u8.
|
||||
looks_hls = link and (".m3u8" in link or link.endswith(".css") or "serveplay.site" in link)
|
||||
if _is_m3u8_method(config.method) and looks_hls:
|
||||
return ExtractedStream(
|
||||
url=link,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=stream_title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
else:
|
||||
# Iframe embed fallback
|
||||
return ExtractedStream(
|
||||
url=embed_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=stream_title,
|
||||
stream_type="embed",
|
||||
embed_url=embed_url,
|
||||
)
|
||||
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[pitsport] Failed to resolve embed %s",
|
||||
embed_uuid,
|
||||
exc_info=True,
|
||||
)
|
||||
return None
|
||||
270
stacks/f1-stream/files/backend/extractors/ppv.py
Normal file
270
stacks/f1-stream/files/backend/extractors/ppv.py
Normal file
|
|
@ -0,0 +1,270 @@
|
|||
"""PPV.to extractor - fetches F1 streams via the public PPV API.
|
||||
|
||||
Returns embed URLs (pooembed.eu) for iframe playback.
|
||||
The API at api.ppv.to/api/streams requires no authentication.
|
||||
Falls back to api.ppv.st if the primary API is unreachable.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
PRIMARY_API = "https://api.ppv.to/api/streams"
|
||||
FALLBACK_API = "https://api.ppv.st/api/streams"
|
||||
EMBED_BASE = "https://pooembed.eu/embed"
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Category name for motorsport on PPV.to
|
||||
MOTORSPORT_CATEGORY = "motorsports"
|
||||
|
||||
# Only include events matching these keywords (case-insensitive)
|
||||
F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1"}
|
||||
# Grand Prix is shared with MotoGP/IndyCar — only match if no other series keywords
|
||||
GP_KEYWORD = "grand prix"
|
||||
NON_F1_KEYWORDS = {
|
||||
"motogp", "moto gp", "moto2", "moto3", "motoe",
|
||||
"indycar", "indy car", "firestone", "nascar",
|
||||
"rally", "wrc", "wec", "lemans", "le mans",
|
||||
"superbike", "dtm", "supercars",
|
||||
}
|
||||
|
||||
|
||||
def _is_f1_stream(name: str, category_name: str = "") -> bool:
|
||||
"""Check if a stream is Formula 1 related.
|
||||
|
||||
Checks both the stream name and the category name.
|
||||
A stream qualifies if:
|
||||
- It is in the motorsport category AND matches F1 keywords, OR
|
||||
- It matches F1 keywords regardless of category.
|
||||
"""
|
||||
lower_name = name.lower()
|
||||
lower_cat = category_name.lower()
|
||||
|
||||
# Reject if it contains non-F1 motorsport keywords
|
||||
if any(kw in lower_name for kw in NON_F1_KEYWORDS):
|
||||
return False
|
||||
|
||||
# Direct F1 keyword match in the stream name
|
||||
if any(kw in lower_name for kw in F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
# "grand prix" in the name, only if in motorsports category and no non-F1 keywords
|
||||
if GP_KEYWORD in lower_name and MOTORSPORT_CATEGORY in lower_cat:
|
||||
return True
|
||||
|
||||
# If the category is motorsport, also check category-level keywords
|
||||
if MOTORSPORT_CATEGORY in lower_cat and any(kw in lower_cat for kw in F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
class PPVExtractor(BaseExtractor):
|
||||
"""Extracts embed URLs from PPV.to's public JSON API.
|
||||
|
||||
Uses the endpoint:
|
||||
- GET https://api.ppv.to/api/streams -> all streams grouped by category
|
||||
- Fallback: https://api.ppv.st/api/streams
|
||||
|
||||
Each stream object contains an `iframe` field with the embed URL,
|
||||
or a `uri_name` from which the embed URL can be constructed.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "ppv"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "PPV.to"
|
||||
|
||||
async def _fetch_streams(self, client: httpx.AsyncClient) -> dict | None:
|
||||
"""Try primary and fallback APIs, return parsed JSON or None."""
|
||||
for api_url in (PRIMARY_API, FALLBACK_API):
|
||||
try:
|
||||
resp = await client.get(api_url)
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
logger.info("[ppv] Fetched streams from %s", api_url)
|
||||
return data
|
||||
logger.warning(
|
||||
"[ppv] %s returned HTTP %d", api_url, resp.status_code
|
||||
)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[ppv] Failed to reach %s", api_url, exc_info=True
|
||||
)
|
||||
return None
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch F1 streams and return embed URLs for iframe playback."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
|
||||
) as client:
|
||||
data = await self._fetch_streams(client)
|
||||
if data is None:
|
||||
logger.warning("[ppv] Could not fetch streams from any API")
|
||||
return []
|
||||
|
||||
# The API returns:
|
||||
# { "streams": [ { "category": "Name", "id": N, "streams": [...] }, ... ] }
|
||||
# Flatten into (category_name, stream_obj) tuples.
|
||||
all_streams = self._normalize_streams(data)
|
||||
|
||||
logger.info(
|
||||
"[ppv] Found %d total stream(s) across all categories",
|
||||
len(all_streams),
|
||||
)
|
||||
|
||||
for category_name, stream_obj in all_streams:
|
||||
name = stream_obj.get("name", "") or stream_obj.get("title", "")
|
||||
|
||||
if not _is_f1_stream(name, category_name):
|
||||
continue
|
||||
|
||||
# Build the embed URL
|
||||
embed_url = self._get_embed_url(stream_obj)
|
||||
if not embed_url:
|
||||
logger.debug("[ppv] No embed URL for stream: %s", name)
|
||||
continue
|
||||
|
||||
# Extract quality from tag if present
|
||||
tag = stream_obj.get("tag", "")
|
||||
quality = tag if tag else ""
|
||||
|
||||
# Build descriptive title
|
||||
title = name
|
||||
viewers = stream_obj.get("viewers")
|
||||
if viewers and int(viewers) > 0:
|
||||
title += f" ({viewers} viewers)"
|
||||
|
||||
# Check for substreams (multiple quality/language options)
|
||||
substreams = stream_obj.get("substreams")
|
||||
if isinstance(substreams, list) and substreams:
|
||||
for i, sub in enumerate(substreams):
|
||||
sub_embed = sub.get("iframe", "") or sub.get("embed_url", "")
|
||||
if not sub_embed:
|
||||
# Fall back to the parent embed URL
|
||||
sub_embed = embed_url
|
||||
sub_name = sub.get("name", "") or sub.get("label", "")
|
||||
sub_quality = sub.get("tag", "") or sub.get("quality", "") or quality
|
||||
sub_title = f"{name}"
|
||||
if sub_name:
|
||||
sub_title += f" - {sub_name}"
|
||||
elif i > 0:
|
||||
sub_title += f" #{i + 1}"
|
||||
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=sub_embed,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=sub_quality,
|
||||
title=sub_title,
|
||||
stream_type="embed",
|
||||
embed_url=sub_embed,
|
||||
)
|
||||
)
|
||||
else:
|
||||
# Single stream, no substreams
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=embed_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=quality,
|
||||
title=title,
|
||||
stream_type="embed",
|
||||
embed_url=embed_url,
|
||||
)
|
||||
)
|
||||
|
||||
except Exception:
|
||||
logger.exception("[ppv] Failed to extract streams")
|
||||
|
||||
logger.info("[ppv] Extracted %d F1 stream(s)", len(streams))
|
||||
return streams
|
||||
|
||||
@staticmethod
|
||||
def _normalize_streams(data: dict | list) -> list[tuple[str, dict]]:
|
||||
"""Normalize the API response into a flat list of (category_name, stream_dict) tuples.
|
||||
|
||||
The PPV API returns data in this shape:
|
||||
{
|
||||
"streams": [
|
||||
{
|
||||
"category": "Motorsports",
|
||||
"id": 35,
|
||||
"streams": [ { stream objects... } ]
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
Each category group has a "category" string and a nested "streams" list.
|
||||
"""
|
||||
result: list[tuple[str, dict]] = []
|
||||
|
||||
# Handle the top-level wrapper
|
||||
if isinstance(data, dict):
|
||||
categories = data.get("streams", [])
|
||||
elif isinstance(data, list):
|
||||
categories = data
|
||||
else:
|
||||
return result
|
||||
|
||||
for category_group in categories:
|
||||
if not isinstance(category_group, dict):
|
||||
continue
|
||||
|
||||
category_name = category_group.get("category", "")
|
||||
|
||||
# The nested streams within this category
|
||||
inner_streams = category_group.get("streams", [])
|
||||
if isinstance(inner_streams, list):
|
||||
for stream_obj in inner_streams:
|
||||
if isinstance(stream_obj, dict):
|
||||
# Attach category_name to each stream for filtering
|
||||
result.append((category_name, stream_obj))
|
||||
elif isinstance(category_group, dict) and "name" in category_group:
|
||||
# Fallback: the item itself is a stream (flat list format)
|
||||
result.append((category_name, category_group))
|
||||
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def _get_embed_url(stream: dict) -> str:
|
||||
"""Extract or construct the embed URL for a stream."""
|
||||
# Prefer the iframe field directly
|
||||
iframe = stream.get("iframe", "")
|
||||
if iframe:
|
||||
return iframe
|
||||
|
||||
# Construct from uri_name
|
||||
uri_name = stream.get("uri_name", "") or stream.get("uri", "")
|
||||
if uri_name:
|
||||
# Strip leading slash if present
|
||||
uri_name = uri_name.lstrip("/")
|
||||
return f"{EMBED_BASE}/{uri_name}"
|
||||
|
||||
# Last resort: use the stream id
|
||||
stream_id = stream.get("id")
|
||||
if stream_id:
|
||||
return f"{EMBED_BASE}/{stream_id}"
|
||||
|
||||
return ""
|
||||
|
|
@ -6,6 +6,7 @@ from datetime import datetime, timezone
|
|||
from backend.extractors.models import ExtractedStream
|
||||
from backend.extractors.registry import ExtractorRegistry
|
||||
from backend.health import StreamHealthChecker
|
||||
from backend.playback_verifier import PlaybackVerifier
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
@ -29,6 +30,11 @@ class ExtractionService:
|
|||
self._last_run: str | None = None
|
||||
self._last_run_stream_count: int = 0
|
||||
self._health_checker = StreamHealthChecker()
|
||||
self._playback_verifier = PlaybackVerifier()
|
||||
|
||||
async def shutdown(self) -> None:
|
||||
"""Release the headless browser instance owned by the verifier."""
|
||||
await self._playback_verifier.shutdown()
|
||||
|
||||
async def run_extraction(self) -> None:
|
||||
"""Run all extractors, health-check results, and cache them.
|
||||
|
|
@ -43,31 +49,93 @@ class ExtractionService:
|
|||
|
||||
streams = await self._registry.extract_all()
|
||||
|
||||
# Run health checks on all extracted streams
|
||||
# Dedupe by canonical URL — pitsport surfaces every WRC stage as a
|
||||
# separate event but they all point at the same RallyTV master.m3u8
|
||||
# (and similar for MotoGP weekend sessions). Keep the first
|
||||
# occurrence so the user sees one entry per actual stream.
|
||||
deduped: list[ExtractedStream] = []
|
||||
seen_urls: set[str] = set()
|
||||
for stream in streams:
|
||||
key = (stream.embed_url or "").strip() or (stream.url or "").strip()
|
||||
if not key or key in seen_urls:
|
||||
continue
|
||||
seen_urls.add(key)
|
||||
deduped.append(stream)
|
||||
if len(deduped) < len(streams):
|
||||
logger.info(
|
||||
"Deduped streams: %d -> %d (collapsed %d duplicate URL(s))",
|
||||
len(streams), len(deduped), len(streams) - len(deduped),
|
||||
)
|
||||
streams = deduped
|
||||
|
||||
# Run health checks + headless-browser playback verification.
|
||||
# Both stream types are now verified end-to-end so the user only
|
||||
# ever sees streams that actually play in a browser.
|
||||
if streams:
|
||||
# Separate m3u8 streams (need health check) from embed streams (skip)
|
||||
m3u8_streams = [s for s in streams if s.stream_type != "embed"]
|
||||
embed_streams = [s for s in streams if s.stream_type == "embed"]
|
||||
|
||||
# Mark embed streams as live (no health check possible for iframes)
|
||||
for stream in embed_streams:
|
||||
stream.is_live = True
|
||||
stream.response_time_ms = 0
|
||||
stream.checked_at = start.isoformat()
|
||||
|
||||
# Health-check only m3u8 streams
|
||||
# m3u8 streams: cheap structural health check (validates manifest,
|
||||
# checks first variant playlist), then a headless-browser test
|
||||
# to confirm hls.js can decode and render frames.
|
||||
if m3u8_streams:
|
||||
stream_dicts = [s.to_dict() for s in m3u8_streams]
|
||||
health_map = await self._health_checker.check_all(stream_dicts)
|
||||
|
||||
for stream in m3u8_streams:
|
||||
health = health_map.get(stream.url)
|
||||
if health:
|
||||
stream.is_live = health.is_live
|
||||
stream.response_time_ms = health.response_time_ms
|
||||
stream.checked_at = health.checked_at
|
||||
if health.bitrate > 0:
|
||||
stream.bitrate = health.bitrate
|
||||
# tentatively mark live; final word comes from the verifier
|
||||
stream.is_live = health.is_live
|
||||
|
||||
# Browser verification: applies to both m3u8 (only those that
|
||||
# passed structural health) and embed (always — they have no
|
||||
# other way to verify).
|
||||
verify_items: list[tuple[str, str]] = []
|
||||
for stream in m3u8_streams:
|
||||
if stream.is_live:
|
||||
verify_items.append((stream.url, "m3u8"))
|
||||
for stream in embed_streams:
|
||||
verify_items.append((stream.embed_url or stream.url, "embed"))
|
||||
|
||||
verdicts = await self._playback_verifier.verify_many(verify_items)
|
||||
|
||||
now_iso = datetime.now(timezone.utc).isoformat()
|
||||
for stream in m3u8_streams:
|
||||
if not stream.is_live:
|
||||
continue # already failed health check
|
||||
verdict = verdicts.get(stream.url)
|
||||
if verdict is None:
|
||||
continue # verifier disabled or unavailable
|
||||
stream.is_live = verdict.is_playable
|
||||
stream.checked_at = now_iso
|
||||
|
||||
# Curated streams skip the verifier — they are hand-picked
|
||||
# 24/7 channels whose embed pages aggressively detect headless
|
||||
# automation. We can't reliably confirm playback server-side,
|
||||
# but we trust the curator. The user's real browser does NOT
|
||||
# trigger the same anti-bot heuristics (real plugins, real
|
||||
# mouse movements, etc.).
|
||||
CURATED_BYPASS = {"curated"}
|
||||
for stream in embed_streams:
|
||||
stream.checked_at = now_iso
|
||||
if stream.site_key in CURATED_BYPASS:
|
||||
stream.is_live = True
|
||||
stream.response_time_ms = 0
|
||||
continue
|
||||
key = stream.embed_url or stream.url
|
||||
verdict = verdicts.get(key)
|
||||
if verdict is None:
|
||||
# Verifier unavailable — fall back to "trust extractor".
|
||||
# This keeps the service usable even without playwright.
|
||||
stream.is_live = True
|
||||
stream.response_time_ms = 0
|
||||
else:
|
||||
stream.is_live = verdict.is_playable
|
||||
stream.response_time_ms = verdict.elapsed_ms
|
||||
|
||||
# Group streams by site_key and update cache
|
||||
new_cache: dict[str, list[ExtractedStream]] = {}
|
||||
|
|
|
|||
|
|
@ -9,7 +9,9 @@ from backend.extractors.models import ExtractedStream
|
|||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
BASE_URL = "https://streamed.su"
|
||||
# Site renamed from streamed.su → streamed.pk in 2026; the .su domain
|
||||
# stopped resolving the API host (only the marketing page is left).
|
||||
BASE_URL = "https://streamed.pk"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
|
|
|
|||
161
stacks/f1-stream/files/backend/extractors/stremio.py
Normal file
161
stacks/f1-stream/files/backend/extractors/stremio.py
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
"""Stremio-addon-driven extractor.
|
||||
|
||||
Stremio addons expose a public HTTP API: each addon has a manifest at
|
||||
`<base>/manifest.json` and per-resource endpoints like
|
||||
`<base>/stream/<type>/<id>.json` returning `{streams:[{url,name,...}]}`.
|
||||
|
||||
This extractor calls a curated set of live-TV addons that surface F1
|
||||
and Sky-Sports-class motorsport channels. We treat each returned URL as
|
||||
an ExtractedStream and let the playback verifier confirm playability.
|
||||
We don't need a Stremio client — we just call the documented HTTP API.
|
||||
|
||||
Findings from initial research (2026-05-07):
|
||||
- **TvVoo** (`tvvoo.hayd.uk`) — wraps the Vavoo IPTV network, lists
|
||||
Sky Sports F1 (UK + IT + DE), DAZN F1, Movistar F1, Canal+ F1,
|
||||
Viaplay F1. The returned m3u8 URLs are IP-bound at the Vavoo CDN
|
||||
(`*.ngolpdkyoctjcddxshli469r.org/sunshine/...`); they're tokenised
|
||||
to whichever IP fetched the manifest. Currently their SSL certs have
|
||||
expired which fails most clients — the addon framework is right but
|
||||
delivery is degraded today.
|
||||
- **StremVerse** (`stremverse.onrender.com`) — returns 11+ streams per
|
||||
catalog id (`stremevent_591`=F1, `stremevent_866`=MotoGP). Mix of
|
||||
DRM-walled DASH, JW-Player-broken-chain JWT, and apar151 HuggingFace
|
||||
proxy URLs. Master playlists parse; variant URLs sometimes return 404
|
||||
if they're meant to be resolved by the addon's player rather than
|
||||
directly.
|
||||
|
||||
Adding a new addon = one entry in `_ADDONS`. Each addon's resolver only
|
||||
needs the manifest + stream endpoints; the addon does the heavy lifting.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import Iterable
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class _Addon:
|
||||
name: str
|
||||
base: str # e.g. "https://tvvoo.hayd.uk"
|
||||
stream_ids: tuple[tuple[str, str, str], ...]
|
||||
"""(stream_type, stream_id, label) per F1/motorsport entry."""
|
||||
|
||||
|
||||
# Curated addon list — see module docstring. These IDs are documented in
|
||||
# the addons' manifests / channel lists. Update when channel names/IDs
|
||||
# rotate.
|
||||
_ADDONS: tuple[_Addon, ...] = (
|
||||
_Addon(
|
||||
name="TvVoo",
|
||||
base="https://tvvoo.hayd.uk",
|
||||
stream_ids=(
|
||||
("tv", "vavoo_SKY%20SPORTS%20F1|group:uk", "Sky Sports F1 UK (Vavoo)"),
|
||||
("tv", "vavoo_SKY%20SPORTS%20F1%20HD|group:uk", "Sky Sports F1 HD UK (Vavoo)"),
|
||||
("tv", "vavoo_SKY%20SPORT%20F1|group:it", "Sky Sport F1 IT (Vavoo)"),
|
||||
("tv", "vavoo_SKY%20SPORT%20F1%20HD|group:de", "Sky Sport F1 DE (Vavoo)"),
|
||||
("tv", "vavoo_DAZN%20F1|group:es", "DAZN F1 ES (Vavoo)"),
|
||||
),
|
||||
),
|
||||
_Addon(
|
||||
name="StremVerse",
|
||||
base="https://stremverse.onrender.com",
|
||||
stream_ids=(
|
||||
("tv", "stremevent_591", "Formula 1 (StremVerse)"),
|
||||
("tv", "stremevent_866", "MotoGP (StremVerse)"),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class StremioAddonExtractor(BaseExtractor):
|
||||
"""Pull F1 + Sky-class motorsport URLs from public Stremio addons."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "stremio"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Stremio Addon"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
# Some addons (TvVoo→Vavoo) hand back URLs whose origin certs
|
||||
# are expired; honest-default verify=True is preserved here so
|
||||
# the verifier sees the same TLS errors a browser would.
|
||||
) as client:
|
||||
tasks = []
|
||||
for addon in _ADDONS:
|
||||
for stype, sid, label in addon.stream_ids:
|
||||
tasks.append(self._resolve(client, addon, stype, sid, label))
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
streams: list[ExtractedStream] = []
|
||||
for r in results:
|
||||
if isinstance(r, Exception):
|
||||
logger.debug("[stremio] resolve failed: %s", r)
|
||||
continue
|
||||
streams.extend(r)
|
||||
|
||||
logger.info("[stremio] surfaced %d candidate stream URL(s) across %d addon(s)",
|
||||
len(streams), len(_ADDONS))
|
||||
return streams
|
||||
|
||||
async def _resolve(
|
||||
self, client: httpx.AsyncClient, addon: _Addon,
|
||||
stype: str, sid: str, label: str,
|
||||
) -> list[ExtractedStream]:
|
||||
url = f"{addon.base}/stream/{stype}/{sid}.json"
|
||||
try:
|
||||
resp = await client.get(url)
|
||||
except Exception as e:
|
||||
logger.debug("[stremio] %s fetch failed: %s", url, e)
|
||||
return []
|
||||
if resp.status_code != 200:
|
||||
logger.debug("[stremio] %s -> HTTP %d", url, resp.status_code)
|
||||
return []
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
out: list[ExtractedStream] = []
|
||||
for idx, s in enumerate(data.get("streams") or []):
|
||||
stream_url = (s.get("url") or "").strip()
|
||||
if not stream_url:
|
||||
continue
|
||||
# Skip DRM-tagged entries — they need Widevine which neither
|
||||
# our verifier nor a clean hls.js path can play.
|
||||
if "DRM" in (s.get("name") or "").upper():
|
||||
continue
|
||||
title = label
|
||||
if idx > 0:
|
||||
title = f"{label} #{idx + 1}"
|
||||
out.append(
|
||||
ExtractedStream(
|
||||
url=stream_url,
|
||||
site_key=self.site_key,
|
||||
site_name=f"{addon.name}",
|
||||
quality="",
|
||||
title=title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
)
|
||||
return out
|
||||
249
stacks/f1-stream/files/backend/extractors/subreddit.py
Normal file
249
stacks/f1-stream/files/backend/extractors/subreddit.py
Normal file
|
|
@ -0,0 +1,249 @@
|
|||
"""Subreddit extractor — pulls community-curated live-stream URLs from
|
||||
the *MotorsportsReplays* subreddit (and a few siblings).
|
||||
|
||||
The community follows a stable pattern: a single mod-curated post titled
|
||||
`[Watch / Download] <Series> <Year> - <Round> | <Event>` goes up on or
|
||||
near each race weekend with a `**Watch Online:**` link in the selftext,
|
||||
pointing at an admin-run WordPress site (motomundo.net for MotoGP, the
|
||||
F1 equivalent has rotated over the years). That WordPress page hosts
|
||||
iframe embeds whose m3u8 is JS-computed at load time — ideal target for
|
||||
the chrome-service pipeline downstream.
|
||||
|
||||
This extractor:
|
||||
- Hits Reddit with a real-browser User-Agent (httpx default UA + cluster
|
||||
IP combo gets HTTP 403'd on r/motogp; a Safari UA does not).
|
||||
- Searches for the `[Watch` thread pattern AND scans `/new.json` for
|
||||
any flair set to LIVE.
|
||||
- Pulls selftext URLs and returns each candidate as an `embed`-type
|
||||
ExtractedStream. The verifier already drives chrome-service for embed
|
||||
streams, so the m3u8 capture happens there.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import re
|
||||
import urllib.parse
|
||||
from typing import NamedTuple
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
# Subreddits to scan.
|
||||
# - r/motorsportsstreams2 is the active 12.5k-sub successor to the banned
|
||||
# r/motorsportstreams; race-weekend "[F1 STREAM]" posts include
|
||||
# `boxboxbox.pro/stream-1` URLs and similar fresh aggregator links.
|
||||
# - r/MotorsportsReplays runs the [Watch / Download] mod-post pattern
|
||||
# linking to motomundo.net (MotoGP) and sister sites.
|
||||
# - The rest are low-yield but cost nothing.
|
||||
SUBREDDITS: tuple[str, ...] = (
|
||||
"motorsportsstreams2",
|
||||
"MotorsportsReplays",
|
||||
"f1streams",
|
||||
"motorsports",
|
||||
"formula1",
|
||||
"motogp",
|
||||
)
|
||||
|
||||
# Search queries fired against r/motorsportsstreams2 + r/MotorsportsReplays.
|
||||
# The first set captures the [Watch / Download] mod posts; the second set
|
||||
# catches race-weekend live discussion threads.
|
||||
SEARCH_QUERIES: tuple[str, ...] = (
|
||||
"Watch Download F1 2026",
|
||||
"Watch Download MotoGP 2026",
|
||||
"Watch Online F1 2026",
|
||||
"F1 STREAM live",
|
||||
"Sky Sports F1 live",
|
||||
"Sky F1 stream",
|
||||
)
|
||||
|
||||
# Hosts we accept as "interesting" stream-page URLs. These are the
|
||||
# admin-curated WordPress / aggregator sites the community links to.
|
||||
# Anchored to what r/motorsportsstreams2 currently posts (May 2026 sweep).
|
||||
_INTERESTING_HOSTS = (
|
||||
# WordPress wrappers / community-run sites
|
||||
"motomundo.net", # MotoGP — admin-curated WP
|
||||
"motomundo.top", # MotoMundo embed host
|
||||
"motomundo.upns.xyz", # MotoMundo embed host (newer)
|
||||
"freemotorsports.com", # WAC successor curated link list
|
||||
"boxboxbox.pro", # F1 race-weekend aggregator (community fav)
|
||||
"boxboxbox.live", # boxboxbox sister
|
||||
"boxboxbox.lol",
|
||||
# Aggregators we already have direct extractors for, but Reddit may
|
||||
# surface event-specific deeplinks (e.g. /watch/<UUID>) we'd miss
|
||||
# otherwise.
|
||||
"pitsport.xyz",
|
||||
"pitsport.live",
|
||||
"rerace.io",
|
||||
"dd12streams.com",
|
||||
"ppv.to",
|
||||
"streamed.pk",
|
||||
"acestrlms.pages.dev",
|
||||
"aceztrims.pages.dev",
|
||||
# Sport-specific direct CDNs that occasionally appear in posts
|
||||
"racelive.jp", # Super Formula
|
||||
"cdn.sfgo.jp", # Super Formula CDN
|
||||
# Speculative F1 sister sites — pattern likely if motomundo for MotoGP
|
||||
"f1mundo.net",
|
||||
"f1.live",
|
||||
"f1live",
|
||||
"skystreams",
|
||||
"raceon",
|
||||
"watchf1",
|
||||
)
|
||||
|
||||
# URLs we actively never try to scrape (auth-walled, social media,
|
||||
# direct downloads with no live stream).
|
||||
_REJECT_HOSTS = (
|
||||
"discord.gg", "discord.com",
|
||||
"twitter.com", "x.com",
|
||||
"youtube.com", "youtu.be",
|
||||
"instagram.com", "tiktok.com",
|
||||
"f1tv.formula1.com",
|
||||
"viktorbarzin.me",
|
||||
"gofile.io",
|
||||
"mega.nz", "drive.google.com",
|
||||
"1fichier.com", "rapidgator", "uploaded.net",
|
||||
"magnet:",
|
||||
)
|
||||
|
||||
_URL_RE = re.compile(r"https?://[^\s\)\]\>\"']+")
|
||||
|
||||
|
||||
class _Candidate(NamedTuple):
|
||||
title: str
|
||||
url: str
|
||||
subreddit: str
|
||||
flair: str
|
||||
|
||||
|
||||
def _is_interesting(url: str) -> bool:
|
||||
low = url.lower()
|
||||
if any(host in low for host in _REJECT_HOSTS):
|
||||
return False
|
||||
return any(host in low for host in _INTERESTING_HOSTS)
|
||||
|
||||
|
||||
def _has_live_marker(post: dict) -> bool:
|
||||
title = (post.get("title") or "").lower()
|
||||
flair = (post.get("link_flair_text") or "").lower()
|
||||
if "[watch" in title or "watch online" in title or "live" in flair:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
class SubredditExtractor(BaseExtractor):
|
||||
"""Scan motorsport subreddits for community-curated live-stream URLs."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "subreddit"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Subreddit"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
# NB: do NOT send `Accept: application/json` — Reddit's anti-bot
|
||||
# fingerprint flags that header from datacenter IPs and returns
|
||||
# HTTP 403 with HTML. Default Accept (`*/*`) gets through fine
|
||||
# and `.json` URLs always return JSON regardless.
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
tasks = [self._fetch_new(client, sub) for sub in SUBREDDITS]
|
||||
tasks.extend(self._search(client, q) for q in SEARCH_QUERIES)
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
candidates: list[_Candidate] = []
|
||||
for r in results:
|
||||
if isinstance(r, Exception):
|
||||
logger.debug("[subreddit] fetch failed: %s", r)
|
||||
continue
|
||||
candidates.extend(r)
|
||||
|
||||
# Dedupe by URL, keep first occurrence.
|
||||
seen: set[str] = set()
|
||||
picks: list[_Candidate] = []
|
||||
for c in candidates:
|
||||
if c.url in seen:
|
||||
continue
|
||||
seen.add(c.url)
|
||||
picks.append(c)
|
||||
|
||||
logger.info(
|
||||
"[subreddit] scanned %d source(s) — %d unique candidate URL(s)",
|
||||
len(SUBREDDITS) + len(SEARCH_QUERIES), len(picks),
|
||||
)
|
||||
return [
|
||||
ExtractedStream(
|
||||
url=c.url,
|
||||
site_key=self.site_key,
|
||||
site_name=f"r/{c.subreddit}",
|
||||
quality="",
|
||||
title=c.title[:100],
|
||||
stream_type="embed",
|
||||
embed_url=c.url,
|
||||
)
|
||||
for c in picks
|
||||
]
|
||||
|
||||
async def _fetch_new(self, client: httpx.AsyncClient, sub: str) -> list[_Candidate]:
|
||||
return await self._collect(
|
||||
client,
|
||||
f"https://www.reddit.com/r/{sub}/new.json?limit=25",
|
||||
sub,
|
||||
)
|
||||
|
||||
async def _search(self, client: httpx.AsyncClient, query: str) -> list[_Candidate]:
|
||||
q = urllib.parse.quote_plus(query)
|
||||
return await self._collect(
|
||||
client,
|
||||
f"https://www.reddit.com/r/MotorsportsReplays/search.json?q={q}&restrict_sr=on&sort=new&limit=10",
|
||||
"MotorsportsReplays",
|
||||
)
|
||||
|
||||
async def _collect(
|
||||
self, client: httpx.AsyncClient, url: str, sub: str
|
||||
) -> list[_Candidate]:
|
||||
try:
|
||||
resp = await client.get(url)
|
||||
except Exception as e:
|
||||
logger.debug("[subreddit] fetch %s failed: %s", url, e)
|
||||
return []
|
||||
if resp.status_code != 200:
|
||||
logger.debug("[subreddit] %s -> HTTP %d", url, resp.status_code)
|
||||
return []
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception:
|
||||
return []
|
||||
out: list[_Candidate] = []
|
||||
for child in (data.get("data", {}) or {}).get("children", []):
|
||||
d = child.get("data", {}) or {}
|
||||
if not _has_live_marker(d):
|
||||
continue
|
||||
text = (d.get("selftext") or "")
|
||||
title = d.get("title") or ""
|
||||
flair = d.get("link_flair_text") or ""
|
||||
# First, the linked URL itself (if it's a recognised live site).
|
||||
top = d.get("url") or ""
|
||||
if top and _is_interesting(top):
|
||||
out.append(_Candidate(title, top, sub, flair))
|
||||
# Then any URL embedded in the selftext that points at a
|
||||
# community-curated live page.
|
||||
for u in _URL_RE.findall(text):
|
||||
if _is_interesting(u):
|
||||
out.append(_Candidate(title, u, sub, flair))
|
||||
return out
|
||||
190
stacks/f1-stream/files/backend/extractors/timstreams.py
Normal file
190
stacks/f1-stream/files/backend/extractors/timstreams.py
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
"""TimStreams extractor - fetches F1 streams from the TimStreams JSON API.
|
||||
|
||||
Returns embed URLs from hmembeds.one for iframe playback.
|
||||
The public API at stra.viaplus.site/main requires no authentication
|
||||
and returns all events/channels across Events, Replays, and 24/7 categories.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
API_URL = "https://stra.viaplus.site/main"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Direct F1 keyword matches (case-insensitive)
|
||||
F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1", "dazn f1"}
|
||||
# "Grand prix" is F1-related only if non-F1 motorsport keywords are absent
|
||||
GP_KEYWORD = "grand prix"
|
||||
# Exclude these motorsport series when matching on "grand prix"
|
||||
NON_F1_KEYWORDS = {
|
||||
"motogp", "moto gp", "moto2", "moto3", "motoe",
|
||||
"indycar", "indy car", "nascar",
|
||||
"rally", "wrc", "wec", "lemans", "le mans",
|
||||
"superbike", "dtm", "supercars",
|
||||
}
|
||||
|
||||
# 24/7 channels that should always be included (embed hashes on hmembeds.one)
|
||||
ALWAYS_INCLUDE_HASHES = {
|
||||
"888520f36cd94c5da4c71fddc1a5fc9b", # Sky Sports F1
|
||||
"fc3a54634d0867b0c02ee3223292e7c6", # DAZN F1
|
||||
}
|
||||
|
||||
|
||||
def _is_f1_event(name: str) -> bool:
|
||||
"""Check if an event/channel is Formula 1 related by name.
|
||||
|
||||
Returns True when the name contains a direct F1 keyword, or contains
|
||||
"grand prix" without non-F1 series keywords.
|
||||
|
||||
Note: The TimStreams API genre field (genre=2) covers ALL sports channels,
|
||||
not just motorsport, so we rely solely on name-based matching.
|
||||
"""
|
||||
lower = name.lower()
|
||||
|
||||
# Direct F1 keyword match
|
||||
if any(kw in lower for kw in F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
# Grand prix without competing series
|
||||
if GP_KEYWORD in lower and not any(kw in lower for kw in NON_F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def _extract_embed_hash(url: str) -> str | None:
|
||||
"""Extract the hash from an hmembeds.one embed URL.
|
||||
|
||||
Expected format: https://hmembeds.one/embed/{hash}
|
||||
Returns the hash string, or None if the URL is not in the expected format.
|
||||
"""
|
||||
if not url:
|
||||
return None
|
||||
# Handle both with and without trailing slash
|
||||
url = url.rstrip("/")
|
||||
prefix = "https://hmembeds.one/embed/"
|
||||
alt_prefix = "http://hmembeds.one/embed/"
|
||||
if url.startswith(prefix):
|
||||
return url[len(prefix):] or None
|
||||
if url.startswith(alt_prefix):
|
||||
return url[len(alt_prefix):] or None
|
||||
return None
|
||||
|
||||
|
||||
def _is_always_include(url: str) -> bool:
|
||||
"""Check if a stream URL is one of the always-include 24/7 channels."""
|
||||
embed_hash = _extract_embed_hash(url)
|
||||
return embed_hash in ALWAYS_INCLUDE_HASHES if embed_hash else False
|
||||
|
||||
|
||||
class TimStreamsExtractor(BaseExtractor):
|
||||
"""Extracts embed URLs from TimStreams' public JSON API.
|
||||
|
||||
The API at stra.viaplus.site/main returns a JSON array of categories,
|
||||
each containing events with stream URLs pointing to hmembeds.one embeds.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "timstreams"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "TimStreams"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch F1 events/channels and return embed URLs for iframe playback."""
|
||||
streams: list[ExtractedStream] = []
|
||||
seen_urls: set[str] = set()
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
|
||||
) as client:
|
||||
resp = await client.get(API_URL)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[timstreams] API returned HTTP %d", resp.status_code
|
||||
)
|
||||
return []
|
||||
|
||||
data = resp.json()
|
||||
if not isinstance(data, list):
|
||||
logger.warning("[timstreams] Unexpected API response type: %s", type(data).__name__)
|
||||
return []
|
||||
|
||||
logger.info("[timstreams] API returned %d categorie(s)", len(data))
|
||||
|
||||
for category in data:
|
||||
category_name = category.get("category", "Unknown")
|
||||
events = category.get("events", [])
|
||||
if not isinstance(events, list):
|
||||
continue
|
||||
|
||||
for event in events:
|
||||
event_name = event.get("name", "Unknown")
|
||||
event_streams = event.get("streams", [])
|
||||
|
||||
if not isinstance(event_streams, list) or not event_streams:
|
||||
continue
|
||||
|
||||
# Check if any stream URL matches an always-include channel
|
||||
always_include = any(
|
||||
_is_always_include(s.get("url", ""))
|
||||
for s in event_streams
|
||||
)
|
||||
|
||||
# Filter: must be F1-related or an always-include channel
|
||||
if not always_include and not _is_f1_event(event_name):
|
||||
continue
|
||||
|
||||
for stream_info in event_streams:
|
||||
stream_name = stream_info.get("name", "")
|
||||
stream_url = stream_info.get("url", "")
|
||||
|
||||
if not stream_url:
|
||||
continue
|
||||
|
||||
# Deduplicate by URL
|
||||
if stream_url in seen_urls:
|
||||
continue
|
||||
seen_urls.add(stream_url)
|
||||
|
||||
# Build a descriptive title
|
||||
title = event_name
|
||||
if stream_name and stream_name.lower() != event_name.lower():
|
||||
title = f"{event_name} - {stream_name}"
|
||||
if category_name:
|
||||
title = f"[{category_name}] {title}"
|
||||
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=stream_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=title,
|
||||
stream_type="embed",
|
||||
embed_url=stream_url,
|
||||
)
|
||||
)
|
||||
|
||||
except httpx.TimeoutException:
|
||||
logger.warning("[timstreams] API request timed out")
|
||||
except Exception:
|
||||
logger.exception("[timstreams] Failed to fetch from API")
|
||||
|
||||
logger.info("[timstreams] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
|
|
@ -3,6 +3,7 @@
|
|||
import logging
|
||||
import os
|
||||
from contextlib import asynccontextmanager
|
||||
from datetime import datetime, timedelta, timezone
|
||||
|
||||
from apscheduler.schedulers.asyncio import AsyncIOScheduler
|
||||
from apscheduler.triggers.cron import CronTrigger
|
||||
|
|
@ -13,6 +14,7 @@ from fastapi.staticfiles import StaticFiles
|
|||
from pydantic import BaseModel
|
||||
from starlette.responses import Response, StreamingResponse
|
||||
|
||||
from backend.embed_proxy import fetch_embed, relay_asset
|
||||
from backend.extractors import create_extraction_service
|
||||
from backend.proxy import proxy_playlist, relay_stream
|
||||
from backend.schedule import ScheduleService
|
||||
|
|
@ -117,10 +119,6 @@ async def lifespan(app: FastAPI):
|
|||
# Startup: load schedule and start background scheduler
|
||||
await schedule_service.initialize()
|
||||
|
||||
# Run initial extraction
|
||||
logger.info("Running initial stream extraction...")
|
||||
await extraction_service.run_extraction()
|
||||
|
||||
# Schedule daily schedule refresh
|
||||
scheduler.add_job(
|
||||
_scheduled_refresh,
|
||||
|
|
@ -130,13 +128,18 @@ async def lifespan(app: FastAPI):
|
|||
replace_existing=True,
|
||||
)
|
||||
|
||||
# Schedule periodic stream extraction (default: every 30 minutes)
|
||||
# Schedule periodic stream extraction (default: every 30 minutes).
|
||||
# next_run_time fires the first run 8s after startup. We don't run
|
||||
# extraction inline here because it calls the playback verifier,
|
||||
# which hits http://127.0.0.1:8000/embed for embed streams — uvicorn
|
||||
# isn't listening yet inside the lifespan startup phase.
|
||||
scheduler.add_job(
|
||||
_scheduled_extraction,
|
||||
trigger=IntervalTrigger(minutes=30),
|
||||
id="stream_extraction",
|
||||
name="Extract streams from all registered sites",
|
||||
replace_existing=True,
|
||||
next_run_time=datetime.now(timezone.utc) + timedelta(seconds=8),
|
||||
)
|
||||
|
||||
# Schedule token refresh every 4 minutes (safe margin for 5-min CDN tokens).
|
||||
|
|
@ -159,6 +162,10 @@ async def lifespan(app: FastAPI):
|
|||
# Shutdown
|
||||
scheduler.shutdown(wait=False)
|
||||
logger.info("APScheduler shut down")
|
||||
try:
|
||||
await extraction_service.shutdown()
|
||||
except Exception:
|
||||
logger.exception("extraction_service shutdown failed")
|
||||
|
||||
|
||||
app = FastAPI(title="F1 Streams", lifespan=lifespan)
|
||||
|
|
@ -409,6 +416,37 @@ async def relay_endpoint(
|
|||
)
|
||||
|
||||
|
||||
# --- Embed iframe-stripping proxy ---
|
||||
|
||||
|
||||
@app.get("/embed")
|
||||
async def embed_proxy(url: str = Query(..., description="Base64url-encoded embed URL")):
|
||||
"""Proxy a third-party embed page so it can be iframed in our origin.
|
||||
|
||||
Strips X-Frame-Options and CSP frame-ancestors from the upstream
|
||||
response, injects a base href + frame-buster-defeat script, and
|
||||
forwards a plausible Referer/Origin to bypass upstream allowlists.
|
||||
"""
|
||||
body, headers, status_code = await fetch_embed(url)
|
||||
return Response(content=body, headers=headers, status_code=status_code)
|
||||
|
||||
|
||||
@app.get("/embed-asset")
|
||||
async def embed_asset(
|
||||
request: Request,
|
||||
url: str = Query(..., description="Base64url-encoded subresource URL"),
|
||||
):
|
||||
"""Relay an upstream subresource (JS/CSS/image/etc.) for the embed proxy.
|
||||
|
||||
Used as a fallback when an upstream blocks hotlinked assets via Origin
|
||||
or Referer checks. Most assets load directly via the injected <base>
|
||||
tag without going through this endpoint.
|
||||
"""
|
||||
range_header = request.headers.get("range")
|
||||
stream_gen, headers, status_code = await relay_asset(url, range_header)
|
||||
return StreamingResponse(stream_gen, headers=headers, status_code=status_code)
|
||||
|
||||
|
||||
# --- Frontend Static Files ---
|
||||
# Mount the SvelteKit static build AFTER all API routes so API endpoints take priority.
|
||||
# SvelteKit adapter-static with ssr=false produces {page}.html files and a fallback index.html.
|
||||
|
|
|
|||
449
stacks/f1-stream/files/backend/playback_verifier.py
Normal file
449
stacks/f1-stream/files/backend/playback_verifier.py
Normal file
|
|
@ -0,0 +1,449 @@
|
|||
"""Headless-browser playback verification for extracted streams.
|
||||
|
||||
The basic health checker (backend/health.py) only validates m3u8 syntax.
|
||||
For embed/iframe streams it has nothing to check — the previous code blindly
|
||||
marked every embed `is_live=True`, which meant the stream list was full of
|
||||
news articles and aggregator landing pages that never actually played.
|
||||
|
||||
This module loads each candidate stream URL in headless Chromium (via
|
||||
Playwright) and looks for *codec-independent* signals that the upstream
|
||||
serves a playable stream:
|
||||
|
||||
- For m3u8: hls.js receives MANIFEST_PARSED + at least one FRAG_LOADED
|
||||
event. We don't wait for `<video>` to gain dimensions, because Playwright's
|
||||
chromium build doesn't include the H.264/AAC codecs. The user's real
|
||||
browser does, so confirming "manifest + segment fetch succeed" is the
|
||||
right server-side signal.
|
||||
- For embed: a `<video>` element appears at top level OR inside the iframe
|
||||
(the embed proxy strips X-Frame-Options + frame-buster JS so we can
|
||||
introspect the iframe content), OR the player has set up a MediaSource.
|
||||
|
||||
Designed to be called from the extraction service's run_extraction()
|
||||
hook, with bounded concurrency. Each verification typically takes
|
||||
4-12 seconds.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import base64
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Toggle off in development by setting PLAYBACK_VERIFY_ENABLED=false.
|
||||
VERIFY_ENABLED = os.getenv("PLAYBACK_VERIFY_ENABLED", "true").lower() in ("true", "1", "yes")
|
||||
|
||||
# Maximum number of concurrent browser pages.
|
||||
MAX_CONCURRENCY = int(os.getenv("PLAYBACK_VERIFY_CONCURRENCY", "2"))
|
||||
|
||||
# Per-stream verification budget (seconds). Beyond this we declare unplayable.
|
||||
PER_STREAM_TIMEOUT = float(os.getenv("PLAYBACK_VERIFY_TIMEOUT", "20"))
|
||||
|
||||
# Where the embed proxy lives, used to wrap embed URLs so they bypass
|
||||
# X-Frame-Options/CSP/JS frame-busters during verification. Defaults to
|
||||
# loopback because verification runs inside the same FastAPI process.
|
||||
PROXY_BASE = os.getenv("PLAYBACK_VERIFY_PROXY_BASE", "http://127.0.0.1:8000")
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlaybackVerdict:
|
||||
is_playable: bool
|
||||
signal: str = "" # which check triggered the positive verdict
|
||||
elapsed_ms: int = 0
|
||||
error: str = ""
|
||||
|
||||
|
||||
def _b64url(s: str) -> str:
|
||||
"""URL-safe base64 with padding stripped — matches m3u8_rewriter.encode_url."""
|
||||
return base64.urlsafe_b64encode(s.encode()).decode().rstrip("=")
|
||||
|
||||
|
||||
def _hls_test_html(m3u8_url: str) -> str:
|
||||
"""A self-contained HTML page that loads an m3u8 via hls.js into a <video>.
|
||||
|
||||
The page exposes window._verifier with manifest_parsed / frag_loaded
|
||||
booleans the verifier polls. It also marks media-error or fatal-error
|
||||
so we can distinguish 'upstream is unreachable' from 'codec missing'.
|
||||
"""
|
||||
return f"""<!doctype html>
|
||||
<html><head><meta charset="utf-8"><title>verify</title>
|
||||
<script src="https://cdn.jsdelivr.net/npm/hls.js@1.5/dist/hls.min.js"></script>
|
||||
</head><body>
|
||||
<video id="v" muted playsinline width="640" height="360"></video>
|
||||
<script>
|
||||
window._verifier = {{
|
||||
manifest_parsed: false,
|
||||
frag_loaded: false,
|
||||
media_loaded: false, // true when MSE has appended any buffer
|
||||
fatal_network_error: false, // upstream truly unreachable
|
||||
manifest_incompatible: false, // codec missing — separate from network reachability
|
||||
hls_error_details: ""
|
||||
}};
|
||||
const v = document.getElementById('v');
|
||||
const url = {m3u8_url!r};
|
||||
function start() {{
|
||||
if (window.Hls && Hls.isSupported()) {{
|
||||
const hls = new Hls({{enableWorker: true}});
|
||||
hls.on(Hls.Events.MANIFEST_PARSED, () => {{ window._verifier.manifest_parsed = true; }});
|
||||
hls.on(Hls.Events.FRAG_LOADED, () => {{ window._verifier.frag_loaded = true; }});
|
||||
hls.on(Hls.Events.BUFFER_APPENDED, () => {{ window._verifier.media_loaded = true; }});
|
||||
hls.on(Hls.Events.ERROR, (_, d) => {{
|
||||
window._verifier.hls_error_details = d.details || "";
|
||||
if (d.fatal && d.type === Hls.ErrorTypes.NETWORK_ERROR) {{
|
||||
window._verifier.fatal_network_error = true;
|
||||
}}
|
||||
if (d.details === Hls.ErrorDetails.MANIFEST_INCOMPATIBLE_CODECS_ERROR) {{
|
||||
window._verifier.manifest_incompatible = true;
|
||||
}}
|
||||
}});
|
||||
hls.loadSource(url);
|
||||
hls.attachMedia(v);
|
||||
}} else if (v.canPlayType('application/vnd.apple.mpegurl')) {{
|
||||
v.src = url;
|
||||
v.addEventListener('loadedmetadata', () => {{ window._verifier.manifest_parsed = true; window._verifier.frag_loaded = true; }});
|
||||
v.addEventListener('error', () => {{ window._verifier.fatal_network_error = true; }});
|
||||
}} else {{
|
||||
window._verifier.hls_error_details = "no hls support";
|
||||
}}
|
||||
}}
|
||||
window.addEventListener('load', start);
|
||||
</script></body></html>"""
|
||||
|
||||
|
||||
def _embed_test_html(_proxied_embed_url: str) -> str:
|
||||
"""No longer used — verifier navigates the page directly to the proxy URL.
|
||||
|
||||
The earlier iframe-wrapper approach hit same-origin policy when inspecting
|
||||
the iframe's contentDocument (the wrapper page was a data: URL, the iframe
|
||||
was http://127.0.0.1:8000), so we couldn't read the embed's DOM.
|
||||
"""
|
||||
return ""
|
||||
|
||||
|
||||
_M3U8_POLL_JS = """
|
||||
() => {
|
||||
const v = window._verifier || {};
|
||||
const vid = document.querySelector('video');
|
||||
return {
|
||||
manifest_parsed: !!v.manifest_parsed,
|
||||
frag_loaded: !!v.frag_loaded,
|
||||
media_loaded: !!v.media_loaded,
|
||||
fatal_network_error: !!v.fatal_network_error,
|
||||
manifest_incompatible: !!v.manifest_incompatible,
|
||||
hls_error_details: v.hls_error_details || "",
|
||||
video_width: vid ? vid.videoWidth : 0,
|
||||
video_ready: vid ? vid.readyState : 0,
|
||||
};
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
_EMBED_POLL_JS = """
|
||||
() => {
|
||||
try {
|
||||
const vids = document.querySelectorAll('video');
|
||||
if (vids.length > 0) {
|
||||
const v = vids[0];
|
||||
return {
|
||||
has_video: true,
|
||||
src: v.currentSrc || v.src || "",
|
||||
width: v.videoWidth,
|
||||
ready: v.readyState,
|
||||
duration: isFinite(v.duration) ? v.duration : 0,
|
||||
media_keys: !!v.mediaKeys,
|
||||
sources: v.querySelectorAll('source').length,
|
||||
};
|
||||
}
|
||||
return {has_video: false};
|
||||
} catch (e) {
|
||||
return {has_video: false, err: String(e)};
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
async def _verify_m3u8(page, m3u8_url: str, deadline: float) -> PlaybackVerdict:
|
||||
"""Confirm an m3u8 URL is fetchable via hls.js end-to-end.
|
||||
|
||||
Positive signal hierarchy:
|
||||
1. media_loaded (MSE buffer appended) — strongest, codec-supported.
|
||||
2. frag_loaded (hls.js fetched at least one segment) — upstream is OK
|
||||
even if the local browser lacks codecs.
|
||||
3. manifest_parsed without media_loaded but with manifest_incompatible
|
||||
— indicates upstream playlist is valid; player can't decode here
|
||||
but a real user's browser will.
|
||||
Negative signal:
|
||||
- fatal_network_error: upstream is unreachable.
|
||||
- timeout with no manifest_parsed: upstream did not respond.
|
||||
"""
|
||||
start = time.monotonic()
|
||||
html = _hls_test_html(m3u8_url)
|
||||
data_url = "data:text/html;base64," + base64.b64encode(html.encode()).decode()
|
||||
|
||||
try:
|
||||
await page.goto(data_url, wait_until="domcontentloaded", timeout=10_000)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"goto failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
|
||||
last_state: dict = {}
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
state = await page.evaluate(_M3U8_POLL_JS)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"evaluate failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
last_state = state
|
||||
if state.get("media_loaded"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="media_loaded",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if state.get("frag_loaded"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="frag_loaded",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
# MANIFEST_INCOMPATIBLE_CODECS_ERROR fires after hls.js successfully
|
||||
# fetched and parsed the manifest — the failure is purely local
|
||||
# (chromium lacks H.264). The user's real browser has codecs, so
|
||||
# this URL is playable from the user's perspective.
|
||||
if state.get("manifest_incompatible"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="manifest_parsed_codec_missing_in_verifier",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if state.get("manifest_parsed"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="manifest_parsed",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if state.get("fatal_network_error"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error="upstream network error",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
await asyncio.sleep(0.25)
|
||||
|
||||
err = "no playback signal"
|
||||
if last_state.get("hls_error_details"):
|
||||
err = f"hls.js error: {last_state['hls_error_details']}"
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=err,
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
|
||||
|
||||
async def _verify_embed(page, proxied_url: str, deadline: float) -> PlaybackVerdict:
|
||||
"""Navigate directly to the proxied embed and confirm a player rendered.
|
||||
|
||||
Positive signals (in priority order):
|
||||
- <video> with src/sources/mediaKeys set (player wired up).
|
||||
- <video> element exists with any state (script ran, player attaching).
|
||||
- A player container div (jwplayer, video-js, [id*=player], etc.).
|
||||
|
||||
Loading the embed page directly (not via iframe wrapper) avoids the
|
||||
same-origin policy that prevented earlier iframe-introspection runs
|
||||
from seeing the embed DOM.
|
||||
"""
|
||||
start = time.monotonic()
|
||||
try:
|
||||
await page.goto(proxied_url, wait_until="domcontentloaded", timeout=15_000)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"goto failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
|
||||
# Track the best state seen across all polls. Some embeds load a player
|
||||
# briefly then anti-bot JS tears the DOM down (hmembeds redirects to
|
||||
# google.com if its devtool-detection trips). We accept any positive
|
||||
# signal observed during the window, even if it's gone by timeout.
|
||||
#
|
||||
# We require an actual <video> element — a "player container div"
|
||||
# is too weak (sportsurge has player-class divs but no real player).
|
||||
seen_video_wired = False
|
||||
seen_video_tag = False
|
||||
last_err = ""
|
||||
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
r = await page.evaluate(_EMBED_POLL_JS)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"evaluate failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if r.get("has_video"):
|
||||
seen_video_tag = True
|
||||
if r.get("src") or r.get("width", 0) > 0 or r.get("media_keys") or r.get("sources", 0) > 0:
|
||||
seen_video_wired = True
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="video.wired",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
last_err = r.get("err", "")
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
if seen_video_wired:
|
||||
return PlaybackVerdict(is_playable=True, signal="video.wired",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000))
|
||||
if seen_video_tag:
|
||||
return PlaybackVerdict(is_playable=True, signal="video.tag_only",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000))
|
||||
|
||||
err = "no <video> element rendered"
|
||||
if last_err:
|
||||
err += f"; last_err: {last_err}"
|
||||
return PlaybackVerdict(is_playable=False, error=err,
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000))
|
||||
|
||||
|
||||
class PlaybackVerifier:
|
||||
"""Verifies playability of m3u8 and embed URLs via headless Chromium.
|
||||
|
||||
Manages a single browser instance for the process lifetime (cheap per-page
|
||||
contexts) and bounds concurrency with a semaphore.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._browser = None
|
||||
self._playwright = None
|
||||
self._sem = asyncio.Semaphore(MAX_CONCURRENCY)
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def _ensure_browser(self):
|
||||
if self._browser is not None:
|
||||
return self._browser
|
||||
async with self._lock:
|
||||
if self._browser is not None:
|
||||
return self._browser
|
||||
try:
|
||||
from playwright.async_api import async_playwright
|
||||
except ImportError:
|
||||
logger.error("playwright not installed — playback verification disabled")
|
||||
return None
|
||||
self._playwright = await async_playwright().start()
|
||||
ws_base = os.getenv("CHROME_WS_URL")
|
||||
ws_token = os.getenv("CHROME_WS_TOKEN")
|
||||
if ws_base and ws_token:
|
||||
self._browser = await self._playwright.chromium.connect(
|
||||
f"{ws_base.rstrip('/')}/{ws_token}", timeout=15_000,
|
||||
)
|
||||
logger.info("connected to remote chrome-service (concurrency=%d)", MAX_CONCURRENCY)
|
||||
else:
|
||||
self._browser = await self._playwright.chromium.launch(
|
||||
headless=True,
|
||||
args=[
|
||||
"--disable-dev-shm-usage",
|
||||
"--disable-web-security",
|
||||
"--no-sandbox",
|
||||
"--disable-setuid-sandbox",
|
||||
"--disable-features=IsolateOrigins,site-per-process",
|
||||
"--autoplay-policy=no-user-gesture-required",
|
||||
],
|
||||
)
|
||||
logger.warning("CHROME_WS_URL not set — using in-process Chromium (concurrency=%d)", MAX_CONCURRENCY)
|
||||
return self._browser
|
||||
|
||||
async def shutdown(self) -> None:
|
||||
if self._browser is not None:
|
||||
try:
|
||||
await self._browser.close()
|
||||
except Exception:
|
||||
logger.exception("error closing browser")
|
||||
if self._playwright is not None:
|
||||
try:
|
||||
await self._playwright.stop()
|
||||
except Exception:
|
||||
logger.exception("error stopping playwright")
|
||||
self._browser = None
|
||||
self._playwright = None
|
||||
|
||||
async def verify(self, url: str, stream_type: str) -> PlaybackVerdict:
|
||||
if not VERIFY_ENABLED:
|
||||
return PlaybackVerdict(is_playable=True, error="disabled")
|
||||
|
||||
browser = await self._ensure_browser()
|
||||
if browser is None:
|
||||
return PlaybackVerdict(is_playable=False, error="playwright unavailable")
|
||||
|
||||
is_m3u8 = stream_type == "m3u8"
|
||||
if not is_m3u8:
|
||||
url = f"{PROXY_BASE}/embed?url={_b64url(url)}"
|
||||
|
||||
async with self._sem:
|
||||
# Set the per-stream deadline AFTER acquiring the semaphore.
|
||||
# Otherwise queued streams that wait behind earlier ones
|
||||
# would have already-expired deadlines when they start.
|
||||
deadline = time.monotonic() + PER_STREAM_TIMEOUT
|
||||
try:
|
||||
context = await browser.new_context(
|
||||
user_agent=USER_AGENT,
|
||||
viewport={"width": 1280, "height": 720},
|
||||
bypass_csp=True,
|
||||
)
|
||||
from backend.stealth import STEALTH_JS
|
||||
await context.add_init_script(STEALTH_JS)
|
||||
page = await context.new_page()
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"context create failed: {e}",
|
||||
)
|
||||
try:
|
||||
if is_m3u8:
|
||||
verdict = await _verify_m3u8(page, url, deadline)
|
||||
else:
|
||||
verdict = await _verify_embed(page, url, deadline)
|
||||
except asyncio.TimeoutError:
|
||||
verdict = PlaybackVerdict(is_playable=False, error="overall timeout")
|
||||
except Exception as e:
|
||||
verdict = PlaybackVerdict(
|
||||
is_playable=False, error=f"verify exception: {e}",
|
||||
)
|
||||
finally:
|
||||
try:
|
||||
await page.close()
|
||||
await context.close()
|
||||
except Exception:
|
||||
pass
|
||||
logger.info(
|
||||
"[verify] %s -> playable=%s signal=%s err=%s elapsed=%dms",
|
||||
url[:120], verdict.is_playable, verdict.signal,
|
||||
verdict.error, verdict.elapsed_ms,
|
||||
)
|
||||
return verdict
|
||||
|
||||
async def verify_many(self, items: list[tuple[str, str]]) -> dict[str, PlaybackVerdict]:
|
||||
if not items:
|
||||
return {}
|
||||
if not VERIFY_ENABLED:
|
||||
return {url: PlaybackVerdict(is_playable=True, error="disabled") for url, _ in items}
|
||||
|
||||
async def _run(url: str, stream_type: str):
|
||||
verdict = await self.verify(url, stream_type)
|
||||
return url, verdict
|
||||
|
||||
results = await asyncio.gather(
|
||||
*[_run(url, st) for url, st in items], return_exceptions=True
|
||||
)
|
||||
out: dict[str, PlaybackVerdict] = {}
|
||||
for r in results:
|
||||
if isinstance(r, Exception):
|
||||
logger.exception("verify task crashed: %s", r)
|
||||
continue
|
||||
url, verdict = r
|
||||
out[url] = verdict
|
||||
return out
|
||||
|
|
@ -3,3 +3,4 @@ uvicorn[standard]
|
|||
httpx>=0.27.0
|
||||
apscheduler>=3.10.0,<4.0
|
||||
pydantic>=2.0.0
|
||||
playwright==1.48.0
|
||||
|
|
|
|||
43
stacks/f1-stream/files/backend/stealth.py
Normal file
43
stacks/f1-stream/files/backend/stealth.py
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
"""Vendored Playwright stealth init script.
|
||||
|
||||
Mirror of `stacks/chrome-service/files/stealth.js`. Kept in sync by hand
|
||||
— update both files together if the JS is changed.
|
||||
"""
|
||||
|
||||
STEALTH_JS = r"""
|
||||
(() => {
|
||||
Object.defineProperty(Navigator.prototype, 'webdriver', { get: () => undefined });
|
||||
if (!window.chrome) window.chrome = {};
|
||||
window.chrome.runtime = window.chrome.runtime || {};
|
||||
Object.defineProperty(navigator, 'plugins', {
|
||||
get: () => [{ name: 'Chrome PDF Plugin' }, { name: 'Chrome PDF Viewer' }, { name: 'Native Client' }],
|
||||
});
|
||||
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
|
||||
const origQuery = window.navigator.permissions && window.navigator.permissions.query;
|
||||
if (origQuery) {
|
||||
window.navigator.permissions.query = (parameters) =>
|
||||
parameters && parameters.name === 'notifications'
|
||||
? Promise.resolve({ state: Notification.permission })
|
||||
: origQuery(parameters);
|
||||
}
|
||||
const spoofGl = (proto) => {
|
||||
if (!proto) return;
|
||||
const orig = proto.getParameter;
|
||||
proto.getParameter = function (parameter) {
|
||||
if (parameter === 37445) return 'Intel Inc.';
|
||||
if (parameter === 37446) return 'Intel Iris OpenGL Engine';
|
||||
return orig.apply(this, arguments);
|
||||
};
|
||||
};
|
||||
spoofGl(window.WebGLRenderingContext && window.WebGLRenderingContext.prototype);
|
||||
spoofGl(window.WebGL2RenderingContext && window.WebGL2RenderingContext.prototype);
|
||||
// disable-devtool.js auto-init evasion: hide the marker attribute so the
|
||||
// library's IIFE exits early. Without this, hmembeds-class players redirect
|
||||
// to google.com when the Performance detector trips under Playwright.
|
||||
const origQS = Document.prototype.querySelector;
|
||||
Document.prototype.querySelector = function (sel) {
|
||||
if (typeof sel === 'string' && sel.indexOf('disable-devtool-auto') !== -1) return null;
|
||||
return origQS.apply(this, arguments);
|
||||
};
|
||||
})();
|
||||
"""
|
||||
|
|
@ -44,6 +44,20 @@ export function getProxyUrl(m3u8Url) {
|
|||
return `${API_BASE}/proxy?url=${encoded}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the embed-proxy URL for an upstream iframe embed page.
|
||||
*
|
||||
* The proxy strips X-Frame-Options / CSP frame-ancestors and injects a
|
||||
* frame-buster-defeat script so the embed renders inside our iframe even
|
||||
* when the upstream tries to block it.
|
||||
* @param {string} embedUrl - The original embed page URL
|
||||
* @returns {string} URL pointing at our /embed proxy
|
||||
*/
|
||||
export function getEmbedProxyUrl(embedUrl) {
|
||||
const encoded = toBase64Url(embedUrl);
|
||||
return `${API_BASE}/embed?url=${encoded}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark a stream as actively being watched (enables token refresh).
|
||||
* @param {string} url - The stream URL
|
||||
|
|
|
|||
|
|
@ -1,5 +1,5 @@
|
|||
<script>
|
||||
import { fetchStreams, fetchSchedule, getProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
|
||||
import { fetchStreams, fetchSchedule, getProxyUrl, getEmbedProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
|
||||
import { onMount, onDestroy } from 'svelte';
|
||||
import { page } from '$app/state';
|
||||
|
||||
|
|
@ -107,12 +107,14 @@
|
|||
}
|
||||
|
||||
if (stream.stream_type === 'embed') {
|
||||
// Embed/iframe player — no hls.js needed
|
||||
// Embed/iframe player — route through our /embed proxy so the
|
||||
// upstream's X-Frame-Options / CSP / JS frame-busters can't
|
||||
// block the iframe.
|
||||
const newPlayer = {
|
||||
id: Date.now(),
|
||||
proxyUrl: '',
|
||||
originalUrl: stream.embed_url,
|
||||
embedUrl: stream.embed_url,
|
||||
embedUrl: getEmbedProxyUrl(stream.embed_url),
|
||||
streamType: 'embed',
|
||||
siteKey: stream.site_key || '',
|
||||
siteName: stream.site_name || stream.site_key || 'Unknown',
|
||||
|
|
@ -173,9 +175,13 @@
|
|||
if (!player || !player.videoEl) return;
|
||||
|
||||
if (Hls.isSupported()) {
|
||||
// `lowLatencyMode` previously broke playback on regular (non-LL-HLS)
|
||||
// providers like RallyTV — they don't ship the LL-HLS extensions
|
||||
// hls.js needs in that mode. Default off; explicit per-stream flag
|
||||
// can re-enable later.
|
||||
const hlsInstance = new Hls({
|
||||
enableWorker: true,
|
||||
lowLatencyMode: true,
|
||||
lowLatencyMode: false,
|
||||
backBufferLength: 90
|
||||
});
|
||||
|
||||
|
|
|
|||
|
|
@ -11,7 +11,8 @@ resource "kubernetes_namespace" "f1-stream" {
|
|||
name = "f1-stream"
|
||||
labels = {
|
||||
"istio-injection" : "disabled"
|
||||
tier = local.tiers.aux
|
||||
tier = local.tiers.aux
|
||||
"chrome-service.viktorbarzin.me/client" = "true"
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
|
|
@ -47,6 +48,35 @@ resource "kubernetes_manifest" "external_secret" {
|
|||
depends_on = [kubernetes_namespace.f1-stream]
|
||||
}
|
||||
|
||||
# Pull the chrome-service bearer token into this namespace as a separate
|
||||
# Secret so the verifier can reach the in-cluster Playwright pool.
|
||||
resource "kubernetes_manifest" "chrome_service_client_secret" {
|
||||
manifest = {
|
||||
apiVersion = "external-secrets.io/v1beta1"
|
||||
kind = "ExternalSecret"
|
||||
metadata = {
|
||||
name = "chrome-service-client-secrets"
|
||||
namespace = "f1-stream"
|
||||
}
|
||||
spec = {
|
||||
refreshInterval = "15m"
|
||||
secretStoreRef = {
|
||||
name = "vault-kv"
|
||||
kind = "ClusterSecretStore"
|
||||
}
|
||||
target = {
|
||||
name = "chrome-service-client-secrets"
|
||||
}
|
||||
dataFrom = [{
|
||||
extract = {
|
||||
key = "chrome-service"
|
||||
}
|
||||
}]
|
||||
}
|
||||
}
|
||||
depends_on = [kubernetes_namespace.f1-stream]
|
||||
}
|
||||
|
||||
resource "kubernetes_persistent_volume_claim" "data_proxmox" {
|
||||
wait_until_bound = false
|
||||
metadata {
|
||||
|
|
@ -104,11 +134,11 @@ resource "kubernetes_deployment" "f1-stream" {
|
|||
name = "f1-stream"
|
||||
resources {
|
||||
limits = {
|
||||
memory = "256Mi"
|
||||
memory = "1Gi"
|
||||
}
|
||||
requests = {
|
||||
cpu = "25m"
|
||||
memory = "256Mi"
|
||||
cpu = "100m"
|
||||
memory = "1Gi"
|
||||
}
|
||||
}
|
||||
port {
|
||||
|
|
@ -127,6 +157,29 @@ resource "kubernetes_deployment" "f1-stream" {
|
|||
name = "DISCORD_CHANNELS"
|
||||
value = var.discord_f1_channel_ids
|
||||
}
|
||||
# Verifier connects to in-cluster headed Chromium pool — see
|
||||
# stacks/chrome-service/. Falls back to in-process headless if unset.
|
||||
env {
|
||||
name = "CHROME_WS_URL"
|
||||
value = "ws://chrome-service.chrome-service.svc.cluster.local:3000"
|
||||
}
|
||||
env {
|
||||
name = "CHROME_WS_TOKEN"
|
||||
value_from {
|
||||
secret_key_ref {
|
||||
name = "chrome-service-client-secrets"
|
||||
key = "api_bearer_token"
|
||||
}
|
||||
}
|
||||
}
|
||||
# The embed proxy (this pod's /embed?url=…) must be reachable from
|
||||
# the remote chrome-service pod. Default 127.0.0.1 only works for
|
||||
# in-process Chromium — for the remote browser we point it at our
|
||||
# own ClusterIP service.
|
||||
env {
|
||||
name = "PLAYBACK_VERIFY_PROXY_BASE"
|
||||
value = "http://f1.f1-stream.svc.cluster.local"
|
||||
}
|
||||
volume_mount {
|
||||
name = "data"
|
||||
mount_path = "/data"
|
||||
|
|
|
|||
|
|
@ -8,7 +8,11 @@ variable "postgresql_host" { type = string }
|
|||
|
||||
locals {
|
||||
namespace = "fire-planner"
|
||||
image = "registry.viktorbarzin.me/fire-planner:${var.image_tag}"
|
||||
# Phase 3 cutover 2026-05-07. NOTE: the registry-private repo for
|
||||
# fire-planner has 0 tags — first build via Woodpecker on the new Forgejo
|
||||
# repo (viktor/fire-planner, Dockerfile + .woodpecker.yml added 2026-05-07)
|
||||
# must succeed BEFORE the next pod restart, otherwise pulls will 404.
|
||||
image = "forgejo.viktorbarzin.me/viktor/fire-planner:${var.image_tag}"
|
||||
labels = {
|
||||
app = "fire-planner"
|
||||
}
|
||||
|
|
|
|||
123
stacks/forgejo/cleanup.tf
Normal file
123
stacks/forgejo/cleanup.tf
Normal file
|
|
@ -0,0 +1,123 @@
|
|||
# Forgejo container-package retention CronJob.
|
||||
#
|
||||
# Forgejo's per-package "Cleanup Rules" UI is not exposed via Terraform —
|
||||
# it's per-user runtime state inside the Forgejo DB. Driving retention from
|
||||
# a CronJob hitting the public API keeps the policy versioned in this repo.
|
||||
#
|
||||
# Auth: a write:package PAT belonging to ci-pusher (same user that pushes
|
||||
# from CI). DELETE on packages requires write:package scope. PAT lives in
|
||||
# Vault at secret/viktor/forgejo_cleanup_token.
|
||||
|
||||
data "vault_kv_secret_v2" "forgejo_viktor" {
|
||||
mount = "secret"
|
||||
name = "viktor"
|
||||
}
|
||||
|
||||
locals {
|
||||
# Flip to false after first 7 days of dry-run logs look correct.
|
||||
forgejo_cleanup_dry_run = true
|
||||
}
|
||||
|
||||
resource "kubernetes_config_map" "forgejo_cleanup_script" {
|
||||
metadata {
|
||||
name = "forgejo-cleanup-script"
|
||||
namespace = kubernetes_namespace.forgejo.metadata[0].name
|
||||
}
|
||||
data = {
|
||||
"cleanup.sh" = file("${path.module}/files/cleanup.sh")
|
||||
}
|
||||
}
|
||||
|
||||
resource "kubernetes_secret" "forgejo_cleanup_token" {
|
||||
metadata {
|
||||
name = "forgejo-cleanup-token"
|
||||
namespace = kubernetes_namespace.forgejo.metadata[0].name
|
||||
}
|
||||
type = "Opaque"
|
||||
data = {
|
||||
# try() so the apply succeeds before the Vault key is populated during
|
||||
# Phase 0 bootstrap (see docs/runbooks/forgejo-registry-setup.md). Empty
|
||||
# token causes the cleanup CronJob to fail visibly — that's intended.
|
||||
FORGEJO_TOKEN = try(data.vault_kv_secret_v2.forgejo_viktor.data["forgejo_cleanup_token"], "")
|
||||
}
|
||||
}
|
||||
|
||||
resource "kubernetes_cron_job_v1" "forgejo_cleanup" {
|
||||
metadata {
|
||||
name = "forgejo-cleanup"
|
||||
namespace = kubernetes_namespace.forgejo.metadata[0].name
|
||||
}
|
||||
spec {
|
||||
concurrency_policy = "Forbid"
|
||||
schedule = "0 4 * * *"
|
||||
failed_jobs_history_limit = 3
|
||||
successful_jobs_history_limit = 3
|
||||
job_template {
|
||||
metadata {}
|
||||
spec {
|
||||
backoff_limit = 1
|
||||
ttl_seconds_after_finished = 3600
|
||||
template {
|
||||
metadata {}
|
||||
spec {
|
||||
container {
|
||||
name = "cleanup"
|
||||
image = "docker.io/library/alpine:3.20"
|
||||
command = ["/bin/sh", "/scripts/cleanup.sh"]
|
||||
env {
|
||||
name = "FORGEJO_TOKEN"
|
||||
value_from {
|
||||
secret_key_ref {
|
||||
name = kubernetes_secret.forgejo_cleanup_token.metadata[0].name
|
||||
key = "FORGEJO_TOKEN"
|
||||
}
|
||||
}
|
||||
}
|
||||
env {
|
||||
name = "FORGEJO_HOST"
|
||||
value = "http://forgejo.forgejo.svc.cluster.local"
|
||||
}
|
||||
env {
|
||||
name = "FORGEJO_OWNER"
|
||||
value = "viktor"
|
||||
}
|
||||
env {
|
||||
name = "KEEP_LAST_N"
|
||||
value = "10"
|
||||
}
|
||||
env {
|
||||
name = "DRY_RUN"
|
||||
value = local.forgejo_cleanup_dry_run ? "true" : "false"
|
||||
}
|
||||
volume_mount {
|
||||
name = "scripts"
|
||||
mount_path = "/scripts"
|
||||
}
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "10m"
|
||||
memory = "32Mi"
|
||||
}
|
||||
limits = {
|
||||
memory = "96Mi"
|
||||
}
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "scripts"
|
||||
config_map {
|
||||
name = kubernetes_config_map.forgejo_cleanup_script.metadata[0].name
|
||||
default_mode = "0755"
|
||||
}
|
||||
}
|
||||
restart_policy = "OnFailure"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||
}
|
||||
}
|
||||
109
stacks/forgejo/files/cleanup.sh
Normal file
109
stacks/forgejo/files/cleanup.sh
Normal file
|
|
@ -0,0 +1,109 @@
|
|||
#!/bin/sh
|
||||
# Forgejo container-package retention.
|
||||
#
|
||||
# For each container package owned by ${FORGEJO_OWNER}, keep newest
|
||||
# ${KEEP_LAST_N} versions + always keep tag "latest". Deletes the rest via
|
||||
# DELETE /api/v1/packages/{owner}/container/{name}/{version}.
|
||||
#
|
||||
# DRY_RUN=true logs what would be deleted but issues no DELETE calls.
|
||||
#
|
||||
# Required env:
|
||||
# FORGEJO_HOST e.g. http://forgejo.forgejo.svc.cluster.local
|
||||
# FORGEJO_OWNER e.g. viktor
|
||||
# FORGEJO_USER PAT owner (write:package scope)
|
||||
# FORGEJO_TOKEN PAT
|
||||
# KEEP_LAST_N integer (default 10)
|
||||
# DRY_RUN true|false (default true)
|
||||
|
||||
set -eu
|
||||
|
||||
apk add --no-cache curl jq >/dev/null
|
||||
|
||||
OWNER="${FORGEJO_OWNER}"
|
||||
KEEP="${KEEP_LAST_N:-10}"
|
||||
DRY="${DRY_RUN:-true}"
|
||||
BASE="${FORGEJO_HOST%/}/api/v1"
|
||||
|
||||
AUTH_HEADER="Authorization: token $FORGEJO_TOKEN"
|
||||
|
||||
echo "Forgejo cleanup: owner=$OWNER keep_last=$KEEP dry_run=$DRY"
|
||||
echo "API base: $BASE"
|
||||
|
||||
# Page through ALL container packages.
|
||||
TMPDIR=$(mktemp -d)
|
||||
trap 'rm -rf "$TMPDIR"' EXIT
|
||||
ALL="$TMPDIR/all.json"
|
||||
echo "[]" > "$ALL"
|
||||
|
||||
PAGE=1
|
||||
while :; do
|
||||
RESP=$(curl -sf -H "$AUTH_HEADER" \
|
||||
"$BASE/packages/$OWNER?type=container&limit=50&page=$PAGE")
|
||||
COUNT=$(echo "$RESP" | jq 'length')
|
||||
if [ "$COUNT" = "0" ]; then break; fi
|
||||
jq -s '.[0] + .[1]' "$ALL" <(echo "$RESP") > "$TMPDIR/merged.json"
|
||||
mv "$TMPDIR/merged.json" "$ALL"
|
||||
PAGE=$((PAGE + 1))
|
||||
# Safety: never run away.
|
||||
if [ "$PAGE" -gt 100 ]; then break; fi
|
||||
done
|
||||
|
||||
TOTAL=$(jq 'length' "$ALL")
|
||||
echo "Found $TOTAL package version(s)."
|
||||
|
||||
if [ "$TOTAL" = "0" ]; then
|
||||
echo "Nothing to do."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Group by name and process each group.
|
||||
NAMES=$(jq -r '.[].name' "$ALL" | sort -u)
|
||||
|
||||
DEL=0
|
||||
KEPT=0
|
||||
|
||||
for NAME in $NAMES; do
|
||||
# All versions of this name, sorted by created_at descending.
|
||||
jq --arg n "$NAME" '
|
||||
[.[] | select(.name == $n)]
|
||||
| sort_by(.created_at) | reverse
|
||||
' "$ALL" > "$TMPDIR/$NAME.json"
|
||||
|
||||
N_VERSIONS=$(jq 'length' "$TMPDIR/$NAME.json")
|
||||
echo "[$NAME] $N_VERSIONS version(s)"
|
||||
|
||||
# Build the keep set: top $KEEP + anything tagged 'latest'.
|
||||
jq -r --argjson keep "$KEEP" '
|
||||
[.[0:$keep][].version] + [.[] | select(.version == "latest") | .version]
|
||||
| unique
|
||||
| .[]
|
||||
' "$TMPDIR/$NAME.json" > "$TMPDIR/$NAME.keep"
|
||||
|
||||
# Build the delete set.
|
||||
jq -r '.[].version' "$TMPDIR/$NAME.json" \
|
||||
| grep -vxFf "$TMPDIR/$NAME.keep" > "$TMPDIR/$NAME.delete" || true
|
||||
|
||||
D_COUNT=$(wc -l < "$TMPDIR/$NAME.delete" | tr -d ' ')
|
||||
K_COUNT=$(wc -l < "$TMPDIR/$NAME.keep" | tr -d ' ')
|
||||
echo " keep=$K_COUNT delete=$D_COUNT"
|
||||
KEPT=$((KEPT + K_COUNT))
|
||||
|
||||
while IFS= read -r VER; do
|
||||
[ -z "$VER" ] && continue
|
||||
URL="$BASE/packages/$OWNER/container/$NAME/$VER"
|
||||
if [ "$DRY" = "true" ]; then
|
||||
echo " DRY_RUN would DELETE $URL"
|
||||
else
|
||||
HTTP=$(curl -s -o /dev/null -w '%{http_code}' \
|
||||
-X DELETE -H "$AUTH_HEADER" "$URL" || echo "000")
|
||||
if [ "$HTTP" = "204" ] || [ "$HTTP" = "200" ]; then
|
||||
echo " deleted $NAME:$VER"
|
||||
else
|
||||
echo " FAIL $NAME:$VER HTTP $HTTP"
|
||||
fi
|
||||
fi
|
||||
DEL=$((DEL + 1))
|
||||
done < "$TMPDIR/$NAME.delete"
|
||||
done
|
||||
|
||||
echo "Summary: kept=$KEPT to_delete=$DEL dry_run=$DRY"
|
||||
|
|
@ -32,7 +32,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
|
|||
annotations = {
|
||||
"resize.topolvm.io/threshold" = "80%"
|
||||
"resize.topolvm.io/increase" = "50%"
|
||||
"resize.topolvm.io/storage_limit" = "20Gi"
|
||||
"resize.topolvm.io/storage_limit" = "50Gi"
|
||||
}
|
||||
}
|
||||
spec {
|
||||
|
|
@ -40,7 +40,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
|
|||
storage_class_name = "proxmox-lvm-encrypted"
|
||||
resources {
|
||||
requests = {
|
||||
storage = "5Gi"
|
||||
storage = "15Gi"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -72,6 +72,14 @@ resource "kubernetes_deployment" "forgejo" {
|
|||
}
|
||||
}
|
||||
spec {
|
||||
# fsGroup chowns the mounted PVC to GID 1000 (the forgejo user) on
|
||||
# mount. Without this, /data is owned by root and the
|
||||
# `[packages].CHUNKED_UPLOAD_PATH` default at /data/tmp is not
|
||||
# writable, crashlooping the pod when packages is enabled. Pre-23-day
|
||||
# Forgejo ran without packages on so this never surfaced.
|
||||
security_context {
|
||||
fs_group = 1000
|
||||
}
|
||||
container {
|
||||
name = "forgejo"
|
||||
image = "codeberg.org/forgejo/forgejo:11"
|
||||
|
|
@ -101,10 +109,30 @@ resource "kubernetes_deployment" "forgejo" {
|
|||
name = "FORGEJO__openid__ENABLE_OPENID_SIGNIN"
|
||||
value = "false"
|
||||
}
|
||||
# Allow webhook delivery to internal k8s services
|
||||
# Allow webhook delivery to internal k8s services AND to the public
|
||||
# ingress hostnames Forgejo's own webhooks point to (ci.viktorbarzin.me
|
||||
# for Woodpecker pipelines).
|
||||
env {
|
||||
name = "FORGEJO__webhook__ALLOWED_HOST_LIST"
|
||||
value = "*.svc.cluster.local"
|
||||
value = "*.svc.cluster.local,ci.viktorbarzin.me,*.viktorbarzin.me"
|
||||
}
|
||||
# Default DELIVER_TIMEOUT is 5s — too tight for the Cloudflare-tunnel
|
||||
# round-trip on first request after pod restart (cold TLS handshake
|
||||
# can hit 6-8s). 30s comfortably covers retries.
|
||||
env {
|
||||
name = "FORGEJO__webhook__DELIVER_TIMEOUT"
|
||||
value = "30"
|
||||
}
|
||||
# OCI registry (container packages). Default-on in Forgejo v11 but
|
||||
# explicit so it can't be silently disabled by an upstream config
|
||||
# change. CHUNKED_UPLOAD_PATH defaults to `data/tmp/package-upload`
|
||||
# under Forgejo's AppDataPath (resolves to a writable subdir of
|
||||
# /data/gitea/) — overriding to /data/tmp directly hits a perms
|
||||
# issue because /data is the volume mount root and is not chowned
|
||||
# to the forgejo user.
|
||||
env {
|
||||
name = "FORGEJO__packages__ENABLED"
|
||||
value = "true"
|
||||
}
|
||||
volume_mount {
|
||||
name = "data"
|
||||
|
|
@ -113,10 +141,10 @@ resource "kubernetes_deployment" "forgejo" {
|
|||
resources {
|
||||
requests = {
|
||||
cpu = "15m"
|
||||
memory = "384Mi"
|
||||
memory = "1Gi"
|
||||
}
|
||||
limits = {
|
||||
memory = "384Mi"
|
||||
memory = "1Gi"
|
||||
}
|
||||
}
|
||||
port {
|
||||
|
|
@ -165,6 +193,9 @@ module "ingress" {
|
|||
namespace = kubernetes_namespace.forgejo.metadata[0].name
|
||||
name = "forgejo"
|
||||
tls_secret_name = var.tls_secret_name
|
||||
# OCI registry pushes ship full image layer blobs in one request; default
|
||||
# Traefik buffering chokes on anything past a few hundred MB.
|
||||
max_body_size = "5g"
|
||||
extra_annotations = {
|
||||
"gethomepage.dev/enabled" = "true"
|
||||
"gethomepage.dev/name" = "Forgejo"
|
||||
|
|
|
|||
|
|
@ -105,7 +105,8 @@ resource "kubernetes_deployment" "freedify" {
|
|||
name = "registry-credentials"
|
||||
}
|
||||
container {
|
||||
image = "registry.viktorbarzin.me/freedify:${var.tag}"
|
||||
# Phase 3 cutover 2026-05-07 — Forgejo registry consolidation.
|
||||
image = "forgejo.viktorbarzin.me/viktor/freedify:${var.tag}"
|
||||
name = "freedify"
|
||||
|
||||
port {
|
||||
|
|
|
|||
|
|
@ -75,13 +75,13 @@ module "k8s-node-template" {
|
|||
mkdir -p /etc/containerd/certs.d/ghcr.io
|
||||
printf 'server = "https://ghcr.io"\n\n[host."http://10.0.20.10:5010"]\n capabilities = ["pull", "resolve"]\n\n[host."https://ghcr.io"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/ghcr.io/hosts.toml
|
||||
|
||||
# Create hosts.toml for private registry — both IP and hostname entries
|
||||
# IP-based (10.0.20.10:5050): direct access, skip TLS verify (wildcard cert, no IP SAN)
|
||||
mkdir -p /etc/containerd/certs.d/10.0.20.10:5050
|
||||
printf 'server = "https://10.0.20.10:5050"\n\n[host."https://10.0.20.10:5050"]\n capabilities = ["pull", "resolve", "push"]\n skip_verify = true\n' > /etc/containerd/certs.d/10.0.20.10:5050/hosts.toml
|
||||
# Hostname-based (registry.viktorbarzin.me): redirects to LAN IP to avoid Traefik round-trip
|
||||
mkdir -p /etc/containerd/certs.d/registry.viktorbarzin.me
|
||||
printf 'server = "https://registry.viktorbarzin.me"\n\n[host."https://10.0.20.10:5050"]\n capabilities = ["pull", "resolve", "push"]\n skip_verify = true\n' > /etc/containerd/certs.d/registry.viktorbarzin.me/hosts.toml
|
||||
# Forgejo OCI registry: redirect to in-cluster Traefik LB (10.0.20.200) so
|
||||
# pulls don't hairpin out through the WAN gateway. Traefik serves the
|
||||
# *.viktorbarzin.me wildcard so SNI verification still passes.
|
||||
# registry.viktorbarzin.me / 10.0.20.10:5050 entries removed in Phase 4 of
|
||||
# the forgejo-registry-consolidation 2026-05-07 — registry-private is gone.
|
||||
mkdir -p /etc/containerd/certs.d/forgejo.viktorbarzin.me
|
||||
printf 'server = "https://forgejo.viktorbarzin.me"\n\n[host."https://10.0.20.200"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
|
||||
|
||||
# Low-traffic registries (registry.k8s.io, quay.io, reg.kyverno.io) pull directly.
|
||||
# Pull-through cache removed: caused corrupted images (truncated downloads)
|
||||
|
|
|
|||
|
|
@ -8,7 +8,8 @@ variable "postgresql_host" { type = string }
|
|||
|
||||
locals {
|
||||
namespace = "job-hunter"
|
||||
image = "registry.viktorbarzin.me/job-hunter:${var.image_tag}"
|
||||
# Phase 3 cutover 2026-05-07 — see infra/docs/plans/2026-05-07-forgejo-registry-consolidation-plan.md.
|
||||
image = "forgejo.viktorbarzin.me/viktor/job-hunter:${var.image_tag}"
|
||||
labels = {
|
||||
app = "job-hunter"
|
||||
}
|
||||
|
|
|
|||
8
stacks/kms/.terraform.lock.hcl
generated
8
stacks/kms/.terraform.lock.hcl
generated
|
|
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
|
|||
]
|
||||
}
|
||||
|
||||
provider "registry.terraform.io/goauthentik/authentik" {
|
||||
version = "2024.12.1"
|
||||
constraints = "~> 2024.10"
|
||||
hashes = [
|
||||
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
|
||||
]
|
||||
}
|
||||
|
||||
provider "registry.terraform.io/hashicorp/helm" {
|
||||
version = "3.1.1"
|
||||
hashes = [
|
||||
|
|
|
|||
|
|
@ -24,16 +24,6 @@ module "tls_secret" {
|
|||
tls_secret_name = var.tls_secret_name
|
||||
}
|
||||
|
||||
resource "kubernetes_config_map" "kms-web-page" {
|
||||
metadata {
|
||||
name = "kms-web-page-config"
|
||||
namespace = kubernetes_namespace.kms.metadata[0].name
|
||||
}
|
||||
data = {
|
||||
"index.html" = var.index_html
|
||||
}
|
||||
}
|
||||
|
||||
resource "kubernetes_deployment" "kms-web-page" {
|
||||
metadata {
|
||||
name = "kms-web-page"
|
||||
|
|
@ -59,8 +49,11 @@ resource "kubernetes_deployment" "kms-web-page" {
|
|||
}
|
||||
}
|
||||
spec {
|
||||
image_pull_secrets {
|
||||
name = "registry-credentials"
|
||||
}
|
||||
container {
|
||||
image = "nginx"
|
||||
image = "forgejo.viktorbarzin.me/viktor/kms-website:${var.image_tag}"
|
||||
name = "kms-web-page"
|
||||
image_pull_policy = "IfNotPresent"
|
||||
resources {
|
||||
|
|
@ -76,29 +69,17 @@ resource "kubernetes_deployment" "kms-web-page" {
|
|||
container_port = 80
|
||||
protocol = "TCP"
|
||||
}
|
||||
volume_mount {
|
||||
name = "config"
|
||||
mount_path = "/usr/share/nginx/html/"
|
||||
}
|
||||
}
|
||||
|
||||
volume {
|
||||
name = "config"
|
||||
config_map {
|
||||
name = "kms-web-page-config"
|
||||
items {
|
||||
key = "index.html"
|
||||
path = "index.html"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
depends_on = [kubernetes_config_map.kms-web-page]
|
||||
lifecycle {
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
ignore_changes = [spec[0].template[0].spec[0].dns_config]
|
||||
ignore_changes = [
|
||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||
spec[0].template[0].spec[0].dns_config,
|
||||
# CI (Woodpecker) manages the live image tag via `kubectl set image`
|
||||
spec[0].template[0].spec[0].container[0].image,
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -1,68 +1,5 @@
|
|||
variable "index_html" {
|
||||
|
||||
default = <<EOT
|
||||
<h1>How to activate windows</h1>
|
||||
Open the following link and find a key for you version of windows: </br>
|
||||
<b><a href="https://goo.gl/BcrPjW" target="_blank">https://goo.gl/BcrPjW</a></b>
|
||||
</br>
|
||||
</br>
|
||||
Open cmd as <b>Administrator</b> and run the following: </br>
|
||||
</br>
|
||||
<b>slmgr.vbs /ipk key_for_your_windows</b>
|
||||
</br>
|
||||
<b>slmgr.vbs /skms kms.viktorbarzin.me </b>
|
||||
<br>
|
||||
<b>
|
||||
slmgr /ato
|
||||
</b>
|
||||
<br>
|
||||
<p>
|
||||
<h3> If you have an evaluation windows, you need to change it to retail one. This is how:</h3>
|
||||
<br>
|
||||
From an elevated command prompt, determine the current edition name with the command <br>
|
||||
<strong>DISM /online /Get-CurrentEdition</strong>.
|
||||
<br>Make note of the edition ID, an abbreviated form of the edition name. Then run
|
||||
<br>
|
||||
<strong>DISM /online /Set-Edition:<edition ID> /ProductKey:XXXXX-XXXXX-XXXXX-XXXXX-XXXXX /AcceptEula</strong>
|
||||
<br> providing the edition ID and a retail product key. The server will restart
|
||||
</p>
|
||||
<hr>
|
||||
|
||||
|
||||
<h1>How to activate Microsoft Office</h1>
|
||||
<br>
|
||||
<b>
|
||||
CD \Program Files\Microsoft Office\Office16 </b> OR <b>CD \Program Files (x86)\Microsoft Office\Office16
|
||||
</b>
|
||||
<br>
|
||||
<b>
|
||||
cscript ospp.vbs /sethst:kms.viktorbarzin.me
|
||||
</b>
|
||||
<br>
|
||||
<b>
|
||||
cscript ospp.vbs /inpkey:xxxxx-xxxxx-xxxxx-xxxxx-xxxxx
|
||||
</b>
|
||||
<br>
|
||||
where 'xxxx' is a key for your office. Some examples for office 2016 - <a
|
||||
href="https://www.techdee.com/microsoft-office-2016-product-key/">https://www.techdee.com/microsoft-office-2016-product-key/</a>
|
||||
<br>
|
||||
<b>
|
||||
cscript ospp.vbs /act
|
||||
</b>
|
||||
|
||||
<br>
|
||||
<br>
|
||||
If you messed up activation settings reset them using
|
||||
<br>
|
||||
slmgr /upk
|
||||
|
||||
<br>
|
||||
slmgr /cpky
|
||||
<br>
|
||||
and
|
||||
<br>
|
||||
slmgr /rearm
|
||||
|
||||
<h3>Buy me a beer :P</h3>
|
||||
EOT
|
||||
variable "image_tag" {
|
||||
type = string
|
||||
default = "latest"
|
||||
description = "kms-website image tag pushed to forgejo.viktorbarzin.me/viktor/kms-website. Use 8-char git SHA in CI."
|
||||
}
|
||||
|
|
|
|||
|
|
@ -20,14 +20,14 @@ resource "kubernetes_secret" "registry_credentials" {
|
|||
data = {
|
||||
".dockerconfigjson" = jsonencode({
|
||||
auths = {
|
||||
"registry.viktorbarzin.me" = {
|
||||
auth = base64encode("${data.vault_kv_secret_v2.viktor.data["registry_user"]}:${data.vault_kv_secret_v2.viktor.data["registry_password"]}")
|
||||
}
|
||||
"registry.viktorbarzin.me:5050" = {
|
||||
auth = base64encode("${data.vault_kv_secret_v2.viktor.data["registry_user"]}:${data.vault_kv_secret_v2.viktor.data["registry_password"]}")
|
||||
}
|
||||
"10.0.20.10:5050" = {
|
||||
auth = base64encode("${data.vault_kv_secret_v2.viktor.data["registry_user"]}:${data.vault_kv_secret_v2.viktor.data["registry_password"]}")
|
||||
# Phase 4 of forgejo-registry-consolidation 2026-05-07 — registry-
|
||||
# private decommissioned. Old auths entries (registry.viktorbarzin.me,
|
||||
# registry.viktorbarzin.me:5050, 10.0.20.10:5050) removed to prevent
|
||||
# silent fallback. If a pod somehow references the old hostname now,
|
||||
# it will visibly fail with auth missing rather than silently pulling
|
||||
# potentially-stale blobs.
|
||||
"forgejo.viktorbarzin.me" = {
|
||||
auth = base64encode("cluster-puller:${try(data.vault_kv_secret_v2.viktor.data["forgejo_pull_token"], "")}")
|
||||
}
|
||||
}
|
||||
})
|
||||
|
|
|
|||
|
|
@ -33,5 +33,10 @@ module "monitoring" {
|
|||
kube_config_path = var.kube_config_path
|
||||
registry_user = data.vault_kv_secret_v2.viktor.data["registry_user"]
|
||||
registry_password = data.vault_kv_secret_v2.viktor.data["registry_password"]
|
||||
tier = local.tiers.cluster
|
||||
# try() so apply succeeds before the Vault key is populated during Phase 0
|
||||
# bootstrap (see docs/runbooks/forgejo-registry-setup.md). Empty token =
|
||||
# probe will report an auth failure and fire RegistryCatalogInaccessible —
|
||||
# that's the intended visible-broken state until the PAT is created.
|
||||
forgejo_pull_token = try(data.vault_kv_secret_v2.viktor.data["forgejo_pull_token"], "")
|
||||
tier = local.tiers.cluster
|
||||
}
|
||||
|
|
|
|||
476
stacks/monitoring/modules/monitoring/dashboards/openclaw.json
Normal file
476
stacks/monitoring/modules/monitoring/dashboards/openclaw.json
Normal file
|
|
@ -0,0 +1,476 @@
|
|||
{
|
||||
"annotations": {"list": []},
|
||||
"editable": true,
|
||||
"fiscalYearStartMonth": 0,
|
||||
"graphTooltip": 0,
|
||||
"id": null,
|
||||
"links": [],
|
||||
"liveNow": false,
|
||||
"refresh": "30s",
|
||||
"schemaVersion": 38,
|
||||
"tags": ["openclaw", "ai", "codex"],
|
||||
"time": {"from": "now-6h", "to": "now"},
|
||||
"timepicker": {},
|
||||
"timezone": "",
|
||||
"title": "OpenClaw — Codex Usage",
|
||||
"uid": "openclaw-codex",
|
||||
"version": 1,
|
||||
"panels": [
|
||||
{
|
||||
"type": "row",
|
||||
"id": 100,
|
||||
"title": "Now",
|
||||
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 0},
|
||||
"collapsed": false,
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 1,
|
||||
"title": "Messages last 5h — gpt-5.4-mini",
|
||||
"description": "Plus rate-card lower bound: 1,200 / 5h. Hard cap at the upper bound: 7,000 / 5h.",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 0, "y": 1},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "auto",
|
||||
"orientation": "auto",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false},
|
||||
"textMode": "auto"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "green", "value": null},
|
||||
{"color": "yellow", "value": 960},
|
||||
{"color": "orange", "value": 1500},
|
||||
{"color": "red", "value": 5600}
|
||||
]
|
||||
},
|
||||
"unit": "short"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum(increase(openclaw_codex_messages_total{provider=\"openai-codex\",model=\"gpt-5.4-mini\"}[5h]))",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "gauge",
|
||||
"id": 2,
|
||||
"title": "% of Plus 5h floor (1,200 cap)",
|
||||
"description": "Conservative gauge against the lower bound of the published rate-card. Real ceiling depends on dynamic allocation (1,200–7,000). Re-baseline if you observe throttling at <80%.",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 6, "y": 1},
|
||||
"options": {
|
||||
"orientation": "auto",
|
||||
"showThresholdLabels": false,
|
||||
"showThresholdMarkers": true,
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"min": 0,
|
||||
"max": 100,
|
||||
"decimals": 1,
|
||||
"unit": "percent",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "green", "value": null},
|
||||
{"color": "yellow", "value": 60},
|
||||
{"color": "orange", "value": 80},
|
||||
{"color": "red", "value": 95}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "100 * sum(increase(openclaw_codex_messages_total{provider=\"openai-codex\",model=\"gpt-5.4-mini\"}[5h])) / 1200",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 3,
|
||||
"title": "Tokens last 5h (input + output, codex)",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 12, "y": 1},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 0,
|
||||
"unit": "short",
|
||||
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum(increase(openclaw_codex_input_tokens_total{provider=\"openai-codex\"}[5h])) + sum(increase(openclaw_codex_output_tokens_total{provider=\"openai-codex\"}[5h]))",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 4,
|
||||
"title": "Cache hit ratio (codex, 5h)",
|
||||
"description": "cacheRead / (cacheRead + input). Higher is better — caching cuts effective Plus quota burn.",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 18, "y": 1},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"min": 0,
|
||||
"max": 100,
|
||||
"decimals": 1,
|
||||
"unit": "percent",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "red", "value": null},
|
||||
{"color": "yellow", "value": 30},
|
||||
{"color": "green", "value": 60}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "100 * sum(increase(openclaw_codex_cache_read_tokens_total{provider=\"openai-codex\"}[5h])) / clamp_min(sum(increase(openclaw_codex_input_tokens_total{provider=\"openai-codex\"}[5h])) + sum(increase(openclaw_codex_cache_read_tokens_total{provider=\"openai-codex\"}[5h])), 1)",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 5,
|
||||
"title": "OAuth token expiry",
|
||||
"description": "Days until the openai-codex OAuth token expires. Re-run `openclaw models auth login --provider openai-codex` before this hits 0.",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 0, "y": 6},
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 1,
|
||||
"unit": "d",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "red", "value": null},
|
||||
{"color": "orange", "value": 1},
|
||||
{"color": "yellow", "value": 3},
|
||||
{"color": "green", "value": 5}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "max(openclaw_codex_oauth_expiry_seconds{provider=\"openai-codex\"}) / 86400",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 6,
|
||||
"title": "Active sessions",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 6, "y": 6},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": true},
|
||||
"textMode": "value_and_name"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "short",
|
||||
"thresholds": {"mode": "absolute", "steps": [{"color": "blue", "value": null}]}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "openclaw_codex_active_sessions",
|
||||
"legendFormat": "{{kind}}",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 7,
|
||||
"title": "Last assistant turn",
|
||||
"description": "Time since the latest assistant message landed in any session.",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 12, "y": 6},
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "green", "value": null},
|
||||
{"color": "yellow", "value": 1800},
|
||||
{"color": "orange", "value": 7200},
|
||||
{"color": "red", "value": 86400}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "time() - openclaw_codex_last_run_timestamp",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "stat",
|
||||
"id": 8,
|
||||
"title": "Errors last 24h",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 5, "w": 6, "x": 18, "y": 6},
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"graphMode": "area",
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 0,
|
||||
"unit": "short",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "green", "value": null},
|
||||
{"color": "yellow", "value": 1},
|
||||
{"color": "red", "value": 10}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum(increase(openclaw_codex_message_errors_total[24h]))",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "row",
|
||||
"id": 200,
|
||||
"title": "Over time",
|
||||
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 11},
|
||||
"collapsed": false,
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": 10,
|
||||
"title": "Messages / min by model",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 12},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {"mode": "palette-classic"},
|
||||
"custom": {
|
||||
"drawStyle": "bars",
|
||||
"fillOpacity": 60,
|
||||
"lineWidth": 1,
|
||||
"stacking": {"mode": "normal"}
|
||||
},
|
||||
"unit": "short"
|
||||
}
|
||||
},
|
||||
"options": {
|
||||
"legend": {"displayMode": "table", "placement": "right", "showLegend": true, "calcs": ["sum"]},
|
||||
"tooltip": {"mode": "multi", "sort": "desc"}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum by (provider, model) (rate(openclaw_codex_messages_total[1m])) * 60",
|
||||
"legendFormat": "{{provider}}/{{model}}",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "timeseries",
|
||||
"id": 11,
|
||||
"title": "Tokens / min by type (codex)",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 20},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {"mode": "palette-classic"},
|
||||
"custom": {
|
||||
"drawStyle": "line",
|
||||
"fillOpacity": 25,
|
||||
"lineWidth": 2,
|
||||
"stacking": {"mode": "none"}
|
||||
},
|
||||
"unit": "short"
|
||||
}
|
||||
},
|
||||
"options": {
|
||||
"legend": {"displayMode": "list", "placement": "bottom", "showLegend": true},
|
||||
"tooltip": {"mode": "multi", "sort": "desc"}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum(rate(openclaw_codex_input_tokens_total{provider=\"openai-codex\"}[5m])) * 60",
|
||||
"legendFormat": "input",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum(rate(openclaw_codex_output_tokens_total{provider=\"openai-codex\"}[5m])) * 60",
|
||||
"legendFormat": "output",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum(rate(openclaw_codex_cache_read_tokens_total{provider=\"openai-codex\"}[5m])) * 60",
|
||||
"legendFormat": "cache_read",
|
||||
"refId": "C"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "bargauge",
|
||||
"id": 12,
|
||||
"title": "Messages / 5h by model",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 20},
|
||||
"options": {
|
||||
"displayMode": "gradient",
|
||||
"orientation": "horizontal",
|
||||
"showUnfilled": true,
|
||||
"reduceOptions": {"calcs": ["lastNotNull"], "fields": "", "values": false}
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"min": 0,
|
||||
"decimals": 0,
|
||||
"unit": "short",
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "green", "value": null},
|
||||
{"color": "yellow", "value": 100},
|
||||
{"color": "orange", "value": 500},
|
||||
{"color": "red", "value": 1000}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum by (provider, model) (increase(openclaw_codex_messages_total[5h]))",
|
||||
"legendFormat": "{{provider}}/{{model}}",
|
||||
"refId": "A"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "row",
|
||||
"id": 300,
|
||||
"title": "Errors",
|
||||
"gridPos": {"h": 1, "w": 24, "x": 0, "y": 28},
|
||||
"collapsed": false,
|
||||
"panels": []
|
||||
},
|
||||
{
|
||||
"type": "table",
|
||||
"id": 20,
|
||||
"title": "Recent errors by model and reason",
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 29},
|
||||
"options": {
|
||||
"showHeader": true
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {"align": "auto", "displayMode": "auto"}
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {"id": "byName", "options": "Value"},
|
||||
"properties": [
|
||||
{"id": "displayName", "value": "Errors (24h)"},
|
||||
{"id": "custom.displayMode", "value": "color-background"},
|
||||
{
|
||||
"id": "thresholds",
|
||||
"value": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{"color": "green", "value": null},
|
||||
{"color": "yellow", "value": 1},
|
||||
{"color": "red", "value": 10}
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {"type": "prometheus", "uid": "PBFA97CFB590B2093"},
|
||||
"expr": "sum by (provider, model, reason) (increase(openclaw_codex_message_errors_total[24h])) > 0",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"transformations": [
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {
|
||||
"excludeByName": {"Time": true, "__name__": true, "instance": true, "job": true, "namespace": true, "pod": true, "app": true},
|
||||
"indexByName": {"provider": 0, "model": 1, "reason": 2, "Value": 3},
|
||||
"renameByName": {}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
File diff suppressed because it is too large
Load diff
|
|
@ -134,6 +134,7 @@ locals {
|
|||
# Applications
|
||||
"qbittorrent.json" = "Applications"
|
||||
"realestate-crawler.json" = "Applications"
|
||||
"openclaw.json" = "Applications"
|
||||
"uk-payslip.json" = "Finance (Personal)"
|
||||
"wealth.json" = "Finance (Personal)"
|
||||
"job-hunter.json" = "Finance"
|
||||
|
|
|
|||
|
|
@ -41,6 +41,11 @@ variable "registry_password" {
|
|||
type = string
|
||||
sensitive = true
|
||||
}
|
||||
variable "forgejo_pull_token" {
|
||||
type = string
|
||||
sensitive = true
|
||||
description = "PAT for the cluster-puller user, used by the Forgejo registry integrity probe."
|
||||
}
|
||||
|
||||
resource "kubernetes_namespace" "monitoring" {
|
||||
metadata {
|
||||
|
|
@ -238,27 +243,42 @@ resource "kubernetes_cron_job_v1" "dns_anomaly_monitor" {
|
|||
}
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Registry manifest-integrity probe — HEADs every tag in the private R/W
|
||||
# registry's catalog, walks multi-platform image indexes, and reports blob
|
||||
# availability. Catches the orphan-index failure mode seen 2026-04-13 and
|
||||
# 2026-04-19 before downstream pipelines hit it.
|
||||
# Phase 4 of forgejo-registry-consolidation 2026-05-07: registry-private
|
||||
# decommissioned. The integrity probe below caught the orphan-index failure
|
||||
# mode in `registry:2.8.3` (post-mortem 2026-04-19). With that engine
|
||||
# retired, the probe is replaced by `forgejo_integrity_probe` below.
|
||||
#
|
||||
# Resource definitions stripped wholesale — terragrunt apply destroys the
|
||||
# in-cluster CronJob + Secret on the next run.
|
||||
# See: docs/post-mortems/2026-04-19-registry-orphan-index.md
|
||||
# -----------------------------------------------------------------------------
|
||||
resource "kubernetes_secret" "registry_probe_credentials" {
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Forgejo registry integrity probe — same algorithm as registry-integrity-probe
|
||||
# above, but targets the Forgejo OCI registry instead of registry-private. Runs
|
||||
# in parallel with the existing probe during the dual-push bake; once Phase 4
|
||||
# decommissions registry-private, the registry-integrity-probe CronJob is
|
||||
# deleted and only this one remains.
|
||||
#
|
||||
# Auth: HTTP Basic with cluster-puller PAT (read:package scope is enough to
|
||||
# walk catalog + manifests). Reaches Forgejo via the in-cluster service so we
|
||||
# don't hairpin out through Traefik for every probe run.
|
||||
# -----------------------------------------------------------------------------
|
||||
resource "kubernetes_secret" "forgejo_probe_credentials" {
|
||||
metadata {
|
||||
name = "registry-probe-credentials"
|
||||
name = "forgejo-probe-credentials"
|
||||
namespace = kubernetes_namespace.monitoring.metadata[0].name
|
||||
}
|
||||
type = "Opaque"
|
||||
data = {
|
||||
REG_USER = var.registry_user
|
||||
REG_PASS = var.registry_password
|
||||
REG_USER = "cluster-puller"
|
||||
REG_PASS = var.forgejo_pull_token
|
||||
}
|
||||
}
|
||||
|
||||
resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
||||
resource "kubernetes_cron_job_v1" "forgejo_integrity_probe" {
|
||||
metadata {
|
||||
name = "registry-integrity-probe"
|
||||
name = "forgejo-integrity-probe"
|
||||
namespace = kubernetes_namespace.monitoring.metadata[0].name
|
||||
}
|
||||
spec {
|
||||
|
|
@ -275,13 +295,13 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
|||
metadata {}
|
||||
spec {
|
||||
container {
|
||||
name = "registry-integrity-probe"
|
||||
name = "forgejo-integrity-probe"
|
||||
image = "docker.io/library/alpine:3.20"
|
||||
env {
|
||||
name = "REG_USER"
|
||||
value_from {
|
||||
secret_key_ref {
|
||||
name = kubernetes_secret.registry_probe_credentials.metadata[0].name
|
||||
name = kubernetes_secret.forgejo_probe_credentials.metadata[0].name
|
||||
key = "REG_USER"
|
||||
}
|
||||
}
|
||||
|
|
@ -290,22 +310,26 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
|||
name = "REG_PASS"
|
||||
value_from {
|
||||
secret_key_ref {
|
||||
name = kubernetes_secret.registry_probe_credentials.metadata[0].name
|
||||
name = kubernetes_secret.forgejo_probe_credentials.metadata[0].name
|
||||
key = "REG_PASS"
|
||||
}
|
||||
}
|
||||
}
|
||||
env {
|
||||
name = "REGISTRY_HOST"
|
||||
value = "10.0.20.10:5050"
|
||||
value = "forgejo.forgejo.svc.cluster.local"
|
||||
}
|
||||
env {
|
||||
name = "REGISTRY_SCHEME"
|
||||
value = "http"
|
||||
}
|
||||
env {
|
||||
name = "REGISTRY_INSTANCE"
|
||||
value = "registry.viktorbarzin.me:5050"
|
||||
value = "forgejo.viktorbarzin.me"
|
||||
}
|
||||
env {
|
||||
name = "PUSHGATEWAY"
|
||||
value = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/registry-integrity-probe"
|
||||
value = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/forgejo-integrity-probe"
|
||||
}
|
||||
env {
|
||||
name = "TAGS_PER_REPO"
|
||||
|
|
@ -316,16 +340,16 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
|||
apk add --no-cache curl jq >/dev/null
|
||||
|
||||
REG="$REGISTRY_HOST"
|
||||
SCHEME="$${REGISTRY_SCHEME:-https}"
|
||||
INSTANCE="$REGISTRY_INSTANCE"
|
||||
AUTH="$REG_USER:$REG_PASS"
|
||||
ACCEPT='application/vnd.oci.image.index.v1+json,application/vnd.oci.image.manifest.v1+json,application/vnd.docker.distribution.manifest.list.v2+json,application/vnd.docker.distribution.manifest.v2+json'
|
||||
|
||||
push() {
|
||||
# Prometheus pushgateway — body ends with blank line. Ignore push errors.
|
||||
curl -sf --max-time 10 --data-binary @- "$PUSHGATEWAY" >/dev/null 2>&1 || true
|
||||
}
|
||||
|
||||
CATALOG=$(curl -sk -u "$AUTH" --max-time 30 "https://$REG/v2/_catalog?n=1000" || echo "")
|
||||
CATALOG=$(curl -sk -u "$AUTH" --max-time 30 "$SCHEME://$REG/v2/_catalog?n=1000" || echo "")
|
||||
REPOS=$(echo "$CATALOG" | jq -r '.repositories[]?' 2>/dev/null || echo "")
|
||||
|
||||
if [ -z "$REPOS" ]; then
|
||||
|
|
@ -350,7 +374,7 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
|||
[ -z "$repo" ] && continue
|
||||
REPOS_N=$((REPOS_N + 1))
|
||||
|
||||
TAGS_JSON=$(curl -sk -u "$AUTH" --max-time 15 "https://$REG/v2/$repo/tags/list" || echo "")
|
||||
TAGS_JSON=$(curl -sk -u "$AUTH" --max-time 15 "$SCHEME://$REG/v2/$repo/tags/list" || echo "")
|
||||
echo "$TAGS_JSON" | jq -r '.tags[]?' 2>/dev/null | tail -n "$TAGS_PER_REPO" > /tmp/tags.txt || true
|
||||
|
||||
while IFS= read -r tag; do
|
||||
|
|
@ -359,7 +383,7 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
|||
|
||||
HTTP=$(curl -sk -u "$AUTH" -o /tmp/m.json -w '%%{http_code}' \
|
||||
-H "Accept: $ACCEPT" --max-time 15 \
|
||||
"https://$REG/v2/$repo/manifests/$tag")
|
||||
"$SCHEME://$REG/v2/$repo/manifests/$tag")
|
||||
if [ "$HTTP" != "200" ]; then
|
||||
echo "FAIL: $repo:$tag manifest HTTP $HTTP"
|
||||
FAIL=$((FAIL + 1))
|
||||
|
|
@ -374,7 +398,7 @@ resource "kubernetes_cron_job_v1" "registry_integrity_probe" {
|
|||
[ -z "$d" ] && continue
|
||||
CH=$(curl -sk -u "$AUTH" -o /dev/null -w '%%{http_code}' \
|
||||
-H "Accept: $ACCEPT" --max-time 10 -I \
|
||||
"https://$REG/v2/$repo/manifests/$d")
|
||||
"$SCHEME://$REG/v2/$repo/manifests/$d")
|
||||
if [ "$CH" != "200" ]; then
|
||||
echo "FAIL: $repo:$tag index child $d HTTP $CH"
|
||||
FAIL=$((FAIL + 1))
|
||||
|
|
|
|||
|
|
@ -1656,22 +1656,22 @@ serverFiles:
|
|||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Registry has {{ $value }} broken manifest reference(s) — orphan index or missing blob"
|
||||
description: "The registry-integrity-probe CronJob in the monitoring namespace found {{ $value }} manifest/blob references that return non-200 on the private registry. Almost certainly an orphan OCI-index child from the cleanup-tags.sh+GC race. Rebuild the affected image per docs/runbooks/registry-rebuild-image.md and investigate which tag(s) the probe logs flagged."
|
||||
summary: "{{ $labels.instance }}: {{ $value }} broken manifest reference(s) — orphan index or missing blob"
|
||||
description: "The forgejo-integrity-probe CronJob found {{ $value }} manifest/blob references that return non-200 on {{ $labels.instance }}. Rebuild the affected image per docs/runbooks/forgejo-registry-rebuild-image.md. (registry.viktorbarzin.me retired Phase 4 of forgejo-registry-consolidation 2026-05-07 — only forgejo.viktorbarzin.me remains.)"
|
||||
- alert: RegistryIntegrityProbeStale
|
||||
expr: time() - registry_manifest_integrity_last_run_timestamp > 3600
|
||||
for: 15m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Registry integrity probe has not reported in >1h — CronJob may be broken"
|
||||
summary: "{{ $labels.instance }} integrity probe has not reported in >1h — CronJob may be broken"
|
||||
- alert: RegistryCatalogInaccessible
|
||||
expr: registry_manifest_integrity_catalog_accessible == 0
|
||||
for: 15m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Registry probe cannot fetch /v2/_catalog — auth failure or registry down"
|
||||
summary: "{{ $labels.instance }} probe cannot fetch /v2/_catalog — auth failure or registry down"
|
||||
- alert: NodeHighCPUUsage
|
||||
expr: pve_cpu_usage_ratio * 100 > 60
|
||||
for: 6h
|
||||
|
|
|
|||
264
stacks/openclaw/files/exporter.py
Normal file
264
stacks/openclaw/files/exporter.py
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
#!/usr/bin/env python3
|
||||
"""OpenClaw / Codex usage exporter.
|
||||
|
||||
Reads ~/.openclaw/agents/*/sessions/*.jsonl (assistant messages with usage)
|
||||
and ~/.openclaw/agents/*/agent/auth-state.json (OAuth profiles), then exposes
|
||||
Prometheus text-format metrics on :9099/metrics. Stdlib only — no pip install
|
||||
needed at startup.
|
||||
|
||||
Metrics (all cumulative-since-session-start; use Prometheus increase()/rate()
|
||||
for windowed views):
|
||||
|
||||
openclaw_codex_messages_total{provider,model,session_kind} counter
|
||||
openclaw_codex_input_tokens_total{provider,model} counter
|
||||
openclaw_codex_output_tokens_total{provider,model} counter
|
||||
openclaw_codex_cache_read_tokens_total{provider,model} counter
|
||||
openclaw_codex_cache_write_tokens_total{provider,model} counter
|
||||
openclaw_codex_message_errors_total{provider,model,reason} counter
|
||||
openclaw_codex_active_sessions{kind} gauge
|
||||
openclaw_codex_oauth_expiry_seconds{provider,account} gauge
|
||||
openclaw_codex_last_run_timestamp gauge
|
||||
openclaw_codex_exporter_scrape_duration_ms gauge
|
||||
"""
|
||||
import glob
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import time
|
||||
from datetime import datetime
|
||||
from http.server import BaseHTTPRequestHandler, HTTPServer
|
||||
from threading import Lock
|
||||
|
||||
OPENCLAW_HOME = os.environ.get("OPENCLAW_HOME", "/home/node/.openclaw")
|
||||
PORT = int(os.environ.get("METRICS_PORT", "9099"))
|
||||
CACHE_SEC = float(os.environ.get("CACHE_SEC", "5"))
|
||||
SKIP_FRAGMENTS = (".broken.", ".reset.", ".deleted.", ".bak.")
|
||||
SESSION_RE = re.compile(r"^([0-9a-f-]{36})\.jsonl$")
|
||||
|
||||
_lock = Lock()
|
||||
_cache = {"text": "", "ts": 0.0}
|
||||
|
||||
|
||||
def _esc(value: str) -> str:
|
||||
return str(value).replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n")
|
||||
|
||||
|
||||
def _line(name: str, labels: dict, value) -> str:
|
||||
if labels:
|
||||
rendered = ",".join(f'{k}="{_esc(v)}"' for k, v in sorted(labels.items()))
|
||||
return f"{name}{{{rendered}}} {value}"
|
||||
return f"{name} {value}"
|
||||
|
||||
|
||||
def _kind_for(session_id: str, sessions_index: dict) -> str:
|
||||
for key, val in sessions_index.items():
|
||||
if val.get("sessionId") != session_id:
|
||||
continue
|
||||
if key.startswith("agent:main:cron:"):
|
||||
return "cron"
|
||||
if key.startswith("telegram:slash:"):
|
||||
return "telegram-slash"
|
||||
if key.startswith("agent:main:"):
|
||||
return "main"
|
||||
surface = (val.get("origin") or {}).get("surface")
|
||||
if surface:
|
||||
return surface
|
||||
return key.split(":", 1)[0]
|
||||
return "unknown"
|
||||
|
||||
|
||||
def _parse_ts(value):
|
||||
if isinstance(value, (int, float)):
|
||||
return float(value)
|
||||
if isinstance(value, str):
|
||||
try:
|
||||
return datetime.fromisoformat(value.replace("Z", "+00:00")).timestamp()
|
||||
except ValueError:
|
||||
return 0.0
|
||||
return 0.0
|
||||
|
||||
|
||||
def _build_text() -> str:
|
||||
start = time.monotonic()
|
||||
out = []
|
||||
|
||||
sessions_index: dict = {}
|
||||
for sp in glob.glob(os.path.join(OPENCLAW_HOME, "agents/*/sessions/sessions.json")):
|
||||
try:
|
||||
with open(sp) as f:
|
||||
sessions_index.update(json.load(f))
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
msg_count: dict = {}
|
||||
in_tok: dict = {}
|
||||
out_tok: dict = {}
|
||||
cr_tok: dict = {}
|
||||
cw_tok: dict = {}
|
||||
err_count: dict = {}
|
||||
latest_ts = 0.0
|
||||
|
||||
for jsonl in glob.glob(os.path.join(OPENCLAW_HOME, "agents/*/sessions/*.jsonl")):
|
||||
bn = os.path.basename(jsonl)
|
||||
if any(s in bn for s in SKIP_FRAGMENTS):
|
||||
continue
|
||||
m = SESSION_RE.match(bn)
|
||||
if not m:
|
||||
continue
|
||||
sid = m.group(1)
|
||||
kind = _kind_for(sid, sessions_index)
|
||||
try:
|
||||
with open(jsonl) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
obj = json.loads(line)
|
||||
except Exception:
|
||||
continue
|
||||
if obj.get("type") != "message":
|
||||
continue
|
||||
msg = obj.get("message") or {}
|
||||
if msg.get("role") != "assistant":
|
||||
continue
|
||||
provider = msg.get("provider") or "unknown"
|
||||
model = msg.get("model") or "unknown"
|
||||
usage = msg.get("usage") or {}
|
||||
ts = _parse_ts(obj.get("timestamp"))
|
||||
if ts > latest_ts:
|
||||
latest_ts = ts
|
||||
if msg.get("stopReason") == "error":
|
||||
reason = (msg.get("errorMessage") or "unknown")[:80]
|
||||
ek = (provider, model, reason)
|
||||
err_count[ek] = err_count.get(ek, 0) + 1
|
||||
continue
|
||||
mk = (provider, model, kind)
|
||||
msg_count[mk] = msg_count.get(mk, 0) + 1
|
||||
pm = (provider, model)
|
||||
in_tok[pm] = in_tok.get(pm, 0) + (usage.get("input") or 0)
|
||||
out_tok[pm] = out_tok.get(pm, 0) + (usage.get("output") or 0)
|
||||
cr_tok[pm] = cr_tok.get(pm, 0) + (usage.get("cacheRead") or 0)
|
||||
cw_tok[pm] = cw_tok.get(pm, 0) + (usage.get("cacheWrite") or 0)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
out.append("# HELP openclaw_codex_messages_total Cumulative assistant messages")
|
||||
out.append("# TYPE openclaw_codex_messages_total counter")
|
||||
for (p, mdl, k), c in msg_count.items():
|
||||
out.append(_line("openclaw_codex_messages_total",
|
||||
{"provider": p, "model": mdl, "session_kind": k}, c))
|
||||
|
||||
for name, src, hlp in [
|
||||
("openclaw_codex_input_tokens_total", in_tok, "Cumulative input tokens"),
|
||||
("openclaw_codex_output_tokens_total", out_tok, "Cumulative output tokens"),
|
||||
("openclaw_codex_cache_read_tokens_total", cr_tok, "Cumulative cache-read tokens"),
|
||||
("openclaw_codex_cache_write_tokens_total", cw_tok, "Cumulative cache-write tokens"),
|
||||
]:
|
||||
out.append(f"# HELP {name} {hlp}")
|
||||
out.append(f"# TYPE {name} counter")
|
||||
for (p, mdl), c in src.items():
|
||||
out.append(_line(name, {"provider": p, "model": mdl}, c))
|
||||
|
||||
out.append("# HELP openclaw_codex_message_errors_total Cumulative assistant errors")
|
||||
out.append("# TYPE openclaw_codex_message_errors_total counter")
|
||||
for (p, mdl, r), c in err_count.items():
|
||||
out.append(_line("openclaw_codex_message_errors_total",
|
||||
{"provider": p, "model": mdl, "reason": r}, c))
|
||||
|
||||
out.append("# HELP openclaw_codex_active_sessions Active sessions in sessions.json")
|
||||
out.append("# TYPE openclaw_codex_active_sessions gauge")
|
||||
kc: dict = {}
|
||||
for k in sessions_index:
|
||||
if k.startswith("agent:main:cron:"):
|
||||
kk = "cron"
|
||||
elif k.startswith("telegram:slash:"):
|
||||
kk = "telegram-slash"
|
||||
elif k.startswith("agent:main:"):
|
||||
kk = "main"
|
||||
else:
|
||||
kk = k.split(":", 1)[0]
|
||||
kc[kk] = kc.get(kk, 0) + 1
|
||||
for k, c in kc.items():
|
||||
out.append(_line("openclaw_codex_active_sessions", {"kind": k}, c))
|
||||
|
||||
if latest_ts:
|
||||
out.append("# HELP openclaw_codex_last_run_timestamp Unix ts of newest assistant message")
|
||||
out.append("# TYPE openclaw_codex_last_run_timestamp gauge")
|
||||
out.append(_line("openclaw_codex_last_run_timestamp", {}, latest_ts))
|
||||
|
||||
out.append("# HELP openclaw_codex_oauth_expiry_seconds Seconds until OAuth token expires")
|
||||
out.append("# TYPE openclaw_codex_oauth_expiry_seconds gauge")
|
||||
now = time.time()
|
||||
for af in glob.glob(os.path.join(OPENCLAW_HOME, "agents/*/agent/auth-profiles.json")):
|
||||
try:
|
||||
with open(af) as f:
|
||||
data = json.load(f)
|
||||
except Exception:
|
||||
continue
|
||||
# Schema: {"version": 1, "profiles": {"<id>": {...}}}.
|
||||
# `expires` is Unix milliseconds.
|
||||
for profile in (data.get("profiles") or {}).values():
|
||||
exp_ms = profile.get("expires")
|
||||
if not isinstance(exp_ms, (int, float)):
|
||||
continue
|
||||
exp_ts = exp_ms / 1000.0
|
||||
out.append(_line(
|
||||
"openclaw_codex_oauth_expiry_seconds",
|
||||
{
|
||||
"provider": profile.get("provider", "unknown"),
|
||||
"account": profile.get("email") or profile.get("account") or "unknown",
|
||||
"plan": profile.get("chatgptPlanType") or "unknown",
|
||||
},
|
||||
max(0, exp_ts - now),
|
||||
))
|
||||
|
||||
out.append("# HELP openclaw_codex_exporter_scrape_duration_ms Last scrape duration ms")
|
||||
out.append("# TYPE openclaw_codex_exporter_scrape_duration_ms gauge")
|
||||
out.append(_line("openclaw_codex_exporter_scrape_duration_ms", {},
|
||||
(time.monotonic() - start) * 1000))
|
||||
|
||||
return "\n".join(out) + "\n"
|
||||
|
||||
|
||||
class Handler(BaseHTTPRequestHandler):
|
||||
def do_GET(self):
|
||||
if self.path == "/healthz":
|
||||
self.send_response(200)
|
||||
self.send_header("Content-Type", "text/plain")
|
||||
self.end_headers()
|
||||
self.wfile.write(b"ok\n")
|
||||
return
|
||||
if self.path != "/metrics":
|
||||
self.send_response(404)
|
||||
self.end_headers()
|
||||
return
|
||||
with _lock:
|
||||
now = time.time()
|
||||
if now - _cache["ts"] > CACHE_SEC:
|
||||
try:
|
||||
_cache["text"] = _build_text()
|
||||
except Exception as exc: # noqa: BLE001
|
||||
_cache["text"] = (
|
||||
f'openclaw_codex_exporter_errors_total{{kind="scrape"}} 1\n'
|
||||
f'# scrape error: {_esc(str(exc))[:200]}\n'
|
||||
)
|
||||
_cache["ts"] = now
|
||||
body = _cache["text"].encode()
|
||||
self.send_response(200)
|
||||
self.send_header("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
|
||||
self.send_header("Content-Length", str(len(body)))
|
||||
self.end_headers()
|
||||
self.wfile.write(body)
|
||||
|
||||
def log_message(self, *args, **kwargs):
|
||||
pass
|
||||
|
||||
|
||||
def main():
|
||||
print(f"openclaw exporter listening on :{PORT}", flush=True)
|
||||
HTTPServer(("0.0.0.0", PORT), Handler).serve_forever()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -131,8 +131,12 @@ resource "kubernetes_config_map" "openclaw_config" {
|
|||
mode = "off"
|
||||
}
|
||||
model = {
|
||||
primary = "nim/qwen/qwen3.5-397b-a17b"
|
||||
fallbacks = ["nim/mistralai/mistral-large-3-675b-instruct-2512", "nim/nvidia/llama-3.1-nemotron-ultra-253b-v1", "modelrelay/auto-fastest"]
|
||||
# ChatGPT Plus OAuth via openai-codex plugin (account: ancaelena98@gmail.com).
|
||||
# gpt-5.4-mini is the only mini variant the Codex backend accepts for Plus tier;
|
||||
# gpt-5-mini / gpt-5.1-codex-mini return model_not_found / "not supported with
|
||||
# ChatGPT account". Plus rate-card: 1,200–7,000 local msgs / 5h on gpt-5.4-mini.
|
||||
primary = "openai-codex/gpt-5.4-mini"
|
||||
fallbacks = ["openai-codex/gpt-5.5", "nim/qwen/qwen3-coder-480b-a35b-instruct", "modelrelay/auto-fastest"]
|
||||
}
|
||||
models = {
|
||||
"modelrelay/auto-fastest" = {}
|
||||
|
|
@ -146,6 +150,8 @@ resource "kubernetes_config_map" "openclaw_config" {
|
|||
"llama-as-openai/Llama-4-Scout-17B-16E-Instruct-FP8" = {}
|
||||
"openrouter/stepfun/step-3.5-flash:free" = {}
|
||||
"openrouter/arcee-ai/trinity-large-preview:free" = {}
|
||||
"openai-codex/gpt-5.4-mini" = {}
|
||||
"openai-codex/gpt-5.5" = {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -255,6 +261,19 @@ resource "random_password" "gateway_token" {
|
|||
special = false
|
||||
}
|
||||
|
||||
# Prometheus exporter script — read by the openclaw-exporter sidecar.
|
||||
# Stdlib-only Python so no pip install at startup. Reads sessions JSONL +
|
||||
# auth-profiles.json from the NFS-backed openclaw home volume (mounted ro).
|
||||
resource "kubernetes_config_map" "openclaw_exporter" {
|
||||
metadata {
|
||||
name = "openclaw-exporter"
|
||||
namespace = kubernetes_namespace.openclaw.metadata[0].name
|
||||
}
|
||||
data = {
|
||||
"exporter.py" = file("${path.module}/files/exporter.py")
|
||||
}
|
||||
}
|
||||
|
||||
module "nfs_tools_host" {
|
||||
source = "../../modules/kubernetes/nfs_volume"
|
||||
name = "openclaw-tools-host"
|
||||
|
|
@ -344,6 +363,11 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
}
|
||||
annotations = {
|
||||
"reloader.stakater.com/search" = "true"
|
||||
# Prometheus auto-discovers pods with these annotations.
|
||||
# Scraped by the openclaw-exporter sidecar — exposes /metrics on :9099.
|
||||
"prometheus.io/scrape" = "true"
|
||||
"prometheus.io/port" = "9099"
|
||||
"prometheus.io/path" = "/metrics"
|
||||
}
|
||||
}
|
||||
spec {
|
||||
|
|
@ -383,8 +407,10 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
# Main container: OpenClaw
|
||||
container {
|
||||
name = "openclaw"
|
||||
image = "ghcr.io/openclaw/openclaw:2026.2.26"
|
||||
command = ["sh", "-c", "node openclaw.mjs doctor --fix 2>/dev/null; exec node openclaw.mjs gateway --allow-unconfigured --bind lan"]
|
||||
image = "ghcr.io/openclaw/openclaw:2026.5.4"
|
||||
# Doctor --fix auto-promotes the highest-tier codex model (gpt-5-pro) after
|
||||
# auth-profile-based model discovery; pin gpt-5.4-mini back to default after it.
|
||||
command = ["sh", "-c", "node openclaw.mjs doctor --fix 2>/dev/null; node openclaw.mjs models set openai-codex/gpt-5.4-mini 2>/dev/null; exec node openclaw.mjs gateway --allow-unconfigured --bind lan"]
|
||||
port {
|
||||
container_port = 18789
|
||||
}
|
||||
|
|
@ -510,6 +536,54 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
}
|
||||
}
|
||||
|
||||
# Sidecar: openclaw-exporter — Prometheus exporter for Codex/OAuth usage.
|
||||
# Reads sessions JSONL files + auth-profiles.json, exposes /metrics on :9099.
|
||||
# Stdlib-only Python; no pip install at startup.
|
||||
container {
|
||||
name = "openclaw-exporter"
|
||||
image = "docker.io/library/python:3.12-slim"
|
||||
command = ["python3", "/scripts/exporter.py"]
|
||||
port {
|
||||
container_port = 9099
|
||||
name = "metrics"
|
||||
}
|
||||
env {
|
||||
name = "OPENCLAW_HOME"
|
||||
value = "/home/node/.openclaw"
|
||||
}
|
||||
env {
|
||||
name = "METRICS_PORT"
|
||||
value = "9099"
|
||||
}
|
||||
volume_mount {
|
||||
name = "openclaw-exporter-script"
|
||||
mount_path = "/scripts"
|
||||
read_only = true
|
||||
}
|
||||
volume_mount {
|
||||
name = "openclaw-home"
|
||||
mount_path = "/home/node/.openclaw"
|
||||
read_only = true
|
||||
}
|
||||
readiness_probe {
|
||||
http_get {
|
||||
path = "/healthz"
|
||||
port = 9099
|
||||
}
|
||||
initial_delay_seconds = 5
|
||||
period_seconds = 30
|
||||
}
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "10m"
|
||||
memory = "64Mi"
|
||||
}
|
||||
limits = {
|
||||
memory = "128Mi"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Sidecar: modelrelay — auto-routes to fastest healthy free model
|
||||
container {
|
||||
name = "modelrelay"
|
||||
|
|
@ -598,6 +672,13 @@ resource "kubernetes_deployment" "openclaw" {
|
|||
name = kubernetes_config_map.openclaw_config.metadata[0].name
|
||||
}
|
||||
}
|
||||
volume {
|
||||
name = "openclaw-exporter-script"
|
||||
config_map {
|
||||
name = kubernetes_config_map.openclaw_exporter.metadata[0].name
|
||||
default_mode = "0555"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
|
|
@ -8,7 +8,10 @@ variable "postgresql_host" { type = string }
|
|||
|
||||
locals {
|
||||
namespace = "payslip-ingest"
|
||||
image = "registry.viktorbarzin.me/payslip-ingest:${var.image_tag}"
|
||||
# Phase 3 of forgejo-registry-consolidation — image= flipped to Forgejo
|
||||
# 2026-05-07. registry-private kept image at the same path, so the new
|
||||
# Forgejo URL is `viktor/<name>` under forgejo.viktorbarzin.me.
|
||||
image = "forgejo.viktorbarzin.me/viktor/payslip-ingest:${var.image_tag}"
|
||||
labels = {
|
||||
app = "payslip-ingest"
|
||||
}
|
||||
|
|
|
|||
|
|
@ -307,6 +307,12 @@ resource "kubernetes_config_map" "bot_block_proxy_config" {
|
|||
server {
|
||||
listen 8080;
|
||||
location /auth {
|
||||
access_by_lua_block {
|
||||
ngx.req.clear_header("If-Match")
|
||||
ngx.req.clear_header("If-None-Match")
|
||||
ngx.req.clear_header("If-Modified-Since")
|
||||
ngx.req.clear_header("If-Unmodified-Since")
|
||||
}
|
||||
proxy_pass http://poison_fountain;
|
||||
proxy_connect_timeout 3s;
|
||||
proxy_read_timeout 5s;
|
||||
|
|
@ -373,7 +379,7 @@ resource "kubernetes_deployment" "bot_block_proxy" {
|
|||
}
|
||||
container {
|
||||
name = "nginx"
|
||||
image = "nginx:1-alpine"
|
||||
image = "openresty/openresty:alpine"
|
||||
|
||||
port {
|
||||
container_port = 8080
|
||||
|
|
|
|||
|
|
@ -515,7 +515,12 @@ resource "kubernetes_cron_job_v1" "wealthfolio_sync" {
|
|||
}
|
||||
container {
|
||||
name = "sync"
|
||||
image = "registry.viktorbarzin.me/wealthfolio-sync:latest"
|
||||
# Phase 4 of forgejo-registry-consolidation 2026-05-07 +
|
||||
# post-cutover wealthfolio-sync rebuild: image is now
|
||||
# produced by /home/wizard/code/broker-sync (Forgejo
|
||||
# viktor/broker-sync, DockerHub viktorbarzin/broker-sync,
|
||||
# Forgejo viktor/wealthfolio-sync as the cluster pull path).
|
||||
image = "forgejo.viktorbarzin.me/viktor/wealthfolio-sync:latest"
|
||||
env {
|
||||
name = "IMAP_HOST"
|
||||
value_from {
|
||||
|
|
|
|||
|
|
@ -172,6 +172,31 @@ resource "helm_release" "woodpecker" {
|
|||
depends_on = [kubernetes_manifest.db_external_secret]
|
||||
}
|
||||
|
||||
# Patch hostAliases onto the woodpecker-server StatefulSet — the chart 3.5.1
|
||||
# does NOT expose this field, so we have to do it after the helm release.
|
||||
# Keeps the OAuth/forge-API path off the WAN gateway (forgejo.viktorbarzin.me
|
||||
# resolves to the public IP via DNS, which round-trips through Cloudflare
|
||||
# and routinely tripped 30s context-deadline timeouts when fetching pipeline
|
||||
# config). 10.0.20.200 is the Traefik LB that fronts forgejo internally;
|
||||
# Traefik serves the *.viktorbarzin.me wildcard so SNI verification still
|
||||
# passes.
|
||||
resource "null_resource" "woodpecker_server_host_alias" {
|
||||
triggers = {
|
||||
helm_revision = helm_release.woodpecker.metadata[0].revision
|
||||
}
|
||||
|
||||
provisioner "local-exec" {
|
||||
command = <<-BASH
|
||||
set -euo pipefail
|
||||
kubectl -n woodpecker patch statefulset/woodpecker-server --type=strategic --patch '{"spec":{"template":{"spec":{"hostAliases":[{"ip":"10.0.20.200","hostnames":["forgejo.viktorbarzin.me"]}]}}}}'
|
||||
kubectl -n woodpecker rollout status statefulset/woodpecker-server --timeout=120s
|
||||
BASH
|
||||
interpreter = ["/bin/bash", "-c"]
|
||||
}
|
||||
|
||||
depends_on = [helm_release.woodpecker]
|
||||
}
|
||||
|
||||
# ClusterRoleBinding - build pods need cluster-admin to PATCH deployments across namespaces
|
||||
resource "kubernetes_cluster_role_binding" "woodpecker" {
|
||||
metadata {
|
||||
|
|
|
|||
|
|
@ -4,10 +4,19 @@ server:
|
|||
reloader.stakater.com/search: "true"
|
||||
statefulSet:
|
||||
replicaCount: 1
|
||||
# NOTE: hostAliases is NOT exposed by the woodpecker Helm chart (3.5.1 verified) —
|
||||
# see main.tf null_resource.woodpecker_server_host_alias which applies the same
|
||||
# via `kubectl patch` post-helm. Pinned to the in-cluster Traefik LB
|
||||
# (10.0.20.200) so the forge-API fetch path never round-trips through
|
||||
# Cloudflare ("context deadline exceeded" was failing every Forgejo
|
||||
# pipeline trigger).
|
||||
image:
|
||||
registry: docker.io
|
||||
repository: woodpeckerci/woodpecker-server
|
||||
tag: "v3.13.0"
|
||||
# Bumped 2026-05-07 from v3.13.0 → v3.14.0 to fix the
|
||||
# "could not load config from forge: context deadline exceeded"
|
||||
# issue when fetching .woodpecker.yml from Forgejo.
|
||||
tag: "v3.14.0"
|
||||
extraSecretNamesForEnvFrom:
|
||||
- woodpecker-db-creds
|
||||
env:
|
||||
|
|
@ -27,6 +36,14 @@ server:
|
|||
WOODPECKER_FORGEJO_CLIENT: "${forgejo_client_id}"
|
||||
WOODPECKER_FORGEJO_SECRET: "${forgejo_client_secret}"
|
||||
WOODPECKER_FORGEJO_URL: "${forgejo_url}"
|
||||
# Default is 3s (cmd/server/flags.go @ default `--forge-timeout`).
|
||||
# Forgejo responses on this cluster spike to 1-2s under load and the
|
||||
# config-loader makes 4-6 sequential calls (.woodpecker dir, .woodpecker.yaml,
|
||||
# .woodpecker.yml, raw .woodpecker/build.yml, etc.); occasionally the cumulative
|
||||
# overhead trips the 3s deadline → "could not load config from forge: context
|
||||
# deadline exceeded" on every pipeline. 30s removes the false-positive timeouts
|
||||
# without regressing the legitimate-failure detection window meaningfully.
|
||||
WOODPECKER_FORGE_TIMEOUT: "30s"
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 80
|
||||
|
|
@ -46,7 +63,7 @@ agent:
|
|||
image:
|
||||
registry: docker.io
|
||||
repository: woodpeckerci/woodpecker-agent
|
||||
tag: "v3.13.0"
|
||||
tag: "v3.14.0"
|
||||
env:
|
||||
WOODPECKER_BACKEND: "kubernetes"
|
||||
WOODPECKER_BACKEND_K8S_NAMESPACE: "woodpecker"
|
||||
|
|
|
|||
File diff suppressed because one or more lines are too long
Loading…
Add table
Add a link
Reference in a new issue