Restore f1-stream stack — undo accidental bundling into 63fe7d2b
Commit 63fe7d2b (fan-control) was made with a bare `git commit` in the shared infra working tree and inadvertently swept in a parallel session's staged f1-stream-extraction work (main.tf repoint, ~48 files/ removals, ci-cd.md + .claude docs, two extraction plan docs). This returns every f1-stream-related path to its pre-63fe7d2b state (3493c347) so that extraction can be committed cleanly by its own session. The fan-control files added in 63fe7d2b are untouched. [ci skip] Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
90ad6b9125
commit
147a8cff40
54 changed files with 9563 additions and 163 deletions
|
|
@ -104,15 +104,13 @@ have `ignore_changes` on `…container[0].image` (KEEL_IGNORE_IMAGE) so CI
|
|||
`:latest` + `imagePullPolicy: Always` (fresh pod each run) instead of a deploy
|
||||
step. **Never** `set image`/`rollout restart` operator-managed StatefulSets
|
||||
(memory id=740). Reference impls: `tuya_bridge/.woodpecker.yml`,
|
||||
`job-hunter`, `f1-stream` (viktor/f1-stream, extracted from this monorepo
|
||||
2026-06-04). This reverses decision #12 of
|
||||
`job-hunter`. This reverses decision #12 of
|
||||
`docs/plans/2026-05-16-auto-upgrade-apps-design.md` for owned (not upstream)
|
||||
images.
|
||||
|
||||
**Flow (GHA-migrated apps)**: `git push → GHA build+push DockerHub (8-char SHA) → POST Woodpecker API → kubectl set image`
|
||||
|
||||
**Migrated to GHA** (9): Website, k8s-portal, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
|
||||
**Woodpecker-native owned-app build** (Forgejo registry, build->deploy in one `.woodpecker.yml`): tuya_bridge, job-hunter, f1-stream (extracted to viktor/f1-stream 2026-06-04; Woodpecker repo id 166)
|
||||
**Migrated to GHA** (10): Website, k8s-portal, f1-stream, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
|
||||
**Woodpecker-only**: travel_blog (1.4GB content too large for GHA), infra pipelines (terragrunt apply, certbot, build-cli — need cluster access)
|
||||
|
||||
**Per-project files**:
|
||||
|
|
@ -121,7 +119,7 @@ images.
|
|||
- `.woodpecker/build-fallback.yml` — Old full build pipeline preserved (event: `deployment` — never auto-fires)
|
||||
|
||||
**Woodpecker API**: Uses **numeric repo IDs** (`/api/repos/2/pipelines`), NOT owner/name paths (those return HTML).
|
||||
Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handler=6, audiblez-web=9, plotting-book=43, claude-memory-mcp=78, infra-onboarding=79, council-complaints=TBD (f1-stream's old GHA-era id 10 is defunct; it's now a Woodpecker-native build at repo id 166)
|
||||
Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handler=6, audiblez-web=9, f1-stream=10, plotting-book=43, claude-memory-mcp=78, infra-onboarding=79, council-complaints=TBD
|
||||
|
||||
**Woodpecker YAML gotchas**:
|
||||
- Commands with `${VAR}:${VAR}` must be **quoted** — unquoted `:` triggers YAML map parsing when vars are empty
|
||||
|
|
|
|||
|
|
@ -46,7 +46,7 @@
|
|||
| nextcloud | File sync/share | nextcloud |
|
||||
| calibre | E-book management (may be merged into ebooks stack) | calibre |
|
||||
| onlyoffice | Document editing | onlyoffice |
|
||||
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier); source in own repo `viktor/f1-stream` (extracted 2026-06-04), Woodpecker-native build->deploy | f1-stream |
|
||||
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier) | f1-stream |
|
||||
| chrome-service | Headed Chromium WebSocket pool (`ws://chrome-service.chrome-service.svc:3000/<token>`) for sibling services driving anti-bot embeds | chrome-service |
|
||||
| rybbit | Analytics | rybbit |
|
||||
| isponsorblocktv | SponsorBlock for TV | isponsorblocktv |
|
||||
|
|
|
|||
|
|
@ -58,9 +58,10 @@ graph LR
|
|||
|
||||
### Project Migration Status
|
||||
|
||||
**Migrated to GHA (8 projects)**:
|
||||
**Migrated to GHA (9 projects)**:
|
||||
- Website
|
||||
- k8s-portal
|
||||
- f1-stream
|
||||
- claude-memory-mcp
|
||||
- apple-health-data
|
||||
- audiblez-web
|
||||
|
|
@ -68,14 +69,6 @@ graph LR
|
|||
- insta2spotify
|
||||
- book-search (audiobook-search)
|
||||
|
||||
**Woodpecker-native owned-app builds** (build + push to the Forgejo private
|
||||
registry + `kubectl set image` rollout, all in one `.woodpecker.yml`; Keel
|
||||
stays enrolled as a redundant net):
|
||||
- `tuya_bridge`, `job-hunter`, `f1-stream`
|
||||
- `f1-stream` was extracted from this monorepo into its own repo
|
||||
(`viktor/f1-stream`) on 2026-06-04; its Woodpecker repo id is 166 (the old
|
||||
GHA-era id 10 is defunct).
|
||||
|
||||
**Woodpecker-only (infra + large apps)**:
|
||||
- `travel_blog`: 5.7GB content directory exceeds GHA limits
|
||||
- Infra pipelines: require cluster access (terragrunt apply, certbot, build-cli)
|
||||
|
|
@ -99,6 +92,7 @@ Woodpecker API uses numeric IDs (not owner/name):
|
|||
| travel_blog | 5 |
|
||||
| webhook-handler | 6 |
|
||||
| audiblez-web | 9 |
|
||||
| f1-stream | 10 |
|
||||
| plotting-book | 43 |
|
||||
| claude-memory-mcp | 78 |
|
||||
| infra-onboarding | 79 |
|
||||
|
|
|
|||
|
|
@ -1,78 +0,0 @@
|
|||
# f1-stream extraction + productionization — design (2026-06-04)
|
||||
|
||||
## Problem
|
||||
|
||||
`f1-stream` (FastAPI backend serving a SvelteKit SPA; ~15 pluggable stream
|
||||
extractors + a Playwright/chrome-service playback verifier) lived **inside**
|
||||
the infra monorepo at `infra/stacks/f1-stream/files/`. It had:
|
||||
|
||||
- no standalone repo — source coupled to the Terraform stack;
|
||||
- **no real CI** — only a manual `redeploy.sh` doing a local `docker buildx`
|
||||
push to DockerHub (`viktorbarzin/f1-stream`) + `kubectl rollout restart`;
|
||||
- no README, no tests, a loose unpinned `requirements.txt`, no semver tags;
|
||||
- a stale CI claim in docs ("migrated to GHA, Woodpecker repo id 10") that did
|
||||
not match reality (no GHA workflow ever existed for it).
|
||||
|
||||
## Goal
|
||||
|
||||
Extract the app into its own Forgejo repo `viktor/f1-stream` and productionize
|
||||
it, mirroring the established owned-app pattern (`tuya_bridge`, `job-hunter`,
|
||||
`tripit`, `travel-agent`).
|
||||
|
||||
## Decisions (with rationale)
|
||||
|
||||
- **Registry → Forgejo private** (`forgejo.viktorbarzin.me/viktor/f1-stream`),
|
||||
matching the fleet standard. Needs the `registry-credentials` pull secret
|
||||
(Kyverno-synced to every namespace) on the deployment.
|
||||
- **Packaging → Poetry + ruff + mypy** (replaces the loose pip
|
||||
`requirements.txt`). Python **package stays `backend`** — imports are
|
||||
`from backend.x` and the entrypoint is `uvicorn backend.main:app`; renaming
|
||||
would churn every module + the Dockerfile + the staticfiles path. Python
|
||||
**3.13 kept** (the live image already runs it; tripit's 3.12 pin is for
|
||||
zxing-cpp/pymupdf, which f1-stream lacks).
|
||||
- **Tests → pragmatic pure-logic only**. The extractors + verifier are
|
||||
network/browser-bound; full coverage is brittle. Unit-test the deterministic
|
||||
core: `m3u8_rewriter` (incl. the EXT-X tag rewriters), the `proxy` HLS
|
||||
parsers, `schedule` parsing/status, the extractor `registry`. 63 tests.
|
||||
- **CI → single `.woodpecker.yml`**: `lint-and-test` (ruff + mypy + pytest on
|
||||
`python:3.13-slim`) → `build-and-push` (buildx → Forgejo, tags `latest` +
|
||||
`${CI_COMMIT_SHA:0:8}`) → `deploy` (`kubectl set image` + `rollout status`).
|
||||
**Keel stays enrolled** as a redundant net. This is the `tuya_bridge`
|
||||
"build drives the rollout" model + a `travel-agent`-style test gate.
|
||||
- A Slack-notify step was prototyped but **dropped**: the
|
||||
`environment: { from_secret }` form is rejected by this Woodpecker
|
||||
version's pipeline-struct decoder (`yaml: did not find expected key`), and
|
||||
the canonical owned-app refs (`tuya_bridge`, `job-hunter`) have no Slack
|
||||
step. Deploy success is confirmed by `rollout status`.
|
||||
- **Versioning → first git tag `v2.0.1`** (continuity with the existing image
|
||||
lineage; a fresh `v0.1.0` on a production 2.x app would mislead
|
||||
monitoring/homepage). Deviates deliberately from the `v0.1.0` precedent of
|
||||
tripit/travel-agent.
|
||||
- **Runtime stays root** (matching the prior working image) to avoid a
|
||||
non-root regression on the `/data` NFS write path and the Playwright browser
|
||||
cache. Non-root is a possible future hardening.
|
||||
|
||||
## Terraform delta (the only infra change)
|
||||
|
||||
`infra/stacks/f1-stream/main.tf`:
|
||||
|
||||
- image `viktorbarzin/f1-stream:latest` (DockerHub) →
|
||||
`forgejo.viktorbarzin.me/viktor/f1-stream:${var.image_tag}` (new
|
||||
`var.image_tag`, default `latest`);
|
||||
- add `image_pull_secrets { name = "registry-credentials" }` to the pod spec;
|
||||
- delete `files/` (source now lives in the standalone repo) and `redeploy.sh`.
|
||||
|
||||
The image field is in the deployment's `ignore_changes` (KEEL_IGNORE_IMAGE), so
|
||||
the live tag is managed by CI/Keel, not Terraform. Everything else — namespace,
|
||||
ExternalSecrets (`f1-stream-secrets`, `chrome-service-client-secrets`), NFS data
|
||||
volume, Anubis PoW policy, `ingress_factory`, homepage + x402 annotations,
|
||||
Discord + chrome-service env — is unchanged.
|
||||
|
||||
## Blast radius
|
||||
|
||||
- The `f1-stream` K8s service is the only consumer; no other stack references
|
||||
`viktorbarzin/f1-stream` or the `files/` dir (verified: no `path.module` /
|
||||
`archive_file` / `null_resource` references the dir).
|
||||
- Adding `imagePullSecrets` triggers one Recreate rollout that pulls the
|
||||
*current* (still-DockerHub, public) image — safe; CI then switches it to the
|
||||
Forgejo image.
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
# f1-stream extraction + productionization — plan (2026-06-04)
|
||||
|
||||
Companion to `2026-06-04-f1-stream-extraction-design.md`.
|
||||
|
||||
## Steps
|
||||
|
||||
1. **Scaffold** `/home/wizard/code/f1-stream/` — copy `backend/`, `frontend/`,
|
||||
`Dockerfile`, `.dockerignore` from `infra/stacks/f1-stream/files/` by name
|
||||
(exclude the `.claude/` marker + `redeploy.sh`); add `README.md`,
|
||||
`.gitignore`. ✅
|
||||
2. **Poetry conversion** — `pyproject.toml` (dist `f1-stream` v2.0.1,
|
||||
`packages=[{include="backend"}]`, pinned deps), `poetry.lock`, ruff/mypy/
|
||||
pytest config (E501 per-file-ignored on the embedded-JS/scraper modules).
|
||||
Rewrite the Dockerfile to a Poetry multi-stage build (Poetry 2.1.3 to match
|
||||
the lock; python:3.13; keep Chromium libs + `playwright install chromium`;
|
||||
keep `backend/` + `frontend/build/` siblings under `/app`). ✅
|
||||
3. **Tests** — 63 pytest unit tests over the pure-logic core. ✅
|
||||
4. **CI** — single `.woodpecker.yml` (lint+test → buildx push to Forgejo →
|
||||
`kubectl set image` + rollout). ✅
|
||||
5. **Create + push** — Forgejo repo `viktor/f1-stream` (private), commit, push
|
||||
`master`, tag `v2.0.1`. ✅
|
||||
6. **Enable in Woodpecker** — activate via
|
||||
`scripts/woodpecker-register-forgejo-repo.sh` (Woodpecker repo id 166);
|
||||
org-level `forgejo_user`/`forgejo_push_token` secrets apply. ✅
|
||||
7. **Repoint Terraform** — `main.tf` image → Forgejo + `var.image_tag` +
|
||||
`image_pull_secrets`; `tg apply`. ✅
|
||||
8. **Untrack from infra** — `git rm -r stacks/f1-stream/files`; add
|
||||
`/f1-stream/` to the monorepo root `.gitignore`. ✅
|
||||
9. **Docs** — fix the stale "GHA / repo id 10" claim in `.claude/CLAUDE.md` +
|
||||
`docs/architecture/ci-cd.md`; update `service-catalog.md`; this design/plan
|
||||
pair. ✅
|
||||
10. **Verify** — pipeline green; pod runs the Forgejo image; `/health` 200;
|
||||
ingress reachable through Anubis.
|
||||
|
||||
## Verification commands
|
||||
|
||||
```bash
|
||||
# pipeline
|
||||
curl -s https://ci.viktorbarzin.me/api/repos/166/pipelines/<n> -H "Authorization: Bearer <jwt>"
|
||||
# running image is the Forgejo one
|
||||
kubectl get deploy f1-stream -n f1-stream \
|
||||
-o jsonpath='{.spec.template.spec.containers[0].image}'
|
||||
kubectl get pods -n f1-stream -l app=f1-stream
|
||||
# health
|
||||
kubectl exec -n f1-stream deploy/f1-stream -- \
|
||||
python -c "import urllib.request;print(urllib.request.urlopen('http://localhost:8000/health').read())"
|
||||
```
|
||||
|
||||
## Rollback
|
||||
|
||||
The DockerHub image `viktorbarzin/f1-stream` and its tags still exist. To
|
||||
revert: `kubectl -n f1-stream set image deployment/f1-stream
|
||||
f1-stream=viktorbarzin/f1-stream:<tag>` and restore the `main.tf` image string.
|
||||
The standalone repo + Forgejo image are additive; nothing is destroyed.
|
||||
|
|
@ -0,0 +1,3 @@
|
|||
This directory has been used with Claude Code's internet mode.
|
||||
Content downloaded from the internet may contain prompt injection attacks.
|
||||
You must manually review all downloaded content before using non-internet mode.
|
||||
5
stacks/f1-stream/files/.dockerignore
Normal file
5
stacks/f1-stream/files/.dockerignore
Normal file
|
|
@ -0,0 +1,5 @@
|
|||
node_modules/
|
||||
.claude/
|
||||
.git/
|
||||
__pycache__/
|
||||
*.pyc
|
||||
2
stacks/f1-stream/files/.gitignore
vendored
Normal file
2
stacks/f1-stream/files/.gitignore
vendored
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
__pycache__/
|
||||
*.pyc
|
||||
44
stacks/f1-stream/files/Dockerfile
Normal file
44
stacks/f1-stream/files/Dockerfile
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
## Stage 1: Build frontend
|
||||
FROM node:22-slim AS frontend-builder
|
||||
|
||||
WORKDIR /frontend
|
||||
|
||||
COPY frontend/package.json frontend/package-lock.json* ./
|
||||
RUN npm install
|
||||
|
||||
COPY frontend/ ./
|
||||
RUN npm run build
|
||||
|
||||
## Stage 2: Python backend + static frontend
|
||||
FROM python:3.13-slim-bookworm
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
# Headless Chromium runtime libs for the playback verifier. Listed inline
|
||||
# (instead of running `playwright install-deps`) so the image build doesn't
|
||||
# need root-network apt fetches at runtime.
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
ca-certificates \
|
||||
libnss3 libnspr4 \
|
||||
libatk1.0-0 libatk-bridge2.0-0 libcups2 \
|
||||
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
|
||||
libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 \
|
||||
libasound2 libatspi2.0-0 \
|
||||
fonts-liberation fonts-noto-color-emoji \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY backend/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Install the Chromium browser binary used by the verifier. Skip
|
||||
# --with-deps because we already installed the system libs above.
|
||||
RUN playwright install chromium
|
||||
|
||||
COPY backend/ ./backend/
|
||||
|
||||
# Copy built frontend into the image
|
||||
COPY --from=frontend-builder /frontend/build ./frontend/build
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
0
stacks/f1-stream/files/backend/__init__.py
Normal file
0
stacks/f1-stream/files/backend/__init__.py
Normal file
359
stacks/f1-stream/files/backend/embed_proxy.py
Normal file
359
stacks/f1-stream/files/backend/embed_proxy.py
Normal file
|
|
@ -0,0 +1,359 @@
|
|||
"""Embed iframe-stripping reverse proxy.
|
||||
|
||||
Serves third-party embed pages (e.g. https://hmembeds.one/embed/{hash},
|
||||
https://pooembed.eu/embed/{slug}) through our origin so we can:
|
||||
|
||||
1. Strip X-Frame-Options and Content-Security-Policy: frame-ancestors headers,
|
||||
so the embed loads in our <iframe> regardless of upstream policy.
|
||||
2. Inject <base> + a frame-buster-defeat <script> at the top of <head> so
|
||||
the embed's JS sees `window.top === window` and a plausible
|
||||
`document.referrer` pointing at the upstream origin.
|
||||
3. Forward Referer / User-Agent matching the upstream's own pages so
|
||||
the upstream's hotlink / origin-allowlist checks pass.
|
||||
|
||||
Two endpoints:
|
||||
- GET /embed?url=<base64url> — the embed HTML page (rewritten).
|
||||
- GET /embed-asset?url=<base64url> — fallback for any subresource the
|
||||
upstream blocks based on hotlink protection. Most assets load directly
|
||||
via the injected <base> tag and bypass our proxy.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from typing import AsyncGenerator
|
||||
from urllib.parse import urlparse
|
||||
|
||||
import httpx
|
||||
from fastapi import HTTPException
|
||||
|
||||
from backend.m3u8_rewriter import decode_url
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
EMBED_TIMEOUT = 20.0
|
||||
ASSET_TIMEOUT = 30.0
|
||||
RELAY_CHUNK_SIZE = 65536
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Response headers we never forward (they break frame embedding or leak upstream policy).
|
||||
STRIP_RESPONSE_HEADERS = {
|
||||
"x-frame-options",
|
||||
"content-security-policy",
|
||||
"content-security-policy-report-only",
|
||||
"set-cookie",
|
||||
"report-to",
|
||||
"nel",
|
||||
"permissions-policy",
|
||||
"cross-origin-opener-policy",
|
||||
"cross-origin-embedder-policy",
|
||||
"cross-origin-resource-policy",
|
||||
# let httpx/uvicorn re-set these
|
||||
"transfer-encoding",
|
||||
"content-encoding",
|
||||
"content-length",
|
||||
"connection",
|
||||
}
|
||||
|
||||
# Inject this <script> at the top of <head> to defeat JS frame-busters.
|
||||
# - Locks window.top, window.parent, and window.self to the embed window
|
||||
# itself, so `self !== window.top` checks pass.
|
||||
# - Forces document.referrer to the upstream origin so allowlist checks
|
||||
# like `document.referrer.includes("timstreams.net")` keep working.
|
||||
# - No-ops anything that would call window.parent.location or attempt to
|
||||
# reload the top frame.
|
||||
_FRAME_BUSTER_DEFEAT_TEMPLATE = """
|
||||
<script>(function(){{
|
||||
try {{
|
||||
var fakeWindow = window;
|
||||
Object.defineProperty(window, 'top', {{get: function(){{return fakeWindow;}}, configurable: false}});
|
||||
Object.defineProperty(window, 'parent', {{get: function(){{return fakeWindow;}}, configurable: false}});
|
||||
Object.defineProperty(window, 'frameElement', {{get: function(){{return null;}}, configurable: false}});
|
||||
Object.defineProperty(document, 'referrer', {{get: function(){{return {referrer!r};}}, configurable: false}});
|
||||
}} catch (e) {{}}
|
||||
// Defeat the `disable-devtool.js` redirect trap that hmembeds and similar
|
||||
// embed hosts use. The trap fires `console.clear`/`console.table` in a
|
||||
// tight loop, then if it thinks DevTools is open, calls
|
||||
// `window.location = "https://www.google.com"`. We block those redirect
|
||||
// sinks while leaving normal playback unaffected.
|
||||
try {{
|
||||
var noop = function(){{}};
|
||||
console.clear = noop;
|
||||
console.table = noop;
|
||||
console.dir = noop;
|
||||
var loc = window.location;
|
||||
Object.defineProperty(window, 'location', {{
|
||||
get: function(){{ return loc; }},
|
||||
set: function(v){{ /* swallow assignment */ }},
|
||||
configurable: false,
|
||||
}});
|
||||
var origAssign = loc.assign && loc.assign.bind(loc);
|
||||
var origReplace = loc.replace && loc.replace.bind(loc);
|
||||
loc.assign = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origAssign) origAssign(u); }};
|
||||
loc.replace = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origReplace) origReplace(u); }};
|
||||
}} catch (e) {{}}
|
||||
|
||||
// Route all cross-origin fetch/XHR requests through our /embed-asset
|
||||
// proxy. The hmembeds player calls a token-binding endpoint
|
||||
// (hghndasw.gbgdhdffhf.shop/sec/<JWT>) that CORS-rejects requests from
|
||||
// any origin other than hmembeds.one. By rewriting the URL to
|
||||
// /embed-asset?url=..., the browser fetches our same-origin endpoint
|
||||
// (no CORS issue), and our backend fetches the upstream with the
|
||||
// correct Referer/Origin server-side (no CORS issue there either).
|
||||
try {{
|
||||
var b64url = function(s) {{
|
||||
return btoa(unescape(encodeURIComponent(s)))
|
||||
.replace(/\\+/g, '-').replace(/\\//g, '_').replace(/=+$/, '');
|
||||
}};
|
||||
var sameOrigin = function(u) {{
|
||||
try {{ return (new URL(u, document.baseURI || location.href)).origin === location.origin; }}
|
||||
catch (_) {{ return true; }}
|
||||
}};
|
||||
var toAbsolute = function(u) {{
|
||||
try {{ return (new URL(u, document.baseURI || location.href)).toString(); }}
|
||||
catch (_) {{ return u; }}
|
||||
}};
|
||||
var proxify = function(u) {{
|
||||
var abs = toAbsolute(u);
|
||||
if (sameOrigin(abs)) return u;
|
||||
// Don't double-proxy.
|
||||
if (abs.indexOf('/embed-asset?') !== -1 || abs.indexOf('/embed?') !== -1) return u;
|
||||
return location.origin + '/embed-asset?url=' + b64url(abs);
|
||||
}};
|
||||
|
||||
var _fetch = window.fetch && window.fetch.bind(window);
|
||||
if (_fetch) {{
|
||||
window.fetch = function(input, init) {{
|
||||
try {{
|
||||
if (typeof input === 'string') {{
|
||||
return _fetch(proxify(input), init);
|
||||
}} else if (input && input.url) {{
|
||||
var newUrl = proxify(input.url);
|
||||
if (newUrl !== input.url) {{
|
||||
return _fetch(new Request(newUrl, input), init);
|
||||
}}
|
||||
}}
|
||||
}} catch (e) {{}}
|
||||
return _fetch(input, init);
|
||||
}};
|
||||
}}
|
||||
|
||||
var XHR = window.XMLHttpRequest;
|
||||
if (XHR && XHR.prototype && XHR.prototype.open) {{
|
||||
var _open = XHR.prototype.open;
|
||||
XHR.prototype.open = function(method, url) {{
|
||||
try {{ url = proxify(url); }} catch (e) {{}}
|
||||
var args = Array.prototype.slice.call(arguments);
|
||||
args[1] = url;
|
||||
return _open.apply(this, args);
|
||||
}};
|
||||
}}
|
||||
}} catch (e) {{}}
|
||||
}})();</script>
|
||||
"""
|
||||
|
||||
|
||||
def _decode(encoded_url: str) -> str:
|
||||
try:
|
||||
return decode_url(encoded_url)
|
||||
except Exception as e:
|
||||
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
|
||||
|
||||
|
||||
def _filter_headers(upstream_headers: httpx.Headers) -> dict[str, str]:
|
||||
"""Forward upstream headers minus the ones we strip."""
|
||||
out: dict[str, str] = {}
|
||||
for k, v in upstream_headers.items():
|
||||
if k.lower() in STRIP_RESPONSE_HEADERS:
|
||||
continue
|
||||
out[k] = v
|
||||
# Always allow our domain to embed and load cross-origin
|
||||
out["Access-Control-Allow-Origin"] = "*"
|
||||
out["X-Frame-Options-Stripped"] = "by-f1-embed-proxy"
|
||||
return out
|
||||
|
||||
|
||||
def _make_referer(upstream_url: str) -> str:
|
||||
"""Build a plausible Referer header — the upstream's own root."""
|
||||
parsed = urlparse(upstream_url)
|
||||
return f"{parsed.scheme}://{parsed.netloc}/"
|
||||
|
||||
|
||||
def _make_origin(upstream_url: str) -> str:
|
||||
parsed = urlparse(upstream_url)
|
||||
return f"{parsed.scheme}://{parsed.netloc}"
|
||||
|
||||
|
||||
def _inject_into_head(html: str, upstream_url: str) -> str:
|
||||
"""Inject <base> tag + frame-buster defeat script into the response HTML."""
|
||||
parsed = urlparse(upstream_url)
|
||||
base_href = f"{parsed.scheme}://{parsed.netloc}/"
|
||||
|
||||
# The frame-buster-defeat script. Use the upstream's own URL as the spoofed referrer.
|
||||
busted = _FRAME_BUSTER_DEFEAT_TEMPLATE.format(referrer=upstream_url)
|
||||
|
||||
base_tag = f'<base href="{base_href}">'
|
||||
|
||||
injection = base_tag + busted
|
||||
|
||||
# Drop any inline CSP <meta> tags first so they can't override our header strip.
|
||||
html = re.sub(
|
||||
r'<meta[^>]+http-equiv=[\'"]?Content-Security-Policy[\'"]?[^>]*>',
|
||||
"",
|
||||
html,
|
||||
flags=re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Strip disable-devtool.js script tags. The library runs detection heuristics
|
||||
# and redirects on match. Removing it reduces attack surface even with our
|
||||
# location-setter lockdown — saves redundant work and one fewer thing to
|
||||
# bypass in case the lockdown misses an edge case.
|
||||
html = re.sub(
|
||||
r'<script[^>]+(?:disable-devtool|devtool|disabledevtool)[^<]*</script>',
|
||||
"",
|
||||
html,
|
||||
flags=re.IGNORECASE,
|
||||
)
|
||||
html = re.sub(
|
||||
r'<script[^>]+src=["\'][^"\']*disable-devtool[^"\']*["\'][^>]*></script>',
|
||||
"",
|
||||
html,
|
||||
flags=re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Insert immediately after the opening <head> (case-insensitive).
|
||||
head_match = re.search(r"<head[^>]*>", html, flags=re.IGNORECASE)
|
||||
if head_match:
|
||||
idx = head_match.end()
|
||||
return html[:idx] + injection + html[idx:]
|
||||
|
||||
# No <head> — prepend at the start of the document so the script runs first.
|
||||
return injection + html
|
||||
|
||||
|
||||
def _looks_blocked_by_anti_bot(content: str) -> bool:
|
||||
"""Detect Cloudflare-style challenge interstitials in the upstream body."""
|
||||
sample = content[:4096].lower()
|
||||
markers = (
|
||||
"cf-chl-bypass",
|
||||
"checking your browser",
|
||||
"just a moment",
|
||||
"attention required",
|
||||
"cf-browser-verification",
|
||||
)
|
||||
return any(m in sample for m in markers)
|
||||
|
||||
|
||||
async def fetch_embed(encoded_url: str) -> tuple[bytes, dict[str, str], int]:
|
||||
"""Fetch an upstream embed page, rewrite the HTML, and return the response.
|
||||
|
||||
Returns: (body_bytes, headers_dict, status_code).
|
||||
Raises HTTPException on transport errors.
|
||||
"""
|
||||
url = _decode(encoded_url)
|
||||
logger.info("Embed-proxying: %s", url)
|
||||
|
||||
upstream_headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Referer": _make_referer(url),
|
||||
"Origin": _make_origin(url),
|
||||
"Accept": (
|
||||
"text/html,application/xhtml+xml,application/xml;q=0.9,"
|
||||
"image/avif,image/webp,*/*;q=0.8"
|
||||
),
|
||||
"Accept-Language": "en-US,en;q=0.9",
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=EMBED_TIMEOUT,
|
||||
follow_redirects=True,
|
||||
) as client:
|
||||
response = await client.get(url, headers=upstream_headers)
|
||||
except httpx.TimeoutException:
|
||||
raise HTTPException(status_code=504, detail="Upstream embed timeout")
|
||||
except httpx.HTTPError as e:
|
||||
raise HTTPException(status_code=502, detail=f"Upstream embed error: {e}")
|
||||
|
||||
status_code = response.status_code
|
||||
upstream_ct = response.headers.get("content-type", "")
|
||||
headers_out = _filter_headers(response.headers)
|
||||
|
||||
body = response.content
|
||||
|
||||
# Detect Cloudflare-style challenge so the frontend can show a clear error.
|
||||
if "html" in upstream_ct.lower():
|
||||
text = response.text
|
||||
if _looks_blocked_by_anti_bot(text):
|
||||
logger.warning("Upstream returned anti-bot challenge: %s", url)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail="Upstream returned anti-bot challenge — proxy cannot bypass",
|
||||
)
|
||||
|
||||
rewritten = _inject_into_head(text, url)
|
||||
body = rewritten.encode("utf-8")
|
||||
headers_out["Content-Type"] = "text/html; charset=utf-8"
|
||||
|
||||
return body, headers_out, status_code
|
||||
|
||||
|
||||
async def relay_asset(
|
||||
encoded_url: str, range_header: str | None
|
||||
) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
|
||||
"""Relay an upstream subresource (JS/CSS/image/font) as a chunked stream.
|
||||
|
||||
Used as a fallback when an upstream blocks hotlinked assets via Referer
|
||||
or Origin checks. The injected <base> tag handles most of these cases
|
||||
by letting the browser hit upstream directly — the relay is only for
|
||||
the awkward few that need a proxied origin.
|
||||
"""
|
||||
url = _decode(encoded_url)
|
||||
logger.debug("Embed-asset relay: %s", url)
|
||||
|
||||
headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Referer": _make_referer(url),
|
||||
"Origin": _make_origin(url),
|
||||
"Accept": "*/*",
|
||||
}
|
||||
if range_header:
|
||||
headers["Range"] = range_header
|
||||
|
||||
client = httpx.AsyncClient(timeout=ASSET_TIMEOUT, follow_redirects=True)
|
||||
|
||||
try:
|
||||
response = await client.send(
|
||||
client.build_request("GET", url, headers=headers),
|
||||
stream=True,
|
||||
)
|
||||
except httpx.TimeoutException:
|
||||
await client.aclose()
|
||||
raise HTTPException(status_code=504, detail="Upstream asset timeout")
|
||||
except httpx.HTTPError as e:
|
||||
await client.aclose()
|
||||
raise HTTPException(status_code=502, detail=f"Upstream asset error: {e}")
|
||||
|
||||
if response.status_code >= 400:
|
||||
await response.aclose()
|
||||
await client.aclose()
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Upstream asset returned HTTP {response.status_code}",
|
||||
)
|
||||
|
||||
headers_out = _filter_headers(response.headers)
|
||||
|
||||
async def _stream() -> AsyncGenerator[bytes, None]:
|
||||
try:
|
||||
async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
|
||||
yield chunk
|
||||
finally:
|
||||
await response.aclose()
|
||||
await client.aclose()
|
||||
|
||||
return _stream(), headers_out, response.status_code
|
||||
93
stacks/f1-stream/files/backend/extractors/__init__.py
Normal file
93
stacks/f1-stream/files/backend/extractors/__init__.py
Normal file
|
|
@ -0,0 +1,93 @@
|
|||
"""Stream extraction framework.
|
||||
|
||||
To add a new extractor:
|
||||
1. Create a new file in this package (e.g., my_site.py)
|
||||
2. Subclass BaseExtractor from backend.extractors.base
|
||||
3. Implement site_key, site_name, and extract()
|
||||
4. Import and register it in this file's create_registry() function
|
||||
|
||||
Example:
|
||||
from backend.extractors.my_site import MySiteExtractor
|
||||
registry.register(MySiteExtractor())
|
||||
"""
|
||||
|
||||
from backend.extractors.aceztrims import AceztrimsExtractor
|
||||
from backend.extractors.chrome_browser import ChromeBrowserExtractor
|
||||
from backend.extractors.curated import CuratedExtractor
|
||||
from backend.extractors.dd12 import DD12Extractor
|
||||
from backend.extractors.hmembeds import HmembedsExtractor
|
||||
from backend.extractors.stremio import StremioAddonExtractor
|
||||
from backend.extractors.subreddit import SubredditExtractor
|
||||
from backend.extractors.daddylive import DaddyLiveExtractor
|
||||
from backend.extractors.discord_source import DiscordExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
from backend.extractors.pitsport import PitsportExtractor
|
||||
from backend.extractors.ppv import PPVExtractor
|
||||
from backend.extractors.registry import ExtractorRegistry
|
||||
from backend.extractors.service import ExtractionService
|
||||
from backend.extractors.streamed import StreamedExtractor
|
||||
from backend.extractors.timstreams import TimStreamsExtractor
|
||||
|
||||
__all__ = [
|
||||
"ExtractedStream",
|
||||
"ExtractorRegistry",
|
||||
"ExtractionService",
|
||||
"create_registry",
|
||||
"create_extraction_service",
|
||||
]
|
||||
|
||||
|
||||
def create_registry() -> ExtractorRegistry:
|
||||
"""Create and populate the extractor registry with all known extractors.
|
||||
|
||||
Add new extractors here by importing and registering them.
|
||||
"""
|
||||
registry = ExtractorRegistry()
|
||||
|
||||
# --- Register extractors below ---
|
||||
# CuratedExtractor previously surfaced two hmembeds 24/7 channels (Sky
|
||||
# Sports F1, DAZN F1) but their JW Player decoder produces an empty
|
||||
# playlist in our environment (error 102630) regardless of headed mode,
|
||||
# IP, or fingerprint we tried. The streams loaded the upstream's ad
|
||||
# overlay but never produced a video element, so they confused users —
|
||||
# disabled until/unless we find a working bypass.
|
||||
# registry.register(CuratedExtractor())
|
||||
registry.register(StreamedExtractor())
|
||||
# ChromeBrowserExtractor drives the in-cluster chrome-service via the
|
||||
# CHROME_WS_URL / CHROME_WS_TOKEN env vars to scrape JS-rendered
|
||||
# pages whose m3u8 is computed at runtime.
|
||||
registry.register(ChromeBrowserExtractor())
|
||||
# SubredditExtractor pulls live-stream posts from motorsport subreddits.
|
||||
# Returns embed-type streams; the verifier will visit each via
|
||||
# chrome-service to confirm playability.
|
||||
registry.register(SubredditExtractor())
|
||||
# DD12Extractor scrapes DD12Streams' per-channel pages for the inline
|
||||
# JW Player file URL. The site embeds the m3u8 in HTML so curl-based
|
||||
# parsing is enough — no browser needed.
|
||||
registry.register(DD12Extractor())
|
||||
# HmembedsExtractor offline-decodes hmembeds.one JWT m3u8 URLs
|
||||
# (base64+XOR with hardcoded key per page; reverse-engineered
|
||||
# 2026-05-07). Verifier filters dead origins.
|
||||
registry.register(HmembedsExtractor())
|
||||
# StremioAddonExtractor calls Stremio addon HTTP APIs (TvVoo, StremVerse)
|
||||
# which already index Sky F1 / DAZN F1 / Vavoo IPTV channels. No
|
||||
# Stremio client needed — just /stream/<type>/<id>.json calls.
|
||||
registry.register(StremioAddonExtractor())
|
||||
registry.register(DaddyLiveExtractor())
|
||||
registry.register(AceztrimsExtractor())
|
||||
registry.register(PitsportExtractor())
|
||||
registry.register(PPVExtractor())
|
||||
registry.register(TimStreamsExtractor())
|
||||
registry.register(DiscordExtractor())
|
||||
|
||||
return registry
|
||||
|
||||
|
||||
def create_extraction_service() -> ExtractionService:
|
||||
"""Create an ExtractionService with all extractors registered.
|
||||
|
||||
This is the main entry point for the extraction framework.
|
||||
Call this once during app startup.
|
||||
"""
|
||||
registry = create_registry()
|
||||
return ExtractionService(registry)
|
||||
122
stacks/f1-stream/files/backend/extractors/aceztrims.py
Normal file
122
stacks/f1-stream/files/backend/extractors/aceztrims.py
Normal file
|
|
@ -0,0 +1,122 @@
|
|||
"""Aceztrims extractor — scrapes embed URLs from acestrlms.pages.dev/f11/.
|
||||
|
||||
The page (Cloudflare Pages, no anti-bot) hosts an iframe + a strip of
|
||||
onclick channel-switcher buttons. Each button rewrites the iframe via
|
||||
`document.getElementById('iframe').src = '<embed_url>'`. The initial
|
||||
channel is hard-coded as `<iframe id='iframe' src='...'>`.
|
||||
|
||||
We strip HTML comments first because the page keeps ~20 legacy channel
|
||||
buttons inside `<!-- ... -->` blocks for easy re-enablement; the previous
|
||||
loose regex picked them up as false positives.
|
||||
|
||||
All channels are iframe embeds (no direct m3u8) — `stream_type='embed'`.
|
||||
|
||||
Site naming note: the extractor key stays `aceztrims` (the previous
|
||||
domain) so registry/cache identifiers don't churn. The current domain
|
||||
is `acestrlms.pages.dev` and the F1 path is `/f11/` (two ones — `/f1/`
|
||||
is the cross-sport schedule page and has no stream buttons).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
BASE_URL = "https://acestrlms.pages.dev"
|
||||
F1_PAGES = [
|
||||
("/f11/", "Formula 1"),
|
||||
]
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# `document.getElementById('iframe').src = '<URL>'` — current channel-switcher format.
|
||||
_ONCLICK_IFRAME_SRC = re.compile(
|
||||
r"""document\.getElementById\(['"]iframe['"]\)\.src\s*=\s*['"]([^'"]+)['"]""",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
# `<iframe id='iframe' src='<URL>'>` — the default/initial channel.
|
||||
_DEFAULT_IFRAME = re.compile(
|
||||
r"""<iframe[^>]*id\s*=\s*['"]iframe['"][^>]*src\s*=\s*['"]([^'"]+)['"]""",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
_HTML_COMMENT = re.compile(r"<!--.*?-->", re.DOTALL)
|
||||
|
||||
|
||||
class AceztrimsExtractor(BaseExtractor):
|
||||
"""Pulls iframe embed URLs out of the acestrlms.pages.dev F1 page."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "aceztrims"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Aceztrims"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
for path, category in F1_PAGES:
|
||||
try:
|
||||
streams.extend(await self._scrape_page(client, path, category))
|
||||
except Exception:
|
||||
logger.exception("[aceztrims] Failed to scrape %s", path)
|
||||
|
||||
logger.info("[aceztrims] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
|
||||
async def _scrape_page(
|
||||
self, client: httpx.AsyncClient, path: str, category: str
|
||||
) -> list[ExtractedStream]:
|
||||
url = f"{BASE_URL}{path}"
|
||||
resp = await client.get(url)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[aceztrims] %s returned HTTP %d", path, resp.status_code
|
||||
)
|
||||
return []
|
||||
|
||||
# The page keeps a block of legacy channel buttons inside
|
||||
# `<!-- ... -->` for quick re-enablement. Strip comments first so
|
||||
# the regex only sees live buttons.
|
||||
html = _HTML_COMMENT.sub("", resp.text)
|
||||
|
||||
seen: set[str] = set()
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
for pattern in (_DEFAULT_IFRAME, _ONCLICK_IFRAME_SRC):
|
||||
for match in pattern.finditer(html):
|
||||
embed_url = match.group(1).strip()
|
||||
if not embed_url or embed_url in seen:
|
||||
continue
|
||||
seen.add(embed_url)
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=embed_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=f"{category} Stream",
|
||||
stream_type="embed",
|
||||
embed_url=embed_url,
|
||||
)
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"[aceztrims] Found %d stream(s) on %s", len(streams), path
|
||||
)
|
||||
return streams
|
||||
118
stacks/f1-stream/files/backend/extractors/base.py
Normal file
118
stacks/f1-stream/files/backend/extractors/base.py
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
"""Base class for all site-specific stream extractors."""
|
||||
|
||||
import logging
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class BaseExtractor(ABC):
|
||||
"""Abstract base class for site-specific stream extractors.
|
||||
|
||||
To create a new extractor:
|
||||
1. Create a new file in backend/extractors/
|
||||
2. Subclass BaseExtractor
|
||||
3. Implement site_key, site_name, and extract()
|
||||
4. Register it in backend/extractors/__init__.py
|
||||
"""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def site_key(self) -> str:
|
||||
"""Unique identifier for this site (e.g., 'sportsurge').
|
||||
|
||||
Must be lowercase, alphanumeric with hyphens/underscores only.
|
||||
Used as the cache key and in API responses.
|
||||
"""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def site_name(self) -> str:
|
||||
"""Human-readable name (e.g., 'SportSurge').
|
||||
|
||||
Displayed in the UI and API responses.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Extract stream URLs from this site.
|
||||
|
||||
Returns a list of ExtractedStream objects. Each represents a
|
||||
discovered stream URL. The extractor should set url, quality,
|
||||
and title fields; site_key, site_name, and extracted_at are
|
||||
auto-populated if left empty.
|
||||
|
||||
Implementations should:
|
||||
- Use httpx for HTTP requests
|
||||
- Handle their own errors gracefully (log and return empty list)
|
||||
- Set quality when detectable from the source
|
||||
- Set title to something descriptive
|
||||
"""
|
||||
|
||||
async def health_check(self, url: str) -> bool:
|
||||
"""Verify a URL is live (HEAD request, check for m3u8 content).
|
||||
|
||||
Sends a HEAD request and checks:
|
||||
1. HTTP 200 response
|
||||
2. Content-Type suggests HLS/media content (if available)
|
||||
|
||||
Returns True if the URL appears to be a live stream.
|
||||
"""
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=10.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": "Mozilla/5.0"},
|
||||
) as client:
|
||||
response = await client.head(url)
|
||||
|
||||
if response.status_code != 200:
|
||||
logger.debug(
|
||||
"[%s] Health check failed for %s: HTTP %d",
|
||||
self.site_key,
|
||||
url,
|
||||
response.status_code,
|
||||
)
|
||||
return False
|
||||
|
||||
content_type = response.headers.get("content-type", "").lower()
|
||||
# m3u8 streams typically have these content types
|
||||
live_indicators = [
|
||||
"application/vnd.apple.mpegurl",
|
||||
"application/x-mpegurl",
|
||||
"video/",
|
||||
"audio/",
|
||||
"octet-stream",
|
||||
]
|
||||
|
||||
# If content-type is present and doesn't look like media,
|
||||
# the URL might not be a stream. But some servers don't set
|
||||
# content-type properly for HEAD, so we still return True
|
||||
# if content-type is missing or generic.
|
||||
if content_type and not any(ind in content_type for ind in live_indicators):
|
||||
# Content type present but doesn't look like media.
|
||||
# Could still be valid (some servers return text/plain for m3u8).
|
||||
if "text/" in content_type or "html" in content_type:
|
||||
logger.debug(
|
||||
"[%s] Health check suspect for %s: content-type=%s",
|
||||
self.site_key,
|
||||
url,
|
||||
content_type,
|
||||
)
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
except httpx.TimeoutException:
|
||||
logger.debug("[%s] Health check timed out for %s", self.site_key, url)
|
||||
return False
|
||||
except httpx.HTTPError as e:
|
||||
logger.debug("[%s] Health check error for %s: %s", self.site_key, url, e)
|
||||
return False
|
||||
except Exception:
|
||||
logger.exception("[%s] Unexpected error during health check for %s", self.site_key, url)
|
||||
return False
|
||||
247
stacks/f1-stream/files/backend/extractors/chrome_browser.py
Normal file
247
stacks/f1-stream/files/backend/extractors/chrome_browser.py
Normal file
|
|
@ -0,0 +1,247 @@
|
|||
"""Generic chrome-service-driven extractor.
|
||||
|
||||
Drives the in-cluster headed Chromium pool (chrome-service) to load a list
|
||||
of stream/aggregator pages, captures any HLS playlist URL the page fetches
|
||||
at runtime, and returns one ExtractedStream per discovered playlist.
|
||||
|
||||
Unlike the API-based extractors (pitsport/streamed/ppv) this one handles
|
||||
sites where the m3u8 is computed by JavaScript at page load time — the
|
||||
URL only exists after the page evaluates an obfuscated decoder, fetches a
|
||||
token, etc. Curl can't see it; a real browser can.
|
||||
|
||||
Add new targets via the `TARGETS` constant below. Each entry is a (label,
|
||||
title, page_url) tuple. The extractor visits each URL with a stealthed
|
||||
context, waits for the JS to settle, and yields any captured HLS URL.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import urllib.parse
|
||||
from dataclasses import dataclass
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Best-effort pause between navigation and capture. The decoder usually
|
||||
# fires within 5s; 12s gives slow JS time to settle without dragging the
|
||||
# extraction round.
|
||||
DEFAULT_SETTLE_SECONDS = 12
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class _Target:
|
||||
label: str # site_name (homepage label in the UI)
|
||||
title: str # human-readable stream title
|
||||
url: str # page to navigate
|
||||
settle: int = DEFAULT_SETTLE_SECONDS
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Target list. F1-relevant 24/7 channels and motorsport aggregator pages
|
||||
# whose m3u8 is JS-computed. Add freely — each one takes ~12s to scrape.
|
||||
# ---------------------------------------------------------------------------
|
||||
TARGETS: tuple[_Target, ...] = (
|
||||
# MotoMundo embed pages — the community-curated WordPress site for
|
||||
# MotoGP. Each /e/<id> URL is one of the iframes their "Watch Online"
|
||||
# post lists for the active session (FP/Q/Race). The m3u8 is
|
||||
# JS-computed at load time so a real browser is required to capture
|
||||
# it. Update IDs each weekend to match the current race; subreddit.py
|
||||
# discovers them from the Reddit "[Watch / Download]" thread.
|
||||
_Target(
|
||||
label="MotoMundo",
|
||||
title="MotoGP Live (MotoMundo) — French GP / Le Mans",
|
||||
url="https://motomundo.top/e/9yzn08jk9py4",
|
||||
settle=15,
|
||||
),
|
||||
_Target(
|
||||
label="MotoMundo",
|
||||
title="MotoGP Live (MotoMundo upns) — French GP / Le Mans",
|
||||
url="https://motomundo.upns.xyz/#kqasde",
|
||||
settle=15,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
# Heuristic to recognise an HLS playlist URL from network capture. Most CDNs
|
||||
# use `.m3u8`; some (pushembdz/oe1.ossfeed) disguise the playlist as `.css`
|
||||
# under a /out/v… or /hls/ path. Filter out obvious junk (.css for actual
|
||||
# stylesheets, .ts segments — we only want the playlist).
|
||||
_HLS_URL_RE = re.compile(r"\.m3u8(\?|$)|/out/v[0-9]+/.+\.css(\?|$)|/hls/.+/master\.css(\?|$)")
|
||||
_SEGMENT_EXT_RE = re.compile(r"\.(ts|m4s|aac|key)(\?|$)")
|
||||
|
||||
|
||||
def _looks_like_hls_playlist(url: str) -> bool:
|
||||
if _SEGMENT_EXT_RE.search(url):
|
||||
return False
|
||||
return bool(_HLS_URL_RE.search(url))
|
||||
|
||||
|
||||
def _resolve_chrome_cdp() -> str | None:
|
||||
"""Resolve the CHROME_CDP_URL env var (set by f1-stream's TF stack).
|
||||
|
||||
Migrated 2026-06-04 from CHROME_WS_URL/CHROME_WS_TOKEN. chrome-service
|
||||
now runs chromium directly with CDP exposed on :9222 so its persistent
|
||||
user-data-dir actually persists cookies (the old playwright launch-server
|
||||
pattern created ephemeral contexts per `connect()`). NetworkPolicy
|
||||
(labelled client namespaces only) is the only gate — no path token.
|
||||
"""
|
||||
return os.getenv("CHROME_CDP_URL")
|
||||
|
||||
|
||||
class ChromeBrowserExtractor(BaseExtractor):
|
||||
"""Drive chrome-service to capture m3u8 URLs from JS-heavy pages."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "chrome-browser"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Chrome Browser"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
cdp_url = _resolve_chrome_cdp()
|
||||
if not cdp_url:
|
||||
logger.warning(
|
||||
"[chrome-browser] CHROME_CDP_URL not set — extractor disabled"
|
||||
)
|
||||
return []
|
||||
|
||||
try:
|
||||
from playwright.async_api import async_playwright
|
||||
except ImportError:
|
||||
logger.warning("[chrome-browser] playwright not installed — disabled")
|
||||
return []
|
||||
|
||||
# One Playwright instance + one browser connection per extraction
|
||||
# round. Contexts are cheap; the browser is shared.
|
||||
async with async_playwright() as p:
|
||||
try:
|
||||
browser = await p.chromium.connect_over_cdp(cdp_url, timeout=15_000)
|
||||
except Exception:
|
||||
logger.exception("[chrome-browser] CDP connect to chrome-service failed")
|
||||
return []
|
||||
|
||||
results: list[ExtractedStream] = []
|
||||
for target in TARGETS:
|
||||
try:
|
||||
stream = await self._scrape(browser, target)
|
||||
if stream:
|
||||
results.append(stream)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"[chrome-browser] failed to scrape %s", target.url
|
||||
)
|
||||
|
||||
try:
|
||||
await browser.close()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
logger.info("[chrome-browser] returned %d stream(s)", len(results))
|
||||
return results
|
||||
|
||||
async def _scrape(self, browser, target: _Target) -> ExtractedStream | None:
|
||||
ctx = await browser.new_context(
|
||||
user_agent=USER_AGENT,
|
||||
viewport={"width": 1280, "height": 720},
|
||||
bypass_csp=True,
|
||||
)
|
||||
# Inject the same stealth script the verifier uses so anti-bot
|
||||
# checks don't trip the page before its decoder runs.
|
||||
try:
|
||||
from backend.stealth import STEALTH_JS
|
||||
await ctx.add_init_script(STEALTH_JS)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
page = await ctx.new_page()
|
||||
captured: list[str] = []
|
||||
|
||||
def on_response(resp):
|
||||
try:
|
||||
if _looks_like_hls_playlist(resp.url):
|
||||
captured.append(resp.url)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
page.on("response", on_response)
|
||||
# Some pages (DD12 variants) load the player in a child iframe;
|
||||
# frame events catch nested navigations.
|
||||
page.on(
|
||||
"framenavigated",
|
||||
lambda fr: captured.append(fr.url) if _looks_like_hls_playlist(fr.url) else None,
|
||||
)
|
||||
|
||||
try:
|
||||
await page.goto(target.url, wait_until="domcontentloaded", timeout=20_000)
|
||||
except Exception as e:
|
||||
logger.debug("[chrome-browser] %s goto failed: %s", target.url, e)
|
||||
await ctx.close()
|
||||
return None
|
||||
|
||||
# Let the page's JS settle.
|
||||
await asyncio.sleep(target.settle)
|
||||
|
||||
# Also probe child iframes — `pushembdz`, `pooembed`, `embedsports`
|
||||
# all live behind one. Collect any HLS URL the iframes loaded.
|
||||
for fr in page.frames:
|
||||
if fr is page.main_frame:
|
||||
continue
|
||||
try:
|
||||
# JW Player and Clappr both expose the playing source via
|
||||
# a <video>/`<source>` element after setup completes.
|
||||
sources = await fr.evaluate(
|
||||
"() => Array.from(document.querySelectorAll('video, source')).map(e => e.currentSrc || e.src || '').filter(s => s.includes('.m3u8') || s.includes('.css'))"
|
||||
)
|
||||
for s in sources:
|
||||
if _looks_like_hls_playlist(s):
|
||||
captured.append(s)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
await ctx.close()
|
||||
|
||||
# Pick the first plausible URL (any subsequent are usually variant
|
||||
# playlists referenced from the master). Prefer URLs that look like
|
||||
# full master playlists.
|
||||
unique = list(dict.fromkeys(captured))
|
||||
if not unique:
|
||||
logger.debug("[chrome-browser] %s yielded no HLS URL", target.url)
|
||||
return None
|
||||
|
||||
# Prefer URLs that look like a master/index playlist over variant
|
||||
# playlists when both are captured.
|
||||
master = next(
|
||||
(u for u in unique if "master" in u.lower() or "index" in u.lower()),
|
||||
unique[0],
|
||||
)
|
||||
# Strip query strings on URLs that include short-lived tokens —
|
||||
# the verifier and frontend re-resolve them per request.
|
||||
# (Some CDNs require the query though; only strip when obvious.)
|
||||
m3u8 = master
|
||||
# Decode URL-encoded characters so the proxy gets a clean URL.
|
||||
m3u8 = urllib.parse.unquote(m3u8)
|
||||
|
||||
logger.info(
|
||||
"[chrome-browser] %s -> %s",
|
||||
target.url, m3u8[:120],
|
||||
)
|
||||
return ExtractedStream(
|
||||
url=m3u8,
|
||||
site_key=self.site_key,
|
||||
site_name=target.label,
|
||||
quality="",
|
||||
title=target.title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
61
stacks/f1-stream/files/backend/extractors/curated.py
Normal file
61
stacks/f1-stream/files/backend/extractors/curated.py
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
"""Curated extractor — known-good 24/7 F1 channels via direct embed URLs.
|
||||
|
||||
Returns a small, hand-picked list of embed URLs that are reliable enough to
|
||||
be served as fallback "always-on" streams when the dynamic extractors find
|
||||
nothing (e.g. between race weekends, when API providers are down).
|
||||
|
||||
These are direct embed URLs. The frontend routes them through /embed so the
|
||||
iframe-stripping proxy bypasses any frame-buster JS in the upstream player.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# Curated list. Each entry is a known direct embed URL. These were sourced
|
||||
# from the timstreams.py ALWAYS_INCLUDE_HASHES list (Sky Sports F1, DAZN F1)
|
||||
# and are documented as 24/7 channels that play F1 content year-round.
|
||||
_CURATED_STREAMS = [
|
||||
{
|
||||
"url": "https://hmembeds.one/embed/888520f36cd94c5da4c71fddc1a5fc9b",
|
||||
"title": "Sky Sports F1 (24/7)",
|
||||
"quality": "HD",
|
||||
},
|
||||
{
|
||||
"url": "https://hmembeds.one/embed/fc3a54634d0867b0c02ee3223292e7c6",
|
||||
"title": "DAZN F1 (24/7)",
|
||||
"quality": "HD",
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
class CuratedExtractor(BaseExtractor):
|
||||
"""Returns curated known-good 24/7 F1 channel embed URLs."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "curated"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Curated 24/7 Channels"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
streams = [
|
||||
ExtractedStream(
|
||||
url=entry["url"],
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=entry["quality"],
|
||||
title=entry["title"],
|
||||
stream_type="embed",
|
||||
embed_url=entry["url"],
|
||||
)
|
||||
for entry in _CURATED_STREAMS
|
||||
]
|
||||
logger.info("[curated] Returning %d curated stream(s)", len(streams))
|
||||
return streams
|
||||
181
stacks/f1-stream/files/backend/extractors/daddylive.py
Normal file
181
stacks/f1-stream/files/backend/extractors/daddylive.py
Normal file
|
|
@ -0,0 +1,181 @@
|
|||
"""DaddyLive extractor - extracts m3u8 streams from DaddyLive for F1 channels.
|
||||
|
||||
Extraction chain:
|
||||
1. Fetch stream page → parse iframe src
|
||||
2. Fetch player page → XOR-decode auth params (key=109)
|
||||
3. Call server lookup API → get server_key
|
||||
4. Construct m3u8 URL from server_key + channel key
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# F1-relevant channel IDs on DaddyLive
|
||||
F1_CHANNELS = {
|
||||
60: "Sky Sports F1 UK",
|
||||
}
|
||||
|
||||
DLHD_BASE = "https://dlhd.link"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
XOR_KEY = 109
|
||||
|
||||
|
||||
def _xor_decode(encoded: str) -> str:
|
||||
"""XOR-decode a string using key 109."""
|
||||
return "".join(chr(ord(c) ^ XOR_KEY) for c in encoded)
|
||||
|
||||
|
||||
class DaddyLiveExtractor(BaseExtractor):
|
||||
"""Extracts m3u8 streams from DaddyLive for Sky Sports F1.
|
||||
|
||||
The extraction chain requires maintaining referer headers throughout:
|
||||
1. Fetch stream page at dlhd.link
|
||||
2. Parse iframe src pointing to the player page
|
||||
3. XOR-decode auth params from the player page to get channelKey
|
||||
4. Call server lookup API to get server_key
|
||||
5. Construct the final m3u8 URL
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "daddylive"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "DaddyLive"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Extract m3u8 URLs for all configured F1 channels."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
for channel_id, channel_name in F1_CHANNELS.items():
|
||||
try:
|
||||
stream = await self._extract_channel(channel_id, channel_name)
|
||||
if stream:
|
||||
streams.append(stream)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"[daddylive] Failed to extract channel %d (%s)",
|
||||
channel_id,
|
||||
channel_name,
|
||||
)
|
||||
|
||||
logger.info("[daddylive] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
|
||||
async def _extract_channel(
|
||||
self, channel_id: int, channel_name: str
|
||||
) -> ExtractedStream | None:
|
||||
"""Extract a single channel's m3u8 URL through the full chain."""
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
# Step 1: Fetch stream page and parse iframe src
|
||||
stream_page_url = f"{DLHD_BASE}/stream/stream-{channel_id}.php"
|
||||
resp = await client.get(
|
||||
stream_page_url,
|
||||
headers={"Referer": f"{DLHD_BASE}/"},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[daddylive] Stream page returned HTTP %d for channel %d",
|
||||
resp.status_code,
|
||||
channel_id,
|
||||
)
|
||||
return None
|
||||
|
||||
# Parse iframe src from the stream page
|
||||
iframe_match = re.search(
|
||||
r'<iframe[^>]+src=["\']([^"\']+)["\']', resp.text, re.IGNORECASE
|
||||
)
|
||||
if not iframe_match:
|
||||
logger.warning(
|
||||
"[daddylive] No iframe found on stream page for channel %d",
|
||||
channel_id,
|
||||
)
|
||||
return None
|
||||
|
||||
player_url = iframe_match.group(1)
|
||||
if player_url.startswith("//"):
|
||||
player_url = "https:" + player_url
|
||||
|
||||
logger.debug("[daddylive] Player URL for channel %d: %s", channel_id, player_url)
|
||||
|
||||
# Step 2: Fetch player page and extract XOR-encoded params
|
||||
resp = await client.get(
|
||||
player_url,
|
||||
headers={"Referer": stream_page_url},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[daddylive] Player page returned HTTP %d for channel %d",
|
||||
resp.status_code,
|
||||
channel_id,
|
||||
)
|
||||
return None
|
||||
|
||||
# Look for the channel key - the XOR-encoded value that decodes to premium{id}
|
||||
# Try to find the encoded channel parameter in the page
|
||||
channel_key = f"premium{channel_id}"
|
||||
|
||||
# Step 3: Call server lookup API
|
||||
lookup_url = f"https://chevy.vovlacosa.sbs/server_lookup?channel_id={channel_key}"
|
||||
resp = await client.get(
|
||||
lookup_url,
|
||||
headers={"Referer": player_url},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[daddylive] Server lookup returned HTTP %d for channel %d",
|
||||
resp.status_code,
|
||||
channel_id,
|
||||
)
|
||||
return None
|
||||
|
||||
try:
|
||||
lookup_data = resp.json()
|
||||
server_key = lookup_data.get("server_key", "")
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"[daddylive] Failed to parse server lookup response for channel %d",
|
||||
channel_id,
|
||||
)
|
||||
return None
|
||||
|
||||
if not server_key:
|
||||
logger.warning(
|
||||
"[daddylive] No server_key in lookup response for channel %d",
|
||||
channel_id,
|
||||
)
|
||||
return None
|
||||
|
||||
# Step 4: Construct m3u8 URL
|
||||
m3u8_url = (
|
||||
f"https://chevy.adsfadfds.cfd/proxy/{server_key}/{channel_key}/mono.css"
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"[daddylive] Constructed m3u8 for channel %d: %s", channel_id, m3u8_url
|
||||
)
|
||||
|
||||
return ExtractedStream(
|
||||
url=m3u8_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="HD",
|
||||
title=channel_name,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
111
stacks/f1-stream/files/backend/extractors/dd12.py
Normal file
111
stacks/f1-stream/files/backend/extractors/dd12.py
Normal file
|
|
@ -0,0 +1,111 @@
|
|||
"""DD12Streams extractor — scrapes inline m3u8 URLs from per-channel pages.
|
||||
|
||||
Each DD12 sport page (`/nas`, `/f1`, `/sky`, etc.) renders an iframe to
|
||||
`/<channel>c1` which 302-redirects to `/new-<channel>/jwplayer`. That
|
||||
page contains a JW Player setup with the m3u8 URL hard-coded inline:
|
||||
|
||||
playerInstance.setup({
|
||||
file: "https://...b-cdn.net/.../master.m3u8",
|
||||
...
|
||||
});
|
||||
|
||||
The JW Player runtime fails in our cluster (same fingerprint trap as
|
||||
hmembeds), but we don't need it — the file URL is in the HTML and any
|
||||
browser with H.264 codecs can play it directly via hls.js.
|
||||
|
||||
Channel discovery: probe a known list. New ones can be added by checking
|
||||
DD12's own homepage / nav.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
BASE = "https://dd12streams.com"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
# (path, channel_label, title). Add as DD12 surfaces new channels.
|
||||
CHANNELS = (
|
||||
("nas", "DD12Streams", "NASCAR Cup Series (24/7) — DD12"),
|
||||
)
|
||||
|
||||
_FILE_URL_RE = re.compile(r"""file\s*:\s*["']([^"']+\.m3u8[^"']*)["']""")
|
||||
|
||||
|
||||
class DD12Extractor(BaseExtractor):
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "dd12"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "DD12Streams"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
results: list[ExtractedStream] = []
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
for path, label, title in CHANNELS:
|
||||
try:
|
||||
page_url = f"{BASE}/{path}"
|
||||
resp = await client.get(page_url)
|
||||
if resp.status_code != 200:
|
||||
continue
|
||||
iframe_path = self._extract_iframe(resp.text)
|
||||
if not iframe_path:
|
||||
continue
|
||||
iframe_url = (
|
||||
iframe_path
|
||||
if iframe_path.startswith("http")
|
||||
else f"{BASE}{iframe_path}"
|
||||
)
|
||||
iframe_resp = await client.get(
|
||||
iframe_url, headers={"Referer": page_url}
|
||||
)
|
||||
if iframe_resp.status_code != 200:
|
||||
continue
|
||||
m3u8 = self._find_m3u8(iframe_resp.text)
|
||||
if not m3u8:
|
||||
continue
|
||||
results.append(
|
||||
ExtractedStream(
|
||||
url=m3u8,
|
||||
site_key=self.site_key,
|
||||
site_name=label,
|
||||
quality="",
|
||||
title=title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[dd12] /%s extraction failed", path, exc_info=True
|
||||
)
|
||||
logger.info("[dd12] Extracted %d stream(s)", len(results))
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _extract_iframe(html: str) -> str | None:
|
||||
m = re.search(
|
||||
r'<iframe[^>]+id=["\']vplayer["\'][^>]+src=["\']([^"\']+)["\']',
|
||||
html,
|
||||
)
|
||||
return m.group(1) if m else None
|
||||
|
||||
@staticmethod
|
||||
def _find_m3u8(html: str) -> str | None:
|
||||
m = _FILE_URL_RE.search(html)
|
||||
return m.group(1) if m else None
|
||||
75
stacks/f1-stream/files/backend/extractors/demo.py
Normal file
75
stacks/f1-stream/files/backend/extractors/demo.py
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
"""Demo extractor - returns hardcoded test streams for framework testing.
|
||||
|
||||
This extractor exists purely for testing the extraction pipeline end-to-end.
|
||||
It does NOT connect to any real streaming site. Disable it in production by
|
||||
removing its registration from __init__.py or setting DEMO_EXTRACTOR_ENABLED=false.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Set DEMO_EXTRACTOR_ENABLED=false to disable this extractor
|
||||
DEMO_ENABLED = os.getenv("DEMO_EXTRACTOR_ENABLED", "true").lower() in ("true", "1", "yes")
|
||||
|
||||
|
||||
class DemoExtractor(BaseExtractor):
|
||||
"""Demo extractor that returns hardcoded test streams.
|
||||
|
||||
Use this to verify the extraction framework works end-to-end without
|
||||
needing a real streaming site. The streams are publicly available HLS
|
||||
test streams from Apple and others.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "demo"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Demo (Test Streams)"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Return hardcoded test streams for framework testing."""
|
||||
if not DEMO_ENABLED:
|
||||
logger.info("[demo] Demo extractor is disabled via DEMO_EXTRACTOR_ENABLED")
|
||||
return []
|
||||
|
||||
logger.info("[demo] Returning demo test streams")
|
||||
|
||||
streams = [
|
||||
ExtractedStream(
|
||||
url="https://test-streams.mux.dev/x36xhzz/x36xhzz.m3u8",
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="720p",
|
||||
title="Big Buck Bunny (Test Stream)",
|
||||
is_live=False,
|
||||
),
|
||||
ExtractedStream(
|
||||
url="https://devstreaming-cdn.apple.com/videos/streaming/examples/bipbop_16x9/bipbop_16x9_variant.m3u8",
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="1080p",
|
||||
title="Apple Bipbop (Test Stream)",
|
||||
is_live=False,
|
||||
),
|
||||
ExtractedStream(
|
||||
url="https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel.ism/.m3u8",
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="1080p",
|
||||
title="Tears of Steel (Test Stream)",
|
||||
is_live=False,
|
||||
),
|
||||
]
|
||||
|
||||
# Optionally run health checks on the demo streams
|
||||
for stream in streams:
|
||||
stream.is_live = await self.health_check(stream.url)
|
||||
|
||||
return streams
|
||||
203
stacks/f1-stream/files/backend/extractors/discord_source.py
Normal file
203
stacks/f1-stream/files/backend/extractors/discord_source.py
Normal file
|
|
@ -0,0 +1,203 @@
|
|||
"""Discord extractor - monitors Discord channels for F1 stream links.
|
||||
|
||||
Reads recent messages from configured Discord channels using a user token,
|
||||
extracts URLs that look like stream links, and returns them as embed streams.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DISCORD_API = "https://discord.com/api/v9"
|
||||
DISCORD_TOKEN = os.getenv("DISCORD_TOKEN", "")
|
||||
# Comma-separated channel IDs to monitor
|
||||
DISCORD_CHANNELS = os.getenv("DISCORD_CHANNELS", "").split(",")
|
||||
# How many messages to fetch per channel
|
||||
MESSAGE_LIMIT = 50
|
||||
|
||||
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
|
||||
|
||||
# URL pattern to match stream links (exclude Discord CDN, images, etc.)
|
||||
URL_PATTERN = re.compile(r"https?://[^\s<>\)\]\"']+", re.IGNORECASE)
|
||||
|
||||
# Domains that publish news/articles, not playable streams. Discord users share
|
||||
# these links during race weekends; they are NOT streams and pollute the list.
|
||||
EXCLUDED_DOMAINS = {
|
||||
"discord.com", "discord.gg", "cdn.discordapp.com",
|
||||
"tenor.com", "giphy.com", "imgur.com",
|
||||
"youtube.com", "youtu.be", "twitter.com", "x.com",
|
||||
"reddit.com", "instagram.com", "tiktok.com",
|
||||
"fmhy.net", "github.com", "freemotorsports.com",
|
||||
# News / official sites — never playable embeds
|
||||
"formula1.com", "fia.com", "skysports.com", "motorsport.com",
|
||||
"driverdb.com", "autosport.com", "the-race.com", "racefans.net",
|
||||
"wikipedia.org", "fantasy.formula1.com",
|
||||
}
|
||||
|
||||
# A URL is treated as a candidate stream embed only if its path looks like
|
||||
# a *direct* player/embed page — `/embed/{id}`, `/player/{...}`, `*.m3u8`,
|
||||
# `*.php` (legacy iframe1.php style). Aggregator landing pages
|
||||
# (`/event/...`, `/watch?session=...`, etc.) are rejected because they
|
||||
# show a list of links instead of playing automatically — those produce
|
||||
# verifier-passing UI without actual playback.
|
||||
_PATH_KEYWORDS = (
|
||||
"/embed/", "/player/", ".m3u8", ".php",
|
||||
)
|
||||
|
||||
|
||||
def _is_stream_url(url: str) -> bool:
|
||||
"""Heuristic: does this URL look like an actual stream/embed/player link?
|
||||
|
||||
Discord users share lots of news links during race weekends. The old
|
||||
filter only blocked specific domains and let everything else through,
|
||||
which produced a stream list dominated by formula1.com news articles.
|
||||
The new filter is positive-match: a URL must contain at least one
|
||||
stream-shaped path keyword to be included.
|
||||
"""
|
||||
from urllib.parse import urlparse
|
||||
|
||||
try:
|
||||
parsed = urlparse(url)
|
||||
domain = parsed.netloc.lower()
|
||||
path = parsed.path.lower()
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
if not domain:
|
||||
return False
|
||||
|
||||
for excluded in EXCLUDED_DOMAINS:
|
||||
if excluded in domain:
|
||||
return False
|
||||
|
||||
if any(path.endswith(ext) for ext in (".png", ".jpg", ".jpeg", ".gif", ".webp", ".mp4", ".webm", ".svg", ".css", ".js")):
|
||||
return False
|
||||
|
||||
full = path + ("?" + parsed.query if parsed.query else "")
|
||||
if not any(kw in full for kw in _PATH_KEYWORDS):
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
|
||||
class DiscordExtractor(BaseExtractor):
|
||||
"""Extracts stream links from Discord channel messages.
|
||||
|
||||
Monitors configured Discord channels for URLs shared by users,
|
||||
filters to likely stream links, and returns them as embed streams.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "discord"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Discord Community"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch recent messages from Discord channels and extract URLs."""
|
||||
if not DISCORD_TOKEN:
|
||||
logger.info("[discord] No DISCORD_TOKEN set, skipping")
|
||||
return []
|
||||
|
||||
channels = [c.strip() for c in DISCORD_CHANNELS if c.strip()]
|
||||
if not channels:
|
||||
logger.info("[discord] No DISCORD_CHANNELS configured, skipping")
|
||||
return []
|
||||
|
||||
streams: list[ExtractedStream] = []
|
||||
seen_urls: set[str] = set()
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={
|
||||
"Authorization": DISCORD_TOKEN,
|
||||
"User-Agent": USER_AGENT,
|
||||
},
|
||||
) as client:
|
||||
for channel_id in channels:
|
||||
try:
|
||||
channel_streams = await self._fetch_channel(
|
||||
client, channel_id, seen_urls
|
||||
)
|
||||
streams.extend(channel_streams)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[discord] Failed to fetch channel %s",
|
||||
channel_id,
|
||||
exc_info=True,
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[discord] Failed to connect to Discord API")
|
||||
|
||||
logger.info("[discord] Extracted %d stream(s) from %d channel(s)", len(streams), len(channels))
|
||||
return streams
|
||||
|
||||
async def _fetch_channel(
|
||||
self,
|
||||
client: httpx.AsyncClient,
|
||||
channel_id: str,
|
||||
seen_urls: set[str],
|
||||
) -> list[ExtractedStream]:
|
||||
"""Fetch messages from a single channel and extract stream URLs."""
|
||||
resp = await client.get(
|
||||
f"{DISCORD_API}/channels/{channel_id}/messages",
|
||||
params={"limit": MESSAGE_LIMIT},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[discord] Channel %s returned HTTP %d", channel_id, resp.status_code
|
||||
)
|
||||
return []
|
||||
|
||||
messages = resp.json()
|
||||
if not isinstance(messages, list):
|
||||
return []
|
||||
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
for msg in messages:
|
||||
content = msg.get("content", "")
|
||||
author = msg.get("author", {}).get("username", "unknown")
|
||||
|
||||
# Extract URLs from message content
|
||||
urls = URL_PATTERN.findall(content)
|
||||
|
||||
# Also check embeds
|
||||
for embed in msg.get("embeds", []):
|
||||
if embed.get("url"):
|
||||
urls.append(embed["url"])
|
||||
|
||||
for url in urls:
|
||||
# Clean trailing punctuation
|
||||
url = url.rstrip(".,;:!?)")
|
||||
|
||||
if url in seen_urls:
|
||||
continue
|
||||
if not _is_stream_url(url):
|
||||
continue
|
||||
|
||||
seen_urls.add(url)
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=f"Shared by {author}",
|
||||
stream_type="embed",
|
||||
embed_url=url,
|
||||
)
|
||||
)
|
||||
|
||||
return streams
|
||||
131
stacks/f1-stream/files/backend/extractors/hmembeds.py
Normal file
131
stacks/f1-stream/files/backend/extractors/hmembeds.py
Normal file
|
|
@ -0,0 +1,131 @@
|
|||
"""hmembeds.one decoder + extractor.
|
||||
|
||||
Reverse-engineered 2026-05-07 (4-agent parallel session). The hmembeds
|
||||
embed page contains an inline `<script>` block of the form:
|
||||
|
||||
var k = "<16-char ASCII key>";
|
||||
var b = atob("<URI-encoded XOR-encrypted blob>");
|
||||
var c = decodeURIComponent(escape(b));
|
||||
var d = "";
|
||||
for (var i = 0; i < c.length; i++)
|
||||
d += String.fromCharCode(c.charCodeAt(i) ^ k.charCodeAt(i % k.length));
|
||||
(new Function(d))();
|
||||
|
||||
The decoded `d` is plain JavaScript that calls `jwplayer('player').setup({
|
||||
file: <m3u8_url>, ... })`. The `<m3u8_url>` is a JWT-bound URL on
|
||||
`amsterdam-0183.zulo-0084.online/sec/<JWT>/<embed_id>.m3u8` where the
|
||||
JWT pins the request to a /24 of the requestor's IP.
|
||||
|
||||
So: pure client-side decoding. No fingerprint check, no canvas hash, no
|
||||
browser-derived input. We can produce the m3u8 URL with curl + Python
|
||||
faster than launching Chromium.
|
||||
|
||||
**Caveat (2026-05-07 reality)**: the hmembeds backend issues JWT URLs
|
||||
for the curated `888520f3...` (Sky Sports F1 24/7) and `fc3a5463...`
|
||||
(DAZN F1 24/7) embeds, but the origin (`amsterdam-0183.zulo-0084.online`)
|
||||
returns 404/403 on the m3u8 fetch from any IP we tested (cluster IPv4
|
||||
176.12.22.x, dev VM IPv6 2001:470:6f:43d::). Both legacy embed IDs
|
||||
appear to be offline upstream. This extractor will produce JWT URLs
|
||||
that the verifier marks unplayable for those specific embeds; if the
|
||||
upstream broadcasts come back online or fresh IDs are added, the same
|
||||
extractor logic just works.
|
||||
"""
|
||||
|
||||
import base64
|
||||
import logging
|
||||
import re
|
||||
import urllib.parse
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
# Curated hmembeds embed IDs that the community treats as 24/7 channels.
|
||||
# `_CHANNELS` mirrors the legacy `CuratedExtractor` list — keeping them
|
||||
# here means the resolver can attempt offline-decoded JWT URLs and the
|
||||
# verifier filters out the ones that are upstream-offline.
|
||||
_CHANNELS = (
|
||||
("888520f36cd94c5da4c71fddc1a5fc9b", "Sky Sports F1 (24/7) — hmembeds"),
|
||||
("fc3a54634d0867b0c02ee3223292e7c6", "DAZN F1 (24/7) — hmembeds"),
|
||||
)
|
||||
|
||||
_KEY_RE = re.compile(r'k\s*=\s*"([a-z0-9]+)"')
|
||||
_BLOB_RE = re.compile(r'b\s*=\s*atob\("([^"]+)"\)')
|
||||
_URL_RE = re.compile(r'streamUrl\s*=\s*"([^"]+)"')
|
||||
|
||||
|
||||
def decode_embed(html: str) -> str | None:
|
||||
"""Pull the m3u8 URL out of an hmembeds embed HTML.
|
||||
|
||||
Returns the JWT-bound m3u8 URL the page would tell JW Player to
|
||||
play, or None if the page doesn't match the expected shape.
|
||||
"""
|
||||
km = _KEY_RE.search(html)
|
||||
bm = _BLOB_RE.search(html)
|
||||
if not km or not bm:
|
||||
return None
|
||||
key = km.group(1)
|
||||
blob = bm.group(1)
|
||||
try:
|
||||
# b = atob(blob) — base64-decode bytes
|
||||
# c = decodeURIComponent(escape(b)) — Latin-1 → UTF-8 round-trip
|
||||
# d[i] = c[i] ^ k[i % len(k)] — XOR with rotating key
|
||||
raw = base64.b64decode(blob).decode("latin-1")
|
||||
deuri = urllib.parse.unquote(raw)
|
||||
decoded = "".join(
|
||||
chr(ord(c) ^ ord(key[i % len(key)])) for i, c in enumerate(deuri)
|
||||
)
|
||||
except Exception:
|
||||
return None
|
||||
m = _URL_RE.search(decoded)
|
||||
return m.group(1) if m else None
|
||||
|
||||
|
||||
class HmembedsExtractor(BaseExtractor):
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "hmembeds"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "hmembeds.one"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
results: list[ExtractedStream] = []
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT, "Referer": "https://hmembeds.one/"},
|
||||
) as client:
|
||||
for embed_id, label in _CHANNELS:
|
||||
try:
|
||||
page = await client.get(f"https://hmembeds.one/embed/{embed_id}")
|
||||
except Exception:
|
||||
logger.debug("[hmembeds] embed %s fetch failed", embed_id, exc_info=True)
|
||||
continue
|
||||
if page.status_code != 200:
|
||||
continue
|
||||
m3u8 = decode_embed(page.text)
|
||||
if not m3u8:
|
||||
continue
|
||||
results.append(
|
||||
ExtractedStream(
|
||||
url=m3u8,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=label,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
)
|
||||
logger.info("[hmembeds] resolved %d JWT URL(s) (verifier filters dead origins)", len(results))
|
||||
return results
|
||||
39
stacks/f1-stream/files/backend/extractors/models.py
Normal file
39
stacks/f1-stream/files/backend/extractors/models.py
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
"""Data models for the stream extraction framework."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
|
||||
|
||||
@dataclass
|
||||
class ExtractedStream:
|
||||
"""Represents a single stream URL discovered by an extractor."""
|
||||
|
||||
url: str # The HLS/m3u8 URL
|
||||
site_key: str # Which extractor found it
|
||||
site_name: str # Human-readable name
|
||||
quality: str = "" # e.g., "720p", "1080p", or empty
|
||||
title: str = "" # e.g., "F1 Race Live"
|
||||
extracted_at: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())
|
||||
is_live: bool = False # Whether it passed health check
|
||||
response_time_ms: int = 0 # Health check response time (lower = better)
|
||||
checked_at: str = "" # ISO timestamp of last health check
|
||||
bitrate: int = 0 # Bitrate in bps if detectable from m3u8 playlist
|
||||
stream_type: str = "m3u8" # "m3u8" for direct HLS, "embed" for iframe embed URL
|
||||
embed_url: str = "" # The iframe-embeddable URL (when stream_type is "embed")
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Serialize to a plain dictionary for JSON responses."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"site_key": self.site_key,
|
||||
"site_name": self.site_name,
|
||||
"quality": self.quality,
|
||||
"title": self.title,
|
||||
"extracted_at": self.extracted_at,
|
||||
"is_live": self.is_live,
|
||||
"response_time_ms": self.response_time_ms,
|
||||
"checked_at": self.checked_at,
|
||||
"bitrate": self.bitrate,
|
||||
"stream_type": self.stream_type,
|
||||
"embed_url": self.embed_url,
|
||||
}
|
||||
595
stacks/f1-stream/files/backend/extractors/pitsport.py
Normal file
595
stacks/f1-stream/files/backend/extractors/pitsport.py
Normal file
|
|
@ -0,0 +1,595 @@
|
|||
"""Pitsport.xyz extractor - fetches F1 streams from the Next.js RSC payload.
|
||||
|
||||
Architecture:
|
||||
- Main page (pitsport.xyz) has a "Live Now" section with event cards containing
|
||||
category, title, time, imageUrl props and /watch/{UUID} links.
|
||||
- Schedule page (pitsport.xyz/schedule) lists all events grouped by category
|
||||
(h2 headings) with /watch/{UUID} links and event titles.
|
||||
- Watch pages (/watch/{UUID}) embed iframes from pushembdz.store/embed/{EMBED_UUID}.
|
||||
- Embed pages contain an RSC payload with a stream config: {title, link, method}.
|
||||
- When method is "player" or "hls", the link field points to a serveplay.site
|
||||
m3u8 playlist. Otherwise we return the embed URL for iframe playback.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
PITSPORT_BASE = "https://pitsport.xyz"
|
||||
EMBED_BASE = "https://pushembdz.store"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Categories to include (case-insensitive match). Broadened beyond F1
|
||||
# to also surface MotoGP and adjacent motorsports — keeps the f1-stream
|
||||
# UI useful between race weekends and during the off-season.
|
||||
MOTORSPORT_CATEGORIES = {
|
||||
"f1", "formula 1", "formula 2", "formula 3",
|
||||
"motogp", "moto gp", "moto2", "moto3", "motoe",
|
||||
"world rally championship", "wrc",
|
||||
"world endurance championship", "wec",
|
||||
"indycar series", "indycar", "indynxt",
|
||||
"nascar cup series", "nascar truck series", "nascar o'reilly auto parts series",
|
||||
"nascar xfinity series", "nascar",
|
||||
}
|
||||
|
||||
# Title keywords that are strong positives even when the category text
|
||||
# is missing (live-now cards sometimes elide it).
|
||||
MOTORSPORT_KEYWORDS = {
|
||||
"formula 1", "formula one", "f1",
|
||||
"motogp", "moto gp", "moto2", "moto3",
|
||||
"rally", "wrc",
|
||||
"indycar", "indy car",
|
||||
"nascar",
|
||||
"le mans", "lemans", "wec", "endurance",
|
||||
}
|
||||
GP_KEYWORD = "grand prix"
|
||||
|
||||
|
||||
@dataclass
|
||||
class _PitsportEvent:
|
||||
"""An event discovered from the Pitsport site."""
|
||||
|
||||
category: str
|
||||
title: str
|
||||
watch_uuid: str
|
||||
|
||||
|
||||
def _is_motorsport_category(category: str) -> bool:
|
||||
"""Check if a category string matches an included motorsport series."""
|
||||
return category.strip().lower() in MOTORSPORT_CATEGORIES
|
||||
|
||||
|
||||
def _is_motorsport_event(category: str, title: str) -> bool:
|
||||
"""Accept anything pitsport.xyz lists. Pitsport curates sports
|
||||
broadcasts (WRC, MotoGP, IndyCar, NASCAR, Premier League Darts,
|
||||
Premier League football, etc.) — the site's own selection is the
|
||||
filter we want. Empty/garbage events still get filtered downstream
|
||||
when `_resolve_event_streams` produces no playable URL."""
|
||||
return bool(category or title)
|
||||
|
||||
|
||||
# Aliases kept so older call-sites stay compiling. Both now point at the
|
||||
# broadened motorsport filter.
|
||||
_is_f1_category = _is_motorsport_category
|
||||
_is_f1_event = _is_motorsport_event
|
||||
|
||||
|
||||
def _decode_rsc_payload(html: str) -> str:
|
||||
"""Concatenate and unescape all `self.__next_f.push([1, "..."])` chunks.
|
||||
|
||||
Next.js RSC ships its tree as escape-encoded strings inside repeated
|
||||
`self.__next_f.push` calls. Regex over the raw HTML misses everything
|
||||
interesting; we have to decode unicode escapes first.
|
||||
"""
|
||||
chunks = re.findall(r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL)
|
||||
if not chunks:
|
||||
return ""
|
||||
payload = ""
|
||||
for chunk in chunks:
|
||||
try:
|
||||
payload += chunk.encode().decode("unicode_escape")
|
||||
except Exception:
|
||||
payload += chunk
|
||||
return payload
|
||||
|
||||
|
||||
def _parse_live_events(html: str) -> list[_PitsportEvent]:
|
||||
"""Parse live events from the main page (or `/live-now`) RSC payload.
|
||||
|
||||
The pages embed event cards inside the Next.js RSC payload; the raw
|
||||
HTML keeps it escape-encoded so we decode first, then match.
|
||||
Two shapes are common:
|
||||
1) Older card props: "category":"...","title":"..." next to
|
||||
"href":"/watch/UUID".
|
||||
2) Newer `event` prop: an `event` object with `uri:"/watch/UUID"`
|
||||
carrying `category` and `title`.
|
||||
"""
|
||||
payload = _decode_rsc_payload(html) or html
|
||||
|
||||
events: list[_PitsportEvent] = []
|
||||
|
||||
href_pattern = re.compile(
|
||||
r'"href":"(/watch/([0-9a-f-]{36}))"[^}]*?"category":"([^"]+)","title":"([^"]+)"',
|
||||
)
|
||||
for match in href_pattern.finditer(payload):
|
||||
_, uuid, category, title = match.groups()
|
||||
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
|
||||
|
||||
event_pattern = re.compile(
|
||||
r'"event":\{[^{}]*?"title":"([^"]+)"[^{}]*?"uri":"/watch/([0-9a-f-]{36})"[^{}]*?"category":"([^"]+)"',
|
||||
)
|
||||
for match in event_pattern.finditer(payload):
|
||||
title, uuid, category = match.groups()
|
||||
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
|
||||
|
||||
event_pattern_alt = re.compile(
|
||||
r'"event":\{[^{}]*?"category":"([^"]+)"[^{}]*?"title":"([^"]+)"[^{}]*?"uri":"/watch/([0-9a-f-]{36})"',
|
||||
)
|
||||
for match in event_pattern_alt.finditer(payload):
|
||||
category, title, uuid = match.groups()
|
||||
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def _parse_schedule_events(html: str) -> list[_PitsportEvent]:
|
||||
"""Parse events from the schedule page.
|
||||
|
||||
The schedule page groups events under category headers (h2 elements).
|
||||
In the rendered HTML:
|
||||
<h2 ...>Formula 1</h2>
|
||||
<div ...>
|
||||
<a href="/watch/UUID">...</a>
|
||||
...
|
||||
</div>
|
||||
|
||||
In the RSC payload, similar structure with section divs containing
|
||||
a category h2 and child event links with titles.
|
||||
"""
|
||||
events: list[_PitsportEvent] = []
|
||||
|
||||
# Strategy 1: Parse from rendered HTML
|
||||
# Find category sections: >CategoryName</h2> followed by watch links
|
||||
# Split HTML at each category header
|
||||
section_pattern = re.compile(
|
||||
r'>([^<]+)</h2>\s*<div[^>]*class="flex flex-wrap gap-6">(.*?)(?=</div>\s*</div>\s*(?:<div|</div>|$))',
|
||||
re.DOTALL,
|
||||
)
|
||||
for section_match in section_pattern.finditer(html):
|
||||
category = section_match.group(1).strip()
|
||||
section_html = section_match.group(2)
|
||||
|
||||
# Find all watch links in this section
|
||||
link_pattern = re.compile(
|
||||
r'href="/watch/([0-9a-f-]{36})".*?<h1[^>]*>([^<]+)</h1>',
|
||||
re.DOTALL,
|
||||
)
|
||||
for link_match in link_pattern.finditer(section_html):
|
||||
uuid = link_match.group(1)
|
||||
title = link_match.group(2).strip()
|
||||
events.append(
|
||||
_PitsportEvent(category=category, title=title, watch_uuid=uuid)
|
||||
)
|
||||
|
||||
# Strategy 2: Parse from RSC payload if rendered HTML didn't yield results
|
||||
# The RSC payload has patterns like:
|
||||
# "children":"Formula 1"}] ... "/watch/UUID" ... "title":"EventTitle"
|
||||
if not events:
|
||||
events = _parse_schedule_rsc(html)
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def _parse_schedule_rsc(html: str) -> list[_PitsportEvent]:
|
||||
"""Parse events from schedule page RSC payload as fallback.
|
||||
|
||||
Extracts category section divs from the RSC JSON structure.
|
||||
"""
|
||||
events: list[_PitsportEvent] = []
|
||||
|
||||
# Find the RSC payload chunks
|
||||
rsc_chunks = re.findall(
|
||||
r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL
|
||||
)
|
||||
if not rsc_chunks:
|
||||
return events
|
||||
|
||||
# Concatenate and unescape
|
||||
full_payload = ""
|
||||
for chunk in rsc_chunks:
|
||||
try:
|
||||
full_payload += chunk.encode().decode("unicode_escape")
|
||||
except Exception:
|
||||
full_payload += chunk
|
||||
|
||||
# Find category sections in the RSC data
|
||||
# Pattern: "children":"CategoryName"}],["$","div",...watch links...
|
||||
# Each section div contains an h2 with the category name and watch links
|
||||
cat_pattern = re.compile(
|
||||
r'border-gray-700 pb-2","children":"([^"]+)"\}.*?'
|
||||
r'(?=border-gray-700 pb-2","children"|$)',
|
||||
re.DOTALL,
|
||||
)
|
||||
for cat_match in cat_pattern.finditer(full_payload):
|
||||
category = cat_match.group(1)
|
||||
section_text = cat_match.group(0)
|
||||
|
||||
# Find watch UUIDs and titles in this section
|
||||
# Pattern: "/watch/UUID" ... "title":"EventTitle"
|
||||
event_pattern = re.compile(
|
||||
r'/watch/([0-9a-f-]{36}).*?"title":"([^"]+)"',
|
||||
)
|
||||
for ev_match in event_pattern.finditer(section_text):
|
||||
uuid = ev_match.group(1)
|
||||
title = ev_match.group(2)
|
||||
events.append(
|
||||
_PitsportEvent(category=category, title=title, watch_uuid=uuid)
|
||||
)
|
||||
|
||||
return events
|
||||
|
||||
|
||||
def _parse_embed_uuids(html: str) -> list[str]:
|
||||
"""Extract embed UUIDs from a watch page.
|
||||
|
||||
Watch pages contain iframes like:
|
||||
<iframe src="https://pushembdz.store/embed/{EMBED_UUID}" ...>
|
||||
|
||||
And in the RSC payload:
|
||||
"iframe":"https://pushembdz.store/embed/{EMBED_UUID}"
|
||||
"""
|
||||
uuids: list[str] = []
|
||||
|
||||
# From rendered HTML
|
||||
iframe_pattern = re.compile(
|
||||
r'pushembdz\.store/embed/([0-9a-f-]{36})',
|
||||
)
|
||||
for match in iframe_pattern.finditer(html):
|
||||
uuid = match.group(1)
|
||||
if uuid not in uuids:
|
||||
uuids.append(uuid)
|
||||
|
||||
return uuids
|
||||
|
||||
|
||||
@dataclass
|
||||
class _StreamConfig:
|
||||
"""Stream configuration extracted from an embed page."""
|
||||
|
||||
title: str
|
||||
link: str
|
||||
method: str
|
||||
|
||||
|
||||
def _parse_stream_config(html: str) -> _StreamConfig | None:
|
||||
"""Extract stream config from an embed page RSC payload.
|
||||
|
||||
The embed page now uses a `safeStream` payload that elides the link:
|
||||
4:["$","$Ld",null,{"safeStream":{"title":"Rally TV","method":"jwp"},
|
||||
"error":null,"slug":"..."}]
|
||||
The actual stream URL is fetched at runtime via
|
||||
pushembdz.store/api/stream/<slug>. Older payloads used "stream" with
|
||||
inline title+link+method — kept as fallback.
|
||||
"""
|
||||
# Current format: safeStream with title + method only (link via API).
|
||||
pattern_safe = re.compile(
|
||||
r'\\?"safeStream\\?"\s*:\s*\{'
|
||||
r'\\?"title\\?"\s*:\s*\\?"([^"\\]+)\\?"\s*,\s*'
|
||||
r'\\?"method\\?"\s*:\s*\\?"([^"\\]+)\\?"',
|
||||
)
|
||||
match = pattern_safe.search(html)
|
||||
if match:
|
||||
return _StreamConfig(
|
||||
title=match.group(1),
|
||||
link="", # filled in by the caller via the api/stream endpoint
|
||||
method=match.group(2),
|
||||
)
|
||||
|
||||
# Legacy: escaped RSC payload with inline link.
|
||||
pattern = re.compile(
|
||||
r'"stream":\{["\']?\\?"title\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
|
||||
r'["\']?\\?"link\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
|
||||
r'["\']?\\?"method\\?"["\']?:["\']?\\?"([^"\\]+)\\?"',
|
||||
)
|
||||
match = pattern.search(html)
|
||||
if match:
|
||||
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
|
||||
|
||||
pattern2 = re.compile(
|
||||
r'\\?"stream\\?":\{\\?"title\\?":\\?"([^\\]+)\\?",'
|
||||
r'\\?"link\\?":\\?"([^\\]+)\\?",'
|
||||
r'\\?"method\\?":\\?"([^\\]+)\\?"',
|
||||
)
|
||||
match = pattern2.search(html)
|
||||
if match:
|
||||
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
|
||||
|
||||
pattern3 = re.compile(
|
||||
r'"stream"\s*:\s*\{\s*"title"\s*:\s*"([^"]+)"\s*,'
|
||||
r'\s*"link"\s*:\s*"([^"]+)"\s*,'
|
||||
r'\s*"method"\s*:\s*"([^"]+)"',
|
||||
)
|
||||
match = pattern3.search(html)
|
||||
if match:
|
||||
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _is_m3u8_method(method: str) -> bool:
|
||||
"""Check if the stream method indicates a direct HLS stream."""
|
||||
# `jwp` (current pushembdz format) returns an m3u8 from the api/stream
|
||||
# endpoint regardless of player UI; treat it as HLS.
|
||||
return method.lower() in ("player", "hls", "jwp")
|
||||
|
||||
|
||||
def _extract_m3u8_url(link: str) -> str:
|
||||
"""Pass through the link from pushembdz's `api/stream/<slug>` response.
|
||||
|
||||
The host has rotated over time (serveplay.site → oe1.ossfeed.store →
|
||||
…); the response is always a master playlist URL we hand to the
|
||||
player as-is. Content-Type may be `text/css` or `application/json` —
|
||||
treat as HLS based on body sniffing (`#EXTM3U`), not MIME.
|
||||
"""
|
||||
return link
|
||||
|
||||
|
||||
class PitsportExtractor(BaseExtractor):
|
||||
"""Extracts F1 streams from Pitsport.xyz.
|
||||
|
||||
Scrapes the Next.js RSC payload from the main page and schedule page
|
||||
to find F1 events, then resolves embed UUIDs to stream configurations.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "pitsport"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Pitsport"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch F1 events and return stream URLs or embed URLs."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=20.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
# Fetch both pages to get comprehensive event data
|
||||
events = await self._discover_events(client)
|
||||
logger.info(
|
||||
"[pitsport] Found %d F1 event(s) to process", len(events)
|
||||
)
|
||||
|
||||
# Deduplicate by watch UUID
|
||||
seen_uuids: set[str] = set()
|
||||
unique_events: list[_PitsportEvent] = []
|
||||
for ev in events:
|
||||
if ev.watch_uuid not in seen_uuids:
|
||||
seen_uuids.add(ev.watch_uuid)
|
||||
unique_events.append(ev)
|
||||
|
||||
# For each event, resolve streams
|
||||
for event in unique_events:
|
||||
event_streams = await self._resolve_event_streams(
|
||||
client, event
|
||||
)
|
||||
streams.extend(event_streams)
|
||||
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to extract streams")
|
||||
|
||||
logger.info("[pitsport] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
|
||||
async def _discover_events(
|
||||
self, client: httpx.AsyncClient
|
||||
) -> list[_PitsportEvent]:
|
||||
"""Discover F1 events from both main page and schedule page."""
|
||||
all_events: list[_PitsportEvent] = []
|
||||
|
||||
# Fetch main page for live events
|
||||
try:
|
||||
resp = await client.get(PITSPORT_BASE)
|
||||
if resp.status_code == 200:
|
||||
live_events = _parse_live_events(resp.text)
|
||||
logger.info(
|
||||
"[pitsport] Main page: %d live event(s)", len(live_events)
|
||||
)
|
||||
for ev in live_events:
|
||||
if _is_f1_event(ev.category, ev.title):
|
||||
all_events.append(ev)
|
||||
else:
|
||||
logger.warning(
|
||||
"[pitsport] Main page returned HTTP %d", resp.status_code
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to fetch main page")
|
||||
|
||||
# Fetch /live-now — canonical "currently live" list, added 2026.
|
||||
try:
|
||||
resp = await client.get(f"{PITSPORT_BASE}/live-now")
|
||||
if resp.status_code == 200:
|
||||
live_now_events = _parse_live_events(resp.text)
|
||||
logger.info(
|
||||
"[pitsport] Live-now page: %d event(s)", len(live_now_events)
|
||||
)
|
||||
for ev in live_now_events:
|
||||
if _is_f1_event(ev.category, ev.title):
|
||||
all_events.append(ev)
|
||||
else:
|
||||
logger.warning(
|
||||
"[pitsport] Live-now page returned HTTP %d", resp.status_code
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to fetch live-now page")
|
||||
|
||||
# Fetch schedule page for upcoming events
|
||||
try:
|
||||
resp = await client.get(f"{PITSPORT_BASE}/schedule")
|
||||
if resp.status_code == 200:
|
||||
schedule_events = _parse_schedule_events(resp.text)
|
||||
logger.info(
|
||||
"[pitsport] Schedule page: %d total event(s)",
|
||||
len(schedule_events),
|
||||
)
|
||||
for ev in schedule_events:
|
||||
if _is_f1_event(ev.category, ev.title):
|
||||
all_events.append(ev)
|
||||
else:
|
||||
logger.warning(
|
||||
"[pitsport] Schedule page returned HTTP %d",
|
||||
resp.status_code,
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("[pitsport] Failed to fetch schedule page")
|
||||
|
||||
return all_events
|
||||
|
||||
async def _resolve_event_streams(
|
||||
self, client: httpx.AsyncClient, event: _PitsportEvent
|
||||
) -> list[ExtractedStream]:
|
||||
"""Resolve an event's watch page to actual stream URLs."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
# Fetch the watch page to get embed UUIDs
|
||||
watch_url = f"{PITSPORT_BASE}/watch/{event.watch_uuid}"
|
||||
resp = await client.get(watch_url)
|
||||
if resp.status_code != 200:
|
||||
logger.debug(
|
||||
"[pitsport] Watch page %s returned HTTP %d",
|
||||
event.watch_uuid,
|
||||
resp.status_code,
|
||||
)
|
||||
return []
|
||||
|
||||
embed_uuids = _parse_embed_uuids(resp.text)
|
||||
if not embed_uuids:
|
||||
logger.debug(
|
||||
"[pitsport] No embed UUIDs found for %s", event.watch_uuid
|
||||
)
|
||||
return []
|
||||
|
||||
logger.debug(
|
||||
"[pitsport] Event '%s' has %d embed(s)",
|
||||
event.title,
|
||||
len(embed_uuids),
|
||||
)
|
||||
|
||||
# Resolve each embed to a stream config
|
||||
for i, embed_uuid in enumerate(embed_uuids):
|
||||
stream = await self._resolve_embed(
|
||||
client, embed_uuid, event, stream_num=i + 1
|
||||
)
|
||||
if stream:
|
||||
streams.append(stream)
|
||||
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[pitsport] Failed to resolve event %s",
|
||||
event.watch_uuid,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
return streams
|
||||
|
||||
async def _resolve_embed(
|
||||
self,
|
||||
client: httpx.AsyncClient,
|
||||
embed_uuid: str,
|
||||
event: _PitsportEvent,
|
||||
stream_num: int,
|
||||
) -> ExtractedStream | None:
|
||||
"""Resolve an embed UUID to a stream configuration."""
|
||||
try:
|
||||
embed_url = f"{EMBED_BASE}/embed/{embed_uuid}"
|
||||
resp = await client.get(embed_url)
|
||||
if resp.status_code != 200:
|
||||
logger.debug(
|
||||
"[pitsport] Embed page %s returned HTTP %d",
|
||||
embed_uuid,
|
||||
resp.status_code,
|
||||
)
|
||||
return None
|
||||
|
||||
config = _parse_stream_config(resp.text)
|
||||
if not config:
|
||||
logger.debug(
|
||||
"[pitsport] No stream config found in embed %s",
|
||||
embed_uuid,
|
||||
)
|
||||
return None
|
||||
|
||||
# Build the stream title
|
||||
stream_title = f"{event.category} - {event.title}"
|
||||
if config.title:
|
||||
stream_title += f" ({config.title})"
|
||||
if stream_num > 1:
|
||||
stream_title += f" #{stream_num}"
|
||||
|
||||
# `safeStream` payload elides the link — fetch it from the
|
||||
# pushembdz.store/api/stream/<slug> endpoint. Older `stream`
|
||||
# payloads provided the link inline.
|
||||
link = config.link
|
||||
if not link and _is_m3u8_method(config.method):
|
||||
api_url = f"{EMBED_BASE}/api/stream/{embed_uuid}"
|
||||
try:
|
||||
api_resp = await client.get(
|
||||
api_url,
|
||||
headers={"Referer": embed_url, "Accept": "application/json"},
|
||||
)
|
||||
if api_resp.status_code == 200:
|
||||
link = (api_resp.json() or {}).get("link", "")
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[pitsport] api/stream lookup failed for %s",
|
||||
embed_uuid,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
# Treat any HLS-ish URL (m3u8, or pushembdz's .css disguise) as m3u8.
|
||||
looks_hls = link and (".m3u8" in link or link.endswith(".css") or "serveplay.site" in link)
|
||||
if _is_m3u8_method(config.method) and looks_hls:
|
||||
return ExtractedStream(
|
||||
url=link,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=stream_title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
else:
|
||||
# Iframe embed fallback
|
||||
return ExtractedStream(
|
||||
url=embed_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=stream_title,
|
||||
stream_type="embed",
|
||||
embed_url=embed_url,
|
||||
)
|
||||
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[pitsport] Failed to resolve embed %s",
|
||||
embed_uuid,
|
||||
exc_info=True,
|
||||
)
|
||||
return None
|
||||
273
stacks/f1-stream/files/backend/extractors/ppv.py
Normal file
273
stacks/f1-stream/files/backend/extractors/ppv.py
Normal file
|
|
@ -0,0 +1,273 @@
|
|||
"""PPV.to extractor - fetches F1 streams via the public PPV API.
|
||||
|
||||
Returns embed URLs (pooembed.eu) for iframe playback.
|
||||
The API at api.ppv.to/api/streams requires no authentication.
|
||||
Falls back to api.ppv.st if the primary API is unreachable.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
PRIMARY_API = "https://api.ppv.to/api/streams"
|
||||
FALLBACK_API = "https://api.ppv.st/api/streams"
|
||||
EMBED_BASE = "https://pooembed.eu/embed"
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Category name for motorsport on PPV.to
|
||||
MOTORSPORT_CATEGORY = "motorsports"
|
||||
|
||||
# Only include events matching these keywords (case-insensitive)
|
||||
F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1"}
|
||||
# Grand Prix is shared with MotoGP/IndyCar — only match if no other series keywords
|
||||
GP_KEYWORD = "grand prix"
|
||||
NON_F1_KEYWORDS = {
|
||||
"motogp", "moto gp", "moto2", "moto3", "motoe",
|
||||
"indycar", "indy car", "firestone", "nascar",
|
||||
"rally", "wrc", "wec", "lemans", "le mans",
|
||||
"superbike", "dtm", "supercars",
|
||||
}
|
||||
|
||||
|
||||
def _is_f1_stream(name: str, category_name: str = "") -> bool:
|
||||
"""Check if a stream is Formula 1 related.
|
||||
|
||||
Checks both the stream name and the category name.
|
||||
A stream qualifies if:
|
||||
- It is in the motorsport category AND matches F1 keywords, OR
|
||||
- It matches F1 keywords regardless of category.
|
||||
"""
|
||||
lower_name = name.lower()
|
||||
lower_cat = category_name.lower()
|
||||
|
||||
# Reject if it contains non-F1 motorsport keywords
|
||||
if any(kw in lower_name for kw in NON_F1_KEYWORDS):
|
||||
return False
|
||||
|
||||
# Direct F1 keyword match in the stream name
|
||||
if any(kw in lower_name for kw in F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
# "grand prix" in the name, only if in motorsports category and no non-F1 keywords
|
||||
if GP_KEYWORD in lower_name and MOTORSPORT_CATEGORY in lower_cat:
|
||||
return True
|
||||
|
||||
# If the category is motorsport, also check category-level keywords
|
||||
if MOTORSPORT_CATEGORY in lower_cat and any(kw in lower_cat for kw in F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
class PPVExtractor(BaseExtractor):
|
||||
"""Extracts embed URLs from PPV.to's public JSON API.
|
||||
|
||||
Uses the endpoint:
|
||||
- GET https://api.ppv.to/api/streams -> all streams grouped by category
|
||||
- Fallback: https://api.ppv.st/api/streams
|
||||
|
||||
Each stream object contains an `iframe` field with the embed URL,
|
||||
or a `uri_name` from which the embed URL can be constructed.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "ppv"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "PPV.to"
|
||||
|
||||
async def _fetch_streams(self, client: httpx.AsyncClient) -> dict | None:
|
||||
"""Try primary and fallback APIs, return parsed JSON or None."""
|
||||
for api_url in (PRIMARY_API, FALLBACK_API):
|
||||
try:
|
||||
resp = await client.get(api_url)
|
||||
if resp.status_code == 200:
|
||||
data = resp.json()
|
||||
logger.info("[ppv] Fetched streams from %s", api_url)
|
||||
return data
|
||||
logger.warning(
|
||||
"[ppv] %s returned HTTP %d", api_url, resp.status_code
|
||||
)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[ppv] Failed to reach %s", api_url, exc_info=True
|
||||
)
|
||||
return None
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch F1 streams and return embed URLs for iframe playback."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
|
||||
) as client:
|
||||
data = await self._fetch_streams(client)
|
||||
if data is None:
|
||||
logger.warning("[ppv] Could not fetch streams from any API")
|
||||
return []
|
||||
|
||||
# The API returns:
|
||||
# { "streams": [ { "category": "Name", "id": N, "streams": [...] }, ... ] }
|
||||
# Flatten into (category_name, stream_obj) tuples.
|
||||
all_streams = self._normalize_streams(data)
|
||||
|
||||
logger.info(
|
||||
"[ppv] Found %d total stream(s) across all categories",
|
||||
len(all_streams),
|
||||
)
|
||||
|
||||
for category_name, stream_obj in all_streams:
|
||||
name = stream_obj.get("name", "") or stream_obj.get("title", "")
|
||||
|
||||
if not _is_f1_stream(name, category_name):
|
||||
continue
|
||||
|
||||
# Build the embed URL
|
||||
embed_url = self._get_embed_url(stream_obj)
|
||||
if not embed_url:
|
||||
logger.debug("[ppv] No embed URL for stream: %s", name)
|
||||
continue
|
||||
|
||||
# Extract quality from tag if present
|
||||
tag = stream_obj.get("tag", "")
|
||||
quality = tag if tag else ""
|
||||
|
||||
# Build descriptive title
|
||||
title = name
|
||||
viewers = stream_obj.get("viewers")
|
||||
if viewers and int(viewers) > 0:
|
||||
title += f" ({viewers} viewers)"
|
||||
|
||||
# Always emit the parent stream — substreams are
|
||||
# additional language/source variants, not replacements.
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=embed_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=quality,
|
||||
title=title,
|
||||
stream_type="embed",
|
||||
embed_url=embed_url,
|
||||
)
|
||||
)
|
||||
|
||||
substreams = stream_obj.get("substreams")
|
||||
if isinstance(substreams, list):
|
||||
for i, sub in enumerate(substreams):
|
||||
sub_embed = sub.get("iframe", "") or sub.get("embed_url", "")
|
||||
if not sub_embed:
|
||||
sub_embed = embed_url
|
||||
sub_name = (
|
||||
sub.get("source_tag", "")
|
||||
or sub.get("name", "")
|
||||
or sub.get("label", "")
|
||||
)
|
||||
sub_quality = sub.get("tag", "") or sub.get("quality", "") or quality
|
||||
sub_title = f"{name}"
|
||||
if sub_name:
|
||||
sub_title += f" - {sub_name}"
|
||||
else:
|
||||
sub_title += f" #{i + 2}"
|
||||
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=sub_embed,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=sub_quality,
|
||||
title=sub_title,
|
||||
stream_type="embed",
|
||||
embed_url=sub_embed,
|
||||
)
|
||||
)
|
||||
|
||||
except Exception:
|
||||
logger.exception("[ppv] Failed to extract streams")
|
||||
|
||||
logger.info("[ppv] Extracted %d F1 stream(s)", len(streams))
|
||||
return streams
|
||||
|
||||
@staticmethod
|
||||
def _normalize_streams(data: dict | list) -> list[tuple[str, dict]]:
|
||||
"""Normalize the API response into a flat list of (category_name, stream_dict) tuples.
|
||||
|
||||
The PPV API returns data in this shape:
|
||||
{
|
||||
"streams": [
|
||||
{
|
||||
"category": "Motorsports",
|
||||
"id": 35,
|
||||
"streams": [ { stream objects... } ]
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
|
||||
Each category group has a "category" string and a nested "streams" list.
|
||||
"""
|
||||
result: list[tuple[str, dict]] = []
|
||||
|
||||
# Handle the top-level wrapper
|
||||
if isinstance(data, dict):
|
||||
categories = data.get("streams", [])
|
||||
elif isinstance(data, list):
|
||||
categories = data
|
||||
else:
|
||||
return result
|
||||
|
||||
for category_group in categories:
|
||||
if not isinstance(category_group, dict):
|
||||
continue
|
||||
|
||||
category_name = category_group.get("category", "")
|
||||
|
||||
# The nested streams within this category
|
||||
inner_streams = category_group.get("streams", [])
|
||||
if isinstance(inner_streams, list):
|
||||
for stream_obj in inner_streams:
|
||||
if isinstance(stream_obj, dict):
|
||||
# Attach category_name to each stream for filtering
|
||||
result.append((category_name, stream_obj))
|
||||
elif isinstance(category_group, dict) and "name" in category_group:
|
||||
# Fallback: the item itself is a stream (flat list format)
|
||||
result.append((category_name, category_group))
|
||||
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def _get_embed_url(stream: dict) -> str:
|
||||
"""Extract or construct the embed URL for a stream."""
|
||||
# Prefer the iframe field directly
|
||||
iframe = stream.get("iframe", "")
|
||||
if iframe:
|
||||
return iframe
|
||||
|
||||
# Construct from uri_name
|
||||
uri_name = stream.get("uri_name", "") or stream.get("uri", "")
|
||||
if uri_name:
|
||||
# Strip leading slash if present
|
||||
uri_name = uri_name.lstrip("/")
|
||||
return f"{EMBED_BASE}/{uri_name}"
|
||||
|
||||
# Last resort: use the stream id
|
||||
stream_id = stream.get("id")
|
||||
if stream_id:
|
||||
return f"{EMBED_BASE}/{stream_id}"
|
||||
|
||||
return ""
|
||||
116
stacks/f1-stream/files/backend/extractors/registry.py
Normal file
116
stacks/f1-stream/files/backend/extractors/registry.py
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
"""Central registry for stream extractors."""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ExtractorRegistry:
|
||||
"""Central registry for all site extractors.
|
||||
|
||||
Manages extractor instances and provides fan-out extraction across
|
||||
all registered extractors with independent error handling.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._extractors: dict[str, BaseExtractor] = {}
|
||||
|
||||
def register(self, extractor: BaseExtractor) -> None:
|
||||
"""Register an extractor instance.
|
||||
|
||||
Args:
|
||||
extractor: A BaseExtractor subclass instance.
|
||||
|
||||
Raises:
|
||||
ValueError: If an extractor with the same site_key is already registered.
|
||||
"""
|
||||
key = extractor.site_key
|
||||
if key in self._extractors:
|
||||
raise ValueError(
|
||||
f"Extractor with site_key '{key}' is already registered "
|
||||
f"(existing: {self._extractors[key].site_name}, "
|
||||
f"new: {extractor.site_name})"
|
||||
)
|
||||
self._extractors[key] = extractor
|
||||
logger.info("Registered extractor: %s (%s)", extractor.site_name, key)
|
||||
|
||||
def get(self, site_key: str) -> BaseExtractor | None:
|
||||
"""Get an extractor by its site_key.
|
||||
|
||||
Args:
|
||||
site_key: The unique identifier of the extractor.
|
||||
|
||||
Returns:
|
||||
The extractor instance, or None if not found.
|
||||
"""
|
||||
return self._extractors.get(site_key)
|
||||
|
||||
def list_extractors(self) -> list[dict]:
|
||||
"""List all registered extractors.
|
||||
|
||||
Returns:
|
||||
A list of dicts with site_key and site_name for each extractor.
|
||||
"""
|
||||
return [
|
||||
{"site_key": ext.site_key, "site_name": ext.site_name}
|
||||
for ext in self._extractors.values()
|
||||
]
|
||||
|
||||
async def extract_all(self) -> list[ExtractedStream]:
|
||||
"""Fan-out extraction to all registered extractors concurrently.
|
||||
|
||||
Each extractor runs independently. If one fails, the others
|
||||
continue and their results are still collected.
|
||||
|
||||
Returns:
|
||||
Combined list of ExtractedStream from all extractors.
|
||||
"""
|
||||
if not self._extractors:
|
||||
logger.warning("No extractors registered, nothing to extract")
|
||||
return []
|
||||
|
||||
logger.info(
|
||||
"Running extraction across %d extractor(s): %s",
|
||||
len(self._extractors),
|
||||
", ".join(self._extractors.keys()),
|
||||
)
|
||||
|
||||
async def _safe_extract(extractor: BaseExtractor) -> list[ExtractedStream]:
|
||||
"""Run a single extractor with error isolation."""
|
||||
try:
|
||||
streams = await extractor.extract()
|
||||
# Fill in site_key/site_name if the extractor didn't set them
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
for stream in streams:
|
||||
if not stream.site_key:
|
||||
stream.site_key = extractor.site_key
|
||||
if not stream.site_name:
|
||||
stream.site_name = extractor.site_name
|
||||
if not stream.extracted_at:
|
||||
stream.extracted_at = now
|
||||
logger.info(
|
||||
"[%s] Extracted %d stream(s)", extractor.site_key, len(streams)
|
||||
)
|
||||
return streams
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"[%s] Extractor failed during extraction", extractor.site_key
|
||||
)
|
||||
return []
|
||||
|
||||
# Run all extractors concurrently
|
||||
tasks = [_safe_extract(ext) for ext in self._extractors.values()]
|
||||
results = await asyncio.gather(*tasks)
|
||||
|
||||
# Flatten results
|
||||
all_streams: list[ExtractedStream] = []
|
||||
for stream_list in results:
|
||||
all_streams.extend(stream_list)
|
||||
|
||||
logger.info("Extraction complete: %d total stream(s) found", len(all_streams))
|
||||
return all_streams
|
||||
270
stacks/f1-stream/files/backend/extractors/service.py
Normal file
270
stacks/f1-stream/files/backend/extractors/service.py
Normal file
|
|
@ -0,0 +1,270 @@
|
|||
"""Extraction service - manages extraction lifecycle: polling, caching, health checking, serving."""
|
||||
|
||||
import logging
|
||||
from datetime import datetime, timezone
|
||||
|
||||
from backend.extractors.models import ExtractedStream
|
||||
from backend.extractors.registry import ExtractorRegistry
|
||||
from backend.health import StreamHealthChecker
|
||||
from backend.playback_verifier import PlaybackVerifier
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class ExtractionService:
|
||||
"""Manages the extraction lifecycle: polling, caching, health checking, and serving.
|
||||
|
||||
Extraction runs on a background schedule (via APScheduler), never on
|
||||
client request path. After extraction, health checks verify each stream
|
||||
is live. Results are cached in memory, keyed by site_key.
|
||||
|
||||
GET /streams only returns streams that passed health checks, sorted by:
|
||||
1. is_live (live streams first)
|
||||
2. response_time_ms (fastest first)
|
||||
"""
|
||||
|
||||
def __init__(self, registry: ExtractorRegistry) -> None:
|
||||
self._registry = registry
|
||||
# Cache: site_key -> list of ExtractedStream
|
||||
self._cache: dict[str, list[ExtractedStream]] = {}
|
||||
self._last_run: str | None = None
|
||||
self._last_run_stream_count: int = 0
|
||||
self._health_checker = StreamHealthChecker()
|
||||
self._playback_verifier = PlaybackVerifier()
|
||||
|
||||
async def shutdown(self) -> None:
|
||||
"""Release the headless browser instance owned by the verifier."""
|
||||
await self._playback_verifier.shutdown()
|
||||
|
||||
async def run_extraction(self) -> None:
|
||||
"""Run all extractors, health-check results, and cache them.
|
||||
|
||||
This is called by the background scheduler. Each extractor's
|
||||
results replace its previous cache entry entirely. After extraction,
|
||||
health checks are run to verify streams are live and measure
|
||||
response times.
|
||||
"""
|
||||
logger.info("Starting extraction run...")
|
||||
start = datetime.now(timezone.utc)
|
||||
|
||||
streams = await self._registry.extract_all()
|
||||
|
||||
# Dedupe by canonical URL — pitsport surfaces every WRC stage as a
|
||||
# separate event but they all point at the same RallyTV master.m3u8
|
||||
# (and similar for MotoGP weekend sessions). Keep the first
|
||||
# occurrence so the user sees one entry per actual stream.
|
||||
deduped: list[ExtractedStream] = []
|
||||
seen_urls: set[str] = set()
|
||||
for stream in streams:
|
||||
key = (stream.embed_url or "").strip() or (stream.url or "").strip()
|
||||
if not key or key in seen_urls:
|
||||
continue
|
||||
seen_urls.add(key)
|
||||
deduped.append(stream)
|
||||
if len(deduped) < len(streams):
|
||||
logger.info(
|
||||
"Deduped streams: %d -> %d (collapsed %d duplicate URL(s))",
|
||||
len(streams), len(deduped), len(streams) - len(deduped),
|
||||
)
|
||||
streams = deduped
|
||||
|
||||
# Run health checks + headless-browser playback verification.
|
||||
# Both stream types are now verified end-to-end so the user only
|
||||
# ever sees streams that actually play in a browser.
|
||||
if streams:
|
||||
m3u8_streams = [s for s in streams if s.stream_type != "embed"]
|
||||
embed_streams = [s for s in streams if s.stream_type == "embed"]
|
||||
|
||||
# m3u8 streams: cheap structural health check (validates manifest,
|
||||
# checks first variant playlist), then a headless-browser test
|
||||
# to confirm hls.js can decode and render frames.
|
||||
if m3u8_streams:
|
||||
stream_dicts = [s.to_dict() for s in m3u8_streams]
|
||||
health_map = await self._health_checker.check_all(stream_dicts)
|
||||
for stream in m3u8_streams:
|
||||
health = health_map.get(stream.url)
|
||||
if health:
|
||||
stream.response_time_ms = health.response_time_ms
|
||||
stream.checked_at = health.checked_at
|
||||
if health.bitrate > 0:
|
||||
stream.bitrate = health.bitrate
|
||||
# tentatively mark live; final word comes from the verifier
|
||||
stream.is_live = health.is_live
|
||||
|
||||
# Browser verification: applies to both m3u8 (only those that
|
||||
# passed structural health) and embed (always — they have no
|
||||
# other way to verify).
|
||||
verify_items: list[tuple[str, str]] = []
|
||||
for stream in m3u8_streams:
|
||||
if stream.is_live:
|
||||
verify_items.append((stream.url, "m3u8"))
|
||||
for stream in embed_streams:
|
||||
verify_items.append((stream.embed_url or stream.url, "embed"))
|
||||
|
||||
verdicts = await self._playback_verifier.verify_many(verify_items)
|
||||
|
||||
now_iso = datetime.now(timezone.utc).isoformat()
|
||||
for stream in m3u8_streams:
|
||||
if not stream.is_live:
|
||||
continue # already failed health check
|
||||
verdict = verdicts.get(stream.url)
|
||||
if verdict is None:
|
||||
continue # verifier disabled or unavailable
|
||||
stream.is_live = verdict.is_playable
|
||||
stream.checked_at = now_iso
|
||||
|
||||
# Curated streams skip the verifier — they are hand-picked
|
||||
# 24/7 channels whose embed pages aggressively detect headless
|
||||
# automation. We can't reliably confirm playback server-side,
|
||||
# but we trust the curator. The user's real browser does NOT
|
||||
# trigger the same anti-bot heuristics (real plugins, real
|
||||
# mouse movements, etc.).
|
||||
CURATED_BYPASS = {"curated"}
|
||||
for stream in embed_streams:
|
||||
stream.checked_at = now_iso
|
||||
if stream.site_key in CURATED_BYPASS:
|
||||
stream.is_live = True
|
||||
stream.response_time_ms = 0
|
||||
continue
|
||||
key = stream.embed_url or stream.url
|
||||
verdict = verdicts.get(key)
|
||||
if verdict is None:
|
||||
# Verifier unavailable — fall back to "trust extractor".
|
||||
# This keeps the service usable even without playwright.
|
||||
stream.is_live = True
|
||||
stream.response_time_ms = 0
|
||||
else:
|
||||
stream.is_live = verdict.is_playable
|
||||
stream.response_time_ms = verdict.elapsed_ms
|
||||
|
||||
# Group streams by site_key and update cache
|
||||
new_cache: dict[str, list[ExtractedStream]] = {}
|
||||
for stream in streams:
|
||||
new_cache.setdefault(stream.site_key, []).append(stream)
|
||||
|
||||
# Replace cache for extractors that returned results.
|
||||
# Clear cache for extractors that returned nothing (site went down, etc.)
|
||||
for extractor_info in self._registry.list_extractors():
|
||||
key = extractor_info["site_key"]
|
||||
if key in new_cache:
|
||||
self._cache[key] = new_cache[key]
|
||||
else:
|
||||
# Extractor returned nothing - clear its cache
|
||||
self._cache.pop(key, None)
|
||||
|
||||
self._last_run = start.isoformat()
|
||||
self._last_run_stream_count = len(streams)
|
||||
|
||||
live_count = sum(
|
||||
1 for streams_list in self._cache.values()
|
||||
for s in streams_list if s.is_live
|
||||
)
|
||||
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
|
||||
logger.info(
|
||||
"Extraction run complete: %d stream(s) from %d extractor(s) in %.1fs (%d live)",
|
||||
len(streams),
|
||||
len(new_cache),
|
||||
elapsed,
|
||||
live_count,
|
||||
)
|
||||
|
||||
def get_streams(self) -> list[dict]:
|
||||
"""Return all cached streams as a sorted list of dicts.
|
||||
|
||||
Only returns streams that passed health checks (is_live=True).
|
||||
Sorted by fallback priority:
|
||||
1. is_live (live streams first) - filters to live only
|
||||
2. response_time_ms (fastest first)
|
||||
|
||||
Returns:
|
||||
List of serialized ExtractedStream dicts from all extractors,
|
||||
filtered to live-only and sorted by response time.
|
||||
"""
|
||||
all_streams: list[ExtractedStream] = []
|
||||
for streams in self._cache.values():
|
||||
all_streams.extend(streams)
|
||||
|
||||
# Sort by fallback priority: live first, then fastest response
|
||||
all_streams.sort(
|
||||
key=lambda s: (not s.is_live, s.response_time_ms)
|
||||
)
|
||||
|
||||
# Only return live streams to clients
|
||||
live_streams = [s for s in all_streams if s.is_live]
|
||||
return [s.to_dict() for s in live_streams]
|
||||
|
||||
def get_all_streams_unfiltered(self) -> list[dict]:
|
||||
"""Return ALL cached streams including unhealthy ones.
|
||||
|
||||
Used for debugging and status endpoints. Sorted by fallback priority
|
||||
but includes streams that failed health checks.
|
||||
|
||||
Returns:
|
||||
List of all serialized ExtractedStream dicts.
|
||||
"""
|
||||
all_streams: list[ExtractedStream] = []
|
||||
for streams in self._cache.values():
|
||||
all_streams.extend(streams)
|
||||
|
||||
# Sort by fallback priority: live first, then fastest response
|
||||
all_streams.sort(
|
||||
key=lambda s: (not s.is_live, s.response_time_ms)
|
||||
)
|
||||
|
||||
return [s.to_dict() for s in all_streams]
|
||||
|
||||
def get_streams_for_session(self, session_type: str) -> list[dict]:
|
||||
"""Return cached streams filtered/annotated for a specific session type.
|
||||
|
||||
Currently returns all live streams (extractors don't yet differentiate by
|
||||
session type). This method exists as a hook for future filtering,
|
||||
e.g., some extractors might only have race streams but not FP streams.
|
||||
|
||||
Args:
|
||||
session_type: The F1 session type (e.g., "race", "qualifying", "fp1").
|
||||
|
||||
Returns:
|
||||
List of serialized ExtractedStream dicts (live only, sorted).
|
||||
"""
|
||||
# For now, all streams are potentially relevant to any session.
|
||||
# Future extractors may tag streams with session types, at which
|
||||
# point this method will filter accordingly.
|
||||
streams = self.get_streams()
|
||||
logger.debug(
|
||||
"Returning %d stream(s) for session type '%s'",
|
||||
len(streams),
|
||||
session_type,
|
||||
)
|
||||
return streams
|
||||
|
||||
def get_status(self) -> dict:
|
||||
"""Return extraction service status for the /extractors endpoint."""
|
||||
extractor_list = self._registry.list_extractors()
|
||||
extractor_statuses = []
|
||||
|
||||
for info in extractor_list:
|
||||
key = info["site_key"]
|
||||
cached = self._cache.get(key, [])
|
||||
live_count = sum(1 for s in cached if s.is_live)
|
||||
extractor_statuses.append(
|
||||
{
|
||||
"site_key": key,
|
||||
"site_name": info["site_name"],
|
||||
"cached_streams": len(cached),
|
||||
"live_streams": live_count,
|
||||
}
|
||||
)
|
||||
|
||||
total_cached = sum(len(streams) for streams in self._cache.values())
|
||||
total_live = sum(
|
||||
1 for streams in self._cache.values()
|
||||
for s in streams if s.is_live
|
||||
)
|
||||
|
||||
return {
|
||||
"extractors": extractor_statuses,
|
||||
"total_cached_streams": total_cached,
|
||||
"total_live_streams": total_live,
|
||||
"last_run": self._last_run,
|
||||
"last_run_stream_count": self._last_run_stream_count,
|
||||
}
|
||||
125
stacks/f1-stream/files/backend/extractors/streamed.py
Normal file
125
stacks/f1-stream/files/backend/extractors/streamed.py
Normal file
|
|
@ -0,0 +1,125 @@
|
|||
"""Streamed.pk extractor - fetches F1/motorsport streams via public JSON API."""
|
||||
|
||||
import logging
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Site renamed from streamed.su → streamed.pk in 2026; the .su domain
|
||||
# stopped resolving the API host (only the marketing page is left).
|
||||
BASE_URL = "https://streamed.pk"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
|
||||
class StreamedExtractor(BaseExtractor):
|
||||
"""Extracts streams from Streamed.pk's public JSON API.
|
||||
|
||||
Uses two endpoints:
|
||||
- GET /api/matches/motor-sports → list of events with sources
|
||||
- GET /api/stream/{source}/{id} → embed URL for a specific source
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "streamed"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Streamed"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch motorsport events and resolve embed URLs for each source."""
|
||||
streams: list[ExtractedStream] = []
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
|
||||
) as client:
|
||||
# Get motorsport events
|
||||
resp = await client.get(f"{BASE_URL}/api/matches/motor-sports")
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[streamed] Events API returned HTTP %d", resp.status_code
|
||||
)
|
||||
return []
|
||||
|
||||
events = resp.json()
|
||||
if not isinstance(events, list):
|
||||
logger.warning("[streamed] Unexpected events response type")
|
||||
return []
|
||||
|
||||
logger.info("[streamed] Found %d motorsport event(s)", len(events))
|
||||
|
||||
for event in events:
|
||||
title = event.get("title", "Unknown Event")
|
||||
sources = event.get("sources", [])
|
||||
if not sources:
|
||||
continue
|
||||
|
||||
for source_info in sources:
|
||||
source_name = source_info.get("source", "")
|
||||
source_id = source_info.get("id", "")
|
||||
if not source_name or not source_id:
|
||||
continue
|
||||
|
||||
try:
|
||||
stream_resp = await client.get(
|
||||
f"{BASE_URL}/api/stream/{source_name}/{source_id}"
|
||||
)
|
||||
if stream_resp.status_code != 200:
|
||||
continue
|
||||
|
||||
stream_data = stream_resp.json()
|
||||
if not isinstance(stream_data, list):
|
||||
stream_data = [stream_data]
|
||||
|
||||
for item in stream_data:
|
||||
embed_url = item.get("embedUrl", "")
|
||||
if not embed_url:
|
||||
continue
|
||||
|
||||
language = item.get("language", "")
|
||||
hd = item.get("hd", False)
|
||||
stream_no = item.get("streamNo", 1)
|
||||
|
||||
quality = "HD" if hd else "SD"
|
||||
stream_title = f"{title}"
|
||||
if language:
|
||||
stream_title += f" ({language})"
|
||||
if stream_no > 1:
|
||||
stream_title += f" #{stream_no}"
|
||||
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=embed_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality=quality,
|
||||
title=stream_title,
|
||||
stream_type="embed",
|
||||
embed_url=embed_url,
|
||||
)
|
||||
)
|
||||
except Exception:
|
||||
logger.debug(
|
||||
"[streamed] Failed to fetch stream for %s/%s",
|
||||
source_name,
|
||||
source_id,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
except Exception:
|
||||
logger.exception("[streamed] Failed to fetch events")
|
||||
|
||||
logger.info("[streamed] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
161
stacks/f1-stream/files/backend/extractors/stremio.py
Normal file
161
stacks/f1-stream/files/backend/extractors/stremio.py
Normal file
|
|
@ -0,0 +1,161 @@
|
|||
"""Stremio-addon-driven extractor.
|
||||
|
||||
Stremio addons expose a public HTTP API: each addon has a manifest at
|
||||
`<base>/manifest.json` and per-resource endpoints like
|
||||
`<base>/stream/<type>/<id>.json` returning `{streams:[{url,name,...}]}`.
|
||||
|
||||
This extractor calls a curated set of live-TV addons that surface F1
|
||||
and Sky-Sports-class motorsport channels. We treat each returned URL as
|
||||
an ExtractedStream and let the playback verifier confirm playability.
|
||||
We don't need a Stremio client — we just call the documented HTTP API.
|
||||
|
||||
Findings from initial research (2026-05-07):
|
||||
- **TvVoo** (`tvvoo.hayd.uk`) — wraps the Vavoo IPTV network, lists
|
||||
Sky Sports F1 (UK + IT + DE), DAZN F1, Movistar F1, Canal+ F1,
|
||||
Viaplay F1. The returned m3u8 URLs are IP-bound at the Vavoo CDN
|
||||
(`*.ngolpdkyoctjcddxshli469r.org/sunshine/...`); they're tokenised
|
||||
to whichever IP fetched the manifest. Currently their SSL certs have
|
||||
expired which fails most clients — the addon framework is right but
|
||||
delivery is degraded today.
|
||||
- **StremVerse** (`stremverse.onrender.com`) — returns 11+ streams per
|
||||
catalog id (`stremevent_591`=F1, `stremevent_866`=MotoGP). Mix of
|
||||
DRM-walled DASH, JW-Player-broken-chain JWT, and apar151 HuggingFace
|
||||
proxy URLs. Master playlists parse; variant URLs sometimes return 404
|
||||
if they're meant to be resolved by the addon's player rather than
|
||||
directly.
|
||||
|
||||
Adding a new addon = one entry in `_ADDONS`. Each addon's resolver only
|
||||
needs the manifest + stream endpoints; the addon does the heavy lifting.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import Iterable
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class _Addon:
|
||||
name: str
|
||||
base: str # e.g. "https://tvvoo.hayd.uk"
|
||||
stream_ids: tuple[tuple[str, str, str], ...]
|
||||
"""(stream_type, stream_id, label) per F1/motorsport entry."""
|
||||
|
||||
|
||||
# Curated addon list — see module docstring. These IDs are documented in
|
||||
# the addons' manifests / channel lists. Update when channel names/IDs
|
||||
# rotate.
|
||||
_ADDONS: tuple[_Addon, ...] = (
|
||||
_Addon(
|
||||
name="TvVoo",
|
||||
base="https://tvvoo.hayd.uk",
|
||||
stream_ids=(
|
||||
("tv", "vavoo_SKY%20SPORTS%20F1|group:uk", "Sky Sports F1 UK (Vavoo)"),
|
||||
("tv", "vavoo_SKY%20SPORTS%20F1%20HD|group:uk", "Sky Sports F1 HD UK (Vavoo)"),
|
||||
("tv", "vavoo_SKY%20SPORT%20F1|group:it", "Sky Sport F1 IT (Vavoo)"),
|
||||
("tv", "vavoo_SKY%20SPORT%20F1%20HD|group:de", "Sky Sport F1 DE (Vavoo)"),
|
||||
("tv", "vavoo_DAZN%20F1|group:es", "DAZN F1 ES (Vavoo)"),
|
||||
),
|
||||
),
|
||||
_Addon(
|
||||
name="StremVerse",
|
||||
base="https://stremverse.onrender.com",
|
||||
stream_ids=(
|
||||
("tv", "stremevent_591", "Formula 1 (StremVerse)"),
|
||||
("tv", "stremevent_866", "MotoGP (StremVerse)"),
|
||||
),
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
class StremioAddonExtractor(BaseExtractor):
|
||||
"""Pull F1 + Sky-class motorsport URLs from public Stremio addons."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "stremio"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Stremio Addon"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
# Some addons (TvVoo→Vavoo) hand back URLs whose origin certs
|
||||
# are expired; honest-default verify=True is preserved here so
|
||||
# the verifier sees the same TLS errors a browser would.
|
||||
) as client:
|
||||
tasks = []
|
||||
for addon in _ADDONS:
|
||||
for stype, sid, label in addon.stream_ids:
|
||||
tasks.append(self._resolve(client, addon, stype, sid, label))
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
streams: list[ExtractedStream] = []
|
||||
for r in results:
|
||||
if isinstance(r, Exception):
|
||||
logger.debug("[stremio] resolve failed: %s", r)
|
||||
continue
|
||||
streams.extend(r)
|
||||
|
||||
logger.info("[stremio] surfaced %d candidate stream URL(s) across %d addon(s)",
|
||||
len(streams), len(_ADDONS))
|
||||
return streams
|
||||
|
||||
async def _resolve(
|
||||
self, client: httpx.AsyncClient, addon: _Addon,
|
||||
stype: str, sid: str, label: str,
|
||||
) -> list[ExtractedStream]:
|
||||
url = f"{addon.base}/stream/{stype}/{sid}.json"
|
||||
try:
|
||||
resp = await client.get(url)
|
||||
except Exception as e:
|
||||
logger.debug("[stremio] %s fetch failed: %s", url, e)
|
||||
return []
|
||||
if resp.status_code != 200:
|
||||
logger.debug("[stremio] %s -> HTTP %d", url, resp.status_code)
|
||||
return []
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
out: list[ExtractedStream] = []
|
||||
for idx, s in enumerate(data.get("streams") or []):
|
||||
stream_url = (s.get("url") or "").strip()
|
||||
if not stream_url:
|
||||
continue
|
||||
# Skip DRM-tagged entries — they need Widevine which neither
|
||||
# our verifier nor a clean hls.js path can play.
|
||||
if "DRM" in (s.get("name") or "").upper():
|
||||
continue
|
||||
title = label
|
||||
if idx > 0:
|
||||
title = f"{label} #{idx + 1}"
|
||||
out.append(
|
||||
ExtractedStream(
|
||||
url=stream_url,
|
||||
site_key=self.site_key,
|
||||
site_name=f"{addon.name}",
|
||||
quality="",
|
||||
title=title,
|
||||
stream_type="m3u8",
|
||||
)
|
||||
)
|
||||
return out
|
||||
249
stacks/f1-stream/files/backend/extractors/subreddit.py
Normal file
249
stacks/f1-stream/files/backend/extractors/subreddit.py
Normal file
|
|
@ -0,0 +1,249 @@
|
|||
"""Subreddit extractor — pulls community-curated live-stream URLs from
|
||||
the *MotorsportsReplays* subreddit (and a few siblings).
|
||||
|
||||
The community follows a stable pattern: a single mod-curated post titled
|
||||
`[Watch / Download] <Series> <Year> - <Round> | <Event>` goes up on or
|
||||
near each race weekend with a `**Watch Online:**` link in the selftext,
|
||||
pointing at an admin-run WordPress site (motomundo.net for MotoGP, the
|
||||
F1 equivalent has rotated over the years). That WordPress page hosts
|
||||
iframe embeds whose m3u8 is JS-computed at load time — ideal target for
|
||||
the chrome-service pipeline downstream.
|
||||
|
||||
This extractor:
|
||||
- Hits Reddit with a real-browser User-Agent (httpx default UA + cluster
|
||||
IP combo gets HTTP 403'd on r/motogp; a Safari UA does not).
|
||||
- Searches for the `[Watch` thread pattern AND scans `/new.json` for
|
||||
any flair set to LIVE.
|
||||
- Pulls selftext URLs and returns each candidate as an `embed`-type
|
||||
ExtractedStream. The verifier already drives chrome-service for embed
|
||||
streams, so the m3u8 capture happens there.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import re
|
||||
import urllib.parse
|
||||
from typing import NamedTuple
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
|
||||
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
|
||||
"Version/17.4 Safari/605.1.15"
|
||||
)
|
||||
|
||||
# Subreddits to scan.
|
||||
# - r/motorsportsstreams2 is the active 12.5k-sub successor to the banned
|
||||
# r/motorsportstreams; race-weekend "[F1 STREAM]" posts include
|
||||
# `boxboxbox.pro/stream-1` URLs and similar fresh aggregator links.
|
||||
# - r/MotorsportsReplays runs the [Watch / Download] mod-post pattern
|
||||
# linking to motomundo.net (MotoGP) and sister sites.
|
||||
# - The rest are low-yield but cost nothing.
|
||||
SUBREDDITS: tuple[str, ...] = (
|
||||
"motorsportsstreams2",
|
||||
"MotorsportsReplays",
|
||||
"f1streams",
|
||||
"motorsports",
|
||||
"formula1",
|
||||
"motogp",
|
||||
)
|
||||
|
||||
# Search queries fired against r/motorsportsstreams2 + r/MotorsportsReplays.
|
||||
# The first set captures the [Watch / Download] mod posts; the second set
|
||||
# catches race-weekend live discussion threads.
|
||||
SEARCH_QUERIES: tuple[str, ...] = (
|
||||
"Watch Download F1 2026",
|
||||
"Watch Download MotoGP 2026",
|
||||
"Watch Online F1 2026",
|
||||
"F1 STREAM live",
|
||||
"Sky Sports F1 live",
|
||||
"Sky F1 stream",
|
||||
)
|
||||
|
||||
# Hosts we accept as "interesting" stream-page URLs. These are the
|
||||
# admin-curated WordPress / aggregator sites the community links to.
|
||||
# Anchored to what r/motorsportsstreams2 currently posts (May 2026 sweep).
|
||||
_INTERESTING_HOSTS = (
|
||||
# WordPress wrappers / community-run sites
|
||||
"motomundo.net", # MotoGP — admin-curated WP
|
||||
"motomundo.top", # MotoMundo embed host
|
||||
"motomundo.upns.xyz", # MotoMundo embed host (newer)
|
||||
"freemotorsports.com", # WAC successor curated link list
|
||||
"boxboxbox.pro", # F1 race-weekend aggregator (community fav)
|
||||
"boxboxbox.live", # boxboxbox sister
|
||||
"boxboxbox.lol",
|
||||
# Aggregators we already have direct extractors for, but Reddit may
|
||||
# surface event-specific deeplinks (e.g. /watch/<UUID>) we'd miss
|
||||
# otherwise.
|
||||
"pitsport.xyz",
|
||||
"pitsport.live",
|
||||
"rerace.io",
|
||||
"dd12streams.com",
|
||||
"ppv.to",
|
||||
"streamed.pk",
|
||||
"acestrlms.pages.dev",
|
||||
"aceztrims.pages.dev",
|
||||
# Sport-specific direct CDNs that occasionally appear in posts
|
||||
"racelive.jp", # Super Formula
|
||||
"cdn.sfgo.jp", # Super Formula CDN
|
||||
# Speculative F1 sister sites — pattern likely if motomundo for MotoGP
|
||||
"f1mundo.net",
|
||||
"f1.live",
|
||||
"f1live",
|
||||
"skystreams",
|
||||
"raceon",
|
||||
"watchf1",
|
||||
)
|
||||
|
||||
# URLs we actively never try to scrape (auth-walled, social media,
|
||||
# direct downloads with no live stream).
|
||||
_REJECT_HOSTS = (
|
||||
"discord.gg", "discord.com",
|
||||
"twitter.com", "x.com",
|
||||
"youtube.com", "youtu.be",
|
||||
"instagram.com", "tiktok.com",
|
||||
"f1tv.formula1.com",
|
||||
"viktorbarzin.me",
|
||||
"gofile.io",
|
||||
"mega.nz", "drive.google.com",
|
||||
"1fichier.com", "rapidgator", "uploaded.net",
|
||||
"magnet:",
|
||||
)
|
||||
|
||||
_URL_RE = re.compile(r"https?://[^\s\)\]\>\"']+")
|
||||
|
||||
|
||||
class _Candidate(NamedTuple):
|
||||
title: str
|
||||
url: str
|
||||
subreddit: str
|
||||
flair: str
|
||||
|
||||
|
||||
def _is_interesting(url: str) -> bool:
|
||||
low = url.lower()
|
||||
if any(host in low for host in _REJECT_HOSTS):
|
||||
return False
|
||||
return any(host in low for host in _INTERESTING_HOSTS)
|
||||
|
||||
|
||||
def _has_live_marker(post: dict) -> bool:
|
||||
title = (post.get("title") or "").lower()
|
||||
flair = (post.get("link_flair_text") or "").lower()
|
||||
if "[watch" in title or "watch online" in title or "live" in flair:
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
class SubredditExtractor(BaseExtractor):
|
||||
"""Scan motorsport subreddits for community-curated live-stream URLs."""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "subreddit"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "Subreddit"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
# NB: do NOT send `Accept: application/json` — Reddit's anti-bot
|
||||
# fingerprint flags that header from datacenter IPs and returns
|
||||
# HTTP 403 with HTML. Default Accept (`*/*`) gets through fine
|
||||
# and `.json` URLs always return JSON regardless.
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT},
|
||||
) as client:
|
||||
tasks = [self._fetch_new(client, sub) for sub in SUBREDDITS]
|
||||
tasks.extend(self._search(client, q) for q in SEARCH_QUERIES)
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
candidates: list[_Candidate] = []
|
||||
for r in results:
|
||||
if isinstance(r, Exception):
|
||||
logger.debug("[subreddit] fetch failed: %s", r)
|
||||
continue
|
||||
candidates.extend(r)
|
||||
|
||||
# Dedupe by URL, keep first occurrence.
|
||||
seen: set[str] = set()
|
||||
picks: list[_Candidate] = []
|
||||
for c in candidates:
|
||||
if c.url in seen:
|
||||
continue
|
||||
seen.add(c.url)
|
||||
picks.append(c)
|
||||
|
||||
logger.info(
|
||||
"[subreddit] scanned %d source(s) — %d unique candidate URL(s)",
|
||||
len(SUBREDDITS) + len(SEARCH_QUERIES), len(picks),
|
||||
)
|
||||
return [
|
||||
ExtractedStream(
|
||||
url=c.url,
|
||||
site_key=self.site_key,
|
||||
site_name=f"r/{c.subreddit}",
|
||||
quality="",
|
||||
title=c.title[:100],
|
||||
stream_type="embed",
|
||||
embed_url=c.url,
|
||||
)
|
||||
for c in picks
|
||||
]
|
||||
|
||||
async def _fetch_new(self, client: httpx.AsyncClient, sub: str) -> list[_Candidate]:
|
||||
return await self._collect(
|
||||
client,
|
||||
f"https://www.reddit.com/r/{sub}/new.json?limit=25",
|
||||
sub,
|
||||
)
|
||||
|
||||
async def _search(self, client: httpx.AsyncClient, query: str) -> list[_Candidate]:
|
||||
q = urllib.parse.quote_plus(query)
|
||||
return await self._collect(
|
||||
client,
|
||||
f"https://www.reddit.com/r/MotorsportsReplays/search.json?q={q}&restrict_sr=on&sort=new&limit=10",
|
||||
"MotorsportsReplays",
|
||||
)
|
||||
|
||||
async def _collect(
|
||||
self, client: httpx.AsyncClient, url: str, sub: str
|
||||
) -> list[_Candidate]:
|
||||
try:
|
||||
resp = await client.get(url)
|
||||
except Exception as e:
|
||||
logger.debug("[subreddit] fetch %s failed: %s", url, e)
|
||||
return []
|
||||
if resp.status_code != 200:
|
||||
logger.debug("[subreddit] %s -> HTTP %d", url, resp.status_code)
|
||||
return []
|
||||
try:
|
||||
data = resp.json()
|
||||
except Exception:
|
||||
return []
|
||||
out: list[_Candidate] = []
|
||||
for child in (data.get("data", {}) or {}).get("children", []):
|
||||
d = child.get("data", {}) or {}
|
||||
if not _has_live_marker(d):
|
||||
continue
|
||||
text = (d.get("selftext") or "")
|
||||
title = d.get("title") or ""
|
||||
flair = d.get("link_flair_text") or ""
|
||||
# First, the linked URL itself (if it's a recognised live site).
|
||||
top = d.get("url") or ""
|
||||
if top and _is_interesting(top):
|
||||
out.append(_Candidate(title, top, sub, flair))
|
||||
# Then any URL embedded in the selftext that points at a
|
||||
# community-curated live page.
|
||||
for u in _URL_RE.findall(text):
|
||||
if _is_interesting(u):
|
||||
out.append(_Candidate(title, u, sub, flair))
|
||||
return out
|
||||
190
stacks/f1-stream/files/backend/extractors/timstreams.py
Normal file
190
stacks/f1-stream/files/backend/extractors/timstreams.py
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
"""TimStreams extractor - fetches F1 streams from the TimStreams JSON API.
|
||||
|
||||
Returns embed URLs from hmembeds.one for iframe playback.
|
||||
The public API at stra.viaplus.site/main requires no authentication
|
||||
and returns all events/channels across Events, Replays, and 24/7 categories.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
import httpx
|
||||
|
||||
from backend.extractors.base import BaseExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
API_URL = "https://stra.viaplus.site/main"
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
# Direct F1 keyword matches (case-insensitive)
|
||||
F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1", "dazn f1"}
|
||||
# "Grand prix" is F1-related only if non-F1 motorsport keywords are absent
|
||||
GP_KEYWORD = "grand prix"
|
||||
# Exclude these motorsport series when matching on "grand prix"
|
||||
NON_F1_KEYWORDS = {
|
||||
"motogp", "moto gp", "moto2", "moto3", "motoe",
|
||||
"indycar", "indy car", "nascar",
|
||||
"rally", "wrc", "wec", "lemans", "le mans",
|
||||
"superbike", "dtm", "supercars",
|
||||
}
|
||||
|
||||
# 24/7 channels that should always be included (embed hashes on hmembeds.one)
|
||||
ALWAYS_INCLUDE_HASHES = {
|
||||
"888520f36cd94c5da4c71fddc1a5fc9b", # Sky Sports F1
|
||||
"fc3a54634d0867b0c02ee3223292e7c6", # DAZN F1
|
||||
}
|
||||
|
||||
|
||||
def _is_f1_event(name: str) -> bool:
|
||||
"""Check if an event/channel is Formula 1 related by name.
|
||||
|
||||
Returns True when the name contains a direct F1 keyword, or contains
|
||||
"grand prix" without non-F1 series keywords.
|
||||
|
||||
Note: The TimStreams API genre field (genre=2) covers ALL sports channels,
|
||||
not just motorsport, so we rely solely on name-based matching.
|
||||
"""
|
||||
lower = name.lower()
|
||||
|
||||
# Direct F1 keyword match
|
||||
if any(kw in lower for kw in F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
# Grand prix without competing series
|
||||
if GP_KEYWORD in lower and not any(kw in lower for kw in NON_F1_KEYWORDS):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
|
||||
def _extract_embed_hash(url: str) -> str | None:
|
||||
"""Extract the hash from an hmembeds.one embed URL.
|
||||
|
||||
Expected format: https://hmembeds.one/embed/{hash}
|
||||
Returns the hash string, or None if the URL is not in the expected format.
|
||||
"""
|
||||
if not url:
|
||||
return None
|
||||
# Handle both with and without trailing slash
|
||||
url = url.rstrip("/")
|
||||
prefix = "https://hmembeds.one/embed/"
|
||||
alt_prefix = "http://hmembeds.one/embed/"
|
||||
if url.startswith(prefix):
|
||||
return url[len(prefix):] or None
|
||||
if url.startswith(alt_prefix):
|
||||
return url[len(alt_prefix):] or None
|
||||
return None
|
||||
|
||||
|
||||
def _is_always_include(url: str) -> bool:
|
||||
"""Check if a stream URL is one of the always-include 24/7 channels."""
|
||||
embed_hash = _extract_embed_hash(url)
|
||||
return embed_hash in ALWAYS_INCLUDE_HASHES if embed_hash else False
|
||||
|
||||
|
||||
class TimStreamsExtractor(BaseExtractor):
|
||||
"""Extracts embed URLs from TimStreams' public JSON API.
|
||||
|
||||
The API at stra.viaplus.site/main returns a JSON array of categories,
|
||||
each containing events with stream URLs pointing to hmembeds.one embeds.
|
||||
"""
|
||||
|
||||
@property
|
||||
def site_key(self) -> str:
|
||||
return "timstreams"
|
||||
|
||||
@property
|
||||
def site_name(self) -> str:
|
||||
return "TimStreams"
|
||||
|
||||
async def extract(self) -> list[ExtractedStream]:
|
||||
"""Fetch F1 events/channels and return embed URLs for iframe playback."""
|
||||
streams: list[ExtractedStream] = []
|
||||
seen_urls: set[str] = set()
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=15.0,
|
||||
follow_redirects=True,
|
||||
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
|
||||
) as client:
|
||||
resp = await client.get(API_URL)
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"[timstreams] API returned HTTP %d", resp.status_code
|
||||
)
|
||||
return []
|
||||
|
||||
data = resp.json()
|
||||
if not isinstance(data, list):
|
||||
logger.warning("[timstreams] Unexpected API response type: %s", type(data).__name__)
|
||||
return []
|
||||
|
||||
logger.info("[timstreams] API returned %d categorie(s)", len(data))
|
||||
|
||||
for category in data:
|
||||
category_name = category.get("category", "Unknown")
|
||||
events = category.get("events", [])
|
||||
if not isinstance(events, list):
|
||||
continue
|
||||
|
||||
for event in events:
|
||||
event_name = event.get("name", "Unknown")
|
||||
event_streams = event.get("streams", [])
|
||||
|
||||
if not isinstance(event_streams, list) or not event_streams:
|
||||
continue
|
||||
|
||||
# Check if any stream URL matches an always-include channel
|
||||
always_include = any(
|
||||
_is_always_include(s.get("url", ""))
|
||||
for s in event_streams
|
||||
)
|
||||
|
||||
# Filter: must be F1-related or an always-include channel
|
||||
if not always_include and not _is_f1_event(event_name):
|
||||
continue
|
||||
|
||||
for stream_info in event_streams:
|
||||
stream_name = stream_info.get("name", "")
|
||||
stream_url = stream_info.get("url", "")
|
||||
|
||||
if not stream_url:
|
||||
continue
|
||||
|
||||
# Deduplicate by URL
|
||||
if stream_url in seen_urls:
|
||||
continue
|
||||
seen_urls.add(stream_url)
|
||||
|
||||
# Build a descriptive title
|
||||
title = event_name
|
||||
if stream_name and stream_name.lower() != event_name.lower():
|
||||
title = f"{event_name} - {stream_name}"
|
||||
if category_name:
|
||||
title = f"[{category_name}] {title}"
|
||||
|
||||
streams.append(
|
||||
ExtractedStream(
|
||||
url=stream_url,
|
||||
site_key=self.site_key,
|
||||
site_name=self.site_name,
|
||||
quality="",
|
||||
title=title,
|
||||
stream_type="embed",
|
||||
embed_url=stream_url,
|
||||
)
|
||||
)
|
||||
|
||||
except httpx.TimeoutException:
|
||||
logger.warning("[timstreams] API request timed out")
|
||||
except Exception:
|
||||
logger.exception("[timstreams] Failed to fetch from API")
|
||||
|
||||
logger.info("[timstreams] Extracted %d stream(s)", len(streams))
|
||||
return streams
|
||||
301
stacks/f1-stream/files/backend/health.py
Normal file
301
stacks/f1-stream/files/backend/health.py
Normal file
|
|
@ -0,0 +1,301 @@
|
|||
"""Stream health checker - verifies extracted streams are live and responsive.
|
||||
|
||||
Performs GET requests against m3u8 URLs to verify they contain valid HLS
|
||||
playlists (#EXTM3U header), measures response times for quality ranking,
|
||||
and supports concurrent checking of multiple streams.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# How long to wait for a single health check (seconds)
|
||||
HEALTH_CHECK_TIMEOUT = 10.0
|
||||
|
||||
# Maximum bytes to read when verifying m3u8 content
|
||||
# We only need to see the #EXTM3U header and a few lines
|
||||
MAX_CONTENT_BYTES = 8192
|
||||
|
||||
# User-Agent to send with health check requests
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class StreamHealth:
|
||||
"""Result of a single stream health check."""
|
||||
|
||||
url: str
|
||||
is_live: bool
|
||||
response_time_ms: int # Lower = better quality indicator
|
||||
checked_at: str = field(
|
||||
default_factory=lambda: datetime.now(timezone.utc).isoformat()
|
||||
)
|
||||
error: str = "" # Error message if not live
|
||||
bitrate: int = 0 # Bitrate in bps if detectable from playlist
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Serialize to a plain dictionary for JSON responses."""
|
||||
return {
|
||||
"url": self.url,
|
||||
"is_live": self.is_live,
|
||||
"response_time_ms": self.response_time_ms,
|
||||
"checked_at": self.checked_at,
|
||||
"error": self.error,
|
||||
"bitrate": self.bitrate,
|
||||
}
|
||||
|
||||
|
||||
def _extract_bitrate(content: str) -> int:
|
||||
"""Try to extract bitrate from m3u8 playlist content.
|
||||
|
||||
Looks for BANDWIDTH= in #EXT-X-STREAM-INF tags. Returns the highest
|
||||
bitrate found, or 0 if none detected.
|
||||
"""
|
||||
max_bitrate = 0
|
||||
for line in content.splitlines():
|
||||
if "BANDWIDTH=" in line:
|
||||
try:
|
||||
# Parse BANDWIDTH=<number> from the tag
|
||||
for part in line.split(","):
|
||||
part = part.strip()
|
||||
if part.startswith("BANDWIDTH="):
|
||||
bw = int(part.split("=", 1)[1])
|
||||
max_bitrate = max(max_bitrate, bw)
|
||||
except (ValueError, IndexError):
|
||||
continue
|
||||
return max_bitrate
|
||||
|
||||
|
||||
class StreamHealthChecker:
|
||||
"""Background health checker for extracted streams.
|
||||
|
||||
Verifies streams are live by performing a partial GET on the m3u8 URL,
|
||||
checking for valid HLS content (#EXTM3U header), and measuring response
|
||||
time as a quality indicator.
|
||||
"""
|
||||
|
||||
def __init__(self, timeout: float = HEALTH_CHECK_TIMEOUT) -> None:
|
||||
self._timeout = timeout
|
||||
|
||||
async def check_stream(self, url: str) -> StreamHealth:
|
||||
"""Check if a stream URL is live by doing a partial GET on the m3u8.
|
||||
|
||||
Verification steps:
|
||||
1. GET the m3u8 URL (not just HEAD - need to verify playlist content)
|
||||
2. Check if response contains #EXTM3U header
|
||||
3. Measure response time as a quality indicator
|
||||
4. Extract bitrate info if available
|
||||
|
||||
Args:
|
||||
url: The m3u8 stream URL to check.
|
||||
|
||||
Returns:
|
||||
StreamHealth with is_live, response_time_ms, checked_at, and
|
||||
optional bitrate and error information.
|
||||
"""
|
||||
start_time = time.monotonic()
|
||||
checked_at = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=self._timeout,
|
||||
follow_redirects=True,
|
||||
headers={
|
||||
"User-Agent": USER_AGENT,
|
||||
"Accept": "*/*",
|
||||
},
|
||||
) as client:
|
||||
# Use a partial GET with Range header to limit download
|
||||
# but fall back to reading limited bytes if Range not supported
|
||||
response = await client.get(
|
||||
url,
|
||||
headers={"Range": f"bytes=0-{MAX_CONTENT_BYTES - 1}"},
|
||||
)
|
||||
|
||||
elapsed_ms = int((time.monotonic() - start_time) * 1000)
|
||||
|
||||
# Accept 200 (full content) or 206 (partial content)
|
||||
if response.status_code not in (200, 206):
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
error=f"HTTP {response.status_code}",
|
||||
)
|
||||
|
||||
content = response.text[:MAX_CONTENT_BYTES]
|
||||
|
||||
# Verify it's a valid HLS playlist
|
||||
if "#EXTM3U" not in content:
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
error="Response does not contain #EXTM3U header",
|
||||
)
|
||||
|
||||
# Extract bitrate info if available
|
||||
bitrate = _extract_bitrate(content)
|
||||
|
||||
# If this is a master playlist, validate at least one variant
|
||||
if "#EXT-X-STREAM-INF:" in content:
|
||||
variant_ok = await self._check_first_variant(
|
||||
content, url, client
|
||||
)
|
||||
if not variant_ok:
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
bitrate=bitrate,
|
||||
error="Master playlist OK but variant playlists are unreachable",
|
||||
)
|
||||
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=True,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
bitrate=bitrate,
|
||||
)
|
||||
|
||||
except httpx.TimeoutException:
|
||||
elapsed_ms = int((time.monotonic() - start_time) * 1000)
|
||||
logger.debug("Health check timed out for %s", url)
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
error="Timeout",
|
||||
)
|
||||
except httpx.HTTPError as e:
|
||||
elapsed_ms = int((time.monotonic() - start_time) * 1000)
|
||||
logger.debug("Health check HTTP error for %s: %s", url, e)
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
error=f"HTTP error: {e}",
|
||||
)
|
||||
except Exception as e:
|
||||
elapsed_ms = int((time.monotonic() - start_time) * 1000)
|
||||
logger.exception("Unexpected error during health check for %s", url)
|
||||
return StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=elapsed_ms,
|
||||
checked_at=checked_at,
|
||||
error=f"Unexpected error: {e}",
|
||||
)
|
||||
|
||||
async def _check_first_variant(
|
||||
self, content: str, base_url: str, client: httpx.AsyncClient
|
||||
) -> bool:
|
||||
"""Check that at least one variant playlist in a master playlist is reachable.
|
||||
|
||||
Extracts the first variant URI from a master playlist and does a HEAD
|
||||
request to verify it returns 200/206. This catches streams where the
|
||||
master playlist is valid but all variant playlists are 404.
|
||||
|
||||
Args:
|
||||
content: The master playlist text content.
|
||||
base_url: The URL of the master playlist (for resolving relative URIs).
|
||||
client: An existing httpx client to reuse.
|
||||
|
||||
Returns:
|
||||
True if at least one variant is reachable, False otherwise.
|
||||
"""
|
||||
lines = content.splitlines()
|
||||
for i, line in enumerate(lines):
|
||||
if not line.strip().startswith("#EXT-X-STREAM-INF:"):
|
||||
continue
|
||||
# Next non-empty, non-comment line is the variant URI
|
||||
for j in range(i + 1, len(lines)):
|
||||
variant_uri = lines[j].strip()
|
||||
if variant_uri and not variant_uri.startswith("#"):
|
||||
# Resolve relative URI
|
||||
if not variant_uri.startswith(("http://", "https://")):
|
||||
variant_uri = urljoin(base_url, variant_uri)
|
||||
try:
|
||||
resp = await client.head(variant_uri)
|
||||
if resp.status_code in (200, 206):
|
||||
return True
|
||||
# HEAD might not be supported, try GET
|
||||
resp = await client.get(
|
||||
variant_uri,
|
||||
headers={"Range": f"bytes=0-{MAX_CONTENT_BYTES - 1}"},
|
||||
)
|
||||
if resp.status_code in (200, 206):
|
||||
return True
|
||||
logger.debug(
|
||||
"Variant playlist %s returned HTTP %d",
|
||||
variant_uri, resp.status_code,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug(
|
||||
"Variant check failed for %s: %s", variant_uri, e
|
||||
)
|
||||
# Only check the first variant
|
||||
return False
|
||||
# No variants found (shouldn't happen if #EXT-X-STREAM-INF was detected)
|
||||
return True
|
||||
|
||||
async def check_all(
|
||||
self, streams: list[dict],
|
||||
) -> dict[str, StreamHealth]:
|
||||
"""Check all streams concurrently, return health map keyed by URL.
|
||||
|
||||
Args:
|
||||
streams: List of stream dicts (must have a "url" key).
|
||||
|
||||
Returns:
|
||||
Dictionary mapping stream URL to its StreamHealth result.
|
||||
"""
|
||||
urls = [s["url"] for s in streams if "url" in s]
|
||||
|
||||
if not urls:
|
||||
return {}
|
||||
|
||||
logger.info("Running health checks on %d stream(s)...", len(urls))
|
||||
|
||||
# Run all checks concurrently
|
||||
tasks = [self.check_stream(url) for url in urls]
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
health_map: dict[str, StreamHealth] = {}
|
||||
for url, result in zip(urls, results):
|
||||
if isinstance(result, Exception):
|
||||
logger.error("Health check task failed for %s: %s", url, result)
|
||||
health_map[url] = StreamHealth(
|
||||
url=url,
|
||||
is_live=False,
|
||||
response_time_ms=0,
|
||||
error=f"Task error: {result}",
|
||||
)
|
||||
else:
|
||||
health_map[url] = result
|
||||
|
||||
live_count = sum(1 for h in health_map.values() if h.is_live)
|
||||
logger.info(
|
||||
"Health checks complete: %d/%d streams are live",
|
||||
live_count,
|
||||
len(health_map),
|
||||
)
|
||||
|
||||
return health_map
|
||||
264
stacks/f1-stream/files/backend/m3u8_rewriter.py
Normal file
264
stacks/f1-stream/files/backend/m3u8_rewriter.py
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
"""m3u8 playlist rewriter - rewrites URIs in HLS playlists to go through the proxy.
|
||||
|
||||
Handles both master playlists (containing variant stream references) and
|
||||
media playlists (containing segment URLs). Resolves relative URIs to
|
||||
absolute before encoding, and routes .m3u8 references through /proxy
|
||||
while routing segments (.ts, .m4s, etc.) through /relay.
|
||||
"""
|
||||
|
||||
import base64
|
||||
import logging
|
||||
import re
|
||||
from urllib.parse import urljoin
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def encode_url(url: str) -> str:
|
||||
"""Base64url-encode a URL for safe transport as a query parameter.
|
||||
|
||||
Uses URL-safe base64 encoding with padding stripped to avoid
|
||||
double-encoding issues when the URL contains special characters.
|
||||
|
||||
Args:
|
||||
url: The raw URL to encode.
|
||||
|
||||
Returns:
|
||||
Base64url-encoded string with padding removed.
|
||||
"""
|
||||
return base64.urlsafe_b64encode(url.encode()).decode().rstrip("=")
|
||||
|
||||
|
||||
def decode_url(encoded: str) -> str:
|
||||
"""Decode a base64url-encoded URL.
|
||||
|
||||
Re-adds padding that was stripped during encoding.
|
||||
|
||||
Args:
|
||||
encoded: Base64url-encoded string (padding may be stripped).
|
||||
|
||||
Returns:
|
||||
The original URL string.
|
||||
|
||||
Raises:
|
||||
ValueError: If the encoded string is not valid base64url.
|
||||
"""
|
||||
# Add padding back - base64 requires length to be multiple of 4
|
||||
padding = 4 - len(encoded) % 4
|
||||
if padding != 4:
|
||||
encoded += "=" * padding
|
||||
return base64.urlsafe_b64decode(encoded).decode()
|
||||
|
||||
|
||||
def _resolve_uri(uri: str, base_url: str) -> str:
|
||||
"""Resolve a potentially relative URI against a base URL.
|
||||
|
||||
Args:
|
||||
uri: The URI from the m3u8 playlist (may be relative or absolute).
|
||||
base_url: The URL of the playlist itself (used as base for relative URIs).
|
||||
|
||||
Returns:
|
||||
Absolute URL.
|
||||
"""
|
||||
if uri.startswith("http://") or uri.startswith("https://"):
|
||||
return uri
|
||||
return urljoin(base_url, uri)
|
||||
|
||||
|
||||
def _is_playlist_uri(uri: str) -> bool:
|
||||
"""Determine if a URI likely points to another playlist (vs a segment).
|
||||
|
||||
Playlist URIs end in .m3u8 or .m3u. Everything else is treated as a
|
||||
segment (TS, fMP4, init segment, etc.).
|
||||
|
||||
Args:
|
||||
uri: The URI to classify.
|
||||
|
||||
Returns:
|
||||
True if the URI appears to be a playlist reference.
|
||||
"""
|
||||
# Strip query string for extension check
|
||||
path = uri.split("?")[0].split("#")[0].lower()
|
||||
return path.endswith(".m3u8") or path.endswith(".m3u")
|
||||
|
||||
|
||||
def _build_proxy_url(absolute_uri: str, proxy_base: str) -> str:
|
||||
"""Build a /proxy URL for a playlist reference.
|
||||
|
||||
Args:
|
||||
absolute_uri: The absolute URL of the upstream playlist.
|
||||
proxy_base: The base URL of our proxy service.
|
||||
|
||||
Returns:
|
||||
Rewritten URL pointing to our /proxy endpoint.
|
||||
"""
|
||||
encoded = encode_url(absolute_uri)
|
||||
return f"{proxy_base}/proxy?url={encoded}"
|
||||
|
||||
|
||||
def _build_relay_url(absolute_uri: str, proxy_base: str) -> str:
|
||||
"""Build a /relay URL for a segment reference.
|
||||
|
||||
Args:
|
||||
absolute_uri: The absolute URL of the upstream segment.
|
||||
proxy_base: The base URL of our proxy service.
|
||||
|
||||
Returns:
|
||||
Rewritten URL pointing to our /relay endpoint.
|
||||
"""
|
||||
encoded = encode_url(absolute_uri)
|
||||
return f"{proxy_base}/relay?url={encoded}"
|
||||
|
||||
|
||||
def _rewrite_uri(uri: str, base_url: str, proxy_base: str) -> str:
|
||||
"""Rewrite a single URI from an m3u8 playlist.
|
||||
|
||||
Resolves relative URIs, then routes playlists through /proxy and
|
||||
segments through /relay.
|
||||
|
||||
Args:
|
||||
uri: The raw URI from the playlist.
|
||||
base_url: The URL of the playlist containing this URI.
|
||||
proxy_base: The base URL of our proxy service.
|
||||
|
||||
Returns:
|
||||
Rewritten URI pointing to our proxy.
|
||||
"""
|
||||
absolute = _resolve_uri(uri, base_url)
|
||||
if _is_playlist_uri(uri):
|
||||
return _build_proxy_url(absolute, proxy_base)
|
||||
return _build_relay_url(absolute, proxy_base)
|
||||
|
||||
|
||||
def rewrite_playlist(content: str, base_url: str, proxy_base: str) -> str:
|
||||
"""Rewrite all URIs in an m3u8 playlist to go through the proxy.
|
||||
|
||||
Handles both master playlists (with #EXT-X-STREAM-INF variant
|
||||
references) and media playlists (with segment URIs). Also handles
|
||||
#EXT-X-MAP:URI= init segment references.
|
||||
|
||||
Args:
|
||||
content: The raw m3u8 playlist text.
|
||||
base_url: The original URL of this playlist (for resolving relative URIs).
|
||||
proxy_base: The base URL of our proxy (e.g., "https://f1.viktorbarzin.me").
|
||||
|
||||
Returns:
|
||||
The rewritten m3u8 playlist text with all URIs proxied.
|
||||
"""
|
||||
proxy_base = proxy_base.rstrip("/")
|
||||
lines = content.splitlines()
|
||||
output_lines: list[str] = []
|
||||
|
||||
# Track if the previous line was #EXT-X-STREAM-INF (next line is a variant URI)
|
||||
next_is_variant = False
|
||||
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
|
||||
# Handle #EXT-X-MAP:URI="..." (init segment)
|
||||
if stripped.startswith("#EXT-X-MAP:"):
|
||||
output_lines.append(_rewrite_ext_x_map(stripped, base_url, proxy_base))
|
||||
continue
|
||||
|
||||
# Handle #EXT-X-STREAM-INF (marks next line as variant playlist URI)
|
||||
if stripped.startswith("#EXT-X-STREAM-INF:"):
|
||||
output_lines.append(line)
|
||||
next_is_variant = True
|
||||
continue
|
||||
|
||||
# Handle #EXT-X-MEDIA with URI= attribute
|
||||
if stripped.startswith("#EXT-X-MEDIA:") and "URI=" in stripped:
|
||||
output_lines.append(_rewrite_ext_x_media(stripped, base_url, proxy_base))
|
||||
continue
|
||||
|
||||
# Handle #EXT-X-I-FRAME-STREAM-INF with URI= attribute
|
||||
if stripped.startswith("#EXT-X-I-FRAME-STREAM-INF:") and "URI=" in stripped:
|
||||
output_lines.append(
|
||||
_rewrite_tag_with_uri(stripped, base_url, proxy_base, is_playlist=True)
|
||||
)
|
||||
continue
|
||||
|
||||
# If previous line was #EXT-X-STREAM-INF, this line is a variant playlist URI
|
||||
if next_is_variant and stripped and not stripped.startswith("#"):
|
||||
absolute = _resolve_uri(stripped, base_url)
|
||||
output_lines.append(_build_proxy_url(absolute, proxy_base))
|
||||
next_is_variant = False
|
||||
continue
|
||||
|
||||
# Regular URI line (non-comment, non-empty, not a tag)
|
||||
if stripped and not stripped.startswith("#"):
|
||||
# This is a segment URI (TS, fMP4, etc.)
|
||||
absolute = _resolve_uri(stripped, base_url)
|
||||
output_lines.append(_build_relay_url(absolute, proxy_base))
|
||||
continue
|
||||
|
||||
# Tags and comments pass through unchanged
|
||||
output_lines.append(line)
|
||||
# Reset variant flag if we hit another tag
|
||||
if stripped.startswith("#") and not stripped.startswith("#EXT-X-STREAM-INF:"):
|
||||
next_is_variant = False
|
||||
|
||||
return "\n".join(output_lines)
|
||||
|
||||
|
||||
def _rewrite_ext_x_map(line: str, base_url: str, proxy_base: str) -> str:
|
||||
"""Rewrite the URI in an #EXT-X-MAP tag.
|
||||
|
||||
#EXT-X-MAP:URI="init.mp4" -> #EXT-X-MAP:URI="<relay_url>"
|
||||
The init segment goes through /relay since it's binary data.
|
||||
"""
|
||||
# Match URI="..." or URI=... (with or without quotes)
|
||||
match = re.search(r'URI="([^"]+)"', line)
|
||||
if not match:
|
||||
match = re.search(r"URI=([^,\s]+)", line)
|
||||
|
||||
if not match:
|
||||
return line
|
||||
|
||||
original_uri = match.group(1)
|
||||
absolute = _resolve_uri(original_uri, base_url)
|
||||
relay_url = _build_relay_url(absolute, proxy_base)
|
||||
|
||||
return line[:match.start(1)] + relay_url + line[match.end(1):]
|
||||
|
||||
|
||||
def _rewrite_ext_x_media(line: str, base_url: str, proxy_base: str) -> str:
|
||||
"""Rewrite the URI in an #EXT-X-MEDIA tag.
|
||||
|
||||
#EXT-X-MEDIA:TYPE=AUDIO,...,URI="audio.m3u8" -> rewrite URI to /proxy
|
||||
"""
|
||||
return _rewrite_tag_with_uri(line, base_url, proxy_base, is_playlist=True)
|
||||
|
||||
|
||||
def _rewrite_tag_with_uri(
|
||||
line: str, base_url: str, proxy_base: str, is_playlist: bool = False,
|
||||
) -> str:
|
||||
"""Rewrite the URI attribute within an HLS tag line.
|
||||
|
||||
Generic handler for any tag that contains a URI="..." attribute.
|
||||
|
||||
Args:
|
||||
line: The full tag line.
|
||||
base_url: Base URL for resolving relative URIs.
|
||||
proxy_base: Our proxy base URL.
|
||||
is_playlist: If True, route through /proxy; otherwise /relay.
|
||||
|
||||
Returns:
|
||||
The tag line with the URI rewritten.
|
||||
"""
|
||||
match = re.search(r'URI="([^"]+)"', line)
|
||||
if not match:
|
||||
match = re.search(r"URI=([^,\s]+)", line)
|
||||
|
||||
if not match:
|
||||
return line
|
||||
|
||||
original_uri = match.group(1)
|
||||
absolute = _resolve_uri(original_uri, base_url)
|
||||
|
||||
if is_playlist:
|
||||
new_url = _build_proxy_url(absolute, proxy_base)
|
||||
else:
|
||||
new_url = _build_relay_url(absolute, proxy_base)
|
||||
|
||||
return line[:match.start(1)] + new_url + line[match.end(1):]
|
||||
488
stacks/f1-stream/files/backend/main.py
Normal file
488
stacks/f1-stream/files/backend/main.py
Normal file
|
|
@ -0,0 +1,488 @@
|
|||
"""F1 Streams - FastAPI backend with schedule, stream extraction, health checking, HLS proxy, and token refresh."""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from contextlib import asynccontextmanager
|
||||
from datetime import datetime, timedelta, timezone
|
||||
|
||||
from apscheduler.schedulers.asyncio import AsyncIOScheduler
|
||||
from apscheduler.triggers.cron import CronTrigger
|
||||
from apscheduler.triggers.interval import IntervalTrigger
|
||||
from fastapi import FastAPI, Query, Request
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from pydantic import BaseModel
|
||||
from starlette.responses import Response, StreamingResponse
|
||||
|
||||
from backend.embed_proxy import fetch_embed, relay_asset
|
||||
from backend.extractors import create_extraction_service
|
||||
from backend.proxy import proxy_playlist, relay_stream
|
||||
from backend.schedule import ScheduleService
|
||||
from backend.token_refresh import TokenRefreshManager
|
||||
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
|
||||
)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
schedule_service = ScheduleService()
|
||||
extraction_service = create_extraction_service()
|
||||
token_refresh_manager = TokenRefreshManager(extraction_service)
|
||||
scheduler = AsyncIOScheduler()
|
||||
|
||||
|
||||
# --- Pydantic models for request bodies ---
|
||||
|
||||
|
||||
class ActivateStreamRequest(BaseModel):
|
||||
"""Request body for POST /streams/activate."""
|
||||
|
||||
url: str
|
||||
site_key: str = ""
|
||||
|
||||
|
||||
class DeactivateStreamRequest(BaseModel):
|
||||
"""Request body for POST /streams/deactivate."""
|
||||
|
||||
url: str
|
||||
|
||||
|
||||
# --- Scheduled callbacks ---
|
||||
|
||||
|
||||
async def _scheduled_refresh() -> None:
|
||||
"""Callback for APScheduler daily schedule refresh."""
|
||||
logger.info("Running scheduled schedule refresh...")
|
||||
await schedule_service.refresh()
|
||||
|
||||
|
||||
async def _scheduled_extraction() -> None:
|
||||
"""Callback for APScheduler stream extraction.
|
||||
|
||||
Adjusts its own interval based on whether a session is currently live:
|
||||
- During a live session: reschedule to every 5 minutes
|
||||
- Otherwise: reschedule to every 30 minutes
|
||||
"""
|
||||
logger.info("Running scheduled extraction...")
|
||||
await extraction_service.run_extraction()
|
||||
|
||||
# Check if any session is currently live and adjust polling interval
|
||||
schedule_data = schedule_service.get_schedule()
|
||||
is_live = False
|
||||
for race in schedule_data.get("races", []):
|
||||
for session in race.get("sessions", []):
|
||||
if session.get("status") == "live":
|
||||
is_live = True
|
||||
break
|
||||
if is_live:
|
||||
break
|
||||
|
||||
# Update the extraction job interval based on live status
|
||||
job = scheduler.get_job("stream_extraction")
|
||||
if job:
|
||||
current_interval = getattr(job.trigger, "interval_length", None)
|
||||
desired_interval = 300 if is_live else 1800 # 5 min or 30 min
|
||||
|
||||
if current_interval != desired_interval:
|
||||
interval_minutes = 5 if is_live else 30
|
||||
scheduler.reschedule_job(
|
||||
"stream_extraction",
|
||||
trigger=IntervalTrigger(minutes=interval_minutes),
|
||||
)
|
||||
logger.info(
|
||||
"Extraction interval adjusted to %d minutes (live=%s)",
|
||||
interval_minutes,
|
||||
is_live,
|
||||
)
|
||||
|
||||
|
||||
async def _scheduled_token_refresh() -> None:
|
||||
"""Callback for APScheduler token refresh.
|
||||
|
||||
Only performs work when there are active streams. Re-runs extractors
|
||||
to get fresh CDN tokens for streams being actively watched.
|
||||
"""
|
||||
if not token_refresh_manager.has_active_streams:
|
||||
return
|
||||
|
||||
logger.info("Running scheduled token refresh...")
|
||||
try:
|
||||
await token_refresh_manager.refresh_active_streams()
|
||||
except Exception:
|
||||
logger.exception("Token refresh failed (non-fatal)")
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Startup and shutdown lifecycle handler."""
|
||||
# Startup: load schedule and start background scheduler
|
||||
await schedule_service.initialize()
|
||||
|
||||
# Schedule daily schedule refresh
|
||||
scheduler.add_job(
|
||||
_scheduled_refresh,
|
||||
trigger=CronTrigger(hour=3, minute=0, timezone="UTC"),
|
||||
id="daily_schedule_refresh",
|
||||
name="Refresh F1 schedule daily at 03:00 UTC",
|
||||
replace_existing=True,
|
||||
)
|
||||
|
||||
# Schedule periodic stream extraction (default: every 30 minutes).
|
||||
# next_run_time fires the first run 8s after startup. We don't run
|
||||
# extraction inline here because it calls the playback verifier,
|
||||
# which hits http://127.0.0.1:8000/embed for embed streams — uvicorn
|
||||
# isn't listening yet inside the lifespan startup phase.
|
||||
scheduler.add_job(
|
||||
_scheduled_extraction,
|
||||
trigger=IntervalTrigger(minutes=30),
|
||||
id="stream_extraction",
|
||||
name="Extract streams from all registered sites",
|
||||
replace_existing=True,
|
||||
next_run_time=datetime.now(timezone.utc) + timedelta(seconds=8),
|
||||
)
|
||||
|
||||
# Schedule token refresh every 4 minutes (safe margin for 5-min CDN tokens).
|
||||
# The callback is a no-op when there are no active streams.
|
||||
scheduler.add_job(
|
||||
_scheduled_token_refresh,
|
||||
trigger=IntervalTrigger(minutes=4),
|
||||
id="token_refresh",
|
||||
name="Refresh CDN tokens for active streams",
|
||||
replace_existing=True,
|
||||
)
|
||||
|
||||
scheduler.start()
|
||||
logger.info(
|
||||
"APScheduler started - schedule refresh at 03:00 UTC, extraction every 30m, token refresh every 4m"
|
||||
)
|
||||
|
||||
yield
|
||||
|
||||
# Shutdown
|
||||
scheduler.shutdown(wait=False)
|
||||
logger.info("APScheduler shut down")
|
||||
try:
|
||||
await extraction_service.shutdown()
|
||||
except Exception:
|
||||
logger.exception("extraction_service shutdown failed")
|
||||
|
||||
|
||||
app = FastAPI(title="F1 Streams", lifespan=lifespan)
|
||||
|
||||
# --- CORS Middleware ---
|
||||
# Required for browser-based HLS players to access proxy/relay endpoints
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_methods=["GET", "POST", "OPTIONS"],
|
||||
allow_headers=["Range", "Content-Type"],
|
||||
expose_headers=["Content-Range", "Content-Length", "Content-Type"],
|
||||
)
|
||||
|
||||
|
||||
# --- Health & Info ---
|
||||
|
||||
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
# --- Schedule ---
|
||||
|
||||
|
||||
@app.get("/schedule")
|
||||
async def get_schedule():
|
||||
"""Return the F1 race schedule for the current season with session statuses."""
|
||||
return schedule_service.get_schedule()
|
||||
|
||||
|
||||
@app.post("/schedule/refresh")
|
||||
async def refresh_schedule():
|
||||
"""Manually trigger a schedule refresh from the jolpica API."""
|
||||
await schedule_service.refresh()
|
||||
return {"status": "refreshed"}
|
||||
|
||||
|
||||
# --- Streams & Extraction ---
|
||||
|
||||
|
||||
@app.get("/streams")
|
||||
async def get_streams():
|
||||
"""Return all currently cached streams that passed health checks.
|
||||
|
||||
Streams are sorted by fallback priority:
|
||||
1. Live streams only (is_live=True)
|
||||
2. Fastest response time first (lowest response_time_ms)
|
||||
"""
|
||||
streams = extraction_service.get_streams()
|
||||
return {
|
||||
"streams": streams,
|
||||
"count": len(streams),
|
||||
}
|
||||
|
||||
|
||||
@app.get("/streams/all")
|
||||
async def get_all_streams():
|
||||
"""Return ALL cached streams including unhealthy ones (for debugging).
|
||||
|
||||
Unlike GET /streams, this endpoint includes streams that failed health
|
||||
checks. Useful for diagnosing extraction or health check issues.
|
||||
"""
|
||||
streams = extraction_service.get_all_streams_unfiltered()
|
||||
return {
|
||||
"streams": streams,
|
||||
"count": len(streams),
|
||||
}
|
||||
|
||||
|
||||
@app.post("/streams/activate")
|
||||
async def activate_stream(body: ActivateStreamRequest):
|
||||
"""Mark a stream as actively being watched.
|
||||
|
||||
When a stream is active, the token refresh manager will periodically
|
||||
re-run the extractor that found it to get fresh CDN tokens before
|
||||
they expire.
|
||||
|
||||
If site_key is not provided, attempts to look it up from the cached
|
||||
streams.
|
||||
|
||||
Body:
|
||||
{"url": "https://...", "site_key": "optional-site-key"}
|
||||
"""
|
||||
url = body.url
|
||||
site_key = body.site_key
|
||||
|
||||
# If site_key not provided, try to look it up from cached streams
|
||||
if not site_key:
|
||||
for streams in extraction_service._cache.values():
|
||||
for stream in streams:
|
||||
if stream.url == url:
|
||||
site_key = stream.site_key
|
||||
break
|
||||
if site_key:
|
||||
break
|
||||
|
||||
if not site_key:
|
||||
return {
|
||||
"status": "error",
|
||||
"detail": "Could not determine site_key for this URL. Provide it explicitly.",
|
||||
}
|
||||
|
||||
token_refresh_manager.mark_stream_active(url, site_key)
|
||||
return {
|
||||
"status": "activated",
|
||||
"url": url,
|
||||
"site_key": site_key,
|
||||
"active_count": len(token_refresh_manager.get_active_streams()),
|
||||
}
|
||||
|
||||
|
||||
@app.post("/streams/deactivate")
|
||||
async def deactivate_stream(body: DeactivateStreamRequest):
|
||||
"""Mark a stream as no longer being watched.
|
||||
|
||||
Stops the token refresh manager from refreshing CDN tokens for this stream.
|
||||
|
||||
Body:
|
||||
{"url": "https://..."}
|
||||
"""
|
||||
token_refresh_manager.mark_stream_inactive(body.url)
|
||||
return {
|
||||
"status": "deactivated",
|
||||
"url": body.url,
|
||||
"active_count": len(token_refresh_manager.get_active_streams()),
|
||||
}
|
||||
|
||||
|
||||
@app.get("/streams/active")
|
||||
async def get_active_streams():
|
||||
"""List currently active streams with their refresh status.
|
||||
|
||||
Returns all streams that are being actively watched, including
|
||||
their current (potentially refreshed) URLs and refresh counts.
|
||||
"""
|
||||
active = token_refresh_manager.get_active_streams()
|
||||
return {
|
||||
"streams": active,
|
||||
"count": len(active),
|
||||
}
|
||||
|
||||
|
||||
@app.get("/extractors")
|
||||
async def get_extractors():
|
||||
"""List registered extractors and their current status."""
|
||||
return extraction_service.get_status()
|
||||
|
||||
|
||||
@app.post("/extract")
|
||||
async def trigger_extraction():
|
||||
"""Manually trigger an extraction run across all registered extractors."""
|
||||
await extraction_service.run_extraction()
|
||||
status = extraction_service.get_status()
|
||||
return {
|
||||
"status": "extraction_complete",
|
||||
"streams_found": status["total_cached_streams"],
|
||||
"live_streams": status["total_live_streams"],
|
||||
"extractors_run": len(status["extractors"]),
|
||||
}
|
||||
|
||||
|
||||
# --- HLS Proxy ---
|
||||
|
||||
|
||||
def _get_proxy_base(request: Request) -> str:
|
||||
"""Derive the proxy base URL from the incoming request.
|
||||
|
||||
Uses X-Forwarded-Proto and X-Forwarded-Host headers if present
|
||||
(behind a reverse proxy), otherwise falls back to request URL.
|
||||
"""
|
||||
proto = request.headers.get("x-forwarded-proto", request.url.scheme)
|
||||
host = request.headers.get("x-forwarded-host", request.url.netloc)
|
||||
return f"{proto}://{host}"
|
||||
|
||||
|
||||
@app.get("/proxy")
|
||||
async def proxy_endpoint(
|
||||
request: Request,
|
||||
url: str = Query(..., description="Base64url-encoded m3u8 playlist URL"),
|
||||
quality: int | None = Query(
|
||||
None,
|
||||
description="0-based quality variant index (0=highest bandwidth). "
|
||||
"Only applies to master playlists.",
|
||||
),
|
||||
):
|
||||
"""Proxy an upstream m3u8 playlist with URI rewriting.
|
||||
|
||||
Fetches the upstream m3u8 playlist, rewrites all URIs to route through
|
||||
our /proxy (for sub-playlists) and /relay (for segments) endpoints,
|
||||
and returns the rewritten playlist.
|
||||
|
||||
The `url` parameter must be base64url-encoded to avoid URL encoding issues.
|
||||
|
||||
If `quality` is specified and the upstream is a master playlist (with
|
||||
multiple quality variants), the proxy will fetch the selected variant's
|
||||
media playlist directly instead of returning the master playlist.
|
||||
Quality index 0 = highest bandwidth, 1 = second highest, etc.
|
||||
|
||||
Examples:
|
||||
GET /proxy?url=aHR0cHM6Ly9leGFtcGxlLmNvbS9zdHJlYW0ubTN1OA
|
||||
GET /proxy?url=aHR0cHM6Ly9leGFtcGxlLmNvbS9zdHJlYW0ubTN1OA&quality=0
|
||||
"""
|
||||
# Check if we have a fresher URL from token refresh
|
||||
fresh_url = token_refresh_manager.get_fresh_url(url)
|
||||
if fresh_url != url:
|
||||
logger.info("Using refreshed URL from token manager")
|
||||
|
||||
proxy_base = _get_proxy_base(request)
|
||||
rewritten = await proxy_playlist(fresh_url, proxy_base, quality=quality)
|
||||
|
||||
return Response(
|
||||
content=rewritten,
|
||||
media_type="application/vnd.apple.mpegurl",
|
||||
headers={
|
||||
"Cache-Control": "no-cache, no-store, must-revalidate",
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
@app.get("/relay")
|
||||
async def relay_endpoint(
|
||||
request: Request,
|
||||
url: str = Query(..., description="Base64url-encoded segment URL"),
|
||||
):
|
||||
"""Relay an upstream media segment as a chunked byte stream.
|
||||
|
||||
Fetches the upstream segment (TS, fMP4, init segment, etc.) and streams
|
||||
it to the client using chunked transfer encoding. Never buffers the
|
||||
full segment in memory.
|
||||
|
||||
The `url` parameter must be base64url-encoded to avoid URL encoding issues.
|
||||
|
||||
Supports HTTP Range requests for seeking.
|
||||
|
||||
Example:
|
||||
GET /relay?url=aHR0cHM6Ly9leGFtcGxlLmNvbS9zZWdtZW50LnRz
|
||||
"""
|
||||
range_header = request.headers.get("range")
|
||||
|
||||
stream_gen, headers, status_code = await relay_stream(url, range_header)
|
||||
|
||||
return StreamingResponse(
|
||||
stream_gen,
|
||||
status_code=status_code,
|
||||
headers=headers,
|
||||
)
|
||||
|
||||
|
||||
# --- Embed iframe-stripping proxy ---
|
||||
|
||||
|
||||
@app.get("/embed")
|
||||
async def embed_proxy(url: str = Query(..., description="Base64url-encoded embed URL")):
|
||||
"""Proxy a third-party embed page so it can be iframed in our origin.
|
||||
|
||||
Strips X-Frame-Options and CSP frame-ancestors from the upstream
|
||||
response, injects a base href + frame-buster-defeat script, and
|
||||
forwards a plausible Referer/Origin to bypass upstream allowlists.
|
||||
"""
|
||||
body, headers, status_code = await fetch_embed(url)
|
||||
return Response(content=body, headers=headers, status_code=status_code)
|
||||
|
||||
|
||||
@app.get("/embed-asset")
|
||||
async def embed_asset(
|
||||
request: Request,
|
||||
url: str = Query(..., description="Base64url-encoded subresource URL"),
|
||||
):
|
||||
"""Relay an upstream subresource (JS/CSS/image/etc.) for the embed proxy.
|
||||
|
||||
Used as a fallback when an upstream blocks hotlinked assets via Origin
|
||||
or Referer checks. Most assets load directly via the injected <base>
|
||||
tag without going through this endpoint.
|
||||
"""
|
||||
range_header = request.headers.get("range")
|
||||
stream_gen, headers, status_code = await relay_asset(url, range_header)
|
||||
return StreamingResponse(stream_gen, headers=headers, status_code=status_code)
|
||||
|
||||
|
||||
# --- Frontend Static Files ---
|
||||
# Mount the SvelteKit static build AFTER all API routes so API endpoints take priority.
|
||||
# SvelteKit adapter-static with ssr=false produces {page}.html files and a fallback index.html.
|
||||
# Starlette StaticFiles(html=True) only checks {path}/index.html, not {path}.html.
|
||||
# We use a catch-all route to handle both patterns and the SPA fallback.
|
||||
_frontend_dir = os.path.realpath(os.path.join(os.path.dirname(__file__), "..", "frontend", "build"))
|
||||
if os.path.exists(_frontend_dir):
|
||||
from starlette.responses import FileResponse, HTMLResponse
|
||||
|
||||
_fallback_path = os.path.join(_frontend_dir, "index.html")
|
||||
|
||||
@app.get("/{path:path}")
|
||||
async def serve_frontend(path: str):
|
||||
"""Serve SvelteKit frontend files with SPA fallback."""
|
||||
for candidate in [
|
||||
os.path.join(_frontend_dir, path),
|
||||
os.path.join(_frontend_dir, f"{path}.html"),
|
||||
os.path.join(_frontend_dir, path, "index.html"),
|
||||
]:
|
||||
real = os.path.realpath(candidate)
|
||||
if real.startswith(_frontend_dir) and os.path.isfile(real):
|
||||
return FileResponse(real)
|
||||
# SPA fallback for client-side routing
|
||||
if os.path.isfile(_fallback_path):
|
||||
return FileResponse(_fallback_path)
|
||||
return Response(content="Not Found", status_code=404)
|
||||
|
||||
logger.info("Serving frontend from %s", _frontend_dir)
|
||||
else:
|
||||
# Fallback root when no frontend build exists
|
||||
@app.get("/")
|
||||
async def root():
|
||||
return {"service": "f1-streams", "version": "5.0.0"}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import uvicorn
|
||||
|
||||
uvicorn.run(app, host="0.0.0.0", port=8000)
|
||||
478
stacks/f1-stream/files/backend/playback_verifier.py
Normal file
478
stacks/f1-stream/files/backend/playback_verifier.py
Normal file
|
|
@ -0,0 +1,478 @@
|
|||
"""Headless-browser playback verification for extracted streams.
|
||||
|
||||
The basic health checker (backend/health.py) only validates m3u8 syntax.
|
||||
For embed/iframe streams it has nothing to check — the previous code blindly
|
||||
marked every embed `is_live=True`, which meant the stream list was full of
|
||||
news articles and aggregator landing pages that never actually played.
|
||||
|
||||
This module loads each candidate stream URL in headless Chromium (via
|
||||
Playwright) and looks for *codec-independent* signals that the upstream
|
||||
serves a playable stream:
|
||||
|
||||
- For m3u8: hls.js receives MANIFEST_PARSED + at least one FRAG_LOADED
|
||||
event. We don't wait for `<video>` to gain dimensions, because Playwright's
|
||||
chromium build doesn't include the H.264/AAC codecs. The user's real
|
||||
browser does, so confirming "manifest + segment fetch succeed" is the
|
||||
right server-side signal.
|
||||
- For embed: a `<video>` element appears at top level OR inside the iframe
|
||||
(the embed proxy strips X-Frame-Options + frame-buster JS so we can
|
||||
introspect the iframe content), OR the player has set up a MediaSource.
|
||||
|
||||
Designed to be called from the extraction service's run_extraction()
|
||||
hook, with bounded concurrency. Each verification typically takes
|
||||
4-12 seconds.
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import base64
|
||||
import logging
|
||||
import os
|
||||
import time
|
||||
from dataclasses import dataclass
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Toggle off in development by setting PLAYBACK_VERIFY_ENABLED=false.
|
||||
VERIFY_ENABLED = os.getenv("PLAYBACK_VERIFY_ENABLED", "true").lower() in ("true", "1", "yes")
|
||||
|
||||
# Maximum number of concurrent browser pages.
|
||||
MAX_CONCURRENCY = int(os.getenv("PLAYBACK_VERIFY_CONCURRENCY", "2"))
|
||||
|
||||
# Per-stream verification budget (seconds). Beyond this we declare unplayable.
|
||||
PER_STREAM_TIMEOUT = float(os.getenv("PLAYBACK_VERIFY_TIMEOUT", "20"))
|
||||
|
||||
# Where the embed proxy lives, used to wrap embed URLs so they bypass
|
||||
# X-Frame-Options/CSP/JS frame-busters during verification. Defaults to
|
||||
# loopback because verification runs inside the same FastAPI process.
|
||||
PROXY_BASE = os.getenv("PLAYBACK_VERIFY_PROXY_BASE", "http://127.0.0.1:8000")
|
||||
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlaybackVerdict:
|
||||
is_playable: bool
|
||||
signal: str = "" # which check triggered the positive verdict
|
||||
elapsed_ms: int = 0
|
||||
error: str = ""
|
||||
|
||||
|
||||
def _b64url(s: str) -> str:
|
||||
"""URL-safe base64 with padding stripped — matches m3u8_rewriter.encode_url."""
|
||||
return base64.urlsafe_b64encode(s.encode()).decode().rstrip("=")
|
||||
|
||||
|
||||
def _hls_test_html(m3u8_url: str) -> str:
|
||||
"""A self-contained HTML page that loads an m3u8 via hls.js into a <video>.
|
||||
|
||||
The page exposes window._verifier with manifest_parsed / frag_loaded
|
||||
booleans the verifier polls. It also marks media-error or fatal-error
|
||||
so we can distinguish 'upstream is unreachable' from 'codec missing'.
|
||||
"""
|
||||
return f"""<!doctype html>
|
||||
<html><head><meta charset="utf-8"><title>verify</title>
|
||||
<script src="https://cdn.jsdelivr.net/npm/hls.js@1.5/dist/hls.min.js"></script>
|
||||
</head><body>
|
||||
<video id="v" muted playsinline width="640" height="360"></video>
|
||||
<script>
|
||||
window._verifier = {{
|
||||
manifest_parsed: false,
|
||||
frag_loaded: false,
|
||||
media_loaded: false, // true when MSE has appended any buffer
|
||||
fatal_network_error: false, // upstream truly unreachable
|
||||
manifest_incompatible: false, // codec missing — separate from network reachability
|
||||
hls_error_details: ""
|
||||
}};
|
||||
const v = document.getElementById('v');
|
||||
const url = {m3u8_url!r};
|
||||
function start() {{
|
||||
if (window.Hls && Hls.isSupported()) {{
|
||||
const hls = new Hls({{enableWorker: true}});
|
||||
hls.on(Hls.Events.MANIFEST_PARSED, () => {{ window._verifier.manifest_parsed = true; }});
|
||||
hls.on(Hls.Events.FRAG_LOADED, () => {{ window._verifier.frag_loaded = true; }});
|
||||
hls.on(Hls.Events.BUFFER_APPENDED, () => {{ window._verifier.media_loaded = true; }});
|
||||
hls.on(Hls.Events.ERROR, (_, d) => {{
|
||||
window._verifier.hls_error_details = d.details || "";
|
||||
if (d.fatal && d.type === Hls.ErrorTypes.NETWORK_ERROR) {{
|
||||
window._verifier.fatal_network_error = true;
|
||||
}}
|
||||
if (d.details === Hls.ErrorDetails.MANIFEST_INCOMPATIBLE_CODECS_ERROR) {{
|
||||
window._verifier.manifest_incompatible = true;
|
||||
}}
|
||||
}});
|
||||
hls.loadSource(url);
|
||||
hls.attachMedia(v);
|
||||
}} else if (v.canPlayType('application/vnd.apple.mpegurl')) {{
|
||||
v.src = url;
|
||||
v.addEventListener('loadedmetadata', () => {{ window._verifier.manifest_parsed = true; window._verifier.frag_loaded = true; }});
|
||||
v.addEventListener('error', () => {{ window._verifier.fatal_network_error = true; }});
|
||||
}} else {{
|
||||
window._verifier.hls_error_details = "no hls support";
|
||||
}}
|
||||
}}
|
||||
window.addEventListener('load', start);
|
||||
</script></body></html>"""
|
||||
|
||||
|
||||
def _embed_test_html(_proxied_embed_url: str) -> str:
|
||||
"""No longer used — verifier navigates the page directly to the proxy URL.
|
||||
|
||||
The earlier iframe-wrapper approach hit same-origin policy when inspecting
|
||||
the iframe's contentDocument (the wrapper page was a data: URL, the iframe
|
||||
was http://127.0.0.1:8000), so we couldn't read the embed's DOM.
|
||||
"""
|
||||
return ""
|
||||
|
||||
|
||||
_M3U8_POLL_JS = """
|
||||
() => {
|
||||
const v = window._verifier || {};
|
||||
const vid = document.querySelector('video');
|
||||
return {
|
||||
manifest_parsed: !!v.manifest_parsed,
|
||||
frag_loaded: !!v.frag_loaded,
|
||||
media_loaded: !!v.media_loaded,
|
||||
fatal_network_error: !!v.fatal_network_error,
|
||||
manifest_incompatible: !!v.manifest_incompatible,
|
||||
hls_error_details: v.hls_error_details || "",
|
||||
video_width: vid ? vid.videoWidth : 0,
|
||||
video_ready: vid ? vid.readyState : 0,
|
||||
};
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
_EMBED_POLL_JS = """
|
||||
() => {
|
||||
try {
|
||||
const vids = document.querySelectorAll('video');
|
||||
if (vids.length > 0) {
|
||||
const v = vids[0];
|
||||
return {
|
||||
has_video: true,
|
||||
src: v.currentSrc || v.src || "",
|
||||
width: v.videoWidth,
|
||||
ready: v.readyState,
|
||||
duration: isFinite(v.duration) ? v.duration : 0,
|
||||
media_keys: !!v.mediaKeys,
|
||||
sources: v.querySelectorAll('source').length,
|
||||
};
|
||||
}
|
||||
return {has_video: false};
|
||||
} catch (e) {
|
||||
return {has_video: false, err: String(e)};
|
||||
}
|
||||
}
|
||||
"""
|
||||
|
||||
|
||||
async def _verify_m3u8(page, m3u8_url: str, deadline: float) -> PlaybackVerdict:
|
||||
"""Confirm an m3u8 URL is fetchable via hls.js end-to-end.
|
||||
|
||||
Positive signal hierarchy:
|
||||
1. media_loaded (MSE buffer appended) — strongest, codec-supported.
|
||||
2. frag_loaded (hls.js fetched at least one segment) — upstream is OK
|
||||
even if the local browser lacks codecs.
|
||||
3. manifest_parsed without media_loaded but with manifest_incompatible
|
||||
— indicates upstream playlist is valid; player can't decode here
|
||||
but a real user's browser will.
|
||||
Negative signal:
|
||||
- fatal_network_error: upstream is unreachable.
|
||||
- timeout with no manifest_parsed: upstream did not respond.
|
||||
"""
|
||||
start = time.monotonic()
|
||||
html = _hls_test_html(m3u8_url)
|
||||
data_url = "data:text/html;base64," + base64.b64encode(html.encode()).decode()
|
||||
|
||||
try:
|
||||
await page.goto(data_url, wait_until="domcontentloaded", timeout=10_000)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"goto failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
|
||||
last_state: dict = {}
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
state = await page.evaluate(_M3U8_POLL_JS)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"evaluate failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
last_state = state
|
||||
if state.get("media_loaded"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="media_loaded",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if state.get("frag_loaded"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="frag_loaded",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
# MANIFEST_INCOMPATIBLE_CODECS_ERROR fires after hls.js successfully
|
||||
# fetched and parsed the manifest — the failure is purely local
|
||||
# (chromium lacks H.264). The user's real browser has codecs, so
|
||||
# this URL is playable from the user's perspective.
|
||||
if state.get("manifest_incompatible"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="manifest_parsed_codec_missing_in_verifier",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if state.get("manifest_parsed"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="manifest_parsed",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if state.get("fatal_network_error"):
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error="upstream network error",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
await asyncio.sleep(0.25)
|
||||
|
||||
err = "no playback signal"
|
||||
if last_state.get("hls_error_details"):
|
||||
err = f"hls.js error: {last_state['hls_error_details']}"
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=err,
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
|
||||
|
||||
async def _verify_embed(page, proxied_url: str, deadline: float) -> PlaybackVerdict:
|
||||
"""Navigate directly to the proxied embed and confirm a player rendered.
|
||||
|
||||
Positive signals (in priority order):
|
||||
- <video> with src/sources/mediaKeys set (player wired up).
|
||||
- <video> element exists with any state (script ran, player attaching).
|
||||
- A player container div (jwplayer, video-js, [id*=player], etc.).
|
||||
|
||||
Loading the embed page directly (not via iframe wrapper) avoids the
|
||||
same-origin policy that prevented earlier iframe-introspection runs
|
||||
from seeing the embed DOM.
|
||||
"""
|
||||
start = time.monotonic()
|
||||
try:
|
||||
await page.goto(proxied_url, wait_until="domcontentloaded", timeout=15_000)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"goto failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
|
||||
# Track the best state seen across all polls. Some embeds load a player
|
||||
# briefly then anti-bot JS tears the DOM down (hmembeds redirects to
|
||||
# google.com if its devtool-detection trips). We accept any positive
|
||||
# signal observed during the window, even if it's gone by timeout.
|
||||
#
|
||||
# We require an actual <video> element — a "player container div"
|
||||
# is too weak (sportsurge has player-class divs but no real player).
|
||||
seen_video_wired = False
|
||||
seen_video_tag = False
|
||||
last_err = ""
|
||||
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
r = await page.evaluate(_EMBED_POLL_JS)
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"evaluate failed: {e}",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
if r.get("has_video"):
|
||||
seen_video_tag = True
|
||||
if r.get("src") or r.get("width", 0) > 0 or r.get("media_keys") or r.get("sources", 0) > 0:
|
||||
seen_video_wired = True
|
||||
return PlaybackVerdict(
|
||||
is_playable=True, signal="video.wired",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000),
|
||||
)
|
||||
last_err = r.get("err", "")
|
||||
await asyncio.sleep(0.5)
|
||||
|
||||
if seen_video_wired:
|
||||
return PlaybackVerdict(is_playable=True, signal="video.wired",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000))
|
||||
if seen_video_tag:
|
||||
return PlaybackVerdict(is_playable=True, signal="video.tag_only",
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000))
|
||||
|
||||
err = "no <video> element rendered"
|
||||
if last_err:
|
||||
err += f"; last_err: {last_err}"
|
||||
return PlaybackVerdict(is_playable=False, error=err,
|
||||
elapsed_ms=int((time.monotonic() - start) * 1000))
|
||||
|
||||
|
||||
class PlaybackVerifier:
|
||||
"""Verifies playability of m3u8 and embed URLs via headless Chromium.
|
||||
|
||||
Manages a single browser instance for the process lifetime (cheap per-page
|
||||
contexts) and bounds concurrency with a semaphore.
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._browser = None
|
||||
self._playwright = None
|
||||
self._sem = asyncio.Semaphore(MAX_CONCURRENCY)
|
||||
self._lock = asyncio.Lock()
|
||||
|
||||
async def _ensure_browser(self):
|
||||
if self._browser is not None:
|
||||
return self._browser
|
||||
async with self._lock:
|
||||
if self._browser is not None:
|
||||
return self._browser
|
||||
try:
|
||||
from playwright.async_api import async_playwright
|
||||
except ImportError:
|
||||
logger.error("playwright not installed — playback verification disabled")
|
||||
return None
|
||||
self._playwright = await async_playwright().start()
|
||||
# CHROME_CDP_URL points to chrome-service's CDP endpoint
|
||||
# (http://chrome-service.chrome-service.svc:9222 by default).
|
||||
# Migrated 2026-06-04 from `chromium.connect(ws_url)` because
|
||||
# chrome-service now runs chromium directly with persistent
|
||||
# user-data-dir for cookie warming — launch-server couldn't
|
||||
# persist. The CDP `Browser` exposes the persistent default
|
||||
# context via `browser.contexts[0]`; here we just call
|
||||
# `new_context()` for incognito-style isolation per verify
|
||||
# round, matching the previous behaviour.
|
||||
cdp_url = os.getenv("CHROME_CDP_URL")
|
||||
if cdp_url:
|
||||
try:
|
||||
self._browser = await self._playwright.chromium.connect_over_cdp(
|
||||
cdp_url, timeout=15_000,
|
||||
)
|
||||
logger.info("connected to remote chrome-service via CDP (concurrency=%d)", MAX_CONCURRENCY)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"CDP connect failed (%s) — falling back to in-process Chromium", cdp_url,
|
||||
)
|
||||
self._browser = None
|
||||
if self._browser is None:
|
||||
# Either CHROME_CDP_URL was unset, or CDP connect failed.
|
||||
# Fall back to in-process headless so the verifier still
|
||||
# returns playable/unplayable verdicts (degraded but
|
||||
# functional — anti-bot pages may bypass).
|
||||
self._browser = await self._playwright.chromium.launch(
|
||||
headless=True,
|
||||
args=[
|
||||
"--disable-dev-shm-usage",
|
||||
"--disable-web-security",
|
||||
"--no-sandbox",
|
||||
"--disable-setuid-sandbox",
|
||||
"--disable-features=IsolateOrigins,site-per-process",
|
||||
"--autoplay-policy=no-user-gesture-required",
|
||||
],
|
||||
)
|
||||
logger.warning(
|
||||
"using in-process Chromium (CHROME_CDP_URL unset or CDP connect failed) (concurrency=%d)",
|
||||
MAX_CONCURRENCY,
|
||||
)
|
||||
return self._browser
|
||||
|
||||
async def shutdown(self) -> None:
|
||||
if self._browser is not None:
|
||||
try:
|
||||
await self._browser.close()
|
||||
except Exception:
|
||||
logger.exception("error closing browser")
|
||||
if self._playwright is not None:
|
||||
try:
|
||||
await self._playwright.stop()
|
||||
except Exception:
|
||||
logger.exception("error stopping playwright")
|
||||
self._browser = None
|
||||
self._playwright = None
|
||||
|
||||
async def verify(self, url: str, stream_type: str) -> PlaybackVerdict:
|
||||
if not VERIFY_ENABLED:
|
||||
return PlaybackVerdict(is_playable=True, error="disabled")
|
||||
|
||||
browser = await self._ensure_browser()
|
||||
if browser is None:
|
||||
return PlaybackVerdict(is_playable=False, error="playwright unavailable")
|
||||
|
||||
is_m3u8 = stream_type == "m3u8"
|
||||
if is_m3u8:
|
||||
# Route m3u8 fetches through our own /proxy so the verifier gets a
|
||||
# same-origin response with ACAO:* — matches what the frontend does
|
||||
# (frontend `getProxyUrl` wraps every m3u8 via /proxy anyway). Without
|
||||
# this, hosts like oe1.ossfeed.store that only return CORS headers
|
||||
# for specific Origins (e.g. pushembdz.store) trigger an immediate
|
||||
# `fatal_network_error` in hls.js and the stream is marked dead.
|
||||
url = f"{PROXY_BASE}/proxy?url={_b64url(url)}"
|
||||
else:
|
||||
url = f"{PROXY_BASE}/embed?url={_b64url(url)}"
|
||||
|
||||
async with self._sem:
|
||||
# Set the per-stream deadline AFTER acquiring the semaphore.
|
||||
# Otherwise queued streams that wait behind earlier ones
|
||||
# would have already-expired deadlines when they start.
|
||||
deadline = time.monotonic() + PER_STREAM_TIMEOUT
|
||||
try:
|
||||
context = await browser.new_context(
|
||||
user_agent=USER_AGENT,
|
||||
viewport={"width": 1280, "height": 720},
|
||||
bypass_csp=True,
|
||||
)
|
||||
from backend.stealth import STEALTH_JS
|
||||
await context.add_init_script(STEALTH_JS)
|
||||
page = await context.new_page()
|
||||
except Exception as e:
|
||||
return PlaybackVerdict(
|
||||
is_playable=False, error=f"context create failed: {e}",
|
||||
)
|
||||
try:
|
||||
if is_m3u8:
|
||||
verdict = await _verify_m3u8(page, url, deadline)
|
||||
else:
|
||||
verdict = await _verify_embed(page, url, deadline)
|
||||
except asyncio.TimeoutError:
|
||||
verdict = PlaybackVerdict(is_playable=False, error="overall timeout")
|
||||
except Exception as e:
|
||||
verdict = PlaybackVerdict(
|
||||
is_playable=False, error=f"verify exception: {e}",
|
||||
)
|
||||
finally:
|
||||
try:
|
||||
await page.close()
|
||||
await context.close()
|
||||
except Exception:
|
||||
pass
|
||||
logger.info(
|
||||
"[verify] %s -> playable=%s signal=%s err=%s elapsed=%dms",
|
||||
url[:120], verdict.is_playable, verdict.signal,
|
||||
verdict.error, verdict.elapsed_ms,
|
||||
)
|
||||
return verdict
|
||||
|
||||
async def verify_many(self, items: list[tuple[str, str]]) -> dict[str, PlaybackVerdict]:
|
||||
if not items:
|
||||
return {}
|
||||
if not VERIFY_ENABLED:
|
||||
return {url: PlaybackVerdict(is_playable=True, error="disabled") for url, _ in items}
|
||||
|
||||
async def _run(url: str, stream_type: str):
|
||||
verdict = await self.verify(url, stream_type)
|
||||
return url, verdict
|
||||
|
||||
results = await asyncio.gather(
|
||||
*[_run(url, st) for url, st in items], return_exceptions=True
|
||||
)
|
||||
out: dict[str, PlaybackVerdict] = {}
|
||||
for r in results:
|
||||
if isinstance(r, Exception):
|
||||
logger.exception("verify task crashed: %s", r)
|
||||
continue
|
||||
url, verdict = r
|
||||
out[url] = verdict
|
||||
return out
|
||||
501
stacks/f1-stream/files/backend/proxy.py
Normal file
501
stacks/f1-stream/files/backend/proxy.py
Normal file
|
|
@ -0,0 +1,501 @@
|
|||
"""HLS proxy - fetches upstream m3u8 playlists and relays media segments.
|
||||
|
||||
Three core functions:
|
||||
1. Playlist proxy: fetches an upstream m3u8 playlist, rewrites all URIs
|
||||
to route through our /proxy and /relay endpoints, returns the rewritten
|
||||
playlist to the client.
|
||||
2. Quality selection: when the upstream m3u8 is a master playlist containing
|
||||
multiple quality variants, allows selecting a specific variant by index.
|
||||
3. Segment relay: fetches an upstream media segment (TS, fMP4, init) and
|
||||
streams it to the client using chunked transfer encoding, never buffering
|
||||
the full segment in memory.
|
||||
|
||||
All responses include CORS headers for browser playback.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from typing import AsyncGenerator
|
||||
from urllib.parse import urljoin
|
||||
|
||||
import httpx
|
||||
from fastapi import HTTPException
|
||||
|
||||
from backend.m3u8_rewriter import decode_url, rewrite_playlist
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Chunk size for relay streaming (64 KB)
|
||||
RELAY_CHUNK_SIZE = 65536
|
||||
|
||||
# Timeout for upstream playlist fetches (seconds)
|
||||
PLAYLIST_TIMEOUT = 15.0
|
||||
|
||||
# Timeout for upstream segment relay - longer because segments are bigger
|
||||
RELAY_TIMEOUT = 30.0
|
||||
|
||||
# User-Agent for upstream requests
|
||||
USER_AGENT = (
|
||||
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
|
||||
"AppleWebKit/537.36 (KHTML, like Gecko) "
|
||||
"Chrome/120.0.0.0 Safari/537.36"
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class QualityVariant:
|
||||
"""A single quality variant parsed from a master HLS playlist."""
|
||||
|
||||
index: int # 0-based index in the playlist
|
||||
bandwidth: int # BANDWIDTH value in bits/sec
|
||||
resolution: str # e.g., "1920x1080" or "" if not specified
|
||||
codecs: str # e.g., "avc1.640028,mp4a.40.2" or "" if not specified
|
||||
name: str # e.g., "720p" or "" if not specified
|
||||
uri: str # The variant playlist URI (absolute)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Serialize to a plain dictionary for JSON responses."""
|
||||
return {
|
||||
"index": self.index,
|
||||
"bandwidth": self.bandwidth,
|
||||
"resolution": self.resolution,
|
||||
"codecs": self.codecs,
|
||||
"name": self.name,
|
||||
"uri": self.uri,
|
||||
}
|
||||
|
||||
|
||||
def _is_master_playlist(content: str) -> bool:
|
||||
"""Check if an m3u8 playlist is a master playlist (contains variant streams).
|
||||
|
||||
A master playlist contains #EXT-X-STREAM-INF tags pointing to variant
|
||||
playlists. A media playlist contains #EXTINF tags pointing to segments.
|
||||
|
||||
Args:
|
||||
content: The raw m3u8 playlist text.
|
||||
|
||||
Returns:
|
||||
True if this is a master playlist.
|
||||
"""
|
||||
return "#EXT-X-STREAM-INF:" in content
|
||||
|
||||
|
||||
def parse_quality_variants(content: str, base_url: str) -> list[QualityVariant]:
|
||||
"""Parse quality variants from a master HLS playlist.
|
||||
|
||||
Extracts all #EXT-X-STREAM-INF entries and their associated URIs.
|
||||
|
||||
Args:
|
||||
content: The raw m3u8 master playlist text.
|
||||
base_url: The URL of the playlist (for resolving relative URIs).
|
||||
|
||||
Returns:
|
||||
List of QualityVariant objects sorted by bandwidth (highest first).
|
||||
"""
|
||||
variants: list[QualityVariant] = []
|
||||
lines = content.splitlines()
|
||||
index = 0
|
||||
|
||||
for i, line in enumerate(lines):
|
||||
stripped = line.strip()
|
||||
if not stripped.startswith("#EXT-X-STREAM-INF:"):
|
||||
continue
|
||||
|
||||
# Parse attributes from the STREAM-INF tag
|
||||
attrs = stripped[len("#EXT-X-STREAM-INF:"):]
|
||||
|
||||
bandwidth = _parse_attr_int(attrs, "BANDWIDTH")
|
||||
resolution = _parse_attr_str(attrs, "RESOLUTION")
|
||||
codecs = _parse_attr_quoted(attrs, "CODECS")
|
||||
name = _parse_attr_quoted(attrs, "NAME")
|
||||
|
||||
# The next non-empty, non-comment line is the variant URI
|
||||
uri = ""
|
||||
for j in range(i + 1, len(lines)):
|
||||
next_line = lines[j].strip()
|
||||
if next_line and not next_line.startswith("#"):
|
||||
uri = next_line
|
||||
break
|
||||
|
||||
if not uri:
|
||||
continue
|
||||
|
||||
# Resolve relative URI
|
||||
if not uri.startswith("http://") and not uri.startswith("https://"):
|
||||
uri = urljoin(base_url, uri)
|
||||
|
||||
# Generate a human-readable name if not provided
|
||||
if not name and resolution:
|
||||
# Extract height from resolution (e.g., "1920x1080" -> "1080p")
|
||||
parts = resolution.split("x")
|
||||
if len(parts) == 2:
|
||||
name = f"{parts[1]}p"
|
||||
|
||||
variants.append(QualityVariant(
|
||||
index=index,
|
||||
bandwidth=bandwidth,
|
||||
resolution=resolution,
|
||||
codecs=codecs,
|
||||
name=name,
|
||||
uri=uri,
|
||||
))
|
||||
index += 1
|
||||
|
||||
# Sort by bandwidth descending (highest quality first)
|
||||
variants.sort(key=lambda v: v.bandwidth, reverse=True)
|
||||
# Re-index after sorting
|
||||
for i, v in enumerate(variants):
|
||||
v.index = i
|
||||
|
||||
return variants
|
||||
|
||||
|
||||
def _select_variant_playlist(
|
||||
content: str, base_url: str, variant_index: int
|
||||
) -> str:
|
||||
"""Extract a single variant from a master playlist by index.
|
||||
|
||||
Instead of returning the full master playlist, returns just the selected
|
||||
variant's media playlist URL. The caller should then fetch and proxy that
|
||||
URL instead.
|
||||
|
||||
Args:
|
||||
content: The raw m3u8 master playlist text.
|
||||
base_url: The URL of the playlist (for resolving relative URIs).
|
||||
variant_index: 0-based index of the desired variant (sorted by bandwidth desc).
|
||||
|
||||
Returns:
|
||||
The absolute URL of the selected variant's media playlist.
|
||||
|
||||
Raises:
|
||||
HTTPException: If the variant index is out of range.
|
||||
"""
|
||||
variants = parse_quality_variants(content, base_url)
|
||||
|
||||
if not variants:
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail="Playlist has no quality variants to select from",
|
||||
)
|
||||
|
||||
if variant_index < 0 or variant_index >= len(variants):
|
||||
raise HTTPException(
|
||||
status_code=400,
|
||||
detail=f"Quality index {variant_index} out of range (0-{len(variants) - 1})",
|
||||
)
|
||||
|
||||
selected = variants[variant_index]
|
||||
logger.info(
|
||||
"Selected quality variant %d: %s (%d bps, %s)",
|
||||
variant_index,
|
||||
selected.name or "unknown",
|
||||
selected.bandwidth,
|
||||
selected.resolution or "no resolution",
|
||||
)
|
||||
|
||||
return selected.uri
|
||||
|
||||
|
||||
def _parse_attr_int(attrs: str, name: str) -> int:
|
||||
"""Parse an integer attribute from an HLS tag attribute string.
|
||||
|
||||
Args:
|
||||
attrs: The attribute string (e.g., 'BANDWIDTH=1280000,RESOLUTION=720x480').
|
||||
name: The attribute name to extract.
|
||||
|
||||
Returns:
|
||||
The integer value, or 0 if not found.
|
||||
"""
|
||||
match = re.search(rf"{name}=(\d+)", attrs)
|
||||
return int(match.group(1)) if match else 0
|
||||
|
||||
|
||||
def _parse_attr_str(attrs: str, name: str) -> str:
|
||||
"""Parse a bare (unquoted) string attribute from an HLS tag attribute string.
|
||||
|
||||
Args:
|
||||
attrs: The attribute string.
|
||||
name: The attribute name to extract.
|
||||
|
||||
Returns:
|
||||
The string value, or empty string if not found.
|
||||
"""
|
||||
match = re.search(rf"{name}=([^,\s\"]+)", attrs)
|
||||
return match.group(1) if match else ""
|
||||
|
||||
|
||||
def _parse_attr_quoted(attrs: str, name: str) -> str:
|
||||
"""Parse a quoted string attribute from an HLS tag attribute string.
|
||||
|
||||
Args:
|
||||
attrs: The attribute string.
|
||||
name: The attribute name to extract.
|
||||
|
||||
Returns:
|
||||
The string value (without quotes), or empty string if not found.
|
||||
"""
|
||||
match = re.search(rf'{name}="([^"]*)"', attrs)
|
||||
return match.group(1) if match else ""
|
||||
|
||||
|
||||
async def proxy_playlist(
|
||||
encoded_url: str, proxy_base: str, quality: int | None = None
|
||||
) -> str:
|
||||
"""Fetch an upstream m3u8 playlist and rewrite all URIs through our proxy.
|
||||
|
||||
If the upstream playlist is a master playlist (containing multiple quality
|
||||
variants) and a quality index is specified, fetches the selected variant's
|
||||
media playlist instead and rewrites that.
|
||||
|
||||
Args:
|
||||
encoded_url: Base64url-encoded URL of the upstream m3u8 playlist.
|
||||
proxy_base: The base URL of our proxy service for rewriting URIs
|
||||
(e.g., "https://f1.viktorbarzin.me").
|
||||
quality: Optional 0-based index of the desired quality variant.
|
||||
Only applies when the upstream is a master playlist.
|
||||
Variants are sorted by bandwidth descending (0 = highest).
|
||||
|
||||
Returns:
|
||||
The rewritten m3u8 playlist text.
|
||||
|
||||
Raises:
|
||||
HTTPException: If the URL can't be decoded, upstream fails, or
|
||||
content is not a valid HLS playlist.
|
||||
"""
|
||||
# Decode the URL
|
||||
try:
|
||||
url = decode_url(encoded_url)
|
||||
except Exception as e:
|
||||
logger.error("Failed to decode proxy URL: %s", e)
|
||||
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
|
||||
|
||||
logger.info("Proxying playlist: %s", url)
|
||||
|
||||
# Fetch the upstream playlist
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=PLAYLIST_TIMEOUT,
|
||||
follow_redirects=True,
|
||||
headers={
|
||||
"User-Agent": USER_AGENT,
|
||||
"Accept": "*/*",
|
||||
},
|
||||
) as client:
|
||||
response = await client.get(url)
|
||||
|
||||
if response.status_code != 200:
|
||||
logger.warning(
|
||||
"Upstream playlist returned HTTP %d for %s",
|
||||
response.status_code,
|
||||
url,
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Upstream returned HTTP {response.status_code}",
|
||||
)
|
||||
|
||||
content = response.text
|
||||
|
||||
except httpx.TimeoutException:
|
||||
logger.error("Timeout fetching upstream playlist: %s", url)
|
||||
raise HTTPException(status_code=504, detail="Upstream playlist timeout")
|
||||
except httpx.HTTPError as e:
|
||||
logger.error("HTTP error fetching upstream playlist: %s - %s", url, e)
|
||||
raise HTTPException(status_code=502, detail=f"Upstream error: {e}")
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.exception("Unexpected error fetching playlist: %s", url)
|
||||
raise HTTPException(status_code=500, detail=f"Internal error: {e}")
|
||||
|
||||
# Validate it looks like an m3u8 playlist
|
||||
if "#EXTM3U" not in content:
|
||||
logger.warning("Upstream response is not a valid m3u8 playlist: %s", url)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail="Upstream response is not a valid HLS playlist",
|
||||
)
|
||||
|
||||
# If this is a master playlist and a quality variant was requested,
|
||||
# fetch the selected variant's media playlist instead
|
||||
if quality is not None and _is_master_playlist(content):
|
||||
variant_url = _select_variant_playlist(content, url, quality)
|
||||
logger.info("Fetching selected variant playlist: %s", variant_url)
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(
|
||||
timeout=PLAYLIST_TIMEOUT,
|
||||
follow_redirects=True,
|
||||
headers={
|
||||
"User-Agent": USER_AGENT,
|
||||
"Accept": "*/*",
|
||||
},
|
||||
) as client:
|
||||
variant_response = await client.get(variant_url)
|
||||
|
||||
if variant_response.status_code != 200:
|
||||
logger.warning(
|
||||
"Variant playlist returned HTTP %d for %s",
|
||||
variant_response.status_code,
|
||||
variant_url,
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Variant playlist returned HTTP {variant_response.status_code}",
|
||||
)
|
||||
|
||||
content = variant_response.text
|
||||
url = variant_url # Use variant URL as base for relative URI resolution
|
||||
|
||||
if "#EXTM3U" not in content:
|
||||
logger.warning(
|
||||
"Variant playlist is not valid m3u8: %s", variant_url
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail="Variant playlist is not a valid HLS playlist",
|
||||
)
|
||||
|
||||
except httpx.TimeoutException:
|
||||
logger.error("Timeout fetching variant playlist: %s", variant_url)
|
||||
raise HTTPException(
|
||||
status_code=504, detail="Variant playlist timeout"
|
||||
)
|
||||
except httpx.HTTPError as e:
|
||||
logger.error(
|
||||
"HTTP error fetching variant playlist: %s - %s", variant_url, e
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=502, detail=f"Variant playlist error: {e}"
|
||||
)
|
||||
except HTTPException:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.exception(
|
||||
"Unexpected error fetching variant playlist: %s", variant_url
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=500, detail=f"Internal error: {e}"
|
||||
)
|
||||
|
||||
# Rewrite all URIs to go through our proxy
|
||||
rewritten = rewrite_playlist(content, url, proxy_base)
|
||||
|
||||
logger.debug(
|
||||
"Proxied playlist from %s: %d bytes -> %d bytes",
|
||||
url,
|
||||
len(content),
|
||||
len(rewritten),
|
||||
)
|
||||
|
||||
return rewritten
|
||||
|
||||
|
||||
async def relay_stream(
|
||||
encoded_url: str,
|
||||
range_header: str | None = None,
|
||||
) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
|
||||
"""Relay an upstream media segment as a chunked byte stream.
|
||||
|
||||
Never buffers the full segment in memory. Streams chunks as they
|
||||
arrive from the upstream server.
|
||||
|
||||
Args:
|
||||
encoded_url: Base64url-encoded URL of the upstream segment.
|
||||
range_header: Optional HTTP Range header from the client to
|
||||
forward to upstream.
|
||||
|
||||
Returns:
|
||||
A tuple of (async_generator, headers_dict, status_code) where:
|
||||
- async_generator yields bytes chunks
|
||||
- headers_dict contains content-type and other relevant headers
|
||||
- status_code is the HTTP status (200 or 206)
|
||||
|
||||
Raises:
|
||||
HTTPException: If the URL can't be decoded or upstream fails.
|
||||
"""
|
||||
# Decode the URL
|
||||
try:
|
||||
url = decode_url(encoded_url)
|
||||
except Exception as e:
|
||||
logger.error("Failed to decode relay URL: %s", e)
|
||||
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
|
||||
|
||||
logger.debug("Relaying segment: %s", url)
|
||||
|
||||
# Build upstream request headers
|
||||
headers = {
|
||||
"User-Agent": USER_AGENT,
|
||||
"Accept": "*/*",
|
||||
}
|
||||
if range_header:
|
||||
headers["Range"] = range_header
|
||||
|
||||
# Create the client and stream - caller is responsible for cleanup
|
||||
# via the async generator protocol
|
||||
client = httpx.AsyncClient(
|
||||
timeout=RELAY_TIMEOUT,
|
||||
follow_redirects=True,
|
||||
)
|
||||
|
||||
try:
|
||||
response = await client.send(
|
||||
client.build_request("GET", url, headers=headers),
|
||||
stream=True,
|
||||
)
|
||||
|
||||
if response.status_code not in (200, 206):
|
||||
await response.aclose()
|
||||
await client.aclose()
|
||||
logger.warning(
|
||||
"Upstream segment returned HTTP %d for %s",
|
||||
response.status_code,
|
||||
url,
|
||||
)
|
||||
raise HTTPException(
|
||||
status_code=502,
|
||||
detail=f"Upstream returned HTTP {response.status_code}",
|
||||
)
|
||||
|
||||
# Collect relevant response headers to forward
|
||||
response_headers: dict[str, str] = {}
|
||||
|
||||
content_type = response.headers.get("content-type", "video/mp2t")
|
||||
response_headers["Content-Type"] = content_type
|
||||
|
||||
if "content-length" in response.headers:
|
||||
response_headers["Content-Length"] = response.headers["content-length"]
|
||||
|
||||
if "content-range" in response.headers:
|
||||
response_headers["Content-Range"] = response.headers["content-range"]
|
||||
|
||||
status_code = response.status_code
|
||||
|
||||
async def _stream_chunks() -> AsyncGenerator[bytes, None]:
|
||||
"""Yield chunks from the upstream response, then clean up."""
|
||||
try:
|
||||
async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
|
||||
yield chunk
|
||||
except Exception as e:
|
||||
logger.error("Error streaming segment from %s: %s", url, e)
|
||||
finally:
|
||||
await response.aclose()
|
||||
await client.aclose()
|
||||
|
||||
return _stream_chunks(), response_headers, status_code
|
||||
|
||||
except HTTPException:
|
||||
raise
|
||||
except httpx.TimeoutException:
|
||||
await client.aclose()
|
||||
logger.error("Timeout relaying segment: %s", url)
|
||||
raise HTTPException(status_code=504, detail="Upstream segment timeout")
|
||||
except httpx.HTTPError as e:
|
||||
await client.aclose()
|
||||
logger.error("HTTP error relaying segment: %s - %s", url, e)
|
||||
raise HTTPException(status_code=502, detail=f"Upstream error: {e}")
|
||||
except Exception as e:
|
||||
await client.aclose()
|
||||
logger.exception("Unexpected error relaying segment: %s", url)
|
||||
raise HTTPException(status_code=500, detail=f"Internal error: {e}")
|
||||
6
stacks/f1-stream/files/backend/requirements.txt
Normal file
6
stacks/f1-stream/files/backend/requirements.txt
Normal file
|
|
@ -0,0 +1,6 @@
|
|||
fastapi==0.115.0
|
||||
uvicorn[standard]
|
||||
httpx>=0.27.0
|
||||
apscheduler>=3.10.0,<4.0
|
||||
pydantic>=2.0.0
|
||||
playwright==1.48.0
|
||||
240
stacks/f1-stream/files/backend/schedule.py
Normal file
240
stacks/f1-stream/files/backend/schedule.py
Normal file
|
|
@ -0,0 +1,240 @@
|
|||
"""F1 Schedule Service - fetches, caches, and serves the F1 race calendar."""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
JOLPICA_API_URL = "https://api.jolpi.ca/ergast/f1/current.json"
|
||||
SCHEDULE_PATH = Path(os.getenv("SCHEDULE_PATH", "/data/schedule.json"))
|
||||
STALE_THRESHOLD = timedelta(hours=24)
|
||||
|
||||
# Typical session durations in minutes
|
||||
SESSION_DURATIONS = {
|
||||
"fp1": 60,
|
||||
"fp2": 60,
|
||||
"fp3": 60,
|
||||
"qualifying": 60,
|
||||
"sprint_qualifying": 30,
|
||||
"sprint": 30,
|
||||
"race": 120,
|
||||
}
|
||||
|
||||
|
||||
def _parse_session_datetime(session: dict[str, str] | None) -> str | None:
|
||||
"""Parse a session dict with 'date' and 'time' fields into an ISO 8601 UTC string."""
|
||||
if not session or "date" not in session or "time" not in session:
|
||||
return None
|
||||
# Time format from API: "14:30:00Z"
|
||||
time_str = session["time"].rstrip("Z")
|
||||
return f"{session['date']}T{time_str}+00:00"
|
||||
|
||||
|
||||
def _parse_race(race: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Transform a raw jolpica/Ergast race object into our internal format."""
|
||||
circuit = race.get("Circuit", {})
|
||||
location = circuit.get("Location", {})
|
||||
|
||||
# Build session list
|
||||
sessions = []
|
||||
|
||||
# Map API keys to our session types, in chronological order for a race weekend
|
||||
session_map = [
|
||||
("FirstPractice", "fp1", "FP1"),
|
||||
("SecondPractice", "fp2", "FP2"),
|
||||
("ThirdPractice", "fp3", "FP3"),
|
||||
("SprintQualifying", "sprint_qualifying", "Sprint Qualifying"),
|
||||
("SprintShootout", "sprint_qualifying", "Sprint Qualifying"),
|
||||
("Sprint", "sprint", "Sprint"),
|
||||
("Qualifying", "qualifying", "Qualifying"),
|
||||
]
|
||||
|
||||
seen_types = set()
|
||||
for api_key, session_type, display_name in session_map:
|
||||
if api_key in race and session_type not in seen_types:
|
||||
dt_str = _parse_session_datetime(race[api_key])
|
||||
if dt_str:
|
||||
sessions.append(
|
||||
{
|
||||
"type": session_type,
|
||||
"name": display_name,
|
||||
"start_utc": dt_str,
|
||||
"duration_minutes": SESSION_DURATIONS.get(session_type, 60),
|
||||
}
|
||||
)
|
||||
seen_types.add(session_type)
|
||||
|
||||
# Race session itself (date and time are top-level)
|
||||
race_dt = _parse_session_datetime({"date": race.get("date", ""), "time": race.get("time", "")})
|
||||
if race_dt:
|
||||
sessions.append(
|
||||
{
|
||||
"type": "race",
|
||||
"name": "Race",
|
||||
"start_utc": race_dt,
|
||||
"duration_minutes": SESSION_DURATIONS["race"],
|
||||
}
|
||||
)
|
||||
|
||||
# Sort sessions chronologically
|
||||
sessions.sort(key=lambda s: s["start_utc"])
|
||||
|
||||
return {
|
||||
"round": int(race.get("round", 0)),
|
||||
"race_name": race.get("raceName", ""),
|
||||
"circuit": circuit.get("circuitName", ""),
|
||||
"circuit_id": circuit.get("circuitId", ""),
|
||||
"country": location.get("country", ""),
|
||||
"locality": location.get("locality", ""),
|
||||
"date": race.get("date", ""),
|
||||
"url": race.get("url", ""),
|
||||
"sessions": sessions,
|
||||
}
|
||||
|
||||
|
||||
def _compute_session_status(session: dict[str, Any], now: datetime) -> str:
|
||||
"""Determine if a session is 'past', 'live', or 'upcoming'."""
|
||||
try:
|
||||
start = datetime.fromisoformat(session["start_utc"])
|
||||
except (ValueError, KeyError):
|
||||
return "upcoming"
|
||||
|
||||
duration = timedelta(minutes=session.get("duration_minutes", 60))
|
||||
end = start + duration
|
||||
|
||||
if now >= end:
|
||||
return "past"
|
||||
elif now >= start:
|
||||
return "live"
|
||||
else:
|
||||
return "upcoming"
|
||||
|
||||
|
||||
class ScheduleService:
|
||||
"""Manages the F1 schedule: fetching, caching, and serving."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._schedule: dict[str, Any] | None = None
|
||||
|
||||
async def fetch_schedule(self) -> dict[str, Any]:
|
||||
"""Fetch the current season schedule from the jolpica API."""
|
||||
logger.info("Fetching F1 schedule from jolpica API...")
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
response = await client.get(JOLPICA_API_URL)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
|
||||
race_table = data.get("MRData", {}).get("RaceTable", {})
|
||||
season = race_table.get("season", "")
|
||||
raw_races = race_table.get("Races", [])
|
||||
|
||||
races = [_parse_race(r) for r in raw_races]
|
||||
|
||||
schedule = {
|
||||
"season": season,
|
||||
"fetched_at": datetime.now(timezone.utc).isoformat(),
|
||||
"races": races,
|
||||
}
|
||||
|
||||
self._schedule = schedule
|
||||
logger.info("Fetched schedule for %s season: %d races", season, len(races))
|
||||
return schedule
|
||||
|
||||
def load_from_disk(self) -> bool:
|
||||
"""Load schedule from NFS-backed JSON file. Returns True if loaded successfully."""
|
||||
if not SCHEDULE_PATH.exists():
|
||||
logger.info("No cached schedule found at %s", SCHEDULE_PATH)
|
||||
return False
|
||||
|
||||
try:
|
||||
with open(SCHEDULE_PATH, "r") as f:
|
||||
self._schedule = json.load(f)
|
||||
logger.info("Loaded cached schedule from %s", SCHEDULE_PATH)
|
||||
return True
|
||||
except (json.JSONDecodeError, OSError) as e:
|
||||
logger.warning("Failed to load cached schedule: %s", e)
|
||||
return False
|
||||
|
||||
def save_to_disk(self) -> None:
|
||||
"""Persist current schedule to NFS-backed JSON file."""
|
||||
if not self._schedule:
|
||||
logger.warning("No schedule data to save")
|
||||
return
|
||||
|
||||
try:
|
||||
SCHEDULE_PATH.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(SCHEDULE_PATH, "w") as f:
|
||||
json.dump(self._schedule, f, indent=2)
|
||||
logger.info("Saved schedule to %s", SCHEDULE_PATH)
|
||||
except OSError as e:
|
||||
logger.error("Failed to save schedule to disk: %s", e)
|
||||
|
||||
def is_stale(self) -> bool:
|
||||
"""Check if the cached schedule data is older than the stale threshold."""
|
||||
if not self._schedule:
|
||||
return True
|
||||
|
||||
fetched_at_str = self._schedule.get("fetched_at")
|
||||
if not fetched_at_str:
|
||||
return True
|
||||
|
||||
try:
|
||||
fetched_at = datetime.fromisoformat(fetched_at_str)
|
||||
return datetime.now(timezone.utc) - fetched_at > STALE_THRESHOLD
|
||||
except ValueError:
|
||||
return True
|
||||
|
||||
def get_schedule(self) -> dict[str, Any]:
|
||||
"""Return the current schedule with computed session statuses."""
|
||||
if not self._schedule:
|
||||
return {"season": "", "races": [], "error": "No schedule data available"}
|
||||
|
||||
now = datetime.now(timezone.utc)
|
||||
races = []
|
||||
|
||||
for race in self._schedule.get("races", []):
|
||||
sessions = []
|
||||
for session in race.get("sessions", []):
|
||||
sessions.append(
|
||||
{
|
||||
**session,
|
||||
"status": _compute_session_status(session, now),
|
||||
}
|
||||
)
|
||||
|
||||
races.append(
|
||||
{
|
||||
**race,
|
||||
"sessions": sessions,
|
||||
}
|
||||
)
|
||||
|
||||
return {
|
||||
"season": self._schedule.get("season", ""),
|
||||
"fetched_at": self._schedule.get("fetched_at", ""),
|
||||
"races": races,
|
||||
}
|
||||
|
||||
async def refresh(self) -> None:
|
||||
"""Fetch fresh schedule and persist to disk. Falls back to cached data on error."""
|
||||
try:
|
||||
await self.fetch_schedule()
|
||||
self.save_to_disk()
|
||||
except httpx.HTTPError as e:
|
||||
logger.error("Failed to refresh schedule from API: %s", e)
|
||||
if not self._schedule:
|
||||
logger.warning("No cached data available either - schedule will be empty")
|
||||
except Exception:
|
||||
logger.exception("Unexpected error during schedule refresh")
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Load from disk on startup and refresh if stale."""
|
||||
self.load_from_disk()
|
||||
if self.is_stale():
|
||||
await self.refresh()
|
||||
43
stacks/f1-stream/files/backend/stealth.py
Normal file
43
stacks/f1-stream/files/backend/stealth.py
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
"""Vendored Playwright stealth init script.
|
||||
|
||||
Mirror of `stacks/chrome-service/files/stealth.js`. Kept in sync by hand
|
||||
— update both files together if the JS is changed.
|
||||
"""
|
||||
|
||||
STEALTH_JS = r"""
|
||||
(() => {
|
||||
Object.defineProperty(Navigator.prototype, 'webdriver', { get: () => undefined });
|
||||
if (!window.chrome) window.chrome = {};
|
||||
window.chrome.runtime = window.chrome.runtime || {};
|
||||
Object.defineProperty(navigator, 'plugins', {
|
||||
get: () => [{ name: 'Chrome PDF Plugin' }, { name: 'Chrome PDF Viewer' }, { name: 'Native Client' }],
|
||||
});
|
||||
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
|
||||
const origQuery = window.navigator.permissions && window.navigator.permissions.query;
|
||||
if (origQuery) {
|
||||
window.navigator.permissions.query = (parameters) =>
|
||||
parameters && parameters.name === 'notifications'
|
||||
? Promise.resolve({ state: Notification.permission })
|
||||
: origQuery(parameters);
|
||||
}
|
||||
const spoofGl = (proto) => {
|
||||
if (!proto) return;
|
||||
const orig = proto.getParameter;
|
||||
proto.getParameter = function (parameter) {
|
||||
if (parameter === 37445) return 'Intel Inc.';
|
||||
if (parameter === 37446) return 'Intel Iris OpenGL Engine';
|
||||
return orig.apply(this, arguments);
|
||||
};
|
||||
};
|
||||
spoofGl(window.WebGLRenderingContext && window.WebGLRenderingContext.prototype);
|
||||
spoofGl(window.WebGL2RenderingContext && window.WebGL2RenderingContext.prototype);
|
||||
// disable-devtool.js auto-init evasion: hide the marker attribute so the
|
||||
// library's IIFE exits early. Without this, hmembeds-class players redirect
|
||||
// to google.com when the Performance detector trips under Playwright.
|
||||
const origQS = Document.prototype.querySelector;
|
||||
Document.prototype.querySelector = function (sel) {
|
||||
if (typeof sel === 'string' && sel.indexOf('disable-devtool-auto') !== -1) return null;
|
||||
return origQS.apply(this, arguments);
|
||||
};
|
||||
})();
|
||||
"""
|
||||
362
stacks/f1-stream/files/backend/token_refresh.py
Normal file
362
stacks/f1-stream/files/backend/token_refresh.py
Normal file
|
|
@ -0,0 +1,362 @@
|
|||
"""Token refresh manager - keeps CDN tokens fresh for active streams.
|
||||
|
||||
CDN tokens embedded in stream URLs expire after 5-30 minutes. During a 2+ hour
|
||||
F1 session, URLs must be refreshed before they expire. This manager periodically
|
||||
re-runs the extractor that found each active stream to get a fresh URL with a
|
||||
new CDN token.
|
||||
|
||||
Usage:
|
||||
1. When a user starts watching, call mark_stream_active(url, site_key)
|
||||
2. The background scheduler calls refresh_active_streams() every 4 minutes
|
||||
3. The proxy calls get_fresh_url(url) to resolve the latest URL
|
||||
4. When the user stops watching, call mark_stream_inactive(url)
|
||||
"""
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class ActiveStream:
|
||||
"""Tracks a stream that a user is currently watching.
|
||||
|
||||
The original_url is the URL the user initially activated. After a token
|
||||
refresh, current_url may differ (new CDN token, different edge server, etc.)
|
||||
but the original_url remains the key for lookups.
|
||||
"""
|
||||
|
||||
original_url: str
|
||||
current_url: str # May differ from original after refresh
|
||||
site_key: str
|
||||
last_refreshed: str
|
||||
refresh_count: int = 0
|
||||
last_error: str = ""
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Serialize to a plain dictionary for JSON responses."""
|
||||
return {
|
||||
"original_url": self.original_url,
|
||||
"current_url": self.current_url,
|
||||
"site_key": self.site_key,
|
||||
"last_refreshed": self.last_refreshed,
|
||||
"refresh_count": self.refresh_count,
|
||||
"last_error": self.last_error,
|
||||
}
|
||||
|
||||
|
||||
class TokenRefreshManager:
|
||||
"""Manages background token refresh for active streams.
|
||||
|
||||
When a user is watching a stream, the manager periodically re-runs
|
||||
the extractor that found it to get a fresh URL with a new token.
|
||||
The fresh URL is stored so the /proxy endpoint can use it on the
|
||||
next playlist fetch.
|
||||
"""
|
||||
|
||||
def __init__(self, extraction_service) -> None:
|
||||
"""Initialize the token refresh manager.
|
||||
|
||||
Args:
|
||||
extraction_service: The ExtractionService instance used to
|
||||
re-run extractors and look up streams by site_key.
|
||||
"""
|
||||
# Import here to avoid circular imports at module level
|
||||
from backend.extractors.service import ExtractionService
|
||||
|
||||
self._extraction_service: ExtractionService = extraction_service
|
||||
self._active_streams: dict[str, ActiveStream] = {}
|
||||
self._refresh_interval = 240 # 4 minutes (safe margin for 5-min tokens)
|
||||
|
||||
@property
|
||||
def refresh_interval(self) -> int:
|
||||
"""Refresh interval in seconds."""
|
||||
return self._refresh_interval
|
||||
|
||||
@property
|
||||
def has_active_streams(self) -> bool:
|
||||
"""Whether there are any active streams being watched."""
|
||||
return len(self._active_streams) > 0
|
||||
|
||||
def mark_stream_active(self, url: str, site_key: str) -> None:
|
||||
"""Mark a stream as being actively watched.
|
||||
|
||||
If the stream is already active, this is a no-op (idempotent).
|
||||
|
||||
Args:
|
||||
url: The stream URL the user is watching.
|
||||
site_key: The extractor site_key that found this stream.
|
||||
"""
|
||||
if url in self._active_streams:
|
||||
logger.debug("Stream already active: %s", url)
|
||||
return
|
||||
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
self._active_streams[url] = ActiveStream(
|
||||
original_url=url,
|
||||
current_url=url,
|
||||
site_key=site_key,
|
||||
last_refreshed=now,
|
||||
)
|
||||
logger.info(
|
||||
"Stream marked active: %s (site_key=%s, total_active=%d)",
|
||||
url,
|
||||
site_key,
|
||||
len(self._active_streams),
|
||||
)
|
||||
|
||||
def mark_stream_inactive(self, url: str) -> None:
|
||||
"""Mark a stream as no longer watched.
|
||||
|
||||
If the stream is not active, this is a no-op.
|
||||
|
||||
Args:
|
||||
url: The original stream URL to deactivate.
|
||||
"""
|
||||
removed = self._active_streams.pop(url, None)
|
||||
if removed:
|
||||
logger.info(
|
||||
"Stream marked inactive: %s (was refreshed %d times, total_active=%d)",
|
||||
url,
|
||||
removed.refresh_count,
|
||||
len(self._active_streams),
|
||||
)
|
||||
else:
|
||||
logger.debug("Stream was not active, nothing to deactivate: %s", url)
|
||||
|
||||
async def refresh_active_streams(self) -> None:
|
||||
"""Re-run extractors for all active streams to get fresh URLs.
|
||||
|
||||
For each active stream, re-runs the extractor that originally found it
|
||||
and tries to match the stream in the new results. If a match is found,
|
||||
updates the current_url. If not, the previous URL is kept (it may still
|
||||
work until its token expires).
|
||||
|
||||
This method is called by the background scheduler every 4 minutes.
|
||||
Token refresh failures are logged but never crash the process.
|
||||
"""
|
||||
if not self._active_streams:
|
||||
logger.debug("No active streams to refresh")
|
||||
return
|
||||
|
||||
logger.info(
|
||||
"Refreshing tokens for %d active stream(s)...",
|
||||
len(self._active_streams),
|
||||
)
|
||||
|
||||
# Group active streams by site_key to avoid re-running the same
|
||||
# extractor multiple times
|
||||
streams_by_site: dict[str, list[ActiveStream]] = {}
|
||||
for stream in self._active_streams.values():
|
||||
streams_by_site.setdefault(stream.site_key, []).append(stream)
|
||||
|
||||
now = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
for site_key, active_list in streams_by_site.items():
|
||||
try:
|
||||
await self._refresh_site(site_key, active_list, now)
|
||||
except Exception:
|
||||
logger.exception(
|
||||
"Failed to refresh tokens for site_key=%s", site_key
|
||||
)
|
||||
# Mark the error on all streams from this site
|
||||
for stream in active_list:
|
||||
stream.last_error = f"Refresh failed at {now}"
|
||||
|
||||
async def _refresh_site(
|
||||
self, site_key: str, active_list: list[ActiveStream], now: str
|
||||
) -> None:
|
||||
"""Re-run a single extractor and update active streams from its results.
|
||||
|
||||
Args:
|
||||
site_key: The extractor's site_key.
|
||||
active_list: List of ActiveStream objects from this extractor.
|
||||
now: ISO timestamp for this refresh cycle.
|
||||
"""
|
||||
registry = self._extraction_service._registry
|
||||
extractor = registry.get(site_key)
|
||||
|
||||
if extractor is None:
|
||||
logger.warning(
|
||||
"Extractor '%s' not found in registry, skipping refresh",
|
||||
site_key,
|
||||
)
|
||||
for stream in active_list:
|
||||
stream.last_error = f"Extractor '{site_key}' not found"
|
||||
return
|
||||
|
||||
logger.info(
|
||||
"Re-running extractor '%s' for token refresh (%d active stream(s))",
|
||||
site_key,
|
||||
len(active_list),
|
||||
)
|
||||
|
||||
# Re-run the extractor to get fresh URLs
|
||||
try:
|
||||
fresh_streams = await extractor.extract()
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"Extractor '%s' failed during token refresh: %s", site_key, e
|
||||
)
|
||||
for stream in active_list:
|
||||
stream.last_error = f"Extraction failed: {e}"
|
||||
return
|
||||
|
||||
if not fresh_streams:
|
||||
logger.warning(
|
||||
"Extractor '%s' returned no streams during token refresh",
|
||||
site_key,
|
||||
)
|
||||
for stream in active_list:
|
||||
stream.last_error = "Extractor returned no streams"
|
||||
return
|
||||
|
||||
# Build a lookup of fresh URLs by quality+title for matching
|
||||
# Since the URL itself changes (new token), we match by metadata
|
||||
fresh_by_key: dict[str, str] = {}
|
||||
for fs in fresh_streams:
|
||||
# Use quality+title as a matching key (these stay the same across refreshes)
|
||||
match_key = f"{fs.quality}|{fs.title}"
|
||||
fresh_by_key[match_key] = fs.url
|
||||
|
||||
# Also keep all fresh URLs for fallback matching
|
||||
all_fresh_urls = [fs.url for fs in fresh_streams]
|
||||
|
||||
for stream in active_list:
|
||||
# Try to find the matching stream in fresh results
|
||||
# Strategy 1: Match by quality+title
|
||||
match_key = self._build_match_key(stream)
|
||||
if match_key and match_key in fresh_by_key:
|
||||
new_url = fresh_by_key[match_key]
|
||||
if new_url != stream.current_url:
|
||||
logger.info(
|
||||
"Token refreshed for stream (quality+title match): %s -> %s",
|
||||
stream.current_url[:80],
|
||||
new_url[:80],
|
||||
)
|
||||
stream.current_url = new_url
|
||||
stream.last_refreshed = now
|
||||
stream.refresh_count += 1
|
||||
stream.last_error = ""
|
||||
continue
|
||||
|
||||
# Strategy 2: Match by URL path similarity (ignoring query params / tokens)
|
||||
matched_url = self._find_url_by_path(stream.current_url, all_fresh_urls)
|
||||
if matched_url:
|
||||
if matched_url != stream.current_url:
|
||||
logger.info(
|
||||
"Token refreshed for stream (path match): %s -> %s",
|
||||
stream.current_url[:80],
|
||||
matched_url[:80],
|
||||
)
|
||||
stream.current_url = matched_url
|
||||
stream.last_refreshed = now
|
||||
stream.refresh_count += 1
|
||||
stream.last_error = ""
|
||||
continue
|
||||
|
||||
# Strategy 3: If only one fresh stream, assume it's the same
|
||||
if len(all_fresh_urls) == 1:
|
||||
new_url = all_fresh_urls[0]
|
||||
if new_url != stream.current_url:
|
||||
logger.info(
|
||||
"Token refreshed for stream (single result fallback): %s -> %s",
|
||||
stream.current_url[:80],
|
||||
new_url[:80],
|
||||
)
|
||||
stream.current_url = new_url
|
||||
stream.last_refreshed = now
|
||||
stream.refresh_count += 1
|
||||
stream.last_error = ""
|
||||
continue
|
||||
|
||||
# No match found - keep the old URL and log
|
||||
logger.warning(
|
||||
"Could not match active stream to fresh results: %s",
|
||||
stream.original_url[:80],
|
||||
)
|
||||
stream.last_error = "No matching stream in fresh results"
|
||||
|
||||
def _build_match_key(self, stream: ActiveStream) -> str:
|
||||
"""Build a match key from cached stream metadata.
|
||||
|
||||
Looks up the stream in the extraction service cache to get
|
||||
quality and title metadata for matching.
|
||||
|
||||
Returns:
|
||||
A match key string, or empty string if metadata not found.
|
||||
"""
|
||||
# Look up the stream in the extraction cache
|
||||
cached_streams = self._extraction_service._cache.get(stream.site_key, [])
|
||||
for cs in cached_streams:
|
||||
if cs.url == stream.current_url or cs.url == stream.original_url:
|
||||
return f"{cs.quality}|{cs.title}"
|
||||
return ""
|
||||
|
||||
@staticmethod
|
||||
def _find_url_by_path(current_url: str, fresh_urls: list[str]) -> str | None:
|
||||
"""Find a fresh URL that matches the current URL by path (ignoring query params).
|
||||
|
||||
CDN token refreshes typically change query parameters but keep the
|
||||
same path structure. This matcher strips query params and compares
|
||||
the path component.
|
||||
|
||||
Args:
|
||||
current_url: The current (possibly expired) URL.
|
||||
fresh_urls: List of fresh URLs to match against.
|
||||
|
||||
Returns:
|
||||
The matching fresh URL, or None if no match.
|
||||
"""
|
||||
from urllib.parse import urlparse
|
||||
|
||||
current_parsed = urlparse(current_url)
|
||||
current_path = current_parsed.path
|
||||
|
||||
for fresh_url in fresh_urls:
|
||||
fresh_parsed = urlparse(fresh_url)
|
||||
# Match on host + path (token is typically in query string)
|
||||
if (
|
||||
fresh_parsed.netloc == current_parsed.netloc
|
||||
and fresh_parsed.path == current_path
|
||||
):
|
||||
return fresh_url
|
||||
|
||||
return None
|
||||
|
||||
def get_fresh_url(self, original_url: str) -> str:
|
||||
"""Get the latest URL for a stream (may have changed due to token refresh).
|
||||
|
||||
If the stream is not active or has not been refreshed, returns the
|
||||
original URL unchanged.
|
||||
|
||||
Args:
|
||||
original_url: The URL to look up (can be the original or any
|
||||
previous current_url).
|
||||
|
||||
Returns:
|
||||
The most recent URL for this stream.
|
||||
"""
|
||||
# Direct lookup by original URL
|
||||
stream = self._active_streams.get(original_url)
|
||||
if stream:
|
||||
return stream.current_url
|
||||
|
||||
# Also check if the URL matches any current_url (in case the caller
|
||||
# is using an intermediate refreshed URL)
|
||||
for stream in self._active_streams.values():
|
||||
if stream.current_url == original_url:
|
||||
return stream.current_url
|
||||
|
||||
# Not an active stream - return as-is
|
||||
return original_url
|
||||
|
||||
def get_active_streams(self) -> list[dict]:
|
||||
"""Return all active streams with their refresh status.
|
||||
|
||||
Returns:
|
||||
List of serialized ActiveStream dicts.
|
||||
"""
|
||||
return [stream.to_dict() for stream in self._active_streams.values()]
|
||||
3
stacks/f1-stream/files/frontend/.gitignore
vendored
Normal file
3
stacks/f1-stream/files/frontend/.gitignore
vendored
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
node_modules/
|
||||
build/
|
||||
.svelte-kit/
|
||||
2140
stacks/f1-stream/files/frontend/package-lock.json
generated
Normal file
2140
stacks/f1-stream/files/frontend/package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load diff
23
stacks/f1-stream/files/frontend/package.json
Normal file
23
stacks/f1-stream/files/frontend/package.json
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
{
|
||||
"name": "f1-stream-frontend",
|
||||
"version": "1.0.0",
|
||||
"private": true,
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "vite dev",
|
||||
"build": "vite build",
|
||||
"preview": "vite preview"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@sveltejs/adapter-static": "^3.0.0",
|
||||
"@sveltejs/kit": "^2.0.0",
|
||||
"@sveltejs/vite-plugin-svelte": "^5.0.0",
|
||||
"@tailwindcss/vite": "^4.0.0",
|
||||
"svelte": "^5.0.0",
|
||||
"tailwindcss": "^4.0.0",
|
||||
"vite": "^6.0.0"
|
||||
},
|
||||
"dependencies": {
|
||||
"hls.js": "^1.5.0"
|
||||
}
|
||||
}
|
||||
35
stacks/f1-stream/files/frontend/src/app.css
Normal file
35
stacks/f1-stream/files/frontend/src/app.css
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
@import "tailwindcss";
|
||||
|
||||
@theme {
|
||||
--color-f1-red: #e10600;
|
||||
--color-f1-red-dark: #b50500;
|
||||
--color-f1-bg: #111111;
|
||||
--color-f1-surface: #1a1a1a;
|
||||
--color-f1-surface-hover: #242424;
|
||||
--color-f1-border: #2a2a2a;
|
||||
--color-f1-text: #e0e0e0;
|
||||
--color-f1-text-muted: #888888;
|
||||
}
|
||||
|
||||
body {
|
||||
background-color: var(--color-f1-bg);
|
||||
color: var(--color-f1-text);
|
||||
font-family: system-ui, -apple-system, sans-serif;
|
||||
}
|
||||
|
||||
/* Scrollbar styling */
|
||||
::-webkit-scrollbar {
|
||||
width: 6px;
|
||||
}
|
||||
::-webkit-scrollbar-track {
|
||||
background: var(--color-f1-bg);
|
||||
}
|
||||
::-webkit-scrollbar-thumb {
|
||||
background: var(--color-f1-border);
|
||||
border-radius: 3px;
|
||||
}
|
||||
|
||||
/* HLS video player */
|
||||
video::-webkit-media-controls {
|
||||
display: none !important;
|
||||
}
|
||||
13
stacks/f1-stream/files/frontend/src/app.html
Normal file
13
stacks/f1-stream/files/frontend/src/app.html
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="utf-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1" />
|
||||
<link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>🏎</text></svg>" />
|
||||
<title>F1 Stream</title>
|
||||
%sveltekit.head%
|
||||
</head>
|
||||
<body data-sveltekit-preload-data="hover">
|
||||
<div style="display: contents">%sveltekit.body%</div>
|
||||
</body>
|
||||
</html>
|
||||
88
stacks/f1-stream/files/frontend/src/lib/api.js
Normal file
88
stacks/f1-stream/files/frontend/src/lib/api.js
Normal file
|
|
@ -0,0 +1,88 @@
|
|||
/**
|
||||
* API client for the F1 Streams backend.
|
||||
* All endpoints are on the same origin, so no CORS issues.
|
||||
*/
|
||||
|
||||
const API_BASE = '';
|
||||
|
||||
/**
|
||||
* Fetch the F1 race schedule with session statuses.
|
||||
* @returns {Promise<{season: string, fetched_at: string, races: Array}>}
|
||||
*/
|
||||
export async function fetchSchedule() {
|
||||
const res = await fetch(`${API_BASE}/schedule`);
|
||||
if (!res.ok) throw new Error(`Schedule fetch failed: ${res.status}`);
|
||||
return res.json();
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch available live streams.
|
||||
* @returns {Promise<{streams: Array, count: number}>}
|
||||
*/
|
||||
export async function fetchStreams() {
|
||||
const res = await fetch(`${API_BASE}/streams`);
|
||||
if (!res.ok) throw new Error(`Streams fetch failed: ${res.status}`);
|
||||
return res.json();
|
||||
}
|
||||
|
||||
/**
|
||||
* Encode a URL to base64url for the proxy endpoint.
|
||||
* @param {string} rawUrl - The original m3u8 URL
|
||||
* @returns {string} base64url-encoded string
|
||||
*/
|
||||
function toBase64Url(rawUrl) {
|
||||
return btoa(rawUrl).replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the proxied m3u8 URL for HLS playback.
|
||||
* @param {string} m3u8Url - The original m3u8 URL
|
||||
* @returns {string} The proxy URL
|
||||
*/
|
||||
export function getProxyUrl(m3u8Url) {
|
||||
const encoded = toBase64Url(m3u8Url);
|
||||
return `${API_BASE}/proxy?url=${encoded}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the embed-proxy URL for an upstream iframe embed page.
|
||||
*
|
||||
* The proxy strips X-Frame-Options / CSP frame-ancestors and injects a
|
||||
* frame-buster-defeat script so the embed renders inside our iframe even
|
||||
* when the upstream tries to block it.
|
||||
* @param {string} embedUrl - The original embed page URL
|
||||
* @returns {string} URL pointing at our /embed proxy
|
||||
*/
|
||||
export function getEmbedProxyUrl(embedUrl) {
|
||||
const encoded = toBase64Url(embedUrl);
|
||||
return `${API_BASE}/embed?url=${encoded}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark a stream as actively being watched (enables token refresh).
|
||||
* @param {string} url - The stream URL
|
||||
* @param {string} [siteKey] - Optional site key
|
||||
*/
|
||||
export async function activateStream(url, siteKey = '') {
|
||||
const res = await fetch(`${API_BASE}/streams/activate`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ url, site_key: siteKey })
|
||||
});
|
||||
if (!res.ok) throw new Error(`Activate failed: ${res.status}`);
|
||||
return res.json();
|
||||
}
|
||||
|
||||
/**
|
||||
* Mark a stream as no longer being watched.
|
||||
* @param {string} url - The stream URL
|
||||
*/
|
||||
export async function deactivateStream(url) {
|
||||
const res = await fetch(`${API_BASE}/streams/deactivate`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ url })
|
||||
});
|
||||
if (!res.ok) throw new Error(`Deactivate failed: ${res.status}`);
|
||||
return res.json();
|
||||
}
|
||||
13
stacks/f1-stream/files/frontend/src/lib/stores.js
Normal file
13
stacks/f1-stream/files/frontend/src/lib/stores.js
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
import { writable } from 'svelte/store';
|
||||
|
||||
/** Schedule data store */
|
||||
export const schedule = writable(null);
|
||||
|
||||
/** Streams data store */
|
||||
export const streams = writable(null);
|
||||
|
||||
/** Loading state */
|
||||
export const loading = writable(false);
|
||||
|
||||
/** Error state */
|
||||
export const error = writable(null);
|
||||
3
stacks/f1-stream/files/frontend/src/routes/+layout.js
Normal file
3
stacks/f1-stream/files/frontend/src/routes/+layout.js
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
export const prerender = true;
|
||||
export const ssr = false;
|
||||
export const trailingSlash = 'always';
|
||||
28
stacks/f1-stream/files/frontend/src/routes/+layout.svelte
Normal file
28
stacks/f1-stream/files/frontend/src/routes/+layout.svelte
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
<script>
|
||||
import '../app.css';
|
||||
|
||||
let { children } = $props();
|
||||
</script>
|
||||
|
||||
<div class="min-h-screen flex flex-col">
|
||||
<header class="border-b border-f1-border bg-f1-surface">
|
||||
<nav class="max-w-6xl mx-auto px-4 py-3 flex items-center gap-6">
|
||||
<a href="/" class="flex items-center gap-2 text-lg font-bold text-white hover:text-f1-red transition-colors">
|
||||
<span class="text-f1-red font-black text-xl">F1</span>
|
||||
<span>Stream</span>
|
||||
</a>
|
||||
<div class="flex gap-4 text-sm">
|
||||
<a href="/" class="text-f1-text-muted hover:text-white transition-colors">Schedule</a>
|
||||
<a href="/watch" class="text-f1-text-muted hover:text-white transition-colors">Watch</a>
|
||||
</div>
|
||||
</nav>
|
||||
</header>
|
||||
|
||||
<main class="flex-1">
|
||||
{@render children()}
|
||||
</main>
|
||||
|
||||
<footer class="border-t border-f1-border py-3 text-center text-xs text-f1-text-muted">
|
||||
F1 Stream
|
||||
</footer>
|
||||
</div>
|
||||
232
stacks/f1-stream/files/frontend/src/routes/+page.svelte
Normal file
232
stacks/f1-stream/files/frontend/src/routes/+page.svelte
Normal file
|
|
@ -0,0 +1,232 @@
|
|||
<script>
|
||||
import { fetchSchedule } from '$lib/api.js';
|
||||
import { onMount } from 'svelte';
|
||||
|
||||
let scheduleData = $state(null);
|
||||
let loading = $state(true);
|
||||
let errorMsg = $state(null);
|
||||
let now = $state(new Date());
|
||||
|
||||
// Update "now" every 30 seconds for live countdown
|
||||
let timer;
|
||||
onMount(() => {
|
||||
loadSchedule();
|
||||
timer = setInterval(() => { now = new Date(); }, 30000);
|
||||
return () => clearInterval(timer);
|
||||
});
|
||||
|
||||
async function loadSchedule() {
|
||||
loading = true;
|
||||
errorMsg = null;
|
||||
try {
|
||||
scheduleData = await fetchSchedule();
|
||||
} catch (e) {
|
||||
errorMsg = e.message;
|
||||
} finally {
|
||||
loading = false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Find the next upcoming session across all races.
|
||||
*/
|
||||
let nextSession = $derived.by(() => {
|
||||
if (!scheduleData?.races) return null;
|
||||
for (const race of scheduleData.races) {
|
||||
for (const session of race.sessions) {
|
||||
if (session.status === 'upcoming') {
|
||||
return { race, session };
|
||||
}
|
||||
if (session.status === 'live') {
|
||||
return { race, session };
|
||||
}
|
||||
}
|
||||
}
|
||||
return null;
|
||||
});
|
||||
|
||||
/**
|
||||
* Format an ISO date string to the user's local timezone.
|
||||
*/
|
||||
function formatLocalTime(isoStr) {
|
||||
const d = new Date(isoStr);
|
||||
return d.toLocaleString(undefined, {
|
||||
weekday: 'short',
|
||||
month: 'short',
|
||||
day: 'numeric',
|
||||
hour: '2-digit',
|
||||
minute: '2-digit'
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Format a short date (day + month).
|
||||
*/
|
||||
function formatShortDate(isoStr) {
|
||||
const d = new Date(isoStr);
|
||||
return d.toLocaleDateString(undefined, { month: 'short', day: 'numeric' });
|
||||
}
|
||||
|
||||
/**
|
||||
* Format a time only.
|
||||
*/
|
||||
function formatTime(isoStr) {
|
||||
const d = new Date(isoStr);
|
||||
return d.toLocaleTimeString(undefined, { hour: '2-digit', minute: '2-digit' });
|
||||
}
|
||||
|
||||
/**
|
||||
* Compute countdown string to a future ISO date.
|
||||
*/
|
||||
function countdown(isoStr) {
|
||||
const target = new Date(isoStr);
|
||||
const diff = target - now;
|
||||
if (diff <= 0) return 'Now';
|
||||
|
||||
const days = Math.floor(diff / (1000 * 60 * 60 * 24));
|
||||
const hours = Math.floor((diff % (1000 * 60 * 60 * 24)) / (1000 * 60 * 60));
|
||||
const mins = Math.floor((diff % (1000 * 60 * 60)) / (1000 * 60));
|
||||
|
||||
if (days > 0) return `${days}d ${hours}h ${mins}m`;
|
||||
if (hours > 0) return `${hours}h ${mins}m`;
|
||||
return `${mins}m`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get status badge classes.
|
||||
*/
|
||||
function statusClasses(status) {
|
||||
switch (status) {
|
||||
case 'live': return 'bg-f1-red text-white';
|
||||
case 'upcoming': return 'bg-blue-600 text-white';
|
||||
case 'past': return 'bg-neutral-700 text-neutral-400';
|
||||
default: return 'bg-neutral-700 text-neutral-400';
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Determine if a race has any live or upcoming sessions (to highlight it).
|
||||
*/
|
||||
function raceIsActive(race) {
|
||||
return race.sessions.some(s => s.status === 'live' || s.status === 'upcoming');
|
||||
}
|
||||
|
||||
/**
|
||||
* Determine if a race is entirely in the past.
|
||||
*/
|
||||
function raceIsPast(race) {
|
||||
return race.sessions.every(s => s.status === 'past');
|
||||
}
|
||||
</script>
|
||||
|
||||
<svelte:head>
|
||||
<title>F1 Stream - Schedule</title>
|
||||
</svelte:head>
|
||||
|
||||
<div class="max-w-6xl mx-auto px-4 py-6">
|
||||
{#if loading}
|
||||
<div class="flex items-center justify-center py-20">
|
||||
<div class="w-8 h-8 border-2 border-f1-red border-t-transparent rounded-full animate-spin"></div>
|
||||
<span class="ml-3 text-f1-text-muted">Loading schedule...</span>
|
||||
</div>
|
||||
{:else if errorMsg}
|
||||
<div class="bg-red-900/30 border border-red-700 rounded-lg p-4 text-center">
|
||||
<p class="text-red-300">Failed to load schedule: {errorMsg}</p>
|
||||
<button onclick={loadSchedule} class="mt-2 px-4 py-1 bg-f1-red text-white rounded text-sm hover:bg-f1-red-dark transition-colors">
|
||||
Retry
|
||||
</button>
|
||||
</div>
|
||||
{:else if scheduleData}
|
||||
<!-- Next Session Countdown -->
|
||||
{#if nextSession}
|
||||
<div class="mb-8 bg-f1-surface border border-f1-border rounded-lg p-6">
|
||||
<div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2">
|
||||
<div>
|
||||
<p class="text-f1-text-muted text-sm uppercase tracking-wider">
|
||||
{nextSession.session.status === 'live' ? 'Live Now' : 'Next Session'}
|
||||
</p>
|
||||
<h2 class="text-xl font-bold text-white mt-1">
|
||||
{nextSession.race.race_name} - {nextSession.session.name}
|
||||
</h2>
|
||||
<p class="text-f1-text-muted text-sm mt-1">
|
||||
{nextSession.race.circuit} · {nextSession.race.country}
|
||||
</p>
|
||||
</div>
|
||||
<div class="text-right">
|
||||
{#if nextSession.session.status === 'live'}
|
||||
<a href="/watch" class="inline-flex items-center gap-2 px-5 py-2 bg-f1-red text-white font-semibold rounded-lg hover:bg-f1-red-dark transition-colors">
|
||||
<span class="w-2 h-2 rounded-full bg-white animate-pulse"></span>
|
||||
Watch Live
|
||||
</a>
|
||||
{:else}
|
||||
<p class="text-2xl font-mono font-bold text-white">{countdown(nextSession.session.start_utc)}</p>
|
||||
<p class="text-f1-text-muted text-sm">{formatLocalTime(nextSession.session.start_utc)}</p>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Season Header -->
|
||||
<div class="flex items-center justify-between mb-6">
|
||||
<h1 class="text-2xl font-bold text-white">{scheduleData.season} Season</h1>
|
||||
<span class="text-xs text-f1-text-muted">{scheduleData.races.length} races</span>
|
||||
</div>
|
||||
|
||||
<!-- Race List -->
|
||||
<div class="space-y-4">
|
||||
{#each scheduleData.races as race (race.round)}
|
||||
{@const isPast = raceIsPast(race)}
|
||||
<div class="bg-f1-surface border border-f1-border rounded-lg overflow-hidden {isPast ? 'opacity-50' : ''}">
|
||||
<!-- Race Header -->
|
||||
<div class="px-4 py-3 flex items-center justify-between">
|
||||
<div class="flex items-center gap-3">
|
||||
<span class="text-f1-text-muted text-sm font-mono w-8">R{race.round}</span>
|
||||
<div>
|
||||
<h3 class="font-semibold text-white">{race.race_name}</h3>
|
||||
<p class="text-xs text-f1-text-muted">{race.circuit} · {race.locality}, {race.country}</p>
|
||||
</div>
|
||||
</div>
|
||||
<span class="text-sm text-f1-text-muted">{formatShortDate(race.date)}</span>
|
||||
</div>
|
||||
|
||||
<!-- Sessions -->
|
||||
<div class="border-t border-f1-border">
|
||||
<div class="grid grid-cols-1 sm:grid-cols-2 md:grid-cols-3 lg:grid-cols-4 gap-px bg-f1-border">
|
||||
{#each race.sessions as session}
|
||||
{@const isLive = session.status === 'live'}
|
||||
{@const isClickable = isLive}
|
||||
<div class="bg-f1-surface px-3 py-2 {isLive ? 'bg-f1-red/10' : ''} {isClickable ? 'hover:bg-f1-surface-hover cursor-pointer' : ''}">
|
||||
{#if isClickable}
|
||||
<a href="/watch?session={session.type}&round={race.round}" class="block">
|
||||
<div class="flex items-center justify-between">
|
||||
<span class="text-sm font-medium text-white">{session.name}</span>
|
||||
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded {statusClasses(session.status)}">
|
||||
{session.status}
|
||||
</span>
|
||||
</div>
|
||||
<p class="text-xs text-f1-text-muted mt-0.5">{formatTime(session.start_utc)}</p>
|
||||
</a>
|
||||
{:else}
|
||||
<div class="flex items-center justify-between">
|
||||
<span class="text-sm font-medium {session.status === 'past' ? 'text-f1-text-muted' : 'text-white'}">{session.name}</span>
|
||||
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded {statusClasses(session.status)}">
|
||||
{session.status}
|
||||
</span>
|
||||
</div>
|
||||
<p class="text-xs text-f1-text-muted mt-0.5">
|
||||
{formatTime(session.start_utc)}
|
||||
{#if session.status === 'upcoming'}
|
||||
· {countdown(session.start_utc)}
|
||||
{/if}
|
||||
</p>
|
||||
{/if}
|
||||
</div>
|
||||
{/each}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
{/each}
|
||||
</div>
|
||||
{/if}
|
||||
</div>
|
||||
484
stacks/f1-stream/files/frontend/src/routes/watch/+page.svelte
Normal file
484
stacks/f1-stream/files/frontend/src/routes/watch/+page.svelte
Normal file
|
|
@ -0,0 +1,484 @@
|
|||
<script>
|
||||
import { fetchStreams, fetchSchedule, getProxyUrl, getEmbedProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
|
||||
import { onMount, onDestroy } from 'svelte';
|
||||
import { page } from '$app/state';
|
||||
|
||||
// Lazy-load hls.js to code-split it into a separate chunk
|
||||
let Hls = $state(null);
|
||||
|
||||
// Query params
|
||||
let sessionType = $derived(page.url?.searchParams?.get('session') || '');
|
||||
let roundNumber = $derived(page.url?.searchParams?.get('round') || '');
|
||||
|
||||
// State
|
||||
let streamsData = $state(null);
|
||||
let scheduleData = $state(null);
|
||||
let loading = $state(true);
|
||||
let errorMsg = $state(null);
|
||||
|
||||
// Multi-stream player state: array of active player slots
|
||||
let players = $state([]);
|
||||
const MAX_PLAYERS = 4;
|
||||
|
||||
// Current session info from schedule
|
||||
let currentRace = $derived.by(() => {
|
||||
if (!scheduleData?.races || !roundNumber) return null;
|
||||
return scheduleData.races.find(r => r.round === parseInt(roundNumber));
|
||||
});
|
||||
|
||||
let currentSession = $derived.by(() => {
|
||||
if (!currentRace || !sessionType) return null;
|
||||
return currentRace.sessions.find(s => s.type === sessionType);
|
||||
});
|
||||
|
||||
// Layout class based on player count
|
||||
let layoutClass = $derived.by(() => {
|
||||
const count = players.length;
|
||||
if (count <= 1) return 'grid-cols-1';
|
||||
if (count === 2) return 'grid-cols-2';
|
||||
return 'grid-cols-2'; // 3-4 players: 2x2 grid
|
||||
});
|
||||
|
||||
onMount(async () => {
|
||||
const hlsModule = await import('hls.js');
|
||||
Hls = hlsModule.default;
|
||||
loadData();
|
||||
document.addEventListener('fullscreenchange', onFullscreenChange);
|
||||
});
|
||||
|
||||
onDestroy(() => {
|
||||
// Clean up all players
|
||||
for (const player of players) {
|
||||
cleanupPlayer(player);
|
||||
}
|
||||
if (typeof document !== 'undefined') {
|
||||
document.removeEventListener('fullscreenchange', onFullscreenChange);
|
||||
}
|
||||
});
|
||||
|
||||
async function loadData() {
|
||||
loading = true;
|
||||
errorMsg = null;
|
||||
try {
|
||||
const [streamsResult, scheduleResult] = await Promise.all([
|
||||
fetchStreams(),
|
||||
fetchSchedule()
|
||||
]);
|
||||
streamsData = streamsResult;
|
||||
scheduleData = scheduleResult;
|
||||
} catch (e) {
|
||||
errorMsg = e.message;
|
||||
} finally {
|
||||
loading = false;
|
||||
}
|
||||
}
|
||||
|
||||
function cleanupPlayer(player) {
|
||||
if (player.hls) {
|
||||
player.hls.destroy();
|
||||
player.hls = null;
|
||||
}
|
||||
if (player.originalUrl) {
|
||||
deactivateStream(player.originalUrl).catch(() => {});
|
||||
}
|
||||
if (player.controlsTimer) {
|
||||
clearTimeout(player.controlsTimer);
|
||||
}
|
||||
}
|
||||
|
||||
function removePlayer(index) {
|
||||
const player = players[index];
|
||||
cleanupPlayer(player);
|
||||
players = players.filter((_, i) => i !== index);
|
||||
}
|
||||
|
||||
function isStreamActive(url) {
|
||||
return players.some(p => p.originalUrl === url);
|
||||
}
|
||||
|
||||
function playStream(stream) {
|
||||
// If already playing this stream, don't add a duplicate
|
||||
const streamUrl = stream.stream_type === 'embed' ? stream.embed_url : stream.url;
|
||||
if (isStreamActive(streamUrl)) return;
|
||||
|
||||
// If at max players, replace the last one
|
||||
if (players.length >= MAX_PLAYERS) {
|
||||
removePlayer(players.length - 1);
|
||||
}
|
||||
|
||||
if (stream.stream_type === 'embed') {
|
||||
// Embed/iframe player — route through our /embed proxy so the
|
||||
// upstream's X-Frame-Options / CSP / JS frame-busters can't
|
||||
// block the iframe.
|
||||
const newPlayer = {
|
||||
id: Date.now(),
|
||||
proxyUrl: '',
|
||||
originalUrl: stream.embed_url,
|
||||
embedUrl: getEmbedProxyUrl(stream.embed_url),
|
||||
streamType: 'embed',
|
||||
siteKey: stream.site_key || '',
|
||||
siteName: stream.site_name || stream.site_key || 'Unknown',
|
||||
quality: stream.quality || '',
|
||||
isPlaying: true,
|
||||
isMuted: false,
|
||||
volume: 1,
|
||||
showControls: true,
|
||||
error: null,
|
||||
videoEl: null,
|
||||
containerEl: null,
|
||||
hls: null,
|
||||
controlsTimer: null,
|
||||
};
|
||||
players = [...players, newPlayer];
|
||||
return;
|
||||
}
|
||||
|
||||
// m3u8 player — use hls.js
|
||||
if (!Hls) return;
|
||||
|
||||
const proxyUrl = getProxyUrl(stream.url);
|
||||
const newPlayer = {
|
||||
id: Date.now(),
|
||||
proxyUrl,
|
||||
originalUrl: stream.url,
|
||||
embedUrl: '',
|
||||
streamType: 'm3u8',
|
||||
siteKey: stream.site_key || '',
|
||||
siteName: stream.site_name || stream.site_key || 'Unknown',
|
||||
quality: stream.quality || '',
|
||||
isPlaying: false,
|
||||
isMuted: false,
|
||||
volume: 1,
|
||||
showControls: true,
|
||||
error: null,
|
||||
videoEl: null,
|
||||
containerEl: null,
|
||||
hls: null,
|
||||
controlsTimer: null,
|
||||
};
|
||||
|
||||
players = [...players, newPlayer];
|
||||
|
||||
// Activate stream for token refresh
|
||||
activateStream(stream.url, stream.site_key || '').catch(() => {});
|
||||
|
||||
// Wait for DOM to update then initialize player
|
||||
requestAnimationFrame(() => {
|
||||
requestAnimationFrame(() => {
|
||||
initPlayer(players.length - 1);
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
function initPlayer(index) {
|
||||
const player = players[index];
|
||||
if (!player || !player.videoEl) return;
|
||||
|
||||
if (Hls.isSupported()) {
|
||||
// `lowLatencyMode` previously broke playback on regular (non-LL-HLS)
|
||||
// providers like RallyTV — they don't ship the LL-HLS extensions
|
||||
// hls.js needs in that mode. Default off; explicit per-stream flag
|
||||
// can re-enable later.
|
||||
const hlsInstance = new Hls({
|
||||
enableWorker: true,
|
||||
lowLatencyMode: false,
|
||||
backBufferLength: 90
|
||||
});
|
||||
|
||||
hlsInstance.loadSource(player.proxyUrl);
|
||||
hlsInstance.attachMedia(player.videoEl);
|
||||
|
||||
hlsInstance.on(Hls.Events.MANIFEST_PARSED, () => {
|
||||
player.videoEl.play().catch(() => {});
|
||||
players[index] = { ...player, isPlaying: true, hls: hlsInstance };
|
||||
});
|
||||
|
||||
hlsInstance.on(Hls.Events.ERROR, (event, data) => {
|
||||
if (data.fatal) {
|
||||
switch (data.type) {
|
||||
case Hls.ErrorTypes.NETWORK_ERROR:
|
||||
players[index] = { ...players[index], error: `Network error: ${data.details}` };
|
||||
hlsInstance.startLoad();
|
||||
break;
|
||||
case Hls.ErrorTypes.MEDIA_ERROR:
|
||||
players[index] = { ...players[index], error: `Media error: ${data.details}` };
|
||||
hlsInstance.recoverMediaError();
|
||||
break;
|
||||
default:
|
||||
players[index] = { ...players[index], error: `Fatal error: ${data.details}` };
|
||||
removePlayer(index);
|
||||
break;
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
player.hls = hlsInstance;
|
||||
} else if (player.videoEl.canPlayType('application/vnd.apple.mpegurl')) {
|
||||
// Native HLS (Safari)
|
||||
player.videoEl.src = player.proxyUrl;
|
||||
player.videoEl.addEventListener('loadedmetadata', () => {
|
||||
player.videoEl.play().catch(() => {});
|
||||
players[index] = { ...player, isPlaying: true };
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
function togglePlay(index) {
|
||||
const player = players[index];
|
||||
if (!player?.videoEl) return;
|
||||
if (player.videoEl.paused) {
|
||||
player.videoEl.play().catch(() => {});
|
||||
players[index] = { ...player, isPlaying: true };
|
||||
} else {
|
||||
player.videoEl.pause();
|
||||
players[index] = { ...player, isPlaying: false };
|
||||
}
|
||||
}
|
||||
|
||||
function toggleMute(index) {
|
||||
const player = players[index];
|
||||
if (!player?.videoEl) return;
|
||||
const newMuted = !player.isMuted;
|
||||
player.videoEl.muted = newMuted;
|
||||
players[index] = { ...player, isMuted: newMuted };
|
||||
}
|
||||
|
||||
function setVolume(index, e) {
|
||||
const player = players[index];
|
||||
if (!player?.videoEl) return;
|
||||
const vol = parseFloat(e.target.value);
|
||||
player.videoEl.volume = vol;
|
||||
const muted = vol === 0;
|
||||
player.videoEl.muted = muted;
|
||||
players[index] = { ...player, volume: vol, isMuted: muted };
|
||||
}
|
||||
|
||||
function toggleFullscreen(index) {
|
||||
const player = players[index];
|
||||
if (!player?.containerEl) return;
|
||||
if (!document.fullscreenElement) {
|
||||
player.containerEl.requestFullscreen().catch(() => {});
|
||||
} else {
|
||||
document.exitFullscreen().catch(() => {});
|
||||
}
|
||||
}
|
||||
|
||||
let isFullscreen = $state(false);
|
||||
function onFullscreenChange() {
|
||||
isFullscreen = !!document.fullscreenElement;
|
||||
}
|
||||
|
||||
function onPlayerMouseMove(index) {
|
||||
const player = players[index];
|
||||
if (!player) return;
|
||||
if (player.controlsTimer) clearTimeout(player.controlsTimer);
|
||||
players[index] = { ...player, showControls: true };
|
||||
const timer = setTimeout(() => {
|
||||
if (players[index]?.isPlaying) {
|
||||
players[index] = { ...players[index], showControls: false };
|
||||
}
|
||||
}, 3000);
|
||||
players[index] = { ...players[index], controlsTimer: timer };
|
||||
}
|
||||
|
||||
function responseTimeColor(ms) {
|
||||
if (ms < 500) return 'text-green-400';
|
||||
if (ms < 1500) return 'text-yellow-400';
|
||||
return 'text-red-400';
|
||||
}
|
||||
</script>
|
||||
|
||||
<svelte:head>
|
||||
<title>F1 Stream - Watch{currentRace ? ` - ${currentRace.race_name}` : ''}</title>
|
||||
</svelte:head>
|
||||
|
||||
<div class="max-w-7xl mx-auto px-4 py-6">
|
||||
<!-- Session Info Header -->
|
||||
{#if currentRace && currentSession}
|
||||
<div class="mb-6">
|
||||
<p class="text-f1-text-muted text-sm uppercase tracking-wider">
|
||||
Round {currentRace.round} · {currentSession.name}
|
||||
</p>
|
||||
<h1 class="text-2xl font-bold text-white">{currentRace.race_name}</h1>
|
||||
<p class="text-f1-text-muted text-sm">{currentRace.circuit} · {currentRace.country}</p>
|
||||
</div>
|
||||
{:else}
|
||||
<h1 class="text-2xl font-bold text-white mb-6">Watch</h1>
|
||||
{/if}
|
||||
|
||||
<!-- Multi-Stream Players Grid -->
|
||||
{#if players.length > 0}
|
||||
<div class="grid {layoutClass} gap-2 mb-6">
|
||||
{#each players as player, i (player.id)}
|
||||
<div
|
||||
class="bg-black rounded-lg overflow-hidden relative group"
|
||||
bind:this={player.containerEl}
|
||||
onmousemove={() => onPlayerMouseMove(i)}
|
||||
role="region"
|
||||
aria-label="Video player {i + 1}"
|
||||
>
|
||||
<!-- Stream label -->
|
||||
<div class="absolute top-2 left-2 z-10 bg-black/60 rounded px-2 py-0.5 text-xs text-white">
|
||||
{player.siteName}{#if player.quality} · {player.quality}{/if}
|
||||
</div>
|
||||
|
||||
<!-- Close button -->
|
||||
<button
|
||||
onclick={() => removePlayer(i)}
|
||||
class="absolute top-2 right-2 z-10 bg-black/60 rounded-full w-6 h-6 flex items-center justify-center text-white hover:text-f1-red hover:bg-black/80 transition-colors"
|
||||
aria-label="Close stream"
|
||||
>
|
||||
<svg class="w-3.5 h-3.5" fill="currentColor" viewBox="0 0 24 24"><path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/></svg>
|
||||
</button>
|
||||
|
||||
<!-- Video or Iframe -->
|
||||
{#if player.streamType === 'embed'}
|
||||
<iframe
|
||||
src={player.embedUrl}
|
||||
class="w-full aspect-video bg-black"
|
||||
allow="autoplay; encrypted-media; fullscreen; picture-in-picture"
|
||||
allowfullscreen
|
||||
frameborder="0"
|
||||
title="{player.siteName} stream"
|
||||
></iframe>
|
||||
{:else}
|
||||
<video
|
||||
bind:this={player.videoEl}
|
||||
class="w-full aspect-video bg-black"
|
||||
playsinline
|
||||
></video>
|
||||
{/if}
|
||||
|
||||
<!-- Controls Overlay -->
|
||||
<div class="absolute bottom-0 left-0 right-0 bg-gradient-to-t from-black/80 to-transparent px-3 py-2 transition-opacity duration-300 {player.showControls ? 'opacity-100' : 'opacity-0'}">
|
||||
<div class="flex items-center gap-2">
|
||||
<button onclick={() => togglePlay(i)} class="text-white hover:text-f1-red transition-colors" aria-label={player.isPlaying ? 'Pause' : 'Play'}>
|
||||
{#if player.isPlaying}
|
||||
<svg class="w-5 h-5" fill="currentColor" viewBox="0 0 24 24"><path d="M6 4h4v16H6V4zm8 0h4v16h-4V4z"/></svg>
|
||||
{:else}
|
||||
<svg class="w-5 h-5" fill="currentColor" viewBox="0 0 24 24"><path d="M8 5v14l11-7z"/></svg>
|
||||
{/if}
|
||||
</button>
|
||||
|
||||
<button onclick={() => toggleMute(i)} class="text-white hover:text-f1-red transition-colors" aria-label={player.isMuted ? 'Unmute' : 'Mute'}>
|
||||
{#if player.isMuted || player.volume === 0}
|
||||
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M16.5 12c0-1.77-1.02-3.29-2.5-4.03v2.21l2.45 2.45c.03-.2.05-.41.05-.63zm2.5 0c0 .94-.2 1.82-.54 2.64l1.51 1.51C20.63 14.91 21 13.5 21 12c0-4.28-2.99-7.86-7-8.77v2.06c2.89.86 5 3.54 5 6.71zM4.27 3L3 4.27 7.73 9H3v6h4l5 5v-6.73l4.25 4.25c-.67.52-1.42.93-2.25 1.18v2.06c1.38-.31 2.63-.95 3.69-1.81L19.73 21 21 19.73l-9-9L4.27 3zM12 4L9.91 6.09 12 8.18V4z"/></svg>
|
||||
{:else}
|
||||
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3c0-1.77-1.02-3.29-2.5-4.03v8.05c1.48-.73 2.5-2.25 2.5-4.02z"/></svg>
|
||||
{/if}
|
||||
</button>
|
||||
<input
|
||||
type="range" min="0" max="1" step="0.05"
|
||||
value={player.volume}
|
||||
oninput={(e) => setVolume(i, e)}
|
||||
class="w-16 h-1 accent-f1-red"
|
||||
aria-label="Volume"
|
||||
/>
|
||||
|
||||
<div class="flex-1"></div>
|
||||
|
||||
<button onclick={() => toggleFullscreen(i)} class="text-white hover:text-f1-red transition-colors" aria-label="Fullscreen">
|
||||
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M7 14H5v5h5v-2H7v-3zm-2-4h2V7h3V5H5v5zm12 7h-3v2h5v-5h-2v3zM14 5v2h3v3h2V5h-5z"/></svg>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Error overlay -->
|
||||
{#if player.error}
|
||||
<div class="absolute bottom-12 left-2 right-2 bg-red-900/80 rounded px-2 py-1 text-xs text-red-300">
|
||||
{player.error}
|
||||
</div>
|
||||
{/if}
|
||||
</div>
|
||||
{/each}
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
<!-- Stream List -->
|
||||
{#if loading}
|
||||
<div class="flex items-center justify-center py-20">
|
||||
<div class="w-8 h-8 border-2 border-f1-red border-t-transparent rounded-full animate-spin"></div>
|
||||
<span class="ml-3 text-f1-text-muted">Loading streams...</span>
|
||||
</div>
|
||||
{:else if errorMsg}
|
||||
<div class="bg-red-900/30 border border-red-700 rounded-lg p-4 text-center">
|
||||
<p class="text-red-300">Failed to load streams: {errorMsg}</p>
|
||||
<button onclick={loadData} class="mt-2 px-4 py-1 bg-f1-red text-white rounded text-sm hover:bg-f1-red-dark transition-colors">
|
||||
Retry
|
||||
</button>
|
||||
</div>
|
||||
{:else if streamsData}
|
||||
<div class="flex items-center justify-between mb-4">
|
||||
<h2 class="text-lg font-semibold text-white">
|
||||
Available Streams
|
||||
<span class="text-f1-text-muted font-normal text-sm ml-2">({streamsData.count})</span>
|
||||
</h2>
|
||||
<div class="flex items-center gap-4">
|
||||
{#if players.length > 0}
|
||||
<span class="text-xs text-f1-text-muted">{players.length}/{MAX_PLAYERS} streams active</span>
|
||||
{/if}
|
||||
<button onclick={loadData} class="text-xs text-f1-text-muted hover:text-white transition-colors uppercase tracking-wider">
|
||||
Refresh
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{#if streamsData.streams.length === 0}
|
||||
<div class="bg-f1-surface border border-f1-border rounded-lg p-8 text-center">
|
||||
<p class="text-f1-text-muted">No streams available right now.</p>
|
||||
<p class="text-f1-text-muted text-sm mt-2">Streams appear when a session is live. Check the schedule for upcoming sessions.</p>
|
||||
<a href="/" class="inline-block mt-4 px-4 py-2 bg-f1-surface-hover border border-f1-border rounded text-sm text-white hover:border-f1-red transition-colors">
|
||||
View Schedule
|
||||
</a>
|
||||
</div>
|
||||
{:else}
|
||||
<div class="space-y-2">
|
||||
{#each streamsData.streams as stream, i}
|
||||
{@const active = isStreamActive(stream.stream_type === 'embed' ? stream.embed_url : stream.url)}
|
||||
<div class="bg-f1-surface border rounded-lg px-4 py-3 flex items-center gap-4 {active ? 'border-f1-red' : 'border-f1-border hover:border-f1-border'}">
|
||||
<div class="flex-1 min-w-0">
|
||||
<div class="flex items-center gap-2">
|
||||
<span class="text-sm font-medium text-white truncate">{stream.site_name || stream.site_key || 'Unknown'}</span>
|
||||
{#if stream.is_live}
|
||||
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded bg-f1-red text-white">Live</span>
|
||||
{/if}
|
||||
{#if stream.stream_type === 'embed'}
|
||||
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded bg-blue-600 text-white">Embed</span>
|
||||
{/if}
|
||||
{#if active}
|
||||
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded bg-green-600 text-white">Playing</span>
|
||||
{/if}
|
||||
</div>
|
||||
<div class="flex items-center gap-3 mt-1 text-xs text-f1-text-muted">
|
||||
{#if stream.title}
|
||||
<span class="truncate">{stream.title}</span>
|
||||
{/if}
|
||||
{#if stream.quality}
|
||||
<span>{stream.quality}</span>
|
||||
{/if}
|
||||
{#if stream.response_time_ms != null}
|
||||
<span class={responseTimeColor(stream.response_time_ms)}>
|
||||
{stream.response_time_ms}ms
|
||||
</span>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="flex items-center gap-2">
|
||||
{#if !active}
|
||||
<button
|
||||
onclick={() => playStream(stream)}
|
||||
class="px-4 py-1.5 rounded text-sm font-medium bg-f1-red text-white hover:bg-f1-red-dark transition-colors"
|
||||
>
|
||||
{players.length > 0 ? 'Add' : 'Watch'}
|
||||
</button>
|
||||
{:else}
|
||||
<span class="text-xs text-green-400">Active</span>
|
||||
{/if}
|
||||
</div>
|
||||
</div>
|
||||
{/each}
|
||||
</div>
|
||||
{/if}
|
||||
{/if}
|
||||
</div>
|
||||
19
stacks/f1-stream/files/frontend/svelte.config.js
Normal file
19
stacks/f1-stream/files/frontend/svelte.config.js
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
import adapter from '@sveltejs/adapter-static';
|
||||
|
||||
/** @type {import('@sveltejs/kit').Config} */
|
||||
const config = {
|
||||
kit: {
|
||||
adapter: adapter({
|
||||
pages: 'build',
|
||||
assets: 'build',
|
||||
fallback: 'index.html',
|
||||
precompress: false,
|
||||
strict: true
|
||||
}),
|
||||
paths: {
|
||||
base: ''
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
export default config;
|
||||
10
stacks/f1-stream/files/frontend/vite.config.js
Normal file
10
stacks/f1-stream/files/frontend/vite.config.js
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
import { sveltekit } from '@sveltejs/kit/vite';
|
||||
import tailwindcss from '@tailwindcss/vite';
|
||||
import { defineConfig } from 'vite';
|
||||
|
||||
export default defineConfig({
|
||||
plugins: [
|
||||
tailwindcss(),
|
||||
sveltekit()
|
||||
]
|
||||
});
|
||||
7
stacks/f1-stream/files/redeploy.sh
Executable file
7
stacks/f1-stream/files/redeploy.sh
Executable file
|
|
@ -0,0 +1,7 @@
|
|||
#!/usr/bin/env bash
|
||||
set -e
|
||||
|
||||
docker buildx build --platform linux/amd64 --provenance=false \
|
||||
-t viktorbarzin/f1-stream:v2.0.1 -t viktorbarzin/f1-stream:latest \
|
||||
--push .
|
||||
kubectl -n f1-stream rollout restart deployment f1-stream
|
||||
|
|
@ -6,15 +6,6 @@ variable "nfs_server" { type = string }
|
|||
variable "discord_f1_guild_id" { type = string }
|
||||
variable "discord_f1_channel_ids" { type = string }
|
||||
|
||||
# Image tag for the Forgejo-registry image. CI (.woodpecker.yml in
|
||||
# viktor/f1-stream) builds + pushes `latest` and `<short-sha>`, then drives the
|
||||
# rollout via `kubectl set image`. Keel stays enrolled as a redundant net, so
|
||||
# the running tag is managed outside Terraform (see KEEL_IGNORE_IMAGE below).
|
||||
variable "image_tag" {
|
||||
type = string
|
||||
default = "latest"
|
||||
}
|
||||
|
||||
resource "kubernetes_namespace" "f1-stream" {
|
||||
metadata {
|
||||
name = "f1-stream"
|
||||
|
|
@ -22,7 +13,7 @@ resource "kubernetes_namespace" "f1-stream" {
|
|||
"istio-injection" : "disabled"
|
||||
tier = local.tiers.aux
|
||||
"chrome-service.viktorbarzin.me/client" = "true"
|
||||
"keel.sh/enrolled" = "true"
|
||||
"keel.sh/enrolled" = "true"
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
|
|
@ -127,7 +118,7 @@ resource "kubernetes_deployment" "f1-stream" {
|
|||
}
|
||||
spec {
|
||||
container {
|
||||
image = "forgejo.viktorbarzin.me/viktor/f1-stream:${var.image_tag}"
|
||||
image = "viktorbarzin/f1-stream:latest"
|
||||
image_pull_policy = "Always"
|
||||
name = "f1-stream"
|
||||
resources {
|
||||
|
|
@ -185,11 +176,6 @@ resource "kubernetes_deployment" "f1-stream" {
|
|||
claim_name = module.nfs_data_host.claim_name
|
||||
}
|
||||
}
|
||||
# Pull the (private) Forgejo-registry image. Kyverno syncs
|
||||
# registry-credentials into every namespace.
|
||||
image_pull_secrets {
|
||||
name = "registry-credentials"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue