Restore f1-stream stack — undo accidental bundling into 63fe7d2b

Commit 63fe7d2b (fan-control) was made with a bare `git commit` in the
shared infra working tree and inadvertently swept in a parallel session's
staged f1-stream-extraction work (main.tf repoint, ~48 files/ removals,
ci-cd.md + .claude docs, two extraction plan docs).

This returns every f1-stream-related path to its pre-63fe7d2b state
(3493c347) so that extraction can be committed cleanly by its own
session. The fan-control files added in 63fe7d2b are untouched.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-04 21:47:20 +00:00
parent 90ad6b9125
commit 147a8cff40
54 changed files with 9563 additions and 163 deletions

View file

@ -104,15 +104,13 @@ have `ignore_changes` on `…container[0].image` (KEEL_IGNORE_IMAGE) so CI
`:latest` + `imagePullPolicy: Always` (fresh pod each run) instead of a deploy
step. **Never** `set image`/`rollout restart` operator-managed StatefulSets
(memory id=740). Reference impls: `tuya_bridge/.woodpecker.yml`,
`job-hunter`, `f1-stream` (viktor/f1-stream, extracted from this monorepo
2026-06-04). This reverses decision #12 of
`job-hunter`. This reverses decision #12 of
`docs/plans/2026-05-16-auto-upgrade-apps-design.md` for owned (not upstream)
images.
**Flow (GHA-migrated apps)**: `git push → GHA build+push DockerHub (8-char SHA) → POST Woodpecker API → kubectl set image`
**Migrated to GHA** (9): Website, k8s-portal, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
**Woodpecker-native owned-app build** (Forgejo registry, build->deploy in one `.woodpecker.yml`): tuya_bridge, job-hunter, f1-stream (extracted to viktor/f1-stream 2026-06-04; Woodpecker repo id 166)
**Migrated to GHA** (10): Website, k8s-portal, f1-stream, claude-memory-mcp, apple-health-data, audiblez-web, plotting-book, insta2spotify, audiobook-search, council-complaints
**Woodpecker-only**: travel_blog (1.4GB content too large for GHA), infra pipelines (terragrunt apply, certbot, build-cli — need cluster access)
**Per-project files**:
@ -121,7 +119,7 @@ images.
- `.woodpecker/build-fallback.yml` — Old full build pipeline preserved (event: `deployment` — never auto-fires)
**Woodpecker API**: Uses **numeric repo IDs** (`/api/repos/2/pipelines`), NOT owner/name paths (those return HTML).
Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handler=6, audiblez-web=9, plotting-book=43, claude-memory-mcp=78, infra-onboarding=79, council-complaints=TBD (f1-stream's old GHA-era id 10 is defunct; it's now a Woodpecker-native build at repo id 166)
Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handler=6, audiblez-web=9, f1-stream=10, plotting-book=43, claude-memory-mcp=78, infra-onboarding=79, council-complaints=TBD
**Woodpecker YAML gotchas**:
- Commands with `${VAR}:${VAR}` must be **quoted** — unquoted `:` triggers YAML map parsing when vars are empty

View file

@ -46,7 +46,7 @@
| nextcloud | File sync/share | nextcloud |
| calibre | E-book management (may be merged into ebooks stack) | calibre |
| onlyoffice | Document editing | onlyoffice |
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier); source in own repo `viktor/f1-stream` (extracted 2026-06-04), Woodpecker-native build->deploy | f1-stream |
| f1-stream | F1 streaming (uses chrome-service for hmembeds verifier) | f1-stream |
| chrome-service | Headed Chromium WebSocket pool (`ws://chrome-service.chrome-service.svc:3000/<token>`) for sibling services driving anti-bot embeds | chrome-service |
| rybbit | Analytics | rybbit |
| isponsorblocktv | SponsorBlock for TV | isponsorblocktv |

View file

@ -58,9 +58,10 @@ graph LR
### Project Migration Status
**Migrated to GHA (8 projects)**:
**Migrated to GHA (9 projects)**:
- Website
- k8s-portal
- f1-stream
- claude-memory-mcp
- apple-health-data
- audiblez-web
@ -68,14 +69,6 @@ graph LR
- insta2spotify
- book-search (audiobook-search)
**Woodpecker-native owned-app builds** (build + push to the Forgejo private
registry + `kubectl set image` rollout, all in one `.woodpecker.yml`; Keel
stays enrolled as a redundant net):
- `tuya_bridge`, `job-hunter`, `f1-stream`
- `f1-stream` was extracted from this monorepo into its own repo
(`viktor/f1-stream`) on 2026-06-04; its Woodpecker repo id is 166 (the old
GHA-era id 10 is defunct).
**Woodpecker-only (infra + large apps)**:
- `travel_blog`: 5.7GB content directory exceeds GHA limits
- Infra pipelines: require cluster access (terragrunt apply, certbot, build-cli)
@ -99,6 +92,7 @@ Woodpecker API uses numeric IDs (not owner/name):
| travel_blog | 5 |
| webhook-handler | 6 |
| audiblez-web | 9 |
| f1-stream | 10 |
| plotting-book | 43 |
| claude-memory-mcp | 78 |
| infra-onboarding | 79 |

View file

@ -1,78 +0,0 @@
# f1-stream extraction + productionization — design (2026-06-04)
## Problem
`f1-stream` (FastAPI backend serving a SvelteKit SPA; ~15 pluggable stream
extractors + a Playwright/chrome-service playback verifier) lived **inside**
the infra monorepo at `infra/stacks/f1-stream/files/`. It had:
- no standalone repo — source coupled to the Terraform stack;
- **no real CI** — only a manual `redeploy.sh` doing a local `docker buildx`
push to DockerHub (`viktorbarzin/f1-stream`) + `kubectl rollout restart`;
- no README, no tests, a loose unpinned `requirements.txt`, no semver tags;
- a stale CI claim in docs ("migrated to GHA, Woodpecker repo id 10") that did
not match reality (no GHA workflow ever existed for it).
## Goal
Extract the app into its own Forgejo repo `viktor/f1-stream` and productionize
it, mirroring the established owned-app pattern (`tuya_bridge`, `job-hunter`,
`tripit`, `travel-agent`).
## Decisions (with rationale)
- **Registry → Forgejo private** (`forgejo.viktorbarzin.me/viktor/f1-stream`),
matching the fleet standard. Needs the `registry-credentials` pull secret
(Kyverno-synced to every namespace) on the deployment.
- **Packaging → Poetry + ruff + mypy** (replaces the loose pip
`requirements.txt`). Python **package stays `backend`** — imports are
`from backend.x` and the entrypoint is `uvicorn backend.main:app`; renaming
would churn every module + the Dockerfile + the staticfiles path. Python
**3.13 kept** (the live image already runs it; tripit's 3.12 pin is for
zxing-cpp/pymupdf, which f1-stream lacks).
- **Tests → pragmatic pure-logic only**. The extractors + verifier are
network/browser-bound; full coverage is brittle. Unit-test the deterministic
core: `m3u8_rewriter` (incl. the EXT-X tag rewriters), the `proxy` HLS
parsers, `schedule` parsing/status, the extractor `registry`. 63 tests.
- **CI → single `.woodpecker.yml`**: `lint-and-test` (ruff + mypy + pytest on
`python:3.13-slim`) → `build-and-push` (buildx → Forgejo, tags `latest` +
`${CI_COMMIT_SHA:0:8}`) → `deploy` (`kubectl set image` + `rollout status`).
**Keel stays enrolled** as a redundant net. This is the `tuya_bridge`
"build drives the rollout" model + a `travel-agent`-style test gate.
- A Slack-notify step was prototyped but **dropped**: the
`environment: { from_secret }` form is rejected by this Woodpecker
version's pipeline-struct decoder (`yaml: did not find expected key`), and
the canonical owned-app refs (`tuya_bridge`, `job-hunter`) have no Slack
step. Deploy success is confirmed by `rollout status`.
- **Versioning → first git tag `v2.0.1`** (continuity with the existing image
lineage; a fresh `v0.1.0` on a production 2.x app would mislead
monitoring/homepage). Deviates deliberately from the `v0.1.0` precedent of
tripit/travel-agent.
- **Runtime stays root** (matching the prior working image) to avoid a
non-root regression on the `/data` NFS write path and the Playwright browser
cache. Non-root is a possible future hardening.
## Terraform delta (the only infra change)
`infra/stacks/f1-stream/main.tf`:
- image `viktorbarzin/f1-stream:latest` (DockerHub) →
`forgejo.viktorbarzin.me/viktor/f1-stream:${var.image_tag}` (new
`var.image_tag`, default `latest`);
- add `image_pull_secrets { name = "registry-credentials" }` to the pod spec;
- delete `files/` (source now lives in the standalone repo) and `redeploy.sh`.
The image field is in the deployment's `ignore_changes` (KEEL_IGNORE_IMAGE), so
the live tag is managed by CI/Keel, not Terraform. Everything else — namespace,
ExternalSecrets (`f1-stream-secrets`, `chrome-service-client-secrets`), NFS data
volume, Anubis PoW policy, `ingress_factory`, homepage + x402 annotations,
Discord + chrome-service env — is unchanged.
## Blast radius
- The `f1-stream` K8s service is the only consumer; no other stack references
`viktorbarzin/f1-stream` or the `files/` dir (verified: no `path.module` /
`archive_file` / `null_resource` references the dir).
- Adding `imagePullSecrets` triggers one Recreate rollout that pulls the
*current* (still-DockerHub, public) image — safe; CI then switches it to the
Forgejo image.

View file

@ -1,54 +0,0 @@
# f1-stream extraction + productionization — plan (2026-06-04)
Companion to `2026-06-04-f1-stream-extraction-design.md`.
## Steps
1. **Scaffold** `/home/wizard/code/f1-stream/` — copy `backend/`, `frontend/`,
`Dockerfile`, `.dockerignore` from `infra/stacks/f1-stream/files/` by name
(exclude the `.claude/` marker + `redeploy.sh`); add `README.md`,
`.gitignore`. ✅
2. **Poetry conversion**`pyproject.toml` (dist `f1-stream` v2.0.1,
`packages=[{include="backend"}]`, pinned deps), `poetry.lock`, ruff/mypy/
pytest config (E501 per-file-ignored on the embedded-JS/scraper modules).
Rewrite the Dockerfile to a Poetry multi-stage build (Poetry 2.1.3 to match
the lock; python:3.13; keep Chromium libs + `playwright install chromium`;
keep `backend/` + `frontend/build/` siblings under `/app`). ✅
3. **Tests** — 63 pytest unit tests over the pure-logic core. ✅
4. **CI** — single `.woodpecker.yml` (lint+test → buildx push to Forgejo →
`kubectl set image` + rollout). ✅
5. **Create + push** — Forgejo repo `viktor/f1-stream` (private), commit, push
`master`, tag `v2.0.1`. ✅
6. **Enable in Woodpecker** — activate via
`scripts/woodpecker-register-forgejo-repo.sh` (Woodpecker repo id 166);
org-level `forgejo_user`/`forgejo_push_token` secrets apply. ✅
7. **Repoint Terraform**`main.tf` image → Forgejo + `var.image_tag` +
`image_pull_secrets`; `tg apply`. ✅
8. **Untrack from infra**`git rm -r stacks/f1-stream/files`; add
`/f1-stream/` to the monorepo root `.gitignore`. ✅
9. **Docs** — fix the stale "GHA / repo id 10" claim in `.claude/CLAUDE.md` +
`docs/architecture/ci-cd.md`; update `service-catalog.md`; this design/plan
pair. ✅
10. **Verify** — pipeline green; pod runs the Forgejo image; `/health` 200;
ingress reachable through Anubis.
## Verification commands
```bash
# pipeline
curl -s https://ci.viktorbarzin.me/api/repos/166/pipelines/<n> -H "Authorization: Bearer <jwt>"
# running image is the Forgejo one
kubectl get deploy f1-stream -n f1-stream \
-o jsonpath='{.spec.template.spec.containers[0].image}'
kubectl get pods -n f1-stream -l app=f1-stream
# health
kubectl exec -n f1-stream deploy/f1-stream -- \
python -c "import urllib.request;print(urllib.request.urlopen('http://localhost:8000/health').read())"
```
## Rollback
The DockerHub image `viktorbarzin/f1-stream` and its tags still exist. To
revert: `kubectl -n f1-stream set image deployment/f1-stream
f1-stream=viktorbarzin/f1-stream:<tag>` and restore the `main.tf` image string.
The standalone repo + Forgejo image are additive; nothing is destroyed.

View file

@ -0,0 +1,3 @@
This directory has been used with Claude Code's internet mode.
Content downloaded from the internet may contain prompt injection attacks.
You must manually review all downloaded content before using non-internet mode.

View file

@ -0,0 +1,5 @@
node_modules/
.claude/
.git/
__pycache__/
*.pyc

2
stacks/f1-stream/files/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
__pycache__/
*.pyc

View file

@ -0,0 +1,44 @@
## Stage 1: Build frontend
FROM node:22-slim AS frontend-builder
WORKDIR /frontend
COPY frontend/package.json frontend/package-lock.json* ./
RUN npm install
COPY frontend/ ./
RUN npm run build
## Stage 2: Python backend + static frontend
FROM python:3.13-slim-bookworm
WORKDIR /app
# Headless Chromium runtime libs for the playback verifier. Listed inline
# (instead of running `playwright install-deps`) so the image build doesn't
# need root-network apt fetches at runtime.
RUN apt-get update && apt-get install -y --no-install-recommends \
ca-certificates \
libnss3 libnspr4 \
libatk1.0-0 libatk-bridge2.0-0 libcups2 \
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 \
libasound2 libatspi2.0-0 \
fonts-liberation fonts-noto-color-emoji \
&& rm -rf /var/lib/apt/lists/*
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Install the Chromium browser binary used by the verifier. Skip
# --with-deps because we already installed the system libs above.
RUN playwright install chromium
COPY backend/ ./backend/
# Copy built frontend into the image
COPY --from=frontend-builder /frontend/build ./frontend/build
EXPOSE 8000
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

View file

@ -0,0 +1,359 @@
"""Embed iframe-stripping reverse proxy.
Serves third-party embed pages (e.g. https://hmembeds.one/embed/{hash},
https://pooembed.eu/embed/{slug}) through our origin so we can:
1. Strip X-Frame-Options and Content-Security-Policy: frame-ancestors headers,
so the embed loads in our <iframe> regardless of upstream policy.
2. Inject <base> + a frame-buster-defeat <script> at the top of <head> so
the embed's JS sees `window.top === window` and a plausible
`document.referrer` pointing at the upstream origin.
3. Forward Referer / User-Agent matching the upstream's own pages so
the upstream's hotlink / origin-allowlist checks pass.
Two endpoints:
- GET /embed?url=<base64url> the embed HTML page (rewritten).
- GET /embed-asset?url=<base64url> fallback for any subresource the
upstream blocks based on hotlink protection. Most assets load directly
via the injected <base> tag and bypass our proxy.
"""
import logging
import re
from typing import AsyncGenerator
from urllib.parse import urlparse
import httpx
from fastapi import HTTPException
from backend.m3u8_rewriter import decode_url
logger = logging.getLogger(__name__)
EMBED_TIMEOUT = 20.0
ASSET_TIMEOUT = 30.0
RELAY_CHUNK_SIZE = 65536
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
# Response headers we never forward (they break frame embedding or leak upstream policy).
STRIP_RESPONSE_HEADERS = {
"x-frame-options",
"content-security-policy",
"content-security-policy-report-only",
"set-cookie",
"report-to",
"nel",
"permissions-policy",
"cross-origin-opener-policy",
"cross-origin-embedder-policy",
"cross-origin-resource-policy",
# let httpx/uvicorn re-set these
"transfer-encoding",
"content-encoding",
"content-length",
"connection",
}
# Inject this <script> at the top of <head> to defeat JS frame-busters.
# - Locks window.top, window.parent, and window.self to the embed window
# itself, so `self !== window.top` checks pass.
# - Forces document.referrer to the upstream origin so allowlist checks
# like `document.referrer.includes("timstreams.net")` keep working.
# - No-ops anything that would call window.parent.location or attempt to
# reload the top frame.
_FRAME_BUSTER_DEFEAT_TEMPLATE = """
<script>(function(){{
try {{
var fakeWindow = window;
Object.defineProperty(window, 'top', {{get: function(){{return fakeWindow;}}, configurable: false}});
Object.defineProperty(window, 'parent', {{get: function(){{return fakeWindow;}}, configurable: false}});
Object.defineProperty(window, 'frameElement', {{get: function(){{return null;}}, configurable: false}});
Object.defineProperty(document, 'referrer', {{get: function(){{return {referrer!r};}}, configurable: false}});
}} catch (e) {{}}
// Defeat the `disable-devtool.js` redirect trap that hmembeds and similar
// embed hosts use. The trap fires `console.clear`/`console.table` in a
// tight loop, then if it thinks DevTools is open, calls
// `window.location = "https://www.google.com"`. We block those redirect
// sinks while leaving normal playback unaffected.
try {{
var noop = function(){{}};
console.clear = noop;
console.table = noop;
console.dir = noop;
var loc = window.location;
Object.defineProperty(window, 'location', {{
get: function(){{ return loc; }},
set: function(v){{ /* swallow assignment */ }},
configurable: false,
}});
var origAssign = loc.assign && loc.assign.bind(loc);
var origReplace = loc.replace && loc.replace.bind(loc);
loc.assign = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origAssign) origAssign(u); }};
loc.replace = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origReplace) origReplace(u); }};
}} catch (e) {{}}
// Route all cross-origin fetch/XHR requests through our /embed-asset
// proxy. The hmembeds player calls a token-binding endpoint
// (hghndasw.gbgdhdffhf.shop/sec/<JWT>) that CORS-rejects requests from
// any origin other than hmembeds.one. By rewriting the URL to
// /embed-asset?url=..., the browser fetches our same-origin endpoint
// (no CORS issue), and our backend fetches the upstream with the
// correct Referer/Origin server-side (no CORS issue there either).
try {{
var b64url = function(s) {{
return btoa(unescape(encodeURIComponent(s)))
.replace(/\\+/g, '-').replace(/\\//g, '_').replace(/=+$/, '');
}};
var sameOrigin = function(u) {{
try {{ return (new URL(u, document.baseURI || location.href)).origin === location.origin; }}
catch (_) {{ return true; }}
}};
var toAbsolute = function(u) {{
try {{ return (new URL(u, document.baseURI || location.href)).toString(); }}
catch (_) {{ return u; }}
}};
var proxify = function(u) {{
var abs = toAbsolute(u);
if (sameOrigin(abs)) return u;
// Don't double-proxy.
if (abs.indexOf('/embed-asset?') !== -1 || abs.indexOf('/embed?') !== -1) return u;
return location.origin + '/embed-asset?url=' + b64url(abs);
}};
var _fetch = window.fetch && window.fetch.bind(window);
if (_fetch) {{
window.fetch = function(input, init) {{
try {{
if (typeof input === 'string') {{
return _fetch(proxify(input), init);
}} else if (input && input.url) {{
var newUrl = proxify(input.url);
if (newUrl !== input.url) {{
return _fetch(new Request(newUrl, input), init);
}}
}}
}} catch (e) {{}}
return _fetch(input, init);
}};
}}
var XHR = window.XMLHttpRequest;
if (XHR && XHR.prototype && XHR.prototype.open) {{
var _open = XHR.prototype.open;
XHR.prototype.open = function(method, url) {{
try {{ url = proxify(url); }} catch (e) {{}}
var args = Array.prototype.slice.call(arguments);
args[1] = url;
return _open.apply(this, args);
}};
}}
}} catch (e) {{}}
}})();</script>
"""
def _decode(encoded_url: str) -> str:
try:
return decode_url(encoded_url)
except Exception as e:
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
def _filter_headers(upstream_headers: httpx.Headers) -> dict[str, str]:
"""Forward upstream headers minus the ones we strip."""
out: dict[str, str] = {}
for k, v in upstream_headers.items():
if k.lower() in STRIP_RESPONSE_HEADERS:
continue
out[k] = v
# Always allow our domain to embed and load cross-origin
out["Access-Control-Allow-Origin"] = "*"
out["X-Frame-Options-Stripped"] = "by-f1-embed-proxy"
return out
def _make_referer(upstream_url: str) -> str:
"""Build a plausible Referer header — the upstream's own root."""
parsed = urlparse(upstream_url)
return f"{parsed.scheme}://{parsed.netloc}/"
def _make_origin(upstream_url: str) -> str:
parsed = urlparse(upstream_url)
return f"{parsed.scheme}://{parsed.netloc}"
def _inject_into_head(html: str, upstream_url: str) -> str:
"""Inject <base> tag + frame-buster defeat script into the response HTML."""
parsed = urlparse(upstream_url)
base_href = f"{parsed.scheme}://{parsed.netloc}/"
# The frame-buster-defeat script. Use the upstream's own URL as the spoofed referrer.
busted = _FRAME_BUSTER_DEFEAT_TEMPLATE.format(referrer=upstream_url)
base_tag = f'<base href="{base_href}">'
injection = base_tag + busted
# Drop any inline CSP <meta> tags first so they can't override our header strip.
html = re.sub(
r'<meta[^>]+http-equiv=[\'"]?Content-Security-Policy[\'"]?[^>]*>',
"",
html,
flags=re.IGNORECASE,
)
# Strip disable-devtool.js script tags. The library runs detection heuristics
# and redirects on match. Removing it reduces attack surface even with our
# location-setter lockdown — saves redundant work and one fewer thing to
# bypass in case the lockdown misses an edge case.
html = re.sub(
r'<script[^>]+(?:disable-devtool|devtool|disabledevtool)[^<]*</script>',
"",
html,
flags=re.IGNORECASE,
)
html = re.sub(
r'<script[^>]+src=["\'][^"\']*disable-devtool[^"\']*["\'][^>]*></script>',
"",
html,
flags=re.IGNORECASE,
)
# Insert immediately after the opening <head> (case-insensitive).
head_match = re.search(r"<head[^>]*>", html, flags=re.IGNORECASE)
if head_match:
idx = head_match.end()
return html[:idx] + injection + html[idx:]
# No <head> — prepend at the start of the document so the script runs first.
return injection + html
def _looks_blocked_by_anti_bot(content: str) -> bool:
"""Detect Cloudflare-style challenge interstitials in the upstream body."""
sample = content[:4096].lower()
markers = (
"cf-chl-bypass",
"checking your browser",
"just a moment",
"attention required",
"cf-browser-verification",
)
return any(m in sample for m in markers)
async def fetch_embed(encoded_url: str) -> tuple[bytes, dict[str, str], int]:
"""Fetch an upstream embed page, rewrite the HTML, and return the response.
Returns: (body_bytes, headers_dict, status_code).
Raises HTTPException on transport errors.
"""
url = _decode(encoded_url)
logger.info("Embed-proxying: %s", url)
upstream_headers = {
"User-Agent": USER_AGENT,
"Referer": _make_referer(url),
"Origin": _make_origin(url),
"Accept": (
"text/html,application/xhtml+xml,application/xml;q=0.9,"
"image/avif,image/webp,*/*;q=0.8"
),
"Accept-Language": "en-US,en;q=0.9",
}
try:
async with httpx.AsyncClient(
timeout=EMBED_TIMEOUT,
follow_redirects=True,
) as client:
response = await client.get(url, headers=upstream_headers)
except httpx.TimeoutException:
raise HTTPException(status_code=504, detail="Upstream embed timeout")
except httpx.HTTPError as e:
raise HTTPException(status_code=502, detail=f"Upstream embed error: {e}")
status_code = response.status_code
upstream_ct = response.headers.get("content-type", "")
headers_out = _filter_headers(response.headers)
body = response.content
# Detect Cloudflare-style challenge so the frontend can show a clear error.
if "html" in upstream_ct.lower():
text = response.text
if _looks_blocked_by_anti_bot(text):
logger.warning("Upstream returned anti-bot challenge: %s", url)
raise HTTPException(
status_code=502,
detail="Upstream returned anti-bot challenge — proxy cannot bypass",
)
rewritten = _inject_into_head(text, url)
body = rewritten.encode("utf-8")
headers_out["Content-Type"] = "text/html; charset=utf-8"
return body, headers_out, status_code
async def relay_asset(
encoded_url: str, range_header: str | None
) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
"""Relay an upstream subresource (JS/CSS/image/font) as a chunked stream.
Used as a fallback when an upstream blocks hotlinked assets via Referer
or Origin checks. The injected <base> tag handles most of these cases
by letting the browser hit upstream directly the relay is only for
the awkward few that need a proxied origin.
"""
url = _decode(encoded_url)
logger.debug("Embed-asset relay: %s", url)
headers = {
"User-Agent": USER_AGENT,
"Referer": _make_referer(url),
"Origin": _make_origin(url),
"Accept": "*/*",
}
if range_header:
headers["Range"] = range_header
client = httpx.AsyncClient(timeout=ASSET_TIMEOUT, follow_redirects=True)
try:
response = await client.send(
client.build_request("GET", url, headers=headers),
stream=True,
)
except httpx.TimeoutException:
await client.aclose()
raise HTTPException(status_code=504, detail="Upstream asset timeout")
except httpx.HTTPError as e:
await client.aclose()
raise HTTPException(status_code=502, detail=f"Upstream asset error: {e}")
if response.status_code >= 400:
await response.aclose()
await client.aclose()
raise HTTPException(
status_code=502,
detail=f"Upstream asset returned HTTP {response.status_code}",
)
headers_out = _filter_headers(response.headers)
async def _stream() -> AsyncGenerator[bytes, None]:
try:
async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
yield chunk
finally:
await response.aclose()
await client.aclose()
return _stream(), headers_out, response.status_code

View file

@ -0,0 +1,93 @@
"""Stream extraction framework.
To add a new extractor:
1. Create a new file in this package (e.g., my_site.py)
2. Subclass BaseExtractor from backend.extractors.base
3. Implement site_key, site_name, and extract()
4. Import and register it in this file's create_registry() function
Example:
from backend.extractors.my_site import MySiteExtractor
registry.register(MySiteExtractor())
"""
from backend.extractors.aceztrims import AceztrimsExtractor
from backend.extractors.chrome_browser import ChromeBrowserExtractor
from backend.extractors.curated import CuratedExtractor
from backend.extractors.dd12 import DD12Extractor
from backend.extractors.hmembeds import HmembedsExtractor
from backend.extractors.stremio import StremioAddonExtractor
from backend.extractors.subreddit import SubredditExtractor
from backend.extractors.daddylive import DaddyLiveExtractor
from backend.extractors.discord_source import DiscordExtractor
from backend.extractors.models import ExtractedStream
from backend.extractors.pitsport import PitsportExtractor
from backend.extractors.ppv import PPVExtractor
from backend.extractors.registry import ExtractorRegistry
from backend.extractors.service import ExtractionService
from backend.extractors.streamed import StreamedExtractor
from backend.extractors.timstreams import TimStreamsExtractor
__all__ = [
"ExtractedStream",
"ExtractorRegistry",
"ExtractionService",
"create_registry",
"create_extraction_service",
]
def create_registry() -> ExtractorRegistry:
"""Create and populate the extractor registry with all known extractors.
Add new extractors here by importing and registering them.
"""
registry = ExtractorRegistry()
# --- Register extractors below ---
# CuratedExtractor previously surfaced two hmembeds 24/7 channels (Sky
# Sports F1, DAZN F1) but their JW Player decoder produces an empty
# playlist in our environment (error 102630) regardless of headed mode,
# IP, or fingerprint we tried. The streams loaded the upstream's ad
# overlay but never produced a video element, so they confused users —
# disabled until/unless we find a working bypass.
# registry.register(CuratedExtractor())
registry.register(StreamedExtractor())
# ChromeBrowserExtractor drives the in-cluster chrome-service via the
# CHROME_WS_URL / CHROME_WS_TOKEN env vars to scrape JS-rendered
# pages whose m3u8 is computed at runtime.
registry.register(ChromeBrowserExtractor())
# SubredditExtractor pulls live-stream posts from motorsport subreddits.
# Returns embed-type streams; the verifier will visit each via
# chrome-service to confirm playability.
registry.register(SubredditExtractor())
# DD12Extractor scrapes DD12Streams' per-channel pages for the inline
# JW Player file URL. The site embeds the m3u8 in HTML so curl-based
# parsing is enough — no browser needed.
registry.register(DD12Extractor())
# HmembedsExtractor offline-decodes hmembeds.one JWT m3u8 URLs
# (base64+XOR with hardcoded key per page; reverse-engineered
# 2026-05-07). Verifier filters dead origins.
registry.register(HmembedsExtractor())
# StremioAddonExtractor calls Stremio addon HTTP APIs (TvVoo, StremVerse)
# which already index Sky F1 / DAZN F1 / Vavoo IPTV channels. No
# Stremio client needed — just /stream/<type>/<id>.json calls.
registry.register(StremioAddonExtractor())
registry.register(DaddyLiveExtractor())
registry.register(AceztrimsExtractor())
registry.register(PitsportExtractor())
registry.register(PPVExtractor())
registry.register(TimStreamsExtractor())
registry.register(DiscordExtractor())
return registry
def create_extraction_service() -> ExtractionService:
"""Create an ExtractionService with all extractors registered.
This is the main entry point for the extraction framework.
Call this once during app startup.
"""
registry = create_registry()
return ExtractionService(registry)

View file

@ -0,0 +1,122 @@
"""Aceztrims extractor — scrapes embed URLs from acestrlms.pages.dev/f11/.
The page (Cloudflare Pages, no anti-bot) hosts an iframe + a strip of
onclick channel-switcher buttons. Each button rewrites the iframe via
`document.getElementById('iframe').src = '<embed_url>'`. The initial
channel is hard-coded as `<iframe id='iframe' src='...'>`.
We strip HTML comments first because the page keeps ~20 legacy channel
buttons inside `<!-- ... -->` blocks for easy re-enablement; the previous
loose regex picked them up as false positives.
All channels are iframe embeds (no direct m3u8) `stream_type='embed'`.
Site naming note: the extractor key stays `aceztrims` (the previous
domain) so registry/cache identifiers don't churn. The current domain
is `acestrlms.pages.dev` and the F1 path is `/f11/` (two ones `/f1/`
is the cross-sport schedule page and has no stream buttons).
"""
import logging
import re
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
BASE_URL = "https://acestrlms.pages.dev"
F1_PAGES = [
("/f11/", "Formula 1"),
]
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
# `document.getElementById('iframe').src = '<URL>'` — current channel-switcher format.
_ONCLICK_IFRAME_SRC = re.compile(
r"""document\.getElementById\(['"]iframe['"]\)\.src\s*=\s*['"]([^'"]+)['"]""",
re.IGNORECASE,
)
# `<iframe id='iframe' src='<URL>'>` — the default/initial channel.
_DEFAULT_IFRAME = re.compile(
r"""<iframe[^>]*id\s*=\s*['"]iframe['"][^>]*src\s*=\s*['"]([^'"]+)['"]""",
re.IGNORECASE,
)
_HTML_COMMENT = re.compile(r"<!--.*?-->", re.DOTALL)
class AceztrimsExtractor(BaseExtractor):
"""Pulls iframe embed URLs out of the acestrlms.pages.dev F1 page."""
@property
def site_key(self) -> str:
return "aceztrims"
@property
def site_name(self) -> str:
return "Aceztrims"
async def extract(self) -> list[ExtractedStream]:
streams: list[ExtractedStream] = []
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT},
) as client:
for path, category in F1_PAGES:
try:
streams.extend(await self._scrape_page(client, path, category))
except Exception:
logger.exception("[aceztrims] Failed to scrape %s", path)
logger.info("[aceztrims] Extracted %d stream(s)", len(streams))
return streams
async def _scrape_page(
self, client: httpx.AsyncClient, path: str, category: str
) -> list[ExtractedStream]:
url = f"{BASE_URL}{path}"
resp = await client.get(url)
if resp.status_code != 200:
logger.warning(
"[aceztrims] %s returned HTTP %d", path, resp.status_code
)
return []
# The page keeps a block of legacy channel buttons inside
# `<!-- ... -->` for quick re-enablement. Strip comments first so
# the regex only sees live buttons.
html = _HTML_COMMENT.sub("", resp.text)
seen: set[str] = set()
streams: list[ExtractedStream] = []
for pattern in (_DEFAULT_IFRAME, _ONCLICK_IFRAME_SRC):
for match in pattern.finditer(html):
embed_url = match.group(1).strip()
if not embed_url or embed_url in seen:
continue
seen.add(embed_url)
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=f"{category} Stream",
stream_type="embed",
embed_url=embed_url,
)
)
logger.info(
"[aceztrims] Found %d stream(s) on %s", len(streams), path
)
return streams

View file

@ -0,0 +1,118 @@
"""Base class for all site-specific stream extractors."""
import logging
from abc import ABC, abstractmethod
import httpx
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
class BaseExtractor(ABC):
"""Abstract base class for site-specific stream extractors.
To create a new extractor:
1. Create a new file in backend/extractors/
2. Subclass BaseExtractor
3. Implement site_key, site_name, and extract()
4. Register it in backend/extractors/__init__.py
"""
@property
@abstractmethod
def site_key(self) -> str:
"""Unique identifier for this site (e.g., 'sportsurge').
Must be lowercase, alphanumeric with hyphens/underscores only.
Used as the cache key and in API responses.
"""
@property
@abstractmethod
def site_name(self) -> str:
"""Human-readable name (e.g., 'SportSurge').
Displayed in the UI and API responses.
"""
@abstractmethod
async def extract(self) -> list[ExtractedStream]:
"""Extract stream URLs from this site.
Returns a list of ExtractedStream objects. Each represents a
discovered stream URL. The extractor should set url, quality,
and title fields; site_key, site_name, and extracted_at are
auto-populated if left empty.
Implementations should:
- Use httpx for HTTP requests
- Handle their own errors gracefully (log and return empty list)
- Set quality when detectable from the source
- Set title to something descriptive
"""
async def health_check(self, url: str) -> bool:
"""Verify a URL is live (HEAD request, check for m3u8 content).
Sends a HEAD request and checks:
1. HTTP 200 response
2. Content-Type suggests HLS/media content (if available)
Returns True if the URL appears to be a live stream.
"""
try:
async with httpx.AsyncClient(
timeout=10.0,
follow_redirects=True,
headers={"User-Agent": "Mozilla/5.0"},
) as client:
response = await client.head(url)
if response.status_code != 200:
logger.debug(
"[%s] Health check failed for %s: HTTP %d",
self.site_key,
url,
response.status_code,
)
return False
content_type = response.headers.get("content-type", "").lower()
# m3u8 streams typically have these content types
live_indicators = [
"application/vnd.apple.mpegurl",
"application/x-mpegurl",
"video/",
"audio/",
"octet-stream",
]
# If content-type is present and doesn't look like media,
# the URL might not be a stream. But some servers don't set
# content-type properly for HEAD, so we still return True
# if content-type is missing or generic.
if content_type and not any(ind in content_type for ind in live_indicators):
# Content type present but doesn't look like media.
# Could still be valid (some servers return text/plain for m3u8).
if "text/" in content_type or "html" in content_type:
logger.debug(
"[%s] Health check suspect for %s: content-type=%s",
self.site_key,
url,
content_type,
)
return False
return True
except httpx.TimeoutException:
logger.debug("[%s] Health check timed out for %s", self.site_key, url)
return False
except httpx.HTTPError as e:
logger.debug("[%s] Health check error for %s: %s", self.site_key, url, e)
return False
except Exception:
logger.exception("[%s] Unexpected error during health check for %s", self.site_key, url)
return False

View file

@ -0,0 +1,247 @@
"""Generic chrome-service-driven extractor.
Drives the in-cluster headed Chromium pool (chrome-service) to load a list
of stream/aggregator pages, captures any HLS playlist URL the page fetches
at runtime, and returns one ExtractedStream per discovered playlist.
Unlike the API-based extractors (pitsport/streamed/ppv) this one handles
sites where the m3u8 is computed by JavaScript at page load time the
URL only exists after the page evaluates an obfuscated decoder, fetches a
token, etc. Curl can't see it; a real browser can.
Add new targets via the `TARGETS` constant below. Each entry is a (label,
title, page_url) tuple. The extractor visits each URL with a stealthed
context, waits for the JS to settle, and yields any captured HLS URL.
"""
import asyncio
import logging
import os
import re
import urllib.parse
from dataclasses import dataclass
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
# Best-effort pause between navigation and capture. The decoder usually
# fires within 5s; 12s gives slow JS time to settle without dragging the
# extraction round.
DEFAULT_SETTLE_SECONDS = 12
USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/17.4 Safari/605.1.15"
)
@dataclass(frozen=True)
class _Target:
label: str # site_name (homepage label in the UI)
title: str # human-readable stream title
url: str # page to navigate
settle: int = DEFAULT_SETTLE_SECONDS
# ---------------------------------------------------------------------------
# Target list. F1-relevant 24/7 channels and motorsport aggregator pages
# whose m3u8 is JS-computed. Add freely — each one takes ~12s to scrape.
# ---------------------------------------------------------------------------
TARGETS: tuple[_Target, ...] = (
# MotoMundo embed pages — the community-curated WordPress site for
# MotoGP. Each /e/<id> URL is one of the iframes their "Watch Online"
# post lists for the active session (FP/Q/Race). The m3u8 is
# JS-computed at load time so a real browser is required to capture
# it. Update IDs each weekend to match the current race; subreddit.py
# discovers them from the Reddit "[Watch / Download]" thread.
_Target(
label="MotoMundo",
title="MotoGP Live (MotoMundo) — French GP / Le Mans",
url="https://motomundo.top/e/9yzn08jk9py4",
settle=15,
),
_Target(
label="MotoMundo",
title="MotoGP Live (MotoMundo upns) — French GP / Le Mans",
url="https://motomundo.upns.xyz/#kqasde",
settle=15,
),
)
# Heuristic to recognise an HLS playlist URL from network capture. Most CDNs
# use `.m3u8`; some (pushembdz/oe1.ossfeed) disguise the playlist as `.css`
# under a /out/v… or /hls/ path. Filter out obvious junk (.css for actual
# stylesheets, .ts segments — we only want the playlist).
_HLS_URL_RE = re.compile(r"\.m3u8(\?|$)|/out/v[0-9]+/.+\.css(\?|$)|/hls/.+/master\.css(\?|$)")
_SEGMENT_EXT_RE = re.compile(r"\.(ts|m4s|aac|key)(\?|$)")
def _looks_like_hls_playlist(url: str) -> bool:
if _SEGMENT_EXT_RE.search(url):
return False
return bool(_HLS_URL_RE.search(url))
def _resolve_chrome_cdp() -> str | None:
"""Resolve the CHROME_CDP_URL env var (set by f1-stream's TF stack).
Migrated 2026-06-04 from CHROME_WS_URL/CHROME_WS_TOKEN. chrome-service
now runs chromium directly with CDP exposed on :9222 so its persistent
user-data-dir actually persists cookies (the old playwright launch-server
pattern created ephemeral contexts per `connect()`). NetworkPolicy
(labelled client namespaces only) is the only gate no path token.
"""
return os.getenv("CHROME_CDP_URL")
class ChromeBrowserExtractor(BaseExtractor):
"""Drive chrome-service to capture m3u8 URLs from JS-heavy pages."""
@property
def site_key(self) -> str:
return "chrome-browser"
@property
def site_name(self) -> str:
return "Chrome Browser"
async def extract(self) -> list[ExtractedStream]:
cdp_url = _resolve_chrome_cdp()
if not cdp_url:
logger.warning(
"[chrome-browser] CHROME_CDP_URL not set — extractor disabled"
)
return []
try:
from playwright.async_api import async_playwright
except ImportError:
logger.warning("[chrome-browser] playwright not installed — disabled")
return []
# One Playwright instance + one browser connection per extraction
# round. Contexts are cheap; the browser is shared.
async with async_playwright() as p:
try:
browser = await p.chromium.connect_over_cdp(cdp_url, timeout=15_000)
except Exception:
logger.exception("[chrome-browser] CDP connect to chrome-service failed")
return []
results: list[ExtractedStream] = []
for target in TARGETS:
try:
stream = await self._scrape(browser, target)
if stream:
results.append(stream)
except Exception:
logger.exception(
"[chrome-browser] failed to scrape %s", target.url
)
try:
await browser.close()
except Exception:
pass
logger.info("[chrome-browser] returned %d stream(s)", len(results))
return results
async def _scrape(self, browser, target: _Target) -> ExtractedStream | None:
ctx = await browser.new_context(
user_agent=USER_AGENT,
viewport={"width": 1280, "height": 720},
bypass_csp=True,
)
# Inject the same stealth script the verifier uses so anti-bot
# checks don't trip the page before its decoder runs.
try:
from backend.stealth import STEALTH_JS
await ctx.add_init_script(STEALTH_JS)
except Exception:
pass
page = await ctx.new_page()
captured: list[str] = []
def on_response(resp):
try:
if _looks_like_hls_playlist(resp.url):
captured.append(resp.url)
except Exception:
pass
page.on("response", on_response)
# Some pages (DD12 variants) load the player in a child iframe;
# frame events catch nested navigations.
page.on(
"framenavigated",
lambda fr: captured.append(fr.url) if _looks_like_hls_playlist(fr.url) else None,
)
try:
await page.goto(target.url, wait_until="domcontentloaded", timeout=20_000)
except Exception as e:
logger.debug("[chrome-browser] %s goto failed: %s", target.url, e)
await ctx.close()
return None
# Let the page's JS settle.
await asyncio.sleep(target.settle)
# Also probe child iframes — `pushembdz`, `pooembed`, `embedsports`
# all live behind one. Collect any HLS URL the iframes loaded.
for fr in page.frames:
if fr is page.main_frame:
continue
try:
# JW Player and Clappr both expose the playing source via
# a <video>/`<source>` element after setup completes.
sources = await fr.evaluate(
"() => Array.from(document.querySelectorAll('video, source')).map(e => e.currentSrc || e.src || '').filter(s => s.includes('.m3u8') || s.includes('.css'))"
)
for s in sources:
if _looks_like_hls_playlist(s):
captured.append(s)
except Exception:
pass
await ctx.close()
# Pick the first plausible URL (any subsequent are usually variant
# playlists referenced from the master). Prefer URLs that look like
# full master playlists.
unique = list(dict.fromkeys(captured))
if not unique:
logger.debug("[chrome-browser] %s yielded no HLS URL", target.url)
return None
# Prefer URLs that look like a master/index playlist over variant
# playlists when both are captured.
master = next(
(u for u in unique if "master" in u.lower() or "index" in u.lower()),
unique[0],
)
# Strip query strings on URLs that include short-lived tokens —
# the verifier and frontend re-resolve them per request.
# (Some CDNs require the query though; only strip when obvious.)
m3u8 = master
# Decode URL-encoded characters so the proxy gets a clean URL.
m3u8 = urllib.parse.unquote(m3u8)
logger.info(
"[chrome-browser] %s -> %s",
target.url, m3u8[:120],
)
return ExtractedStream(
url=m3u8,
site_key=self.site_key,
site_name=target.label,
quality="",
title=target.title,
stream_type="m3u8",
)

View file

@ -0,0 +1,61 @@
"""Curated extractor — known-good 24/7 F1 channels via direct embed URLs.
Returns a small, hand-picked list of embed URLs that are reliable enough to
be served as fallback "always-on" streams when the dynamic extractors find
nothing (e.g. between race weekends, when API providers are down).
These are direct embed URLs. The frontend routes them through /embed so the
iframe-stripping proxy bypasses any frame-buster JS in the upstream player.
"""
import logging
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
# Curated list. Each entry is a known direct embed URL. These were sourced
# from the timstreams.py ALWAYS_INCLUDE_HASHES list (Sky Sports F1, DAZN F1)
# and are documented as 24/7 channels that play F1 content year-round.
_CURATED_STREAMS = [
{
"url": "https://hmembeds.one/embed/888520f36cd94c5da4c71fddc1a5fc9b",
"title": "Sky Sports F1 (24/7)",
"quality": "HD",
},
{
"url": "https://hmembeds.one/embed/fc3a54634d0867b0c02ee3223292e7c6",
"title": "DAZN F1 (24/7)",
"quality": "HD",
},
]
class CuratedExtractor(BaseExtractor):
"""Returns curated known-good 24/7 F1 channel embed URLs."""
@property
def site_key(self) -> str:
return "curated"
@property
def site_name(self) -> str:
return "Curated 24/7 Channels"
async def extract(self) -> list[ExtractedStream]:
streams = [
ExtractedStream(
url=entry["url"],
site_key=self.site_key,
site_name=self.site_name,
quality=entry["quality"],
title=entry["title"],
stream_type="embed",
embed_url=entry["url"],
)
for entry in _CURATED_STREAMS
]
logger.info("[curated] Returning %d curated stream(s)", len(streams))
return streams

View file

@ -0,0 +1,181 @@
"""DaddyLive extractor - extracts m3u8 streams from DaddyLive for F1 channels.
Extraction chain:
1. Fetch stream page parse iframe src
2. Fetch player page XOR-decode auth params (key=109)
3. Call server lookup API get server_key
4. Construct m3u8 URL from server_key + channel key
"""
import logging
import re
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
# F1-relevant channel IDs on DaddyLive
F1_CHANNELS = {
60: "Sky Sports F1 UK",
}
DLHD_BASE = "https://dlhd.link"
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
XOR_KEY = 109
def _xor_decode(encoded: str) -> str:
"""XOR-decode a string using key 109."""
return "".join(chr(ord(c) ^ XOR_KEY) for c in encoded)
class DaddyLiveExtractor(BaseExtractor):
"""Extracts m3u8 streams from DaddyLive for Sky Sports F1.
The extraction chain requires maintaining referer headers throughout:
1. Fetch stream page at dlhd.link
2. Parse iframe src pointing to the player page
3. XOR-decode auth params from the player page to get channelKey
4. Call server lookup API to get server_key
5. Construct the final m3u8 URL
"""
@property
def site_key(self) -> str:
return "daddylive"
@property
def site_name(self) -> str:
return "DaddyLive"
async def extract(self) -> list[ExtractedStream]:
"""Extract m3u8 URLs for all configured F1 channels."""
streams: list[ExtractedStream] = []
for channel_id, channel_name in F1_CHANNELS.items():
try:
stream = await self._extract_channel(channel_id, channel_name)
if stream:
streams.append(stream)
except Exception:
logger.exception(
"[daddylive] Failed to extract channel %d (%s)",
channel_id,
channel_name,
)
logger.info("[daddylive] Extracted %d stream(s)", len(streams))
return streams
async def _extract_channel(
self, channel_id: int, channel_name: str
) -> ExtractedStream | None:
"""Extract a single channel's m3u8 URL through the full chain."""
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT},
) as client:
# Step 1: Fetch stream page and parse iframe src
stream_page_url = f"{DLHD_BASE}/stream/stream-{channel_id}.php"
resp = await client.get(
stream_page_url,
headers={"Referer": f"{DLHD_BASE}/"},
)
if resp.status_code != 200:
logger.warning(
"[daddylive] Stream page returned HTTP %d for channel %d",
resp.status_code,
channel_id,
)
return None
# Parse iframe src from the stream page
iframe_match = re.search(
r'<iframe[^>]+src=["\']([^"\']+)["\']', resp.text, re.IGNORECASE
)
if not iframe_match:
logger.warning(
"[daddylive] No iframe found on stream page for channel %d",
channel_id,
)
return None
player_url = iframe_match.group(1)
if player_url.startswith("//"):
player_url = "https:" + player_url
logger.debug("[daddylive] Player URL for channel %d: %s", channel_id, player_url)
# Step 2: Fetch player page and extract XOR-encoded params
resp = await client.get(
player_url,
headers={"Referer": stream_page_url},
)
if resp.status_code != 200:
logger.warning(
"[daddylive] Player page returned HTTP %d for channel %d",
resp.status_code,
channel_id,
)
return None
# Look for the channel key - the XOR-encoded value that decodes to premium{id}
# Try to find the encoded channel parameter in the page
channel_key = f"premium{channel_id}"
# Step 3: Call server lookup API
lookup_url = f"https://chevy.vovlacosa.sbs/server_lookup?channel_id={channel_key}"
resp = await client.get(
lookup_url,
headers={"Referer": player_url},
)
if resp.status_code != 200:
logger.warning(
"[daddylive] Server lookup returned HTTP %d for channel %d",
resp.status_code,
channel_id,
)
return None
try:
lookup_data = resp.json()
server_key = lookup_data.get("server_key", "")
except Exception:
logger.warning(
"[daddylive] Failed to parse server lookup response for channel %d",
channel_id,
)
return None
if not server_key:
logger.warning(
"[daddylive] No server_key in lookup response for channel %d",
channel_id,
)
return None
# Step 4: Construct m3u8 URL
m3u8_url = (
f"https://chevy.adsfadfds.cfd/proxy/{server_key}/{channel_key}/mono.css"
)
logger.info(
"[daddylive] Constructed m3u8 for channel %d: %s", channel_id, m3u8_url
)
return ExtractedStream(
url=m3u8_url,
site_key=self.site_key,
site_name=self.site_name,
quality="HD",
title=channel_name,
stream_type="m3u8",
)

View file

@ -0,0 +1,111 @@
"""DD12Streams extractor — scrapes inline m3u8 URLs from per-channel pages.
Each DD12 sport page (`/nas`, `/f1`, `/sky`, etc.) renders an iframe to
`/<channel>c1` which 302-redirects to `/new-<channel>/jwplayer`. That
page contains a JW Player setup with the m3u8 URL hard-coded inline:
playerInstance.setup({
file: "https://...b-cdn.net/.../master.m3u8",
...
});
The JW Player runtime fails in our cluster (same fingerprint trap as
hmembeds), but we don't need it — the file URL is in the HTML and any
browser with H.264 codecs can play it directly via hls.js.
Channel discovery: probe a known list. New ones can be added by checking
DD12's own homepage / nav.
"""
import logging
import re
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
BASE = "https://dd12streams.com"
USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/17.4 Safari/605.1.15"
)
# (path, channel_label, title). Add as DD12 surfaces new channels.
CHANNELS = (
("nas", "DD12Streams", "NASCAR Cup Series (24/7) — DD12"),
)
_FILE_URL_RE = re.compile(r"""file\s*:\s*["']([^"']+\.m3u8[^"']*)["']""")
class DD12Extractor(BaseExtractor):
@property
def site_key(self) -> str:
return "dd12"
@property
def site_name(self) -> str:
return "DD12Streams"
async def extract(self) -> list[ExtractedStream]:
results: list[ExtractedStream] = []
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT},
) as client:
for path, label, title in CHANNELS:
try:
page_url = f"{BASE}/{path}"
resp = await client.get(page_url)
if resp.status_code != 200:
continue
iframe_path = self._extract_iframe(resp.text)
if not iframe_path:
continue
iframe_url = (
iframe_path
if iframe_path.startswith("http")
else f"{BASE}{iframe_path}"
)
iframe_resp = await client.get(
iframe_url, headers={"Referer": page_url}
)
if iframe_resp.status_code != 200:
continue
m3u8 = self._find_m3u8(iframe_resp.text)
if not m3u8:
continue
results.append(
ExtractedStream(
url=m3u8,
site_key=self.site_key,
site_name=label,
quality="",
title=title,
stream_type="m3u8",
)
)
except Exception:
logger.debug(
"[dd12] /%s extraction failed", path, exc_info=True
)
logger.info("[dd12] Extracted %d stream(s)", len(results))
return results
@staticmethod
def _extract_iframe(html: str) -> str | None:
m = re.search(
r'<iframe[^>]+id=["\']vplayer["\'][^>]+src=["\']([^"\']+)["\']',
html,
)
return m.group(1) if m else None
@staticmethod
def _find_m3u8(html: str) -> str | None:
m = _FILE_URL_RE.search(html)
return m.group(1) if m else None

View file

@ -0,0 +1,75 @@
"""Demo extractor - returns hardcoded test streams for framework testing.
This extractor exists purely for testing the extraction pipeline end-to-end.
It does NOT connect to any real streaming site. Disable it in production by
removing its registration from __init__.py or setting DEMO_EXTRACTOR_ENABLED=false.
"""
import logging
import os
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
# Set DEMO_EXTRACTOR_ENABLED=false to disable this extractor
DEMO_ENABLED = os.getenv("DEMO_EXTRACTOR_ENABLED", "true").lower() in ("true", "1", "yes")
class DemoExtractor(BaseExtractor):
"""Demo extractor that returns hardcoded test streams.
Use this to verify the extraction framework works end-to-end without
needing a real streaming site. The streams are publicly available HLS
test streams from Apple and others.
"""
@property
def site_key(self) -> str:
return "demo"
@property
def site_name(self) -> str:
return "Demo (Test Streams)"
async def extract(self) -> list[ExtractedStream]:
"""Return hardcoded test streams for framework testing."""
if not DEMO_ENABLED:
logger.info("[demo] Demo extractor is disabled via DEMO_EXTRACTOR_ENABLED")
return []
logger.info("[demo] Returning demo test streams")
streams = [
ExtractedStream(
url="https://test-streams.mux.dev/x36xhzz/x36xhzz.m3u8",
site_key=self.site_key,
site_name=self.site_name,
quality="720p",
title="Big Buck Bunny (Test Stream)",
is_live=False,
),
ExtractedStream(
url="https://devstreaming-cdn.apple.com/videos/streaming/examples/bipbop_16x9/bipbop_16x9_variant.m3u8",
site_key=self.site_key,
site_name=self.site_name,
quality="1080p",
title="Apple Bipbop (Test Stream)",
is_live=False,
),
ExtractedStream(
url="https://demo.unified-streaming.com/k8s/features/stable/video/tears-of-steel/tears-of-steel.ism/.m3u8",
site_key=self.site_key,
site_name=self.site_name,
quality="1080p",
title="Tears of Steel (Test Stream)",
is_live=False,
),
]
# Optionally run health checks on the demo streams
for stream in streams:
stream.is_live = await self.health_check(stream.url)
return streams

View file

@ -0,0 +1,203 @@
"""Discord extractor - monitors Discord channels for F1 stream links.
Reads recent messages from configured Discord channels using a user token,
extracts URLs that look like stream links, and returns them as embed streams.
"""
import logging
import os
import re
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
DISCORD_API = "https://discord.com/api/v9"
DISCORD_TOKEN = os.getenv("DISCORD_TOKEN", "")
# Comma-separated channel IDs to monitor
DISCORD_CHANNELS = os.getenv("DISCORD_CHANNELS", "").split(",")
# How many messages to fetch per channel
MESSAGE_LIMIT = 50
USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
# URL pattern to match stream links (exclude Discord CDN, images, etc.)
URL_PATTERN = re.compile(r"https?://[^\s<>\)\]\"']+", re.IGNORECASE)
# Domains that publish news/articles, not playable streams. Discord users share
# these links during race weekends; they are NOT streams and pollute the list.
EXCLUDED_DOMAINS = {
"discord.com", "discord.gg", "cdn.discordapp.com",
"tenor.com", "giphy.com", "imgur.com",
"youtube.com", "youtu.be", "twitter.com", "x.com",
"reddit.com", "instagram.com", "tiktok.com",
"fmhy.net", "github.com", "freemotorsports.com",
# News / official sites — never playable embeds
"formula1.com", "fia.com", "skysports.com", "motorsport.com",
"driverdb.com", "autosport.com", "the-race.com", "racefans.net",
"wikipedia.org", "fantasy.formula1.com",
}
# A URL is treated as a candidate stream embed only if its path looks like
# a *direct* player/embed page — `/embed/{id}`, `/player/{...}`, `*.m3u8`,
# `*.php` (legacy iframe1.php style). Aggregator landing pages
# (`/event/...`, `/watch?session=...`, etc.) are rejected because they
# show a list of links instead of playing automatically — those produce
# verifier-passing UI without actual playback.
_PATH_KEYWORDS = (
"/embed/", "/player/", ".m3u8", ".php",
)
def _is_stream_url(url: str) -> bool:
"""Heuristic: does this URL look like an actual stream/embed/player link?
Discord users share lots of news links during race weekends. The old
filter only blocked specific domains and let everything else through,
which produced a stream list dominated by formula1.com news articles.
The new filter is positive-match: a URL must contain at least one
stream-shaped path keyword to be included.
"""
from urllib.parse import urlparse
try:
parsed = urlparse(url)
domain = parsed.netloc.lower()
path = parsed.path.lower()
except Exception:
return False
if not domain:
return False
for excluded in EXCLUDED_DOMAINS:
if excluded in domain:
return False
if any(path.endswith(ext) for ext in (".png", ".jpg", ".jpeg", ".gif", ".webp", ".mp4", ".webm", ".svg", ".css", ".js")):
return False
full = path + ("?" + parsed.query if parsed.query else "")
if not any(kw in full for kw in _PATH_KEYWORDS):
return False
return True
class DiscordExtractor(BaseExtractor):
"""Extracts stream links from Discord channel messages.
Monitors configured Discord channels for URLs shared by users,
filters to likely stream links, and returns them as embed streams.
"""
@property
def site_key(self) -> str:
return "discord"
@property
def site_name(self) -> str:
return "Discord Community"
async def extract(self) -> list[ExtractedStream]:
"""Fetch recent messages from Discord channels and extract URLs."""
if not DISCORD_TOKEN:
logger.info("[discord] No DISCORD_TOKEN set, skipping")
return []
channels = [c.strip() for c in DISCORD_CHANNELS if c.strip()]
if not channels:
logger.info("[discord] No DISCORD_CHANNELS configured, skipping")
return []
streams: list[ExtractedStream] = []
seen_urls: set[str] = set()
try:
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={
"Authorization": DISCORD_TOKEN,
"User-Agent": USER_AGENT,
},
) as client:
for channel_id in channels:
try:
channel_streams = await self._fetch_channel(
client, channel_id, seen_urls
)
streams.extend(channel_streams)
except Exception:
logger.debug(
"[discord] Failed to fetch channel %s",
channel_id,
exc_info=True,
)
except Exception:
logger.exception("[discord] Failed to connect to Discord API")
logger.info("[discord] Extracted %d stream(s) from %d channel(s)", len(streams), len(channels))
return streams
async def _fetch_channel(
self,
client: httpx.AsyncClient,
channel_id: str,
seen_urls: set[str],
) -> list[ExtractedStream]:
"""Fetch messages from a single channel and extract stream URLs."""
resp = await client.get(
f"{DISCORD_API}/channels/{channel_id}/messages",
params={"limit": MESSAGE_LIMIT},
)
if resp.status_code != 200:
logger.warning(
"[discord] Channel %s returned HTTP %d", channel_id, resp.status_code
)
return []
messages = resp.json()
if not isinstance(messages, list):
return []
streams: list[ExtractedStream] = []
for msg in messages:
content = msg.get("content", "")
author = msg.get("author", {}).get("username", "unknown")
# Extract URLs from message content
urls = URL_PATTERN.findall(content)
# Also check embeds
for embed in msg.get("embeds", []):
if embed.get("url"):
urls.append(embed["url"])
for url in urls:
# Clean trailing punctuation
url = url.rstrip(".,;:!?)")
if url in seen_urls:
continue
if not _is_stream_url(url):
continue
seen_urls.add(url)
streams.append(
ExtractedStream(
url=url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=f"Shared by {author}",
stream_type="embed",
embed_url=url,
)
)
return streams

View file

@ -0,0 +1,131 @@
"""hmembeds.one decoder + extractor.
Reverse-engineered 2026-05-07 (4-agent parallel session). The hmembeds
embed page contains an inline `<script>` block of the form:
var k = "<16-char ASCII key>";
var b = atob("<URI-encoded XOR-encrypted blob>");
var c = decodeURIComponent(escape(b));
var d = "";
for (var i = 0; i < c.length; i++)
d += String.fromCharCode(c.charCodeAt(i) ^ k.charCodeAt(i % k.length));
(new Function(d))();
The decoded `d` is plain JavaScript that calls `jwplayer('player').setup({
file: <m3u8_url>, ... })`. The `<m3u8_url>` is a JWT-bound URL on
`amsterdam-0183.zulo-0084.online/sec/<JWT>/<embed_id>.m3u8` where the
JWT pins the request to a /24 of the requestor's IP.
So: pure client-side decoding. No fingerprint check, no canvas hash, no
browser-derived input. We can produce the m3u8 URL with curl + Python
faster than launching Chromium.
**Caveat (2026-05-07 reality)**: the hmembeds backend issues JWT URLs
for the curated `888520f3...` (Sky Sports F1 24/7) and `fc3a5463...`
(DAZN F1 24/7) embeds, but the origin (`amsterdam-0183.zulo-0084.online`)
returns 404/403 on the m3u8 fetch from any IP we tested (cluster IPv4
176.12.22.x, dev VM IPv6 2001:470:6f:43d::). Both legacy embed IDs
appear to be offline upstream. This extractor will produce JWT URLs
that the verifier marks unplayable for those specific embeds; if the
upstream broadcasts come back online or fresh IDs are added, the same
extractor logic just works.
"""
import base64
import logging
import re
import urllib.parse
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/17.4 Safari/605.1.15"
)
# Curated hmembeds embed IDs that the community treats as 24/7 channels.
# `_CHANNELS` mirrors the legacy `CuratedExtractor` list — keeping them
# here means the resolver can attempt offline-decoded JWT URLs and the
# verifier filters out the ones that are upstream-offline.
_CHANNELS = (
("888520f36cd94c5da4c71fddc1a5fc9b", "Sky Sports F1 (24/7) — hmembeds"),
("fc3a54634d0867b0c02ee3223292e7c6", "DAZN F1 (24/7) — hmembeds"),
)
_KEY_RE = re.compile(r'k\s*=\s*"([a-z0-9]+)"')
_BLOB_RE = re.compile(r'b\s*=\s*atob\("([^"]+)"\)')
_URL_RE = re.compile(r'streamUrl\s*=\s*"([^"]+)"')
def decode_embed(html: str) -> str | None:
"""Pull the m3u8 URL out of an hmembeds embed HTML.
Returns the JWT-bound m3u8 URL the page would tell JW Player to
play, or None if the page doesn't match the expected shape.
"""
km = _KEY_RE.search(html)
bm = _BLOB_RE.search(html)
if not km or not bm:
return None
key = km.group(1)
blob = bm.group(1)
try:
# b = atob(blob) — base64-decode bytes
# c = decodeURIComponent(escape(b)) — Latin-1 → UTF-8 round-trip
# d[i] = c[i] ^ k[i % len(k)] — XOR with rotating key
raw = base64.b64decode(blob).decode("latin-1")
deuri = urllib.parse.unquote(raw)
decoded = "".join(
chr(ord(c) ^ ord(key[i % len(key)])) for i, c in enumerate(deuri)
)
except Exception:
return None
m = _URL_RE.search(decoded)
return m.group(1) if m else None
class HmembedsExtractor(BaseExtractor):
@property
def site_key(self) -> str:
return "hmembeds"
@property
def site_name(self) -> str:
return "hmembeds.one"
async def extract(self) -> list[ExtractedStream]:
results: list[ExtractedStream] = []
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT, "Referer": "https://hmembeds.one/"},
) as client:
for embed_id, label in _CHANNELS:
try:
page = await client.get(f"https://hmembeds.one/embed/{embed_id}")
except Exception:
logger.debug("[hmembeds] embed %s fetch failed", embed_id, exc_info=True)
continue
if page.status_code != 200:
continue
m3u8 = decode_embed(page.text)
if not m3u8:
continue
results.append(
ExtractedStream(
url=m3u8,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=label,
stream_type="m3u8",
)
)
logger.info("[hmembeds] resolved %d JWT URL(s) (verifier filters dead origins)", len(results))
return results

View file

@ -0,0 +1,39 @@
"""Data models for the stream extraction framework."""
from dataclasses import dataclass, field
from datetime import datetime, timezone
@dataclass
class ExtractedStream:
"""Represents a single stream URL discovered by an extractor."""
url: str # The HLS/m3u8 URL
site_key: str # Which extractor found it
site_name: str # Human-readable name
quality: str = "" # e.g., "720p", "1080p", or empty
title: str = "" # e.g., "F1 Race Live"
extracted_at: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())
is_live: bool = False # Whether it passed health check
response_time_ms: int = 0 # Health check response time (lower = better)
checked_at: str = "" # ISO timestamp of last health check
bitrate: int = 0 # Bitrate in bps if detectable from m3u8 playlist
stream_type: str = "m3u8" # "m3u8" for direct HLS, "embed" for iframe embed URL
embed_url: str = "" # The iframe-embeddable URL (when stream_type is "embed")
def to_dict(self) -> dict:
"""Serialize to a plain dictionary for JSON responses."""
return {
"url": self.url,
"site_key": self.site_key,
"site_name": self.site_name,
"quality": self.quality,
"title": self.title,
"extracted_at": self.extracted_at,
"is_live": self.is_live,
"response_time_ms": self.response_time_ms,
"checked_at": self.checked_at,
"bitrate": self.bitrate,
"stream_type": self.stream_type,
"embed_url": self.embed_url,
}

View file

@ -0,0 +1,595 @@
"""Pitsport.xyz extractor - fetches F1 streams from the Next.js RSC payload.
Architecture:
- Main page (pitsport.xyz) has a "Live Now" section with event cards containing
category, title, time, imageUrl props and /watch/{UUID} links.
- Schedule page (pitsport.xyz/schedule) lists all events grouped by category
(h2 headings) with /watch/{UUID} links and event titles.
- Watch pages (/watch/{UUID}) embed iframes from pushembdz.store/embed/{EMBED_UUID}.
- Embed pages contain an RSC payload with a stream config: {title, link, method}.
- When method is "player" or "hls", the link field points to a serveplay.site
m3u8 playlist. Otherwise we return the embed URL for iframe playback.
"""
import logging
import re
from dataclasses import dataclass
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
PITSPORT_BASE = "https://pitsport.xyz"
EMBED_BASE = "https://pushembdz.store"
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
# Categories to include (case-insensitive match). Broadened beyond F1
# to also surface MotoGP and adjacent motorsports — keeps the f1-stream
# UI useful between race weekends and during the off-season.
MOTORSPORT_CATEGORIES = {
"f1", "formula 1", "formula 2", "formula 3",
"motogp", "moto gp", "moto2", "moto3", "motoe",
"world rally championship", "wrc",
"world endurance championship", "wec",
"indycar series", "indycar", "indynxt",
"nascar cup series", "nascar truck series", "nascar o'reilly auto parts series",
"nascar xfinity series", "nascar",
}
# Title keywords that are strong positives even when the category text
# is missing (live-now cards sometimes elide it).
MOTORSPORT_KEYWORDS = {
"formula 1", "formula one", "f1",
"motogp", "moto gp", "moto2", "moto3",
"rally", "wrc",
"indycar", "indy car",
"nascar",
"le mans", "lemans", "wec", "endurance",
}
GP_KEYWORD = "grand prix"
@dataclass
class _PitsportEvent:
"""An event discovered from the Pitsport site."""
category: str
title: str
watch_uuid: str
def _is_motorsport_category(category: str) -> bool:
"""Check if a category string matches an included motorsport series."""
return category.strip().lower() in MOTORSPORT_CATEGORIES
def _is_motorsport_event(category: str, title: str) -> bool:
"""Accept anything pitsport.xyz lists. Pitsport curates sports
broadcasts (WRC, MotoGP, IndyCar, NASCAR, Premier League Darts,
Premier League football, etc.) the site's own selection is the
filter we want. Empty/garbage events still get filtered downstream
when `_resolve_event_streams` produces no playable URL."""
return bool(category or title)
# Aliases kept so older call-sites stay compiling. Both now point at the
# broadened motorsport filter.
_is_f1_category = _is_motorsport_category
_is_f1_event = _is_motorsport_event
def _decode_rsc_payload(html: str) -> str:
"""Concatenate and unescape all `self.__next_f.push([1, "..."])` chunks.
Next.js RSC ships its tree as escape-encoded strings inside repeated
`self.__next_f.push` calls. Regex over the raw HTML misses everything
interesting; we have to decode unicode escapes first.
"""
chunks = re.findall(r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL)
if not chunks:
return ""
payload = ""
for chunk in chunks:
try:
payload += chunk.encode().decode("unicode_escape")
except Exception:
payload += chunk
return payload
def _parse_live_events(html: str) -> list[_PitsportEvent]:
"""Parse live events from the main page (or `/live-now`) RSC payload.
The pages embed event cards inside the Next.js RSC payload; the raw
HTML keeps it escape-encoded so we decode first, then match.
Two shapes are common:
1) Older card props: "category":"...","title":"..." next to
"href":"/watch/UUID".
2) Newer `event` prop: an `event` object with `uri:"/watch/UUID"`
carrying `category` and `title`.
"""
payload = _decode_rsc_payload(html) or html
events: list[_PitsportEvent] = []
href_pattern = re.compile(
r'"href":"(/watch/([0-9a-f-]{36}))"[^}]*?"category":"([^"]+)","title":"([^"]+)"',
)
for match in href_pattern.finditer(payload):
_, uuid, category, title = match.groups()
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
event_pattern = re.compile(
r'"event":\{[^{}]*?"title":"([^"]+)"[^{}]*?"uri":"/watch/([0-9a-f-]{36})"[^{}]*?"category":"([^"]+)"',
)
for match in event_pattern.finditer(payload):
title, uuid, category = match.groups()
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
event_pattern_alt = re.compile(
r'"event":\{[^{}]*?"category":"([^"]+)"[^{}]*?"title":"([^"]+)"[^{}]*?"uri":"/watch/([0-9a-f-]{36})"',
)
for match in event_pattern_alt.finditer(payload):
category, title, uuid = match.groups()
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
return events
def _parse_schedule_events(html: str) -> list[_PitsportEvent]:
"""Parse events from the schedule page.
The schedule page groups events under category headers (h2 elements).
In the rendered HTML:
<h2 ...>Formula 1</h2>
<div ...>
<a href="/watch/UUID">...</a>
...
</div>
In the RSC payload, similar structure with section divs containing
a category h2 and child event links with titles.
"""
events: list[_PitsportEvent] = []
# Strategy 1: Parse from rendered HTML
# Find category sections: >CategoryName</h2> followed by watch links
# Split HTML at each category header
section_pattern = re.compile(
r'>([^<]+)</h2>\s*<div[^>]*class="flex flex-wrap gap-6">(.*?)(?=</div>\s*</div>\s*(?:<div|</div>|$))',
re.DOTALL,
)
for section_match in section_pattern.finditer(html):
category = section_match.group(1).strip()
section_html = section_match.group(2)
# Find all watch links in this section
link_pattern = re.compile(
r'href="/watch/([0-9a-f-]{36})".*?<h1[^>]*>([^<]+)</h1>',
re.DOTALL,
)
for link_match in link_pattern.finditer(section_html):
uuid = link_match.group(1)
title = link_match.group(2).strip()
events.append(
_PitsportEvent(category=category, title=title, watch_uuid=uuid)
)
# Strategy 2: Parse from RSC payload if rendered HTML didn't yield results
# The RSC payload has patterns like:
# "children":"Formula 1"}] ... "/watch/UUID" ... "title":"EventTitle"
if not events:
events = _parse_schedule_rsc(html)
return events
def _parse_schedule_rsc(html: str) -> list[_PitsportEvent]:
"""Parse events from schedule page RSC payload as fallback.
Extracts category section divs from the RSC JSON structure.
"""
events: list[_PitsportEvent] = []
# Find the RSC payload chunks
rsc_chunks = re.findall(
r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL
)
if not rsc_chunks:
return events
# Concatenate and unescape
full_payload = ""
for chunk in rsc_chunks:
try:
full_payload += chunk.encode().decode("unicode_escape")
except Exception:
full_payload += chunk
# Find category sections in the RSC data
# Pattern: "children":"CategoryName"}],["$","div",...watch links...
# Each section div contains an h2 with the category name and watch links
cat_pattern = re.compile(
r'border-gray-700 pb-2","children":"([^"]+)"\}.*?'
r'(?=border-gray-700 pb-2","children"|$)',
re.DOTALL,
)
for cat_match in cat_pattern.finditer(full_payload):
category = cat_match.group(1)
section_text = cat_match.group(0)
# Find watch UUIDs and titles in this section
# Pattern: "/watch/UUID" ... "title":"EventTitle"
event_pattern = re.compile(
r'/watch/([0-9a-f-]{36}).*?"title":"([^"]+)"',
)
for ev_match in event_pattern.finditer(section_text):
uuid = ev_match.group(1)
title = ev_match.group(2)
events.append(
_PitsportEvent(category=category, title=title, watch_uuid=uuid)
)
return events
def _parse_embed_uuids(html: str) -> list[str]:
"""Extract embed UUIDs from a watch page.
Watch pages contain iframes like:
<iframe src="https://pushembdz.store/embed/{EMBED_UUID}" ...>
And in the RSC payload:
"iframe":"https://pushembdz.store/embed/{EMBED_UUID}"
"""
uuids: list[str] = []
# From rendered HTML
iframe_pattern = re.compile(
r'pushembdz\.store/embed/([0-9a-f-]{36})',
)
for match in iframe_pattern.finditer(html):
uuid = match.group(1)
if uuid not in uuids:
uuids.append(uuid)
return uuids
@dataclass
class _StreamConfig:
"""Stream configuration extracted from an embed page."""
title: str
link: str
method: str
def _parse_stream_config(html: str) -> _StreamConfig | None:
"""Extract stream config from an embed page RSC payload.
The embed page now uses a `safeStream` payload that elides the link:
4:["$","$Ld",null,{"safeStream":{"title":"Rally TV","method":"jwp"},
"error":null,"slug":"..."}]
The actual stream URL is fetched at runtime via
pushembdz.store/api/stream/<slug>. Older payloads used "stream" with
inline title+link+method kept as fallback.
"""
# Current format: safeStream with title + method only (link via API).
pattern_safe = re.compile(
r'\\?"safeStream\\?"\s*:\s*\{'
r'\\?"title\\?"\s*:\s*\\?"([^"\\]+)\\?"\s*,\s*'
r'\\?"method\\?"\s*:\s*\\?"([^"\\]+)\\?"',
)
match = pattern_safe.search(html)
if match:
return _StreamConfig(
title=match.group(1),
link="", # filled in by the caller via the api/stream endpoint
method=match.group(2),
)
# Legacy: escaped RSC payload with inline link.
pattern = re.compile(
r'"stream":\{["\']?\\?"title\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
r'["\']?\\?"link\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
r'["\']?\\?"method\\?"["\']?:["\']?\\?"([^"\\]+)\\?"',
)
match = pattern.search(html)
if match:
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
pattern2 = re.compile(
r'\\?"stream\\?":\{\\?"title\\?":\\?"([^\\]+)\\?",'
r'\\?"link\\?":\\?"([^\\]+)\\?",'
r'\\?"method\\?":\\?"([^\\]+)\\?"',
)
match = pattern2.search(html)
if match:
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
pattern3 = re.compile(
r'"stream"\s*:\s*\{\s*"title"\s*:\s*"([^"]+)"\s*,'
r'\s*"link"\s*:\s*"([^"]+)"\s*,'
r'\s*"method"\s*:\s*"([^"]+)"',
)
match = pattern3.search(html)
if match:
return _StreamConfig(title=match.group(1), link=match.group(2), method=match.group(3))
return None
def _is_m3u8_method(method: str) -> bool:
"""Check if the stream method indicates a direct HLS stream."""
# `jwp` (current pushembdz format) returns an m3u8 from the api/stream
# endpoint regardless of player UI; treat it as HLS.
return method.lower() in ("player", "hls", "jwp")
def _extract_m3u8_url(link: str) -> str:
"""Pass through the link from pushembdz's `api/stream/<slug>` response.
The host has rotated over time (serveplay.site oe1.ossfeed.store
); the response is always a master playlist URL we hand to the
player as-is. Content-Type may be `text/css` or `application/json`
treat as HLS based on body sniffing (`#EXTM3U`), not MIME.
"""
return link
class PitsportExtractor(BaseExtractor):
"""Extracts F1 streams from Pitsport.xyz.
Scrapes the Next.js RSC payload from the main page and schedule page
to find F1 events, then resolves embed UUIDs to stream configurations.
"""
@property
def site_key(self) -> str:
return "pitsport"
@property
def site_name(self) -> str:
return "Pitsport"
async def extract(self) -> list[ExtractedStream]:
"""Fetch F1 events and return stream URLs or embed URLs."""
streams: list[ExtractedStream] = []
try:
async with httpx.AsyncClient(
timeout=20.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT},
) as client:
# Fetch both pages to get comprehensive event data
events = await self._discover_events(client)
logger.info(
"[pitsport] Found %d F1 event(s) to process", len(events)
)
# Deduplicate by watch UUID
seen_uuids: set[str] = set()
unique_events: list[_PitsportEvent] = []
for ev in events:
if ev.watch_uuid not in seen_uuids:
seen_uuids.add(ev.watch_uuid)
unique_events.append(ev)
# For each event, resolve streams
for event in unique_events:
event_streams = await self._resolve_event_streams(
client, event
)
streams.extend(event_streams)
except Exception:
logger.exception("[pitsport] Failed to extract streams")
logger.info("[pitsport] Extracted %d stream(s)", len(streams))
return streams
async def _discover_events(
self, client: httpx.AsyncClient
) -> list[_PitsportEvent]:
"""Discover F1 events from both main page and schedule page."""
all_events: list[_PitsportEvent] = []
# Fetch main page for live events
try:
resp = await client.get(PITSPORT_BASE)
if resp.status_code == 200:
live_events = _parse_live_events(resp.text)
logger.info(
"[pitsport] Main page: %d live event(s)", len(live_events)
)
for ev in live_events:
if _is_f1_event(ev.category, ev.title):
all_events.append(ev)
else:
logger.warning(
"[pitsport] Main page returned HTTP %d", resp.status_code
)
except Exception:
logger.exception("[pitsport] Failed to fetch main page")
# Fetch /live-now — canonical "currently live" list, added 2026.
try:
resp = await client.get(f"{PITSPORT_BASE}/live-now")
if resp.status_code == 200:
live_now_events = _parse_live_events(resp.text)
logger.info(
"[pitsport] Live-now page: %d event(s)", len(live_now_events)
)
for ev in live_now_events:
if _is_f1_event(ev.category, ev.title):
all_events.append(ev)
else:
logger.warning(
"[pitsport] Live-now page returned HTTP %d", resp.status_code
)
except Exception:
logger.exception("[pitsport] Failed to fetch live-now page")
# Fetch schedule page for upcoming events
try:
resp = await client.get(f"{PITSPORT_BASE}/schedule")
if resp.status_code == 200:
schedule_events = _parse_schedule_events(resp.text)
logger.info(
"[pitsport] Schedule page: %d total event(s)",
len(schedule_events),
)
for ev in schedule_events:
if _is_f1_event(ev.category, ev.title):
all_events.append(ev)
else:
logger.warning(
"[pitsport] Schedule page returned HTTP %d",
resp.status_code,
)
except Exception:
logger.exception("[pitsport] Failed to fetch schedule page")
return all_events
async def _resolve_event_streams(
self, client: httpx.AsyncClient, event: _PitsportEvent
) -> list[ExtractedStream]:
"""Resolve an event's watch page to actual stream URLs."""
streams: list[ExtractedStream] = []
try:
# Fetch the watch page to get embed UUIDs
watch_url = f"{PITSPORT_BASE}/watch/{event.watch_uuid}"
resp = await client.get(watch_url)
if resp.status_code != 200:
logger.debug(
"[pitsport] Watch page %s returned HTTP %d",
event.watch_uuid,
resp.status_code,
)
return []
embed_uuids = _parse_embed_uuids(resp.text)
if not embed_uuids:
logger.debug(
"[pitsport] No embed UUIDs found for %s", event.watch_uuid
)
return []
logger.debug(
"[pitsport] Event '%s' has %d embed(s)",
event.title,
len(embed_uuids),
)
# Resolve each embed to a stream config
for i, embed_uuid in enumerate(embed_uuids):
stream = await self._resolve_embed(
client, embed_uuid, event, stream_num=i + 1
)
if stream:
streams.append(stream)
except Exception:
logger.debug(
"[pitsport] Failed to resolve event %s",
event.watch_uuid,
exc_info=True,
)
return streams
async def _resolve_embed(
self,
client: httpx.AsyncClient,
embed_uuid: str,
event: _PitsportEvent,
stream_num: int,
) -> ExtractedStream | None:
"""Resolve an embed UUID to a stream configuration."""
try:
embed_url = f"{EMBED_BASE}/embed/{embed_uuid}"
resp = await client.get(embed_url)
if resp.status_code != 200:
logger.debug(
"[pitsport] Embed page %s returned HTTP %d",
embed_uuid,
resp.status_code,
)
return None
config = _parse_stream_config(resp.text)
if not config:
logger.debug(
"[pitsport] No stream config found in embed %s",
embed_uuid,
)
return None
# Build the stream title
stream_title = f"{event.category} - {event.title}"
if config.title:
stream_title += f" ({config.title})"
if stream_num > 1:
stream_title += f" #{stream_num}"
# `safeStream` payload elides the link — fetch it from the
# pushembdz.store/api/stream/<slug> endpoint. Older `stream`
# payloads provided the link inline.
link = config.link
if not link and _is_m3u8_method(config.method):
api_url = f"{EMBED_BASE}/api/stream/{embed_uuid}"
try:
api_resp = await client.get(
api_url,
headers={"Referer": embed_url, "Accept": "application/json"},
)
if api_resp.status_code == 200:
link = (api_resp.json() or {}).get("link", "")
except Exception:
logger.debug(
"[pitsport] api/stream lookup failed for %s",
embed_uuid,
exc_info=True,
)
# Treat any HLS-ish URL (m3u8, or pushembdz's .css disguise) as m3u8.
looks_hls = link and (".m3u8" in link or link.endswith(".css") or "serveplay.site" in link)
if _is_m3u8_method(config.method) and looks_hls:
return ExtractedStream(
url=link,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=stream_title,
stream_type="m3u8",
)
else:
# Iframe embed fallback
return ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=stream_title,
stream_type="embed",
embed_url=embed_url,
)
except Exception:
logger.debug(
"[pitsport] Failed to resolve embed %s",
embed_uuid,
exc_info=True,
)
return None

View file

@ -0,0 +1,273 @@
"""PPV.to extractor - fetches F1 streams via the public PPV API.
Returns embed URLs (pooembed.eu) for iframe playback.
The API at api.ppv.to/api/streams requires no authentication.
Falls back to api.ppv.st if the primary API is unreachable.
"""
import logging
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
PRIMARY_API = "https://api.ppv.to/api/streams"
FALLBACK_API = "https://api.ppv.st/api/streams"
EMBED_BASE = "https://pooembed.eu/embed"
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
# Category name for motorsport on PPV.to
MOTORSPORT_CATEGORY = "motorsports"
# Only include events matching these keywords (case-insensitive)
F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1"}
# Grand Prix is shared with MotoGP/IndyCar — only match if no other series keywords
GP_KEYWORD = "grand prix"
NON_F1_KEYWORDS = {
"motogp", "moto gp", "moto2", "moto3", "motoe",
"indycar", "indy car", "firestone", "nascar",
"rally", "wrc", "wec", "lemans", "le mans",
"superbike", "dtm", "supercars",
}
def _is_f1_stream(name: str, category_name: str = "") -> bool:
"""Check if a stream is Formula 1 related.
Checks both the stream name and the category name.
A stream qualifies if:
- It is in the motorsport category AND matches F1 keywords, OR
- It matches F1 keywords regardless of category.
"""
lower_name = name.lower()
lower_cat = category_name.lower()
# Reject if it contains non-F1 motorsport keywords
if any(kw in lower_name for kw in NON_F1_KEYWORDS):
return False
# Direct F1 keyword match in the stream name
if any(kw in lower_name for kw in F1_KEYWORDS):
return True
# "grand prix" in the name, only if in motorsports category and no non-F1 keywords
if GP_KEYWORD in lower_name and MOTORSPORT_CATEGORY in lower_cat:
return True
# If the category is motorsport, also check category-level keywords
if MOTORSPORT_CATEGORY in lower_cat and any(kw in lower_cat for kw in F1_KEYWORDS):
return True
return False
class PPVExtractor(BaseExtractor):
"""Extracts embed URLs from PPV.to's public JSON API.
Uses the endpoint:
- GET https://api.ppv.to/api/streams -> all streams grouped by category
- Fallback: https://api.ppv.st/api/streams
Each stream object contains an `iframe` field with the embed URL,
or a `uri_name` from which the embed URL can be constructed.
"""
@property
def site_key(self) -> str:
return "ppv"
@property
def site_name(self) -> str:
return "PPV.to"
async def _fetch_streams(self, client: httpx.AsyncClient) -> dict | None:
"""Try primary and fallback APIs, return parsed JSON or None."""
for api_url in (PRIMARY_API, FALLBACK_API):
try:
resp = await client.get(api_url)
if resp.status_code == 200:
data = resp.json()
logger.info("[ppv] Fetched streams from %s", api_url)
return data
logger.warning(
"[ppv] %s returned HTTP %d", api_url, resp.status_code
)
except Exception:
logger.debug(
"[ppv] Failed to reach %s", api_url, exc_info=True
)
return None
async def extract(self) -> list[ExtractedStream]:
"""Fetch F1 streams and return embed URLs for iframe playback."""
streams: list[ExtractedStream] = []
try:
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
) as client:
data = await self._fetch_streams(client)
if data is None:
logger.warning("[ppv] Could not fetch streams from any API")
return []
# The API returns:
# { "streams": [ { "category": "Name", "id": N, "streams": [...] }, ... ] }
# Flatten into (category_name, stream_obj) tuples.
all_streams = self._normalize_streams(data)
logger.info(
"[ppv] Found %d total stream(s) across all categories",
len(all_streams),
)
for category_name, stream_obj in all_streams:
name = stream_obj.get("name", "") or stream_obj.get("title", "")
if not _is_f1_stream(name, category_name):
continue
# Build the embed URL
embed_url = self._get_embed_url(stream_obj)
if not embed_url:
logger.debug("[ppv] No embed URL for stream: %s", name)
continue
# Extract quality from tag if present
tag = stream_obj.get("tag", "")
quality = tag if tag else ""
# Build descriptive title
title = name
viewers = stream_obj.get("viewers")
if viewers and int(viewers) > 0:
title += f" ({viewers} viewers)"
# Always emit the parent stream — substreams are
# additional language/source variants, not replacements.
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality=quality,
title=title,
stream_type="embed",
embed_url=embed_url,
)
)
substreams = stream_obj.get("substreams")
if isinstance(substreams, list):
for i, sub in enumerate(substreams):
sub_embed = sub.get("iframe", "") or sub.get("embed_url", "")
if not sub_embed:
sub_embed = embed_url
sub_name = (
sub.get("source_tag", "")
or sub.get("name", "")
or sub.get("label", "")
)
sub_quality = sub.get("tag", "") or sub.get("quality", "") or quality
sub_title = f"{name}"
if sub_name:
sub_title += f" - {sub_name}"
else:
sub_title += f" #{i + 2}"
streams.append(
ExtractedStream(
url=sub_embed,
site_key=self.site_key,
site_name=self.site_name,
quality=sub_quality,
title=sub_title,
stream_type="embed",
embed_url=sub_embed,
)
)
except Exception:
logger.exception("[ppv] Failed to extract streams")
logger.info("[ppv] Extracted %d F1 stream(s)", len(streams))
return streams
@staticmethod
def _normalize_streams(data: dict | list) -> list[tuple[str, dict]]:
"""Normalize the API response into a flat list of (category_name, stream_dict) tuples.
The PPV API returns data in this shape:
{
"streams": [
{
"category": "Motorsports",
"id": 35,
"streams": [ { stream objects... } ]
},
...
]
}
Each category group has a "category" string and a nested "streams" list.
"""
result: list[tuple[str, dict]] = []
# Handle the top-level wrapper
if isinstance(data, dict):
categories = data.get("streams", [])
elif isinstance(data, list):
categories = data
else:
return result
for category_group in categories:
if not isinstance(category_group, dict):
continue
category_name = category_group.get("category", "")
# The nested streams within this category
inner_streams = category_group.get("streams", [])
if isinstance(inner_streams, list):
for stream_obj in inner_streams:
if isinstance(stream_obj, dict):
# Attach category_name to each stream for filtering
result.append((category_name, stream_obj))
elif isinstance(category_group, dict) and "name" in category_group:
# Fallback: the item itself is a stream (flat list format)
result.append((category_name, category_group))
return result
@staticmethod
def _get_embed_url(stream: dict) -> str:
"""Extract or construct the embed URL for a stream."""
# Prefer the iframe field directly
iframe = stream.get("iframe", "")
if iframe:
return iframe
# Construct from uri_name
uri_name = stream.get("uri_name", "") or stream.get("uri", "")
if uri_name:
# Strip leading slash if present
uri_name = uri_name.lstrip("/")
return f"{EMBED_BASE}/{uri_name}"
# Last resort: use the stream id
stream_id = stream.get("id")
if stream_id:
return f"{EMBED_BASE}/{stream_id}"
return ""

View file

@ -0,0 +1,116 @@
"""Central registry for stream extractors."""
import asyncio
import logging
from datetime import datetime, timezone
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
class ExtractorRegistry:
"""Central registry for all site extractors.
Manages extractor instances and provides fan-out extraction across
all registered extractors with independent error handling.
"""
def __init__(self) -> None:
self._extractors: dict[str, BaseExtractor] = {}
def register(self, extractor: BaseExtractor) -> None:
"""Register an extractor instance.
Args:
extractor: A BaseExtractor subclass instance.
Raises:
ValueError: If an extractor with the same site_key is already registered.
"""
key = extractor.site_key
if key in self._extractors:
raise ValueError(
f"Extractor with site_key '{key}' is already registered "
f"(existing: {self._extractors[key].site_name}, "
f"new: {extractor.site_name})"
)
self._extractors[key] = extractor
logger.info("Registered extractor: %s (%s)", extractor.site_name, key)
def get(self, site_key: str) -> BaseExtractor | None:
"""Get an extractor by its site_key.
Args:
site_key: The unique identifier of the extractor.
Returns:
The extractor instance, or None if not found.
"""
return self._extractors.get(site_key)
def list_extractors(self) -> list[dict]:
"""List all registered extractors.
Returns:
A list of dicts with site_key and site_name for each extractor.
"""
return [
{"site_key": ext.site_key, "site_name": ext.site_name}
for ext in self._extractors.values()
]
async def extract_all(self) -> list[ExtractedStream]:
"""Fan-out extraction to all registered extractors concurrently.
Each extractor runs independently. If one fails, the others
continue and their results are still collected.
Returns:
Combined list of ExtractedStream from all extractors.
"""
if not self._extractors:
logger.warning("No extractors registered, nothing to extract")
return []
logger.info(
"Running extraction across %d extractor(s): %s",
len(self._extractors),
", ".join(self._extractors.keys()),
)
async def _safe_extract(extractor: BaseExtractor) -> list[ExtractedStream]:
"""Run a single extractor with error isolation."""
try:
streams = await extractor.extract()
# Fill in site_key/site_name if the extractor didn't set them
now = datetime.now(timezone.utc).isoformat()
for stream in streams:
if not stream.site_key:
stream.site_key = extractor.site_key
if not stream.site_name:
stream.site_name = extractor.site_name
if not stream.extracted_at:
stream.extracted_at = now
logger.info(
"[%s] Extracted %d stream(s)", extractor.site_key, len(streams)
)
return streams
except Exception:
logger.exception(
"[%s] Extractor failed during extraction", extractor.site_key
)
return []
# Run all extractors concurrently
tasks = [_safe_extract(ext) for ext in self._extractors.values()]
results = await asyncio.gather(*tasks)
# Flatten results
all_streams: list[ExtractedStream] = []
for stream_list in results:
all_streams.extend(stream_list)
logger.info("Extraction complete: %d total stream(s) found", len(all_streams))
return all_streams

View file

@ -0,0 +1,270 @@
"""Extraction service - manages extraction lifecycle: polling, caching, health checking, serving."""
import logging
from datetime import datetime, timezone
from backend.extractors.models import ExtractedStream
from backend.extractors.registry import ExtractorRegistry
from backend.health import StreamHealthChecker
from backend.playback_verifier import PlaybackVerifier
logger = logging.getLogger(__name__)
class ExtractionService:
"""Manages the extraction lifecycle: polling, caching, health checking, and serving.
Extraction runs on a background schedule (via APScheduler), never on
client request path. After extraction, health checks verify each stream
is live. Results are cached in memory, keyed by site_key.
GET /streams only returns streams that passed health checks, sorted by:
1. is_live (live streams first)
2. response_time_ms (fastest first)
"""
def __init__(self, registry: ExtractorRegistry) -> None:
self._registry = registry
# Cache: site_key -> list of ExtractedStream
self._cache: dict[str, list[ExtractedStream]] = {}
self._last_run: str | None = None
self._last_run_stream_count: int = 0
self._health_checker = StreamHealthChecker()
self._playback_verifier = PlaybackVerifier()
async def shutdown(self) -> None:
"""Release the headless browser instance owned by the verifier."""
await self._playback_verifier.shutdown()
async def run_extraction(self) -> None:
"""Run all extractors, health-check results, and cache them.
This is called by the background scheduler. Each extractor's
results replace its previous cache entry entirely. After extraction,
health checks are run to verify streams are live and measure
response times.
"""
logger.info("Starting extraction run...")
start = datetime.now(timezone.utc)
streams = await self._registry.extract_all()
# Dedupe by canonical URL — pitsport surfaces every WRC stage as a
# separate event but they all point at the same RallyTV master.m3u8
# (and similar for MotoGP weekend sessions). Keep the first
# occurrence so the user sees one entry per actual stream.
deduped: list[ExtractedStream] = []
seen_urls: set[str] = set()
for stream in streams:
key = (stream.embed_url or "").strip() or (stream.url or "").strip()
if not key or key in seen_urls:
continue
seen_urls.add(key)
deduped.append(stream)
if len(deduped) < len(streams):
logger.info(
"Deduped streams: %d -> %d (collapsed %d duplicate URL(s))",
len(streams), len(deduped), len(streams) - len(deduped),
)
streams = deduped
# Run health checks + headless-browser playback verification.
# Both stream types are now verified end-to-end so the user only
# ever sees streams that actually play in a browser.
if streams:
m3u8_streams = [s for s in streams if s.stream_type != "embed"]
embed_streams = [s for s in streams if s.stream_type == "embed"]
# m3u8 streams: cheap structural health check (validates manifest,
# checks first variant playlist), then a headless-browser test
# to confirm hls.js can decode and render frames.
if m3u8_streams:
stream_dicts = [s.to_dict() for s in m3u8_streams]
health_map = await self._health_checker.check_all(stream_dicts)
for stream in m3u8_streams:
health = health_map.get(stream.url)
if health:
stream.response_time_ms = health.response_time_ms
stream.checked_at = health.checked_at
if health.bitrate > 0:
stream.bitrate = health.bitrate
# tentatively mark live; final word comes from the verifier
stream.is_live = health.is_live
# Browser verification: applies to both m3u8 (only those that
# passed structural health) and embed (always — they have no
# other way to verify).
verify_items: list[tuple[str, str]] = []
for stream in m3u8_streams:
if stream.is_live:
verify_items.append((stream.url, "m3u8"))
for stream in embed_streams:
verify_items.append((stream.embed_url or stream.url, "embed"))
verdicts = await self._playback_verifier.verify_many(verify_items)
now_iso = datetime.now(timezone.utc).isoformat()
for stream in m3u8_streams:
if not stream.is_live:
continue # already failed health check
verdict = verdicts.get(stream.url)
if verdict is None:
continue # verifier disabled or unavailable
stream.is_live = verdict.is_playable
stream.checked_at = now_iso
# Curated streams skip the verifier — they are hand-picked
# 24/7 channels whose embed pages aggressively detect headless
# automation. We can't reliably confirm playback server-side,
# but we trust the curator. The user's real browser does NOT
# trigger the same anti-bot heuristics (real plugins, real
# mouse movements, etc.).
CURATED_BYPASS = {"curated"}
for stream in embed_streams:
stream.checked_at = now_iso
if stream.site_key in CURATED_BYPASS:
stream.is_live = True
stream.response_time_ms = 0
continue
key = stream.embed_url or stream.url
verdict = verdicts.get(key)
if verdict is None:
# Verifier unavailable — fall back to "trust extractor".
# This keeps the service usable even without playwright.
stream.is_live = True
stream.response_time_ms = 0
else:
stream.is_live = verdict.is_playable
stream.response_time_ms = verdict.elapsed_ms
# Group streams by site_key and update cache
new_cache: dict[str, list[ExtractedStream]] = {}
for stream in streams:
new_cache.setdefault(stream.site_key, []).append(stream)
# Replace cache for extractors that returned results.
# Clear cache for extractors that returned nothing (site went down, etc.)
for extractor_info in self._registry.list_extractors():
key = extractor_info["site_key"]
if key in new_cache:
self._cache[key] = new_cache[key]
else:
# Extractor returned nothing - clear its cache
self._cache.pop(key, None)
self._last_run = start.isoformat()
self._last_run_stream_count = len(streams)
live_count = sum(
1 for streams_list in self._cache.values()
for s in streams_list if s.is_live
)
elapsed = (datetime.now(timezone.utc) - start).total_seconds()
logger.info(
"Extraction run complete: %d stream(s) from %d extractor(s) in %.1fs (%d live)",
len(streams),
len(new_cache),
elapsed,
live_count,
)
def get_streams(self) -> list[dict]:
"""Return all cached streams as a sorted list of dicts.
Only returns streams that passed health checks (is_live=True).
Sorted by fallback priority:
1. is_live (live streams first) - filters to live only
2. response_time_ms (fastest first)
Returns:
List of serialized ExtractedStream dicts from all extractors,
filtered to live-only and sorted by response time.
"""
all_streams: list[ExtractedStream] = []
for streams in self._cache.values():
all_streams.extend(streams)
# Sort by fallback priority: live first, then fastest response
all_streams.sort(
key=lambda s: (not s.is_live, s.response_time_ms)
)
# Only return live streams to clients
live_streams = [s for s in all_streams if s.is_live]
return [s.to_dict() for s in live_streams]
def get_all_streams_unfiltered(self) -> list[dict]:
"""Return ALL cached streams including unhealthy ones.
Used for debugging and status endpoints. Sorted by fallback priority
but includes streams that failed health checks.
Returns:
List of all serialized ExtractedStream dicts.
"""
all_streams: list[ExtractedStream] = []
for streams in self._cache.values():
all_streams.extend(streams)
# Sort by fallback priority: live first, then fastest response
all_streams.sort(
key=lambda s: (not s.is_live, s.response_time_ms)
)
return [s.to_dict() for s in all_streams]
def get_streams_for_session(self, session_type: str) -> list[dict]:
"""Return cached streams filtered/annotated for a specific session type.
Currently returns all live streams (extractors don't yet differentiate by
session type). This method exists as a hook for future filtering,
e.g., some extractors might only have race streams but not FP streams.
Args:
session_type: The F1 session type (e.g., "race", "qualifying", "fp1").
Returns:
List of serialized ExtractedStream dicts (live only, sorted).
"""
# For now, all streams are potentially relevant to any session.
# Future extractors may tag streams with session types, at which
# point this method will filter accordingly.
streams = self.get_streams()
logger.debug(
"Returning %d stream(s) for session type '%s'",
len(streams),
session_type,
)
return streams
def get_status(self) -> dict:
"""Return extraction service status for the /extractors endpoint."""
extractor_list = self._registry.list_extractors()
extractor_statuses = []
for info in extractor_list:
key = info["site_key"]
cached = self._cache.get(key, [])
live_count = sum(1 for s in cached if s.is_live)
extractor_statuses.append(
{
"site_key": key,
"site_name": info["site_name"],
"cached_streams": len(cached),
"live_streams": live_count,
}
)
total_cached = sum(len(streams) for streams in self._cache.values())
total_live = sum(
1 for streams in self._cache.values()
for s in streams if s.is_live
)
return {
"extractors": extractor_statuses,
"total_cached_streams": total_cached,
"total_live_streams": total_live,
"last_run": self._last_run,
"last_run_stream_count": self._last_run_stream_count,
}

View file

@ -0,0 +1,125 @@
"""Streamed.pk extractor - fetches F1/motorsport streams via public JSON API."""
import logging
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
# Site renamed from streamed.su → streamed.pk in 2026; the .su domain
# stopped resolving the API host (only the marketing page is left).
BASE_URL = "https://streamed.pk"
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
class StreamedExtractor(BaseExtractor):
"""Extracts streams from Streamed.pk's public JSON API.
Uses two endpoints:
- GET /api/matches/motor-sports list of events with sources
- GET /api/stream/{source}/{id} embed URL for a specific source
"""
@property
def site_key(self) -> str:
return "streamed"
@property
def site_name(self) -> str:
return "Streamed"
async def extract(self) -> list[ExtractedStream]:
"""Fetch motorsport events and resolve embed URLs for each source."""
streams: list[ExtractedStream] = []
try:
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
) as client:
# Get motorsport events
resp = await client.get(f"{BASE_URL}/api/matches/motor-sports")
if resp.status_code != 200:
logger.warning(
"[streamed] Events API returned HTTP %d", resp.status_code
)
return []
events = resp.json()
if not isinstance(events, list):
logger.warning("[streamed] Unexpected events response type")
return []
logger.info("[streamed] Found %d motorsport event(s)", len(events))
for event in events:
title = event.get("title", "Unknown Event")
sources = event.get("sources", [])
if not sources:
continue
for source_info in sources:
source_name = source_info.get("source", "")
source_id = source_info.get("id", "")
if not source_name or not source_id:
continue
try:
stream_resp = await client.get(
f"{BASE_URL}/api/stream/{source_name}/{source_id}"
)
if stream_resp.status_code != 200:
continue
stream_data = stream_resp.json()
if not isinstance(stream_data, list):
stream_data = [stream_data]
for item in stream_data:
embed_url = item.get("embedUrl", "")
if not embed_url:
continue
language = item.get("language", "")
hd = item.get("hd", False)
stream_no = item.get("streamNo", 1)
quality = "HD" if hd else "SD"
stream_title = f"{title}"
if language:
stream_title += f" ({language})"
if stream_no > 1:
stream_title += f" #{stream_no}"
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality=quality,
title=stream_title,
stream_type="embed",
embed_url=embed_url,
)
)
except Exception:
logger.debug(
"[streamed] Failed to fetch stream for %s/%s",
source_name,
source_id,
exc_info=True,
)
except Exception:
logger.exception("[streamed] Failed to fetch events")
logger.info("[streamed] Extracted %d stream(s)", len(streams))
return streams

View file

@ -0,0 +1,161 @@
"""Stremio-addon-driven extractor.
Stremio addons expose a public HTTP API: each addon has a manifest at
`<base>/manifest.json` and per-resource endpoints like
`<base>/stream/<type>/<id>.json` returning `{streams:[{url,name,...}]}`.
This extractor calls a curated set of live-TV addons that surface F1
and Sky-Sports-class motorsport channels. We treat each returned URL as
an ExtractedStream and let the playback verifier confirm playability.
We don't need a Stremio client — we just call the documented HTTP API.
Findings from initial research (2026-05-07):
- **TvVoo** (`tvvoo.hayd.uk`) wraps the Vavoo IPTV network, lists
Sky Sports F1 (UK + IT + DE), DAZN F1, Movistar F1, Canal+ F1,
Viaplay F1. The returned m3u8 URLs are IP-bound at the Vavoo CDN
(`*.ngolpdkyoctjcddxshli469r.org/sunshine/...`); they're tokenised
to whichever IP fetched the manifest. Currently their SSL certs have
expired which fails most clients the addon framework is right but
delivery is degraded today.
- **StremVerse** (`stremverse.onrender.com`) returns 11+ streams per
catalog id (`stremevent_591`=F1, `stremevent_866`=MotoGP). Mix of
DRM-walled DASH, JW-Player-broken-chain JWT, and apar151 HuggingFace
proxy URLs. Master playlists parse; variant URLs sometimes return 404
if they're meant to be resolved by the addon's player rather than
directly.
Adding a new addon = one entry in `_ADDONS`. Each addon's resolver only
needs the manifest + stream endpoints; the addon does the heavy lifting.
"""
import asyncio
import logging
from dataclasses import dataclass
from typing import Iterable
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/17.4 Safari/605.1.15"
)
@dataclass(frozen=True)
class _Addon:
name: str
base: str # e.g. "https://tvvoo.hayd.uk"
stream_ids: tuple[tuple[str, str, str], ...]
"""(stream_type, stream_id, label) per F1/motorsport entry."""
# Curated addon list — see module docstring. These IDs are documented in
# the addons' manifests / channel lists. Update when channel names/IDs
# rotate.
_ADDONS: tuple[_Addon, ...] = (
_Addon(
name="TvVoo",
base="https://tvvoo.hayd.uk",
stream_ids=(
("tv", "vavoo_SKY%20SPORTS%20F1|group:uk", "Sky Sports F1 UK (Vavoo)"),
("tv", "vavoo_SKY%20SPORTS%20F1%20HD|group:uk", "Sky Sports F1 HD UK (Vavoo)"),
("tv", "vavoo_SKY%20SPORT%20F1|group:it", "Sky Sport F1 IT (Vavoo)"),
("tv", "vavoo_SKY%20SPORT%20F1%20HD|group:de", "Sky Sport F1 DE (Vavoo)"),
("tv", "vavoo_DAZN%20F1|group:es", "DAZN F1 ES (Vavoo)"),
),
),
_Addon(
name="StremVerse",
base="https://stremverse.onrender.com",
stream_ids=(
("tv", "stremevent_591", "Formula 1 (StremVerse)"),
("tv", "stremevent_866", "MotoGP (StremVerse)"),
),
),
)
class StremioAddonExtractor(BaseExtractor):
"""Pull F1 + Sky-class motorsport URLs from public Stremio addons."""
@property
def site_key(self) -> str:
return "stremio"
@property
def site_name(self) -> str:
return "Stremio Addon"
async def extract(self) -> list[ExtractedStream]:
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT},
# Some addons (TvVoo→Vavoo) hand back URLs whose origin certs
# are expired; honest-default verify=True is preserved here so
# the verifier sees the same TLS errors a browser would.
) as client:
tasks = []
for addon in _ADDONS:
for stype, sid, label in addon.stream_ids:
tasks.append(self._resolve(client, addon, stype, sid, label))
results = await asyncio.gather(*tasks, return_exceptions=True)
streams: list[ExtractedStream] = []
for r in results:
if isinstance(r, Exception):
logger.debug("[stremio] resolve failed: %s", r)
continue
streams.extend(r)
logger.info("[stremio] surfaced %d candidate stream URL(s) across %d addon(s)",
len(streams), len(_ADDONS))
return streams
async def _resolve(
self, client: httpx.AsyncClient, addon: _Addon,
stype: str, sid: str, label: str,
) -> list[ExtractedStream]:
url = f"{addon.base}/stream/{stype}/{sid}.json"
try:
resp = await client.get(url)
except Exception as e:
logger.debug("[stremio] %s fetch failed: %s", url, e)
return []
if resp.status_code != 200:
logger.debug("[stremio] %s -> HTTP %d", url, resp.status_code)
return []
try:
data = resp.json()
except Exception:
return []
out: list[ExtractedStream] = []
for idx, s in enumerate(data.get("streams") or []):
stream_url = (s.get("url") or "").strip()
if not stream_url:
continue
# Skip DRM-tagged entries — they need Widevine which neither
# our verifier nor a clean hls.js path can play.
if "DRM" in (s.get("name") or "").upper():
continue
title = label
if idx > 0:
title = f"{label} #{idx + 1}"
out.append(
ExtractedStream(
url=stream_url,
site_key=self.site_key,
site_name=f"{addon.name}",
quality="",
title=title,
stream_type="m3u8",
)
)
return out

View file

@ -0,0 +1,249 @@
"""Subreddit extractor — pulls community-curated live-stream URLs from
the *MotorsportsReplays* subreddit (and a few siblings).
The community follows a stable pattern: a single mod-curated post titled
`[Watch / Download] <Series> <Year> - <Round> | <Event>` goes up on or
near each race weekend with a `**Watch Online:**` link in the selftext,
pointing at an admin-run WordPress site (motomundo.net for MotoGP, the
F1 equivalent has rotated over the years). That WordPress page hosts
iframe embeds whose m3u8 is JS-computed at load time ideal target for
the chrome-service pipeline downstream.
This extractor:
- Hits Reddit with a real-browser User-Agent (httpx default UA + cluster
IP combo gets HTTP 403'd on r/motogp; a Safari UA does not).
- Searches for the `[Watch` thread pattern AND scans `/new.json` for
any flair set to LIVE.
- Pulls selftext URLs and returns each candidate as an `embed`-type
ExtractedStream. The verifier already drives chrome-service for embed
streams, so the m3u8 capture happens there.
"""
import asyncio
import logging
import re
import urllib.parse
from typing import NamedTuple
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/17.4 Safari/605.1.15"
)
# Subreddits to scan.
# - r/motorsportsstreams2 is the active 12.5k-sub successor to the banned
# r/motorsportstreams; race-weekend "[F1 STREAM]" posts include
# `boxboxbox.pro/stream-1` URLs and similar fresh aggregator links.
# - r/MotorsportsReplays runs the [Watch / Download] mod-post pattern
# linking to motomundo.net (MotoGP) and sister sites.
# - The rest are low-yield but cost nothing.
SUBREDDITS: tuple[str, ...] = (
"motorsportsstreams2",
"MotorsportsReplays",
"f1streams",
"motorsports",
"formula1",
"motogp",
)
# Search queries fired against r/motorsportsstreams2 + r/MotorsportsReplays.
# The first set captures the [Watch / Download] mod posts; the second set
# catches race-weekend live discussion threads.
SEARCH_QUERIES: tuple[str, ...] = (
"Watch Download F1 2026",
"Watch Download MotoGP 2026",
"Watch Online F1 2026",
"F1 STREAM live",
"Sky Sports F1 live",
"Sky F1 stream",
)
# Hosts we accept as "interesting" stream-page URLs. These are the
# admin-curated WordPress / aggregator sites the community links to.
# Anchored to what r/motorsportsstreams2 currently posts (May 2026 sweep).
_INTERESTING_HOSTS = (
# WordPress wrappers / community-run sites
"motomundo.net", # MotoGP — admin-curated WP
"motomundo.top", # MotoMundo embed host
"motomundo.upns.xyz", # MotoMundo embed host (newer)
"freemotorsports.com", # WAC successor curated link list
"boxboxbox.pro", # F1 race-weekend aggregator (community fav)
"boxboxbox.live", # boxboxbox sister
"boxboxbox.lol",
# Aggregators we already have direct extractors for, but Reddit may
# surface event-specific deeplinks (e.g. /watch/<UUID>) we'd miss
# otherwise.
"pitsport.xyz",
"pitsport.live",
"rerace.io",
"dd12streams.com",
"ppv.to",
"streamed.pk",
"acestrlms.pages.dev",
"aceztrims.pages.dev",
# Sport-specific direct CDNs that occasionally appear in posts
"racelive.jp", # Super Formula
"cdn.sfgo.jp", # Super Formula CDN
# Speculative F1 sister sites — pattern likely if motomundo for MotoGP
"f1mundo.net",
"f1.live",
"f1live",
"skystreams",
"raceon",
"watchf1",
)
# URLs we actively never try to scrape (auth-walled, social media,
# direct downloads with no live stream).
_REJECT_HOSTS = (
"discord.gg", "discord.com",
"twitter.com", "x.com",
"youtube.com", "youtu.be",
"instagram.com", "tiktok.com",
"f1tv.formula1.com",
"viktorbarzin.me",
"gofile.io",
"mega.nz", "drive.google.com",
"1fichier.com", "rapidgator", "uploaded.net",
"magnet:",
)
_URL_RE = re.compile(r"https?://[^\s\)\]\>\"']+")
class _Candidate(NamedTuple):
title: str
url: str
subreddit: str
flair: str
def _is_interesting(url: str) -> bool:
low = url.lower()
if any(host in low for host in _REJECT_HOSTS):
return False
return any(host in low for host in _INTERESTING_HOSTS)
def _has_live_marker(post: dict) -> bool:
title = (post.get("title") or "").lower()
flair = (post.get("link_flair_text") or "").lower()
if "[watch" in title or "watch online" in title or "live" in flair:
return True
return False
class SubredditExtractor(BaseExtractor):
"""Scan motorsport subreddits for community-curated live-stream URLs."""
@property
def site_key(self) -> str:
return "subreddit"
@property
def site_name(self) -> str:
return "Subreddit"
async def extract(self) -> list[ExtractedStream]:
# NB: do NOT send `Accept: application/json` — Reddit's anti-bot
# fingerprint flags that header from datacenter IPs and returns
# HTTP 403 with HTML. Default Accept (`*/*`) gets through fine
# and `.json` URLs always return JSON regardless.
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT},
) as client:
tasks = [self._fetch_new(client, sub) for sub in SUBREDDITS]
tasks.extend(self._search(client, q) for q in SEARCH_QUERIES)
results = await asyncio.gather(*tasks, return_exceptions=True)
candidates: list[_Candidate] = []
for r in results:
if isinstance(r, Exception):
logger.debug("[subreddit] fetch failed: %s", r)
continue
candidates.extend(r)
# Dedupe by URL, keep first occurrence.
seen: set[str] = set()
picks: list[_Candidate] = []
for c in candidates:
if c.url in seen:
continue
seen.add(c.url)
picks.append(c)
logger.info(
"[subreddit] scanned %d source(s) — %d unique candidate URL(s)",
len(SUBREDDITS) + len(SEARCH_QUERIES), len(picks),
)
return [
ExtractedStream(
url=c.url,
site_key=self.site_key,
site_name=f"r/{c.subreddit}",
quality="",
title=c.title[:100],
stream_type="embed",
embed_url=c.url,
)
for c in picks
]
async def _fetch_new(self, client: httpx.AsyncClient, sub: str) -> list[_Candidate]:
return await self._collect(
client,
f"https://www.reddit.com/r/{sub}/new.json?limit=25",
sub,
)
async def _search(self, client: httpx.AsyncClient, query: str) -> list[_Candidate]:
q = urllib.parse.quote_plus(query)
return await self._collect(
client,
f"https://www.reddit.com/r/MotorsportsReplays/search.json?q={q}&restrict_sr=on&sort=new&limit=10",
"MotorsportsReplays",
)
async def _collect(
self, client: httpx.AsyncClient, url: str, sub: str
) -> list[_Candidate]:
try:
resp = await client.get(url)
except Exception as e:
logger.debug("[subreddit] fetch %s failed: %s", url, e)
return []
if resp.status_code != 200:
logger.debug("[subreddit] %s -> HTTP %d", url, resp.status_code)
return []
try:
data = resp.json()
except Exception:
return []
out: list[_Candidate] = []
for child in (data.get("data", {}) or {}).get("children", []):
d = child.get("data", {}) or {}
if not _has_live_marker(d):
continue
text = (d.get("selftext") or "")
title = d.get("title") or ""
flair = d.get("link_flair_text") or ""
# First, the linked URL itself (if it's a recognised live site).
top = d.get("url") or ""
if top and _is_interesting(top):
out.append(_Candidate(title, top, sub, flair))
# Then any URL embedded in the selftext that points at a
# community-curated live page.
for u in _URL_RE.findall(text):
if _is_interesting(u):
out.append(_Candidate(title, u, sub, flair))
return out

View file

@ -0,0 +1,190 @@
"""TimStreams extractor - fetches F1 streams from the TimStreams JSON API.
Returns embed URLs from hmembeds.one for iframe playback.
The public API at stra.viaplus.site/main requires no authentication
and returns all events/channels across Events, Replays, and 24/7 categories.
"""
import logging
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
API_URL = "https://stra.viaplus.site/main"
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
# Direct F1 keyword matches (case-insensitive)
F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1", "dazn f1"}
# "Grand prix" is F1-related only if non-F1 motorsport keywords are absent
GP_KEYWORD = "grand prix"
# Exclude these motorsport series when matching on "grand prix"
NON_F1_KEYWORDS = {
"motogp", "moto gp", "moto2", "moto3", "motoe",
"indycar", "indy car", "nascar",
"rally", "wrc", "wec", "lemans", "le mans",
"superbike", "dtm", "supercars",
}
# 24/7 channels that should always be included (embed hashes on hmembeds.one)
ALWAYS_INCLUDE_HASHES = {
"888520f36cd94c5da4c71fddc1a5fc9b", # Sky Sports F1
"fc3a54634d0867b0c02ee3223292e7c6", # DAZN F1
}
def _is_f1_event(name: str) -> bool:
"""Check if an event/channel is Formula 1 related by name.
Returns True when the name contains a direct F1 keyword, or contains
"grand prix" without non-F1 series keywords.
Note: The TimStreams API genre field (genre=2) covers ALL sports channels,
not just motorsport, so we rely solely on name-based matching.
"""
lower = name.lower()
# Direct F1 keyword match
if any(kw in lower for kw in F1_KEYWORDS):
return True
# Grand prix without competing series
if GP_KEYWORD in lower and not any(kw in lower for kw in NON_F1_KEYWORDS):
return True
return False
def _extract_embed_hash(url: str) -> str | None:
"""Extract the hash from an hmembeds.one embed URL.
Expected format: https://hmembeds.one/embed/{hash}
Returns the hash string, or None if the URL is not in the expected format.
"""
if not url:
return None
# Handle both with and without trailing slash
url = url.rstrip("/")
prefix = "https://hmembeds.one/embed/"
alt_prefix = "http://hmembeds.one/embed/"
if url.startswith(prefix):
return url[len(prefix):] or None
if url.startswith(alt_prefix):
return url[len(alt_prefix):] or None
return None
def _is_always_include(url: str) -> bool:
"""Check if a stream URL is one of the always-include 24/7 channels."""
embed_hash = _extract_embed_hash(url)
return embed_hash in ALWAYS_INCLUDE_HASHES if embed_hash else False
class TimStreamsExtractor(BaseExtractor):
"""Extracts embed URLs from TimStreams' public JSON API.
The API at stra.viaplus.site/main returns a JSON array of categories,
each containing events with stream URLs pointing to hmembeds.one embeds.
"""
@property
def site_key(self) -> str:
return "timstreams"
@property
def site_name(self) -> str:
return "TimStreams"
async def extract(self) -> list[ExtractedStream]:
"""Fetch F1 events/channels and return embed URLs for iframe playback."""
streams: list[ExtractedStream] = []
seen_urls: set[str] = set()
try:
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
) as client:
resp = await client.get(API_URL)
if resp.status_code != 200:
logger.warning(
"[timstreams] API returned HTTP %d", resp.status_code
)
return []
data = resp.json()
if not isinstance(data, list):
logger.warning("[timstreams] Unexpected API response type: %s", type(data).__name__)
return []
logger.info("[timstreams] API returned %d categorie(s)", len(data))
for category in data:
category_name = category.get("category", "Unknown")
events = category.get("events", [])
if not isinstance(events, list):
continue
for event in events:
event_name = event.get("name", "Unknown")
event_streams = event.get("streams", [])
if not isinstance(event_streams, list) or not event_streams:
continue
# Check if any stream URL matches an always-include channel
always_include = any(
_is_always_include(s.get("url", ""))
for s in event_streams
)
# Filter: must be F1-related or an always-include channel
if not always_include and not _is_f1_event(event_name):
continue
for stream_info in event_streams:
stream_name = stream_info.get("name", "")
stream_url = stream_info.get("url", "")
if not stream_url:
continue
# Deduplicate by URL
if stream_url in seen_urls:
continue
seen_urls.add(stream_url)
# Build a descriptive title
title = event_name
if stream_name and stream_name.lower() != event_name.lower():
title = f"{event_name} - {stream_name}"
if category_name:
title = f"[{category_name}] {title}"
streams.append(
ExtractedStream(
url=stream_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=title,
stream_type="embed",
embed_url=stream_url,
)
)
except httpx.TimeoutException:
logger.warning("[timstreams] API request timed out")
except Exception:
logger.exception("[timstreams] Failed to fetch from API")
logger.info("[timstreams] Extracted %d stream(s)", len(streams))
return streams

View file

@ -0,0 +1,301 @@
"""Stream health checker - verifies extracted streams are live and responsive.
Performs GET requests against m3u8 URLs to verify they contain valid HLS
playlists (#EXTM3U header), measures response times for quality ranking,
and supports concurrent checking of multiple streams.
"""
import asyncio
import logging
import time
from dataclasses import dataclass, field
from datetime import datetime, timezone
from urllib.parse import urljoin
import httpx
logger = logging.getLogger(__name__)
# How long to wait for a single health check (seconds)
HEALTH_CHECK_TIMEOUT = 10.0
# Maximum bytes to read when verifying m3u8 content
# We only need to see the #EXTM3U header and a few lines
MAX_CONTENT_BYTES = 8192
# User-Agent to send with health check requests
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
@dataclass
class StreamHealth:
"""Result of a single stream health check."""
url: str
is_live: bool
response_time_ms: int # Lower = better quality indicator
checked_at: str = field(
default_factory=lambda: datetime.now(timezone.utc).isoformat()
)
error: str = "" # Error message if not live
bitrate: int = 0 # Bitrate in bps if detectable from playlist
def to_dict(self) -> dict:
"""Serialize to a plain dictionary for JSON responses."""
return {
"url": self.url,
"is_live": self.is_live,
"response_time_ms": self.response_time_ms,
"checked_at": self.checked_at,
"error": self.error,
"bitrate": self.bitrate,
}
def _extract_bitrate(content: str) -> int:
"""Try to extract bitrate from m3u8 playlist content.
Looks for BANDWIDTH= in #EXT-X-STREAM-INF tags. Returns the highest
bitrate found, or 0 if none detected.
"""
max_bitrate = 0
for line in content.splitlines():
if "BANDWIDTH=" in line:
try:
# Parse BANDWIDTH=<number> from the tag
for part in line.split(","):
part = part.strip()
if part.startswith("BANDWIDTH="):
bw = int(part.split("=", 1)[1])
max_bitrate = max(max_bitrate, bw)
except (ValueError, IndexError):
continue
return max_bitrate
class StreamHealthChecker:
"""Background health checker for extracted streams.
Verifies streams are live by performing a partial GET on the m3u8 URL,
checking for valid HLS content (#EXTM3U header), and measuring response
time as a quality indicator.
"""
def __init__(self, timeout: float = HEALTH_CHECK_TIMEOUT) -> None:
self._timeout = timeout
async def check_stream(self, url: str) -> StreamHealth:
"""Check if a stream URL is live by doing a partial GET on the m3u8.
Verification steps:
1. GET the m3u8 URL (not just HEAD - need to verify playlist content)
2. Check if response contains #EXTM3U header
3. Measure response time as a quality indicator
4. Extract bitrate info if available
Args:
url: The m3u8 stream URL to check.
Returns:
StreamHealth with is_live, response_time_ms, checked_at, and
optional bitrate and error information.
"""
start_time = time.monotonic()
checked_at = datetime.now(timezone.utc).isoformat()
try:
async with httpx.AsyncClient(
timeout=self._timeout,
follow_redirects=True,
headers={
"User-Agent": USER_AGENT,
"Accept": "*/*",
},
) as client:
# Use a partial GET with Range header to limit download
# but fall back to reading limited bytes if Range not supported
response = await client.get(
url,
headers={"Range": f"bytes=0-{MAX_CONTENT_BYTES - 1}"},
)
elapsed_ms = int((time.monotonic() - start_time) * 1000)
# Accept 200 (full content) or 206 (partial content)
if response.status_code not in (200, 206):
return StreamHealth(
url=url,
is_live=False,
response_time_ms=elapsed_ms,
checked_at=checked_at,
error=f"HTTP {response.status_code}",
)
content = response.text[:MAX_CONTENT_BYTES]
# Verify it's a valid HLS playlist
if "#EXTM3U" not in content:
return StreamHealth(
url=url,
is_live=False,
response_time_ms=elapsed_ms,
checked_at=checked_at,
error="Response does not contain #EXTM3U header",
)
# Extract bitrate info if available
bitrate = _extract_bitrate(content)
# If this is a master playlist, validate at least one variant
if "#EXT-X-STREAM-INF:" in content:
variant_ok = await self._check_first_variant(
content, url, client
)
if not variant_ok:
return StreamHealth(
url=url,
is_live=False,
response_time_ms=elapsed_ms,
checked_at=checked_at,
bitrate=bitrate,
error="Master playlist OK but variant playlists are unreachable",
)
return StreamHealth(
url=url,
is_live=True,
response_time_ms=elapsed_ms,
checked_at=checked_at,
bitrate=bitrate,
)
except httpx.TimeoutException:
elapsed_ms = int((time.monotonic() - start_time) * 1000)
logger.debug("Health check timed out for %s", url)
return StreamHealth(
url=url,
is_live=False,
response_time_ms=elapsed_ms,
checked_at=checked_at,
error="Timeout",
)
except httpx.HTTPError as e:
elapsed_ms = int((time.monotonic() - start_time) * 1000)
logger.debug("Health check HTTP error for %s: %s", url, e)
return StreamHealth(
url=url,
is_live=False,
response_time_ms=elapsed_ms,
checked_at=checked_at,
error=f"HTTP error: {e}",
)
except Exception as e:
elapsed_ms = int((time.monotonic() - start_time) * 1000)
logger.exception("Unexpected error during health check for %s", url)
return StreamHealth(
url=url,
is_live=False,
response_time_ms=elapsed_ms,
checked_at=checked_at,
error=f"Unexpected error: {e}",
)
async def _check_first_variant(
self, content: str, base_url: str, client: httpx.AsyncClient
) -> bool:
"""Check that at least one variant playlist in a master playlist is reachable.
Extracts the first variant URI from a master playlist and does a HEAD
request to verify it returns 200/206. This catches streams where the
master playlist is valid but all variant playlists are 404.
Args:
content: The master playlist text content.
base_url: The URL of the master playlist (for resolving relative URIs).
client: An existing httpx client to reuse.
Returns:
True if at least one variant is reachable, False otherwise.
"""
lines = content.splitlines()
for i, line in enumerate(lines):
if not line.strip().startswith("#EXT-X-STREAM-INF:"):
continue
# Next non-empty, non-comment line is the variant URI
for j in range(i + 1, len(lines)):
variant_uri = lines[j].strip()
if variant_uri and not variant_uri.startswith("#"):
# Resolve relative URI
if not variant_uri.startswith(("http://", "https://")):
variant_uri = urljoin(base_url, variant_uri)
try:
resp = await client.head(variant_uri)
if resp.status_code in (200, 206):
return True
# HEAD might not be supported, try GET
resp = await client.get(
variant_uri,
headers={"Range": f"bytes=0-{MAX_CONTENT_BYTES - 1}"},
)
if resp.status_code in (200, 206):
return True
logger.debug(
"Variant playlist %s returned HTTP %d",
variant_uri, resp.status_code,
)
except Exception as e:
logger.debug(
"Variant check failed for %s: %s", variant_uri, e
)
# Only check the first variant
return False
# No variants found (shouldn't happen if #EXT-X-STREAM-INF was detected)
return True
async def check_all(
self, streams: list[dict],
) -> dict[str, StreamHealth]:
"""Check all streams concurrently, return health map keyed by URL.
Args:
streams: List of stream dicts (must have a "url" key).
Returns:
Dictionary mapping stream URL to its StreamHealth result.
"""
urls = [s["url"] for s in streams if "url" in s]
if not urls:
return {}
logger.info("Running health checks on %d stream(s)...", len(urls))
# Run all checks concurrently
tasks = [self.check_stream(url) for url in urls]
results = await asyncio.gather(*tasks, return_exceptions=True)
health_map: dict[str, StreamHealth] = {}
for url, result in zip(urls, results):
if isinstance(result, Exception):
logger.error("Health check task failed for %s: %s", url, result)
health_map[url] = StreamHealth(
url=url,
is_live=False,
response_time_ms=0,
error=f"Task error: {result}",
)
else:
health_map[url] = result
live_count = sum(1 for h in health_map.values() if h.is_live)
logger.info(
"Health checks complete: %d/%d streams are live",
live_count,
len(health_map),
)
return health_map

View file

@ -0,0 +1,264 @@
"""m3u8 playlist rewriter - rewrites URIs in HLS playlists to go through the proxy.
Handles both master playlists (containing variant stream references) and
media playlists (containing segment URLs). Resolves relative URIs to
absolute before encoding, and routes .m3u8 references through /proxy
while routing segments (.ts, .m4s, etc.) through /relay.
"""
import base64
import logging
import re
from urllib.parse import urljoin
logger = logging.getLogger(__name__)
def encode_url(url: str) -> str:
"""Base64url-encode a URL for safe transport as a query parameter.
Uses URL-safe base64 encoding with padding stripped to avoid
double-encoding issues when the URL contains special characters.
Args:
url: The raw URL to encode.
Returns:
Base64url-encoded string with padding removed.
"""
return base64.urlsafe_b64encode(url.encode()).decode().rstrip("=")
def decode_url(encoded: str) -> str:
"""Decode a base64url-encoded URL.
Re-adds padding that was stripped during encoding.
Args:
encoded: Base64url-encoded string (padding may be stripped).
Returns:
The original URL string.
Raises:
ValueError: If the encoded string is not valid base64url.
"""
# Add padding back - base64 requires length to be multiple of 4
padding = 4 - len(encoded) % 4
if padding != 4:
encoded += "=" * padding
return base64.urlsafe_b64decode(encoded).decode()
def _resolve_uri(uri: str, base_url: str) -> str:
"""Resolve a potentially relative URI against a base URL.
Args:
uri: The URI from the m3u8 playlist (may be relative or absolute).
base_url: The URL of the playlist itself (used as base for relative URIs).
Returns:
Absolute URL.
"""
if uri.startswith("http://") or uri.startswith("https://"):
return uri
return urljoin(base_url, uri)
def _is_playlist_uri(uri: str) -> bool:
"""Determine if a URI likely points to another playlist (vs a segment).
Playlist URIs end in .m3u8 or .m3u. Everything else is treated as a
segment (TS, fMP4, init segment, etc.).
Args:
uri: The URI to classify.
Returns:
True if the URI appears to be a playlist reference.
"""
# Strip query string for extension check
path = uri.split("?")[0].split("#")[0].lower()
return path.endswith(".m3u8") or path.endswith(".m3u")
def _build_proxy_url(absolute_uri: str, proxy_base: str) -> str:
"""Build a /proxy URL for a playlist reference.
Args:
absolute_uri: The absolute URL of the upstream playlist.
proxy_base: The base URL of our proxy service.
Returns:
Rewritten URL pointing to our /proxy endpoint.
"""
encoded = encode_url(absolute_uri)
return f"{proxy_base}/proxy?url={encoded}"
def _build_relay_url(absolute_uri: str, proxy_base: str) -> str:
"""Build a /relay URL for a segment reference.
Args:
absolute_uri: The absolute URL of the upstream segment.
proxy_base: The base URL of our proxy service.
Returns:
Rewritten URL pointing to our /relay endpoint.
"""
encoded = encode_url(absolute_uri)
return f"{proxy_base}/relay?url={encoded}"
def _rewrite_uri(uri: str, base_url: str, proxy_base: str) -> str:
"""Rewrite a single URI from an m3u8 playlist.
Resolves relative URIs, then routes playlists through /proxy and
segments through /relay.
Args:
uri: The raw URI from the playlist.
base_url: The URL of the playlist containing this URI.
proxy_base: The base URL of our proxy service.
Returns:
Rewritten URI pointing to our proxy.
"""
absolute = _resolve_uri(uri, base_url)
if _is_playlist_uri(uri):
return _build_proxy_url(absolute, proxy_base)
return _build_relay_url(absolute, proxy_base)
def rewrite_playlist(content: str, base_url: str, proxy_base: str) -> str:
"""Rewrite all URIs in an m3u8 playlist to go through the proxy.
Handles both master playlists (with #EXT-X-STREAM-INF variant
references) and media playlists (with segment URIs). Also handles
#EXT-X-MAP:URI= init segment references.
Args:
content: The raw m3u8 playlist text.
base_url: The original URL of this playlist (for resolving relative URIs).
proxy_base: The base URL of our proxy (e.g., "https://f1.viktorbarzin.me").
Returns:
The rewritten m3u8 playlist text with all URIs proxied.
"""
proxy_base = proxy_base.rstrip("/")
lines = content.splitlines()
output_lines: list[str] = []
# Track if the previous line was #EXT-X-STREAM-INF (next line is a variant URI)
next_is_variant = False
for line in lines:
stripped = line.strip()
# Handle #EXT-X-MAP:URI="..." (init segment)
if stripped.startswith("#EXT-X-MAP:"):
output_lines.append(_rewrite_ext_x_map(stripped, base_url, proxy_base))
continue
# Handle #EXT-X-STREAM-INF (marks next line as variant playlist URI)
if stripped.startswith("#EXT-X-STREAM-INF:"):
output_lines.append(line)
next_is_variant = True
continue
# Handle #EXT-X-MEDIA with URI= attribute
if stripped.startswith("#EXT-X-MEDIA:") and "URI=" in stripped:
output_lines.append(_rewrite_ext_x_media(stripped, base_url, proxy_base))
continue
# Handle #EXT-X-I-FRAME-STREAM-INF with URI= attribute
if stripped.startswith("#EXT-X-I-FRAME-STREAM-INF:") and "URI=" in stripped:
output_lines.append(
_rewrite_tag_with_uri(stripped, base_url, proxy_base, is_playlist=True)
)
continue
# If previous line was #EXT-X-STREAM-INF, this line is a variant playlist URI
if next_is_variant and stripped and not stripped.startswith("#"):
absolute = _resolve_uri(stripped, base_url)
output_lines.append(_build_proxy_url(absolute, proxy_base))
next_is_variant = False
continue
# Regular URI line (non-comment, non-empty, not a tag)
if stripped and not stripped.startswith("#"):
# This is a segment URI (TS, fMP4, etc.)
absolute = _resolve_uri(stripped, base_url)
output_lines.append(_build_relay_url(absolute, proxy_base))
continue
# Tags and comments pass through unchanged
output_lines.append(line)
# Reset variant flag if we hit another tag
if stripped.startswith("#") and not stripped.startswith("#EXT-X-STREAM-INF:"):
next_is_variant = False
return "\n".join(output_lines)
def _rewrite_ext_x_map(line: str, base_url: str, proxy_base: str) -> str:
"""Rewrite the URI in an #EXT-X-MAP tag.
#EXT-X-MAP:URI="init.mp4" -> #EXT-X-MAP:URI="<relay_url>"
The init segment goes through /relay since it's binary data.
"""
# Match URI="..." or URI=... (with or without quotes)
match = re.search(r'URI="([^"]+)"', line)
if not match:
match = re.search(r"URI=([^,\s]+)", line)
if not match:
return line
original_uri = match.group(1)
absolute = _resolve_uri(original_uri, base_url)
relay_url = _build_relay_url(absolute, proxy_base)
return line[:match.start(1)] + relay_url + line[match.end(1):]
def _rewrite_ext_x_media(line: str, base_url: str, proxy_base: str) -> str:
"""Rewrite the URI in an #EXT-X-MEDIA tag.
#EXT-X-MEDIA:TYPE=AUDIO,...,URI="audio.m3u8" -> rewrite URI to /proxy
"""
return _rewrite_tag_with_uri(line, base_url, proxy_base, is_playlist=True)
def _rewrite_tag_with_uri(
line: str, base_url: str, proxy_base: str, is_playlist: bool = False,
) -> str:
"""Rewrite the URI attribute within an HLS tag line.
Generic handler for any tag that contains a URI="..." attribute.
Args:
line: The full tag line.
base_url: Base URL for resolving relative URIs.
proxy_base: Our proxy base URL.
is_playlist: If True, route through /proxy; otherwise /relay.
Returns:
The tag line with the URI rewritten.
"""
match = re.search(r'URI="([^"]+)"', line)
if not match:
match = re.search(r"URI=([^,\s]+)", line)
if not match:
return line
original_uri = match.group(1)
absolute = _resolve_uri(original_uri, base_url)
if is_playlist:
new_url = _build_proxy_url(absolute, proxy_base)
else:
new_url = _build_relay_url(absolute, proxy_base)
return line[:match.start(1)] + new_url + line[match.end(1):]

View file

@ -0,0 +1,488 @@
"""F1 Streams - FastAPI backend with schedule, stream extraction, health checking, HLS proxy, and token refresh."""
import logging
import os
from contextlib import asynccontextmanager
from datetime import datetime, timedelta, timezone
from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger
from apscheduler.triggers.interval import IntervalTrigger
from fastapi import FastAPI, Query, Request
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel
from starlette.responses import Response, StreamingResponse
from backend.embed_proxy import fetch_embed, relay_asset
from backend.extractors import create_extraction_service
from backend.proxy import proxy_playlist, relay_stream
from backend.schedule import ScheduleService
from backend.token_refresh import TokenRefreshManager
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
)
logger = logging.getLogger(__name__)
schedule_service = ScheduleService()
extraction_service = create_extraction_service()
token_refresh_manager = TokenRefreshManager(extraction_service)
scheduler = AsyncIOScheduler()
# --- Pydantic models for request bodies ---
class ActivateStreamRequest(BaseModel):
"""Request body for POST /streams/activate."""
url: str
site_key: str = ""
class DeactivateStreamRequest(BaseModel):
"""Request body for POST /streams/deactivate."""
url: str
# --- Scheduled callbacks ---
async def _scheduled_refresh() -> None:
"""Callback for APScheduler daily schedule refresh."""
logger.info("Running scheduled schedule refresh...")
await schedule_service.refresh()
async def _scheduled_extraction() -> None:
"""Callback for APScheduler stream extraction.
Adjusts its own interval based on whether a session is currently live:
- During a live session: reschedule to every 5 minutes
- Otherwise: reschedule to every 30 minutes
"""
logger.info("Running scheduled extraction...")
await extraction_service.run_extraction()
# Check if any session is currently live and adjust polling interval
schedule_data = schedule_service.get_schedule()
is_live = False
for race in schedule_data.get("races", []):
for session in race.get("sessions", []):
if session.get("status") == "live":
is_live = True
break
if is_live:
break
# Update the extraction job interval based on live status
job = scheduler.get_job("stream_extraction")
if job:
current_interval = getattr(job.trigger, "interval_length", None)
desired_interval = 300 if is_live else 1800 # 5 min or 30 min
if current_interval != desired_interval:
interval_minutes = 5 if is_live else 30
scheduler.reschedule_job(
"stream_extraction",
trigger=IntervalTrigger(minutes=interval_minutes),
)
logger.info(
"Extraction interval adjusted to %d minutes (live=%s)",
interval_minutes,
is_live,
)
async def _scheduled_token_refresh() -> None:
"""Callback for APScheduler token refresh.
Only performs work when there are active streams. Re-runs extractors
to get fresh CDN tokens for streams being actively watched.
"""
if not token_refresh_manager.has_active_streams:
return
logger.info("Running scheduled token refresh...")
try:
await token_refresh_manager.refresh_active_streams()
except Exception:
logger.exception("Token refresh failed (non-fatal)")
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Startup and shutdown lifecycle handler."""
# Startup: load schedule and start background scheduler
await schedule_service.initialize()
# Schedule daily schedule refresh
scheduler.add_job(
_scheduled_refresh,
trigger=CronTrigger(hour=3, minute=0, timezone="UTC"),
id="daily_schedule_refresh",
name="Refresh F1 schedule daily at 03:00 UTC",
replace_existing=True,
)
# Schedule periodic stream extraction (default: every 30 minutes).
# next_run_time fires the first run 8s after startup. We don't run
# extraction inline here because it calls the playback verifier,
# which hits http://127.0.0.1:8000/embed for embed streams — uvicorn
# isn't listening yet inside the lifespan startup phase.
scheduler.add_job(
_scheduled_extraction,
trigger=IntervalTrigger(minutes=30),
id="stream_extraction",
name="Extract streams from all registered sites",
replace_existing=True,
next_run_time=datetime.now(timezone.utc) + timedelta(seconds=8),
)
# Schedule token refresh every 4 minutes (safe margin for 5-min CDN tokens).
# The callback is a no-op when there are no active streams.
scheduler.add_job(
_scheduled_token_refresh,
trigger=IntervalTrigger(minutes=4),
id="token_refresh",
name="Refresh CDN tokens for active streams",
replace_existing=True,
)
scheduler.start()
logger.info(
"APScheduler started - schedule refresh at 03:00 UTC, extraction every 30m, token refresh every 4m"
)
yield
# Shutdown
scheduler.shutdown(wait=False)
logger.info("APScheduler shut down")
try:
await extraction_service.shutdown()
except Exception:
logger.exception("extraction_service shutdown failed")
app = FastAPI(title="F1 Streams", lifespan=lifespan)
# --- CORS Middleware ---
# Required for browser-based HLS players to access proxy/relay endpoints
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["GET", "POST", "OPTIONS"],
allow_headers=["Range", "Content-Type"],
expose_headers=["Content-Range", "Content-Length", "Content-Type"],
)
# --- Health & Info ---
@app.get("/health")
async def health():
return {"status": "ok"}
# --- Schedule ---
@app.get("/schedule")
async def get_schedule():
"""Return the F1 race schedule for the current season with session statuses."""
return schedule_service.get_schedule()
@app.post("/schedule/refresh")
async def refresh_schedule():
"""Manually trigger a schedule refresh from the jolpica API."""
await schedule_service.refresh()
return {"status": "refreshed"}
# --- Streams & Extraction ---
@app.get("/streams")
async def get_streams():
"""Return all currently cached streams that passed health checks.
Streams are sorted by fallback priority:
1. Live streams only (is_live=True)
2. Fastest response time first (lowest response_time_ms)
"""
streams = extraction_service.get_streams()
return {
"streams": streams,
"count": len(streams),
}
@app.get("/streams/all")
async def get_all_streams():
"""Return ALL cached streams including unhealthy ones (for debugging).
Unlike GET /streams, this endpoint includes streams that failed health
checks. Useful for diagnosing extraction or health check issues.
"""
streams = extraction_service.get_all_streams_unfiltered()
return {
"streams": streams,
"count": len(streams),
}
@app.post("/streams/activate")
async def activate_stream(body: ActivateStreamRequest):
"""Mark a stream as actively being watched.
When a stream is active, the token refresh manager will periodically
re-run the extractor that found it to get fresh CDN tokens before
they expire.
If site_key is not provided, attempts to look it up from the cached
streams.
Body:
{"url": "https://...", "site_key": "optional-site-key"}
"""
url = body.url
site_key = body.site_key
# If site_key not provided, try to look it up from cached streams
if not site_key:
for streams in extraction_service._cache.values():
for stream in streams:
if stream.url == url:
site_key = stream.site_key
break
if site_key:
break
if not site_key:
return {
"status": "error",
"detail": "Could not determine site_key for this URL. Provide it explicitly.",
}
token_refresh_manager.mark_stream_active(url, site_key)
return {
"status": "activated",
"url": url,
"site_key": site_key,
"active_count": len(token_refresh_manager.get_active_streams()),
}
@app.post("/streams/deactivate")
async def deactivate_stream(body: DeactivateStreamRequest):
"""Mark a stream as no longer being watched.
Stops the token refresh manager from refreshing CDN tokens for this stream.
Body:
{"url": "https://..."}
"""
token_refresh_manager.mark_stream_inactive(body.url)
return {
"status": "deactivated",
"url": body.url,
"active_count": len(token_refresh_manager.get_active_streams()),
}
@app.get("/streams/active")
async def get_active_streams():
"""List currently active streams with their refresh status.
Returns all streams that are being actively watched, including
their current (potentially refreshed) URLs and refresh counts.
"""
active = token_refresh_manager.get_active_streams()
return {
"streams": active,
"count": len(active),
}
@app.get("/extractors")
async def get_extractors():
"""List registered extractors and their current status."""
return extraction_service.get_status()
@app.post("/extract")
async def trigger_extraction():
"""Manually trigger an extraction run across all registered extractors."""
await extraction_service.run_extraction()
status = extraction_service.get_status()
return {
"status": "extraction_complete",
"streams_found": status["total_cached_streams"],
"live_streams": status["total_live_streams"],
"extractors_run": len(status["extractors"]),
}
# --- HLS Proxy ---
def _get_proxy_base(request: Request) -> str:
"""Derive the proxy base URL from the incoming request.
Uses X-Forwarded-Proto and X-Forwarded-Host headers if present
(behind a reverse proxy), otherwise falls back to request URL.
"""
proto = request.headers.get("x-forwarded-proto", request.url.scheme)
host = request.headers.get("x-forwarded-host", request.url.netloc)
return f"{proto}://{host}"
@app.get("/proxy")
async def proxy_endpoint(
request: Request,
url: str = Query(..., description="Base64url-encoded m3u8 playlist URL"),
quality: int | None = Query(
None,
description="0-based quality variant index (0=highest bandwidth). "
"Only applies to master playlists.",
),
):
"""Proxy an upstream m3u8 playlist with URI rewriting.
Fetches the upstream m3u8 playlist, rewrites all URIs to route through
our /proxy (for sub-playlists) and /relay (for segments) endpoints,
and returns the rewritten playlist.
The `url` parameter must be base64url-encoded to avoid URL encoding issues.
If `quality` is specified and the upstream is a master playlist (with
multiple quality variants), the proxy will fetch the selected variant's
media playlist directly instead of returning the master playlist.
Quality index 0 = highest bandwidth, 1 = second highest, etc.
Examples:
GET /proxy?url=aHR0cHM6Ly9leGFtcGxlLmNvbS9zdHJlYW0ubTN1OA
GET /proxy?url=aHR0cHM6Ly9leGFtcGxlLmNvbS9zdHJlYW0ubTN1OA&quality=0
"""
# Check if we have a fresher URL from token refresh
fresh_url = token_refresh_manager.get_fresh_url(url)
if fresh_url != url:
logger.info("Using refreshed URL from token manager")
proxy_base = _get_proxy_base(request)
rewritten = await proxy_playlist(fresh_url, proxy_base, quality=quality)
return Response(
content=rewritten,
media_type="application/vnd.apple.mpegurl",
headers={
"Cache-Control": "no-cache, no-store, must-revalidate",
},
)
@app.get("/relay")
async def relay_endpoint(
request: Request,
url: str = Query(..., description="Base64url-encoded segment URL"),
):
"""Relay an upstream media segment as a chunked byte stream.
Fetches the upstream segment (TS, fMP4, init segment, etc.) and streams
it to the client using chunked transfer encoding. Never buffers the
full segment in memory.
The `url` parameter must be base64url-encoded to avoid URL encoding issues.
Supports HTTP Range requests for seeking.
Example:
GET /relay?url=aHR0cHM6Ly9leGFtcGxlLmNvbS9zZWdtZW50LnRz
"""
range_header = request.headers.get("range")
stream_gen, headers, status_code = await relay_stream(url, range_header)
return StreamingResponse(
stream_gen,
status_code=status_code,
headers=headers,
)
# --- Embed iframe-stripping proxy ---
@app.get("/embed")
async def embed_proxy(url: str = Query(..., description="Base64url-encoded embed URL")):
"""Proxy a third-party embed page so it can be iframed in our origin.
Strips X-Frame-Options and CSP frame-ancestors from the upstream
response, injects a base href + frame-buster-defeat script, and
forwards a plausible Referer/Origin to bypass upstream allowlists.
"""
body, headers, status_code = await fetch_embed(url)
return Response(content=body, headers=headers, status_code=status_code)
@app.get("/embed-asset")
async def embed_asset(
request: Request,
url: str = Query(..., description="Base64url-encoded subresource URL"),
):
"""Relay an upstream subresource (JS/CSS/image/etc.) for the embed proxy.
Used as a fallback when an upstream blocks hotlinked assets via Origin
or Referer checks. Most assets load directly via the injected <base>
tag without going through this endpoint.
"""
range_header = request.headers.get("range")
stream_gen, headers, status_code = await relay_asset(url, range_header)
return StreamingResponse(stream_gen, headers=headers, status_code=status_code)
# --- Frontend Static Files ---
# Mount the SvelteKit static build AFTER all API routes so API endpoints take priority.
# SvelteKit adapter-static with ssr=false produces {page}.html files and a fallback index.html.
# Starlette StaticFiles(html=True) only checks {path}/index.html, not {path}.html.
# We use a catch-all route to handle both patterns and the SPA fallback.
_frontend_dir = os.path.realpath(os.path.join(os.path.dirname(__file__), "..", "frontend", "build"))
if os.path.exists(_frontend_dir):
from starlette.responses import FileResponse, HTMLResponse
_fallback_path = os.path.join(_frontend_dir, "index.html")
@app.get("/{path:path}")
async def serve_frontend(path: str):
"""Serve SvelteKit frontend files with SPA fallback."""
for candidate in [
os.path.join(_frontend_dir, path),
os.path.join(_frontend_dir, f"{path}.html"),
os.path.join(_frontend_dir, path, "index.html"),
]:
real = os.path.realpath(candidate)
if real.startswith(_frontend_dir) and os.path.isfile(real):
return FileResponse(real)
# SPA fallback for client-side routing
if os.path.isfile(_fallback_path):
return FileResponse(_fallback_path)
return Response(content="Not Found", status_code=404)
logger.info("Serving frontend from %s", _frontend_dir)
else:
# Fallback root when no frontend build exists
@app.get("/")
async def root():
return {"service": "f1-streams", "version": "5.0.0"}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

View file

@ -0,0 +1,478 @@
"""Headless-browser playback verification for extracted streams.
The basic health checker (backend/health.py) only validates m3u8 syntax.
For embed/iframe streams it has nothing to check the previous code blindly
marked every embed `is_live=True`, which meant the stream list was full of
news articles and aggregator landing pages that never actually played.
This module loads each candidate stream URL in headless Chromium (via
Playwright) and looks for *codec-independent* signals that the upstream
serves a playable stream:
- For m3u8: hls.js receives MANIFEST_PARSED + at least one FRAG_LOADED
event. We don't wait for `<video>` to gain dimensions, because Playwright's
chromium build doesn't include the H.264/AAC codecs. The user's real
browser does, so confirming "manifest + segment fetch succeed" is the
right server-side signal.
- For embed: a `<video>` element appears at top level OR inside the iframe
(the embed proxy strips X-Frame-Options + frame-buster JS so we can
introspect the iframe content), OR the player has set up a MediaSource.
Designed to be called from the extraction service's run_extraction()
hook, with bounded concurrency. Each verification typically takes
4-12 seconds.
"""
import asyncio
import base64
import logging
import os
import time
from dataclasses import dataclass
logger = logging.getLogger(__name__)
# Toggle off in development by setting PLAYBACK_VERIFY_ENABLED=false.
VERIFY_ENABLED = os.getenv("PLAYBACK_VERIFY_ENABLED", "true").lower() in ("true", "1", "yes")
# Maximum number of concurrent browser pages.
MAX_CONCURRENCY = int(os.getenv("PLAYBACK_VERIFY_CONCURRENCY", "2"))
# Per-stream verification budget (seconds). Beyond this we declare unplayable.
PER_STREAM_TIMEOUT = float(os.getenv("PLAYBACK_VERIFY_TIMEOUT", "20"))
# Where the embed proxy lives, used to wrap embed URLs so they bypass
# X-Frame-Options/CSP/JS frame-busters during verification. Defaults to
# loopback because verification runs inside the same FastAPI process.
PROXY_BASE = os.getenv("PLAYBACK_VERIFY_PROXY_BASE", "http://127.0.0.1:8000")
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
@dataclass
class PlaybackVerdict:
is_playable: bool
signal: str = "" # which check triggered the positive verdict
elapsed_ms: int = 0
error: str = ""
def _b64url(s: str) -> str:
"""URL-safe base64 with padding stripped — matches m3u8_rewriter.encode_url."""
return base64.urlsafe_b64encode(s.encode()).decode().rstrip("=")
def _hls_test_html(m3u8_url: str) -> str:
"""A self-contained HTML page that loads an m3u8 via hls.js into a <video>.
The page exposes window._verifier with manifest_parsed / frag_loaded
booleans the verifier polls. It also marks media-error or fatal-error
so we can distinguish 'upstream is unreachable' from 'codec missing'.
"""
return f"""<!doctype html>
<html><head><meta charset="utf-8"><title>verify</title>
<script src="https://cdn.jsdelivr.net/npm/hls.js@1.5/dist/hls.min.js"></script>
</head><body>
<video id="v" muted playsinline width="640" height="360"></video>
<script>
window._verifier = {{
manifest_parsed: false,
frag_loaded: false,
media_loaded: false, // true when MSE has appended any buffer
fatal_network_error: false, // upstream truly unreachable
manifest_incompatible: false, // codec missing separate from network reachability
hls_error_details: ""
}};
const v = document.getElementById('v');
const url = {m3u8_url!r};
function start() {{
if (window.Hls && Hls.isSupported()) {{
const hls = new Hls({{enableWorker: true}});
hls.on(Hls.Events.MANIFEST_PARSED, () => {{ window._verifier.manifest_parsed = true; }});
hls.on(Hls.Events.FRAG_LOADED, () => {{ window._verifier.frag_loaded = true; }});
hls.on(Hls.Events.BUFFER_APPENDED, () => {{ window._verifier.media_loaded = true; }});
hls.on(Hls.Events.ERROR, (_, d) => {{
window._verifier.hls_error_details = d.details || "";
if (d.fatal && d.type === Hls.ErrorTypes.NETWORK_ERROR) {{
window._verifier.fatal_network_error = true;
}}
if (d.details === Hls.ErrorDetails.MANIFEST_INCOMPATIBLE_CODECS_ERROR) {{
window._verifier.manifest_incompatible = true;
}}
}});
hls.loadSource(url);
hls.attachMedia(v);
}} else if (v.canPlayType('application/vnd.apple.mpegurl')) {{
v.src = url;
v.addEventListener('loadedmetadata', () => {{ window._verifier.manifest_parsed = true; window._verifier.frag_loaded = true; }});
v.addEventListener('error', () => {{ window._verifier.fatal_network_error = true; }});
}} else {{
window._verifier.hls_error_details = "no hls support";
}}
}}
window.addEventListener('load', start);
</script></body></html>"""
def _embed_test_html(_proxied_embed_url: str) -> str:
"""No longer used — verifier navigates the page directly to the proxy URL.
The earlier iframe-wrapper approach hit same-origin policy when inspecting
the iframe's contentDocument (the wrapper page was a data: URL, the iframe
was http://127.0.0.1:8000), so we couldn't read the embed's DOM.
"""
return ""
_M3U8_POLL_JS = """
() => {
const v = window._verifier || {};
const vid = document.querySelector('video');
return {
manifest_parsed: !!v.manifest_parsed,
frag_loaded: !!v.frag_loaded,
media_loaded: !!v.media_loaded,
fatal_network_error: !!v.fatal_network_error,
manifest_incompatible: !!v.manifest_incompatible,
hls_error_details: v.hls_error_details || "",
video_width: vid ? vid.videoWidth : 0,
video_ready: vid ? vid.readyState : 0,
};
}
"""
_EMBED_POLL_JS = """
() => {
try {
const vids = document.querySelectorAll('video');
if (vids.length > 0) {
const v = vids[0];
return {
has_video: true,
src: v.currentSrc || v.src || "",
width: v.videoWidth,
ready: v.readyState,
duration: isFinite(v.duration) ? v.duration : 0,
media_keys: !!v.mediaKeys,
sources: v.querySelectorAll('source').length,
};
}
return {has_video: false};
} catch (e) {
return {has_video: false, err: String(e)};
}
}
"""
async def _verify_m3u8(page, m3u8_url: str, deadline: float) -> PlaybackVerdict:
"""Confirm an m3u8 URL is fetchable via hls.js end-to-end.
Positive signal hierarchy:
1. media_loaded (MSE buffer appended) strongest, codec-supported.
2. frag_loaded (hls.js fetched at least one segment) upstream is OK
even if the local browser lacks codecs.
3. manifest_parsed without media_loaded but with manifest_incompatible
indicates upstream playlist is valid; player can't decode here
but a real user's browser will.
Negative signal:
- fatal_network_error: upstream is unreachable.
- timeout with no manifest_parsed: upstream did not respond.
"""
start = time.monotonic()
html = _hls_test_html(m3u8_url)
data_url = "data:text/html;base64," + base64.b64encode(html.encode()).decode()
try:
await page.goto(data_url, wait_until="domcontentloaded", timeout=10_000)
except Exception as e:
return PlaybackVerdict(
is_playable=False, error=f"goto failed: {e}",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
last_state: dict = {}
while time.monotonic() < deadline:
try:
state = await page.evaluate(_M3U8_POLL_JS)
except Exception as e:
return PlaybackVerdict(
is_playable=False, error=f"evaluate failed: {e}",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
last_state = state
if state.get("media_loaded"):
return PlaybackVerdict(
is_playable=True, signal="media_loaded",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
if state.get("frag_loaded"):
return PlaybackVerdict(
is_playable=True, signal="frag_loaded",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
# MANIFEST_INCOMPATIBLE_CODECS_ERROR fires after hls.js successfully
# fetched and parsed the manifest — the failure is purely local
# (chromium lacks H.264). The user's real browser has codecs, so
# this URL is playable from the user's perspective.
if state.get("manifest_incompatible"):
return PlaybackVerdict(
is_playable=True, signal="manifest_parsed_codec_missing_in_verifier",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
if state.get("manifest_parsed"):
return PlaybackVerdict(
is_playable=True, signal="manifest_parsed",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
if state.get("fatal_network_error"):
return PlaybackVerdict(
is_playable=False, error="upstream network error",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
await asyncio.sleep(0.25)
err = "no playback signal"
if last_state.get("hls_error_details"):
err = f"hls.js error: {last_state['hls_error_details']}"
return PlaybackVerdict(
is_playable=False, error=err,
elapsed_ms=int((time.monotonic() - start) * 1000),
)
async def _verify_embed(page, proxied_url: str, deadline: float) -> PlaybackVerdict:
"""Navigate directly to the proxied embed and confirm a player rendered.
Positive signals (in priority order):
- <video> with src/sources/mediaKeys set (player wired up).
- <video> element exists with any state (script ran, player attaching).
- A player container div (jwplayer, video-js, [id*=player], etc.).
Loading the embed page directly (not via iframe wrapper) avoids the
same-origin policy that prevented earlier iframe-introspection runs
from seeing the embed DOM.
"""
start = time.monotonic()
try:
await page.goto(proxied_url, wait_until="domcontentloaded", timeout=15_000)
except Exception as e:
return PlaybackVerdict(
is_playable=False, error=f"goto failed: {e}",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
# Track the best state seen across all polls. Some embeds load a player
# briefly then anti-bot JS tears the DOM down (hmembeds redirects to
# google.com if its devtool-detection trips). We accept any positive
# signal observed during the window, even if it's gone by timeout.
#
# We require an actual <video> element — a "player container div"
# is too weak (sportsurge has player-class divs but no real player).
seen_video_wired = False
seen_video_tag = False
last_err = ""
while time.monotonic() < deadline:
try:
r = await page.evaluate(_EMBED_POLL_JS)
except Exception as e:
return PlaybackVerdict(
is_playable=False, error=f"evaluate failed: {e}",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
if r.get("has_video"):
seen_video_tag = True
if r.get("src") or r.get("width", 0) > 0 or r.get("media_keys") or r.get("sources", 0) > 0:
seen_video_wired = True
return PlaybackVerdict(
is_playable=True, signal="video.wired",
elapsed_ms=int((time.monotonic() - start) * 1000),
)
last_err = r.get("err", "")
await asyncio.sleep(0.5)
if seen_video_wired:
return PlaybackVerdict(is_playable=True, signal="video.wired",
elapsed_ms=int((time.monotonic() - start) * 1000))
if seen_video_tag:
return PlaybackVerdict(is_playable=True, signal="video.tag_only",
elapsed_ms=int((time.monotonic() - start) * 1000))
err = "no <video> element rendered"
if last_err:
err += f"; last_err: {last_err}"
return PlaybackVerdict(is_playable=False, error=err,
elapsed_ms=int((time.monotonic() - start) * 1000))
class PlaybackVerifier:
"""Verifies playability of m3u8 and embed URLs via headless Chromium.
Manages a single browser instance for the process lifetime (cheap per-page
contexts) and bounds concurrency with a semaphore.
"""
def __init__(self) -> None:
self._browser = None
self._playwright = None
self._sem = asyncio.Semaphore(MAX_CONCURRENCY)
self._lock = asyncio.Lock()
async def _ensure_browser(self):
if self._browser is not None:
return self._browser
async with self._lock:
if self._browser is not None:
return self._browser
try:
from playwright.async_api import async_playwright
except ImportError:
logger.error("playwright not installed — playback verification disabled")
return None
self._playwright = await async_playwright().start()
# CHROME_CDP_URL points to chrome-service's CDP endpoint
# (http://chrome-service.chrome-service.svc:9222 by default).
# Migrated 2026-06-04 from `chromium.connect(ws_url)` because
# chrome-service now runs chromium directly with persistent
# user-data-dir for cookie warming — launch-server couldn't
# persist. The CDP `Browser` exposes the persistent default
# context via `browser.contexts[0]`; here we just call
# `new_context()` for incognito-style isolation per verify
# round, matching the previous behaviour.
cdp_url = os.getenv("CHROME_CDP_URL")
if cdp_url:
try:
self._browser = await self._playwright.chromium.connect_over_cdp(
cdp_url, timeout=15_000,
)
logger.info("connected to remote chrome-service via CDP (concurrency=%d)", MAX_CONCURRENCY)
except Exception:
logger.exception(
"CDP connect failed (%s) — falling back to in-process Chromium", cdp_url,
)
self._browser = None
if self._browser is None:
# Either CHROME_CDP_URL was unset, or CDP connect failed.
# Fall back to in-process headless so the verifier still
# returns playable/unplayable verdicts (degraded but
# functional — anti-bot pages may bypass).
self._browser = await self._playwright.chromium.launch(
headless=True,
args=[
"--disable-dev-shm-usage",
"--disable-web-security",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-features=IsolateOrigins,site-per-process",
"--autoplay-policy=no-user-gesture-required",
],
)
logger.warning(
"using in-process Chromium (CHROME_CDP_URL unset or CDP connect failed) (concurrency=%d)",
MAX_CONCURRENCY,
)
return self._browser
async def shutdown(self) -> None:
if self._browser is not None:
try:
await self._browser.close()
except Exception:
logger.exception("error closing browser")
if self._playwright is not None:
try:
await self._playwright.stop()
except Exception:
logger.exception("error stopping playwright")
self._browser = None
self._playwright = None
async def verify(self, url: str, stream_type: str) -> PlaybackVerdict:
if not VERIFY_ENABLED:
return PlaybackVerdict(is_playable=True, error="disabled")
browser = await self._ensure_browser()
if browser is None:
return PlaybackVerdict(is_playable=False, error="playwright unavailable")
is_m3u8 = stream_type == "m3u8"
if is_m3u8:
# Route m3u8 fetches through our own /proxy so the verifier gets a
# same-origin response with ACAO:* — matches what the frontend does
# (frontend `getProxyUrl` wraps every m3u8 via /proxy anyway). Without
# this, hosts like oe1.ossfeed.store that only return CORS headers
# for specific Origins (e.g. pushembdz.store) trigger an immediate
# `fatal_network_error` in hls.js and the stream is marked dead.
url = f"{PROXY_BASE}/proxy?url={_b64url(url)}"
else:
url = f"{PROXY_BASE}/embed?url={_b64url(url)}"
async with self._sem:
# Set the per-stream deadline AFTER acquiring the semaphore.
# Otherwise queued streams that wait behind earlier ones
# would have already-expired deadlines when they start.
deadline = time.monotonic() + PER_STREAM_TIMEOUT
try:
context = await browser.new_context(
user_agent=USER_AGENT,
viewport={"width": 1280, "height": 720},
bypass_csp=True,
)
from backend.stealth import STEALTH_JS
await context.add_init_script(STEALTH_JS)
page = await context.new_page()
except Exception as e:
return PlaybackVerdict(
is_playable=False, error=f"context create failed: {e}",
)
try:
if is_m3u8:
verdict = await _verify_m3u8(page, url, deadline)
else:
verdict = await _verify_embed(page, url, deadline)
except asyncio.TimeoutError:
verdict = PlaybackVerdict(is_playable=False, error="overall timeout")
except Exception as e:
verdict = PlaybackVerdict(
is_playable=False, error=f"verify exception: {e}",
)
finally:
try:
await page.close()
await context.close()
except Exception:
pass
logger.info(
"[verify] %s -> playable=%s signal=%s err=%s elapsed=%dms",
url[:120], verdict.is_playable, verdict.signal,
verdict.error, verdict.elapsed_ms,
)
return verdict
async def verify_many(self, items: list[tuple[str, str]]) -> dict[str, PlaybackVerdict]:
if not items:
return {}
if not VERIFY_ENABLED:
return {url: PlaybackVerdict(is_playable=True, error="disabled") for url, _ in items}
async def _run(url: str, stream_type: str):
verdict = await self.verify(url, stream_type)
return url, verdict
results = await asyncio.gather(
*[_run(url, st) for url, st in items], return_exceptions=True
)
out: dict[str, PlaybackVerdict] = {}
for r in results:
if isinstance(r, Exception):
logger.exception("verify task crashed: %s", r)
continue
url, verdict = r
out[url] = verdict
return out

View file

@ -0,0 +1,501 @@
"""HLS proxy - fetches upstream m3u8 playlists and relays media segments.
Three core functions:
1. Playlist proxy: fetches an upstream m3u8 playlist, rewrites all URIs
to route through our /proxy and /relay endpoints, returns the rewritten
playlist to the client.
2. Quality selection: when the upstream m3u8 is a master playlist containing
multiple quality variants, allows selecting a specific variant by index.
3. Segment relay: fetches an upstream media segment (TS, fMP4, init) and
streams it to the client using chunked transfer encoding, never buffering
the full segment in memory.
All responses include CORS headers for browser playback.
"""
import logging
import re
from dataclasses import dataclass
from typing import AsyncGenerator
from urllib.parse import urljoin
import httpx
from fastapi import HTTPException
from backend.m3u8_rewriter import decode_url, rewrite_playlist
logger = logging.getLogger(__name__)
# Chunk size for relay streaming (64 KB)
RELAY_CHUNK_SIZE = 65536
# Timeout for upstream playlist fetches (seconds)
PLAYLIST_TIMEOUT = 15.0
# Timeout for upstream segment relay - longer because segments are bigger
RELAY_TIMEOUT = 30.0
# User-Agent for upstream requests
USER_AGENT = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
)
@dataclass
class QualityVariant:
"""A single quality variant parsed from a master HLS playlist."""
index: int # 0-based index in the playlist
bandwidth: int # BANDWIDTH value in bits/sec
resolution: str # e.g., "1920x1080" or "" if not specified
codecs: str # e.g., "avc1.640028,mp4a.40.2" or "" if not specified
name: str # e.g., "720p" or "" if not specified
uri: str # The variant playlist URI (absolute)
def to_dict(self) -> dict:
"""Serialize to a plain dictionary for JSON responses."""
return {
"index": self.index,
"bandwidth": self.bandwidth,
"resolution": self.resolution,
"codecs": self.codecs,
"name": self.name,
"uri": self.uri,
}
def _is_master_playlist(content: str) -> bool:
"""Check if an m3u8 playlist is a master playlist (contains variant streams).
A master playlist contains #EXT-X-STREAM-INF tags pointing to variant
playlists. A media playlist contains #EXTINF tags pointing to segments.
Args:
content: The raw m3u8 playlist text.
Returns:
True if this is a master playlist.
"""
return "#EXT-X-STREAM-INF:" in content
def parse_quality_variants(content: str, base_url: str) -> list[QualityVariant]:
"""Parse quality variants from a master HLS playlist.
Extracts all #EXT-X-STREAM-INF entries and their associated URIs.
Args:
content: The raw m3u8 master playlist text.
base_url: The URL of the playlist (for resolving relative URIs).
Returns:
List of QualityVariant objects sorted by bandwidth (highest first).
"""
variants: list[QualityVariant] = []
lines = content.splitlines()
index = 0
for i, line in enumerate(lines):
stripped = line.strip()
if not stripped.startswith("#EXT-X-STREAM-INF:"):
continue
# Parse attributes from the STREAM-INF tag
attrs = stripped[len("#EXT-X-STREAM-INF:"):]
bandwidth = _parse_attr_int(attrs, "BANDWIDTH")
resolution = _parse_attr_str(attrs, "RESOLUTION")
codecs = _parse_attr_quoted(attrs, "CODECS")
name = _parse_attr_quoted(attrs, "NAME")
# The next non-empty, non-comment line is the variant URI
uri = ""
for j in range(i + 1, len(lines)):
next_line = lines[j].strip()
if next_line and not next_line.startswith("#"):
uri = next_line
break
if not uri:
continue
# Resolve relative URI
if not uri.startswith("http://") and not uri.startswith("https://"):
uri = urljoin(base_url, uri)
# Generate a human-readable name if not provided
if not name and resolution:
# Extract height from resolution (e.g., "1920x1080" -> "1080p")
parts = resolution.split("x")
if len(parts) == 2:
name = f"{parts[1]}p"
variants.append(QualityVariant(
index=index,
bandwidth=bandwidth,
resolution=resolution,
codecs=codecs,
name=name,
uri=uri,
))
index += 1
# Sort by bandwidth descending (highest quality first)
variants.sort(key=lambda v: v.bandwidth, reverse=True)
# Re-index after sorting
for i, v in enumerate(variants):
v.index = i
return variants
def _select_variant_playlist(
content: str, base_url: str, variant_index: int
) -> str:
"""Extract a single variant from a master playlist by index.
Instead of returning the full master playlist, returns just the selected
variant's media playlist URL. The caller should then fetch and proxy that
URL instead.
Args:
content: The raw m3u8 master playlist text.
base_url: The URL of the playlist (for resolving relative URIs).
variant_index: 0-based index of the desired variant (sorted by bandwidth desc).
Returns:
The absolute URL of the selected variant's media playlist.
Raises:
HTTPException: If the variant index is out of range.
"""
variants = parse_quality_variants(content, base_url)
if not variants:
raise HTTPException(
status_code=400,
detail="Playlist has no quality variants to select from",
)
if variant_index < 0 or variant_index >= len(variants):
raise HTTPException(
status_code=400,
detail=f"Quality index {variant_index} out of range (0-{len(variants) - 1})",
)
selected = variants[variant_index]
logger.info(
"Selected quality variant %d: %s (%d bps, %s)",
variant_index,
selected.name or "unknown",
selected.bandwidth,
selected.resolution or "no resolution",
)
return selected.uri
def _parse_attr_int(attrs: str, name: str) -> int:
"""Parse an integer attribute from an HLS tag attribute string.
Args:
attrs: The attribute string (e.g., 'BANDWIDTH=1280000,RESOLUTION=720x480').
name: The attribute name to extract.
Returns:
The integer value, or 0 if not found.
"""
match = re.search(rf"{name}=(\d+)", attrs)
return int(match.group(1)) if match else 0
def _parse_attr_str(attrs: str, name: str) -> str:
"""Parse a bare (unquoted) string attribute from an HLS tag attribute string.
Args:
attrs: The attribute string.
name: The attribute name to extract.
Returns:
The string value, or empty string if not found.
"""
match = re.search(rf"{name}=([^,\s\"]+)", attrs)
return match.group(1) if match else ""
def _parse_attr_quoted(attrs: str, name: str) -> str:
"""Parse a quoted string attribute from an HLS tag attribute string.
Args:
attrs: The attribute string.
name: The attribute name to extract.
Returns:
The string value (without quotes), or empty string if not found.
"""
match = re.search(rf'{name}="([^"]*)"', attrs)
return match.group(1) if match else ""
async def proxy_playlist(
encoded_url: str, proxy_base: str, quality: int | None = None
) -> str:
"""Fetch an upstream m3u8 playlist and rewrite all URIs through our proxy.
If the upstream playlist is a master playlist (containing multiple quality
variants) and a quality index is specified, fetches the selected variant's
media playlist instead and rewrites that.
Args:
encoded_url: Base64url-encoded URL of the upstream m3u8 playlist.
proxy_base: The base URL of our proxy service for rewriting URIs
(e.g., "https://f1.viktorbarzin.me").
quality: Optional 0-based index of the desired quality variant.
Only applies when the upstream is a master playlist.
Variants are sorted by bandwidth descending (0 = highest).
Returns:
The rewritten m3u8 playlist text.
Raises:
HTTPException: If the URL can't be decoded, upstream fails, or
content is not a valid HLS playlist.
"""
# Decode the URL
try:
url = decode_url(encoded_url)
except Exception as e:
logger.error("Failed to decode proxy URL: %s", e)
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
logger.info("Proxying playlist: %s", url)
# Fetch the upstream playlist
try:
async with httpx.AsyncClient(
timeout=PLAYLIST_TIMEOUT,
follow_redirects=True,
headers={
"User-Agent": USER_AGENT,
"Accept": "*/*",
},
) as client:
response = await client.get(url)
if response.status_code != 200:
logger.warning(
"Upstream playlist returned HTTP %d for %s",
response.status_code,
url,
)
raise HTTPException(
status_code=502,
detail=f"Upstream returned HTTP {response.status_code}",
)
content = response.text
except httpx.TimeoutException:
logger.error("Timeout fetching upstream playlist: %s", url)
raise HTTPException(status_code=504, detail="Upstream playlist timeout")
except httpx.HTTPError as e:
logger.error("HTTP error fetching upstream playlist: %s - %s", url, e)
raise HTTPException(status_code=502, detail=f"Upstream error: {e}")
except HTTPException:
raise
except Exception as e:
logger.exception("Unexpected error fetching playlist: %s", url)
raise HTTPException(status_code=500, detail=f"Internal error: {e}")
# Validate it looks like an m3u8 playlist
if "#EXTM3U" not in content:
logger.warning("Upstream response is not a valid m3u8 playlist: %s", url)
raise HTTPException(
status_code=502,
detail="Upstream response is not a valid HLS playlist",
)
# If this is a master playlist and a quality variant was requested,
# fetch the selected variant's media playlist instead
if quality is not None and _is_master_playlist(content):
variant_url = _select_variant_playlist(content, url, quality)
logger.info("Fetching selected variant playlist: %s", variant_url)
try:
async with httpx.AsyncClient(
timeout=PLAYLIST_TIMEOUT,
follow_redirects=True,
headers={
"User-Agent": USER_AGENT,
"Accept": "*/*",
},
) as client:
variant_response = await client.get(variant_url)
if variant_response.status_code != 200:
logger.warning(
"Variant playlist returned HTTP %d for %s",
variant_response.status_code,
variant_url,
)
raise HTTPException(
status_code=502,
detail=f"Variant playlist returned HTTP {variant_response.status_code}",
)
content = variant_response.text
url = variant_url # Use variant URL as base for relative URI resolution
if "#EXTM3U" not in content:
logger.warning(
"Variant playlist is not valid m3u8: %s", variant_url
)
raise HTTPException(
status_code=502,
detail="Variant playlist is not a valid HLS playlist",
)
except httpx.TimeoutException:
logger.error("Timeout fetching variant playlist: %s", variant_url)
raise HTTPException(
status_code=504, detail="Variant playlist timeout"
)
except httpx.HTTPError as e:
logger.error(
"HTTP error fetching variant playlist: %s - %s", variant_url, e
)
raise HTTPException(
status_code=502, detail=f"Variant playlist error: {e}"
)
except HTTPException:
raise
except Exception as e:
logger.exception(
"Unexpected error fetching variant playlist: %s", variant_url
)
raise HTTPException(
status_code=500, detail=f"Internal error: {e}"
)
# Rewrite all URIs to go through our proxy
rewritten = rewrite_playlist(content, url, proxy_base)
logger.debug(
"Proxied playlist from %s: %d bytes -> %d bytes",
url,
len(content),
len(rewritten),
)
return rewritten
async def relay_stream(
encoded_url: str,
range_header: str | None = None,
) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
"""Relay an upstream media segment as a chunked byte stream.
Never buffers the full segment in memory. Streams chunks as they
arrive from the upstream server.
Args:
encoded_url: Base64url-encoded URL of the upstream segment.
range_header: Optional HTTP Range header from the client to
forward to upstream.
Returns:
A tuple of (async_generator, headers_dict, status_code) where:
- async_generator yields bytes chunks
- headers_dict contains content-type and other relevant headers
- status_code is the HTTP status (200 or 206)
Raises:
HTTPException: If the URL can't be decoded or upstream fails.
"""
# Decode the URL
try:
url = decode_url(encoded_url)
except Exception as e:
logger.error("Failed to decode relay URL: %s", e)
raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
logger.debug("Relaying segment: %s", url)
# Build upstream request headers
headers = {
"User-Agent": USER_AGENT,
"Accept": "*/*",
}
if range_header:
headers["Range"] = range_header
# Create the client and stream - caller is responsible for cleanup
# via the async generator protocol
client = httpx.AsyncClient(
timeout=RELAY_TIMEOUT,
follow_redirects=True,
)
try:
response = await client.send(
client.build_request("GET", url, headers=headers),
stream=True,
)
if response.status_code not in (200, 206):
await response.aclose()
await client.aclose()
logger.warning(
"Upstream segment returned HTTP %d for %s",
response.status_code,
url,
)
raise HTTPException(
status_code=502,
detail=f"Upstream returned HTTP {response.status_code}",
)
# Collect relevant response headers to forward
response_headers: dict[str, str] = {}
content_type = response.headers.get("content-type", "video/mp2t")
response_headers["Content-Type"] = content_type
if "content-length" in response.headers:
response_headers["Content-Length"] = response.headers["content-length"]
if "content-range" in response.headers:
response_headers["Content-Range"] = response.headers["content-range"]
status_code = response.status_code
async def _stream_chunks() -> AsyncGenerator[bytes, None]:
"""Yield chunks from the upstream response, then clean up."""
try:
async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
yield chunk
except Exception as e:
logger.error("Error streaming segment from %s: %s", url, e)
finally:
await response.aclose()
await client.aclose()
return _stream_chunks(), response_headers, status_code
except HTTPException:
raise
except httpx.TimeoutException:
await client.aclose()
logger.error("Timeout relaying segment: %s", url)
raise HTTPException(status_code=504, detail="Upstream segment timeout")
except httpx.HTTPError as e:
await client.aclose()
logger.error("HTTP error relaying segment: %s - %s", url, e)
raise HTTPException(status_code=502, detail=f"Upstream error: {e}")
except Exception as e:
await client.aclose()
logger.exception("Unexpected error relaying segment: %s", url)
raise HTTPException(status_code=500, detail=f"Internal error: {e}")

View file

@ -0,0 +1,6 @@
fastapi==0.115.0
uvicorn[standard]
httpx>=0.27.0
apscheduler>=3.10.0,<4.0
pydantic>=2.0.0
playwright==1.48.0

View file

@ -0,0 +1,240 @@
"""F1 Schedule Service - fetches, caches, and serves the F1 race calendar."""
import json
import logging
import os
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any
import httpx
logger = logging.getLogger(__name__)
JOLPICA_API_URL = "https://api.jolpi.ca/ergast/f1/current.json"
SCHEDULE_PATH = Path(os.getenv("SCHEDULE_PATH", "/data/schedule.json"))
STALE_THRESHOLD = timedelta(hours=24)
# Typical session durations in minutes
SESSION_DURATIONS = {
"fp1": 60,
"fp2": 60,
"fp3": 60,
"qualifying": 60,
"sprint_qualifying": 30,
"sprint": 30,
"race": 120,
}
def _parse_session_datetime(session: dict[str, str] | None) -> str | None:
"""Parse a session dict with 'date' and 'time' fields into an ISO 8601 UTC string."""
if not session or "date" not in session or "time" not in session:
return None
# Time format from API: "14:30:00Z"
time_str = session["time"].rstrip("Z")
return f"{session['date']}T{time_str}+00:00"
def _parse_race(race: dict[str, Any]) -> dict[str, Any]:
"""Transform a raw jolpica/Ergast race object into our internal format."""
circuit = race.get("Circuit", {})
location = circuit.get("Location", {})
# Build session list
sessions = []
# Map API keys to our session types, in chronological order for a race weekend
session_map = [
("FirstPractice", "fp1", "FP1"),
("SecondPractice", "fp2", "FP2"),
("ThirdPractice", "fp3", "FP3"),
("SprintQualifying", "sprint_qualifying", "Sprint Qualifying"),
("SprintShootout", "sprint_qualifying", "Sprint Qualifying"),
("Sprint", "sprint", "Sprint"),
("Qualifying", "qualifying", "Qualifying"),
]
seen_types = set()
for api_key, session_type, display_name in session_map:
if api_key in race and session_type not in seen_types:
dt_str = _parse_session_datetime(race[api_key])
if dt_str:
sessions.append(
{
"type": session_type,
"name": display_name,
"start_utc": dt_str,
"duration_minutes": SESSION_DURATIONS.get(session_type, 60),
}
)
seen_types.add(session_type)
# Race session itself (date and time are top-level)
race_dt = _parse_session_datetime({"date": race.get("date", ""), "time": race.get("time", "")})
if race_dt:
sessions.append(
{
"type": "race",
"name": "Race",
"start_utc": race_dt,
"duration_minutes": SESSION_DURATIONS["race"],
}
)
# Sort sessions chronologically
sessions.sort(key=lambda s: s["start_utc"])
return {
"round": int(race.get("round", 0)),
"race_name": race.get("raceName", ""),
"circuit": circuit.get("circuitName", ""),
"circuit_id": circuit.get("circuitId", ""),
"country": location.get("country", ""),
"locality": location.get("locality", ""),
"date": race.get("date", ""),
"url": race.get("url", ""),
"sessions": sessions,
}
def _compute_session_status(session: dict[str, Any], now: datetime) -> str:
"""Determine if a session is 'past', 'live', or 'upcoming'."""
try:
start = datetime.fromisoformat(session["start_utc"])
except (ValueError, KeyError):
return "upcoming"
duration = timedelta(minutes=session.get("duration_minutes", 60))
end = start + duration
if now >= end:
return "past"
elif now >= start:
return "live"
else:
return "upcoming"
class ScheduleService:
"""Manages the F1 schedule: fetching, caching, and serving."""
def __init__(self) -> None:
self._schedule: dict[str, Any] | None = None
async def fetch_schedule(self) -> dict[str, Any]:
"""Fetch the current season schedule from the jolpica API."""
logger.info("Fetching F1 schedule from jolpica API...")
async with httpx.AsyncClient(timeout=30.0) as client:
response = await client.get(JOLPICA_API_URL)
response.raise_for_status()
data = response.json()
race_table = data.get("MRData", {}).get("RaceTable", {})
season = race_table.get("season", "")
raw_races = race_table.get("Races", [])
races = [_parse_race(r) for r in raw_races]
schedule = {
"season": season,
"fetched_at": datetime.now(timezone.utc).isoformat(),
"races": races,
}
self._schedule = schedule
logger.info("Fetched schedule for %s season: %d races", season, len(races))
return schedule
def load_from_disk(self) -> bool:
"""Load schedule from NFS-backed JSON file. Returns True if loaded successfully."""
if not SCHEDULE_PATH.exists():
logger.info("No cached schedule found at %s", SCHEDULE_PATH)
return False
try:
with open(SCHEDULE_PATH, "r") as f:
self._schedule = json.load(f)
logger.info("Loaded cached schedule from %s", SCHEDULE_PATH)
return True
except (json.JSONDecodeError, OSError) as e:
logger.warning("Failed to load cached schedule: %s", e)
return False
def save_to_disk(self) -> None:
"""Persist current schedule to NFS-backed JSON file."""
if not self._schedule:
logger.warning("No schedule data to save")
return
try:
SCHEDULE_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(SCHEDULE_PATH, "w") as f:
json.dump(self._schedule, f, indent=2)
logger.info("Saved schedule to %s", SCHEDULE_PATH)
except OSError as e:
logger.error("Failed to save schedule to disk: %s", e)
def is_stale(self) -> bool:
"""Check if the cached schedule data is older than the stale threshold."""
if not self._schedule:
return True
fetched_at_str = self._schedule.get("fetched_at")
if not fetched_at_str:
return True
try:
fetched_at = datetime.fromisoformat(fetched_at_str)
return datetime.now(timezone.utc) - fetched_at > STALE_THRESHOLD
except ValueError:
return True
def get_schedule(self) -> dict[str, Any]:
"""Return the current schedule with computed session statuses."""
if not self._schedule:
return {"season": "", "races": [], "error": "No schedule data available"}
now = datetime.now(timezone.utc)
races = []
for race in self._schedule.get("races", []):
sessions = []
for session in race.get("sessions", []):
sessions.append(
{
**session,
"status": _compute_session_status(session, now),
}
)
races.append(
{
**race,
"sessions": sessions,
}
)
return {
"season": self._schedule.get("season", ""),
"fetched_at": self._schedule.get("fetched_at", ""),
"races": races,
}
async def refresh(self) -> None:
"""Fetch fresh schedule and persist to disk. Falls back to cached data on error."""
try:
await self.fetch_schedule()
self.save_to_disk()
except httpx.HTTPError as e:
logger.error("Failed to refresh schedule from API: %s", e)
if not self._schedule:
logger.warning("No cached data available either - schedule will be empty")
except Exception:
logger.exception("Unexpected error during schedule refresh")
async def initialize(self) -> None:
"""Load from disk on startup and refresh if stale."""
self.load_from_disk()
if self.is_stale():
await self.refresh()

View file

@ -0,0 +1,43 @@
"""Vendored Playwright stealth init script.
Mirror of `stacks/chrome-service/files/stealth.js`. Kept in sync by hand
update both files together if the JS is changed.
"""
STEALTH_JS = r"""
(() => {
Object.defineProperty(Navigator.prototype, 'webdriver', { get: () => undefined });
if (!window.chrome) window.chrome = {};
window.chrome.runtime = window.chrome.runtime || {};
Object.defineProperty(navigator, 'plugins', {
get: () => [{ name: 'Chrome PDF Plugin' }, { name: 'Chrome PDF Viewer' }, { name: 'Native Client' }],
});
Object.defineProperty(navigator, 'languages', { get: () => ['en-US', 'en'] });
const origQuery = window.navigator.permissions && window.navigator.permissions.query;
if (origQuery) {
window.navigator.permissions.query = (parameters) =>
parameters && parameters.name === 'notifications'
? Promise.resolve({ state: Notification.permission })
: origQuery(parameters);
}
const spoofGl = (proto) => {
if (!proto) return;
const orig = proto.getParameter;
proto.getParameter = function (parameter) {
if (parameter === 37445) return 'Intel Inc.';
if (parameter === 37446) return 'Intel Iris OpenGL Engine';
return orig.apply(this, arguments);
};
};
spoofGl(window.WebGLRenderingContext && window.WebGLRenderingContext.prototype);
spoofGl(window.WebGL2RenderingContext && window.WebGL2RenderingContext.prototype);
// disable-devtool.js auto-init evasion: hide the marker attribute so the
// library's IIFE exits early. Without this, hmembeds-class players redirect
// to google.com when the Performance detector trips under Playwright.
const origQS = Document.prototype.querySelector;
Document.prototype.querySelector = function (sel) {
if (typeof sel === 'string' && sel.indexOf('disable-devtool-auto') !== -1) return null;
return origQS.apply(this, arguments);
};
})();
"""

View file

@ -0,0 +1,362 @@
"""Token refresh manager - keeps CDN tokens fresh for active streams.
CDN tokens embedded in stream URLs expire after 5-30 minutes. During a 2+ hour
F1 session, URLs must be refreshed before they expire. This manager periodically
re-runs the extractor that found each active stream to get a fresh URL with a
new CDN token.
Usage:
1. When a user starts watching, call mark_stream_active(url, site_key)
2. The background scheduler calls refresh_active_streams() every 4 minutes
3. The proxy calls get_fresh_url(url) to resolve the latest URL
4. When the user stops watching, call mark_stream_inactive(url)
"""
import logging
from dataclasses import dataclass
from datetime import datetime, timezone
logger = logging.getLogger(__name__)
@dataclass
class ActiveStream:
"""Tracks a stream that a user is currently watching.
The original_url is the URL the user initially activated. After a token
refresh, current_url may differ (new CDN token, different edge server, etc.)
but the original_url remains the key for lookups.
"""
original_url: str
current_url: str # May differ from original after refresh
site_key: str
last_refreshed: str
refresh_count: int = 0
last_error: str = ""
def to_dict(self) -> dict:
"""Serialize to a plain dictionary for JSON responses."""
return {
"original_url": self.original_url,
"current_url": self.current_url,
"site_key": self.site_key,
"last_refreshed": self.last_refreshed,
"refresh_count": self.refresh_count,
"last_error": self.last_error,
}
class TokenRefreshManager:
"""Manages background token refresh for active streams.
When a user is watching a stream, the manager periodically re-runs
the extractor that found it to get a fresh URL with a new token.
The fresh URL is stored so the /proxy endpoint can use it on the
next playlist fetch.
"""
def __init__(self, extraction_service) -> None:
"""Initialize the token refresh manager.
Args:
extraction_service: The ExtractionService instance used to
re-run extractors and look up streams by site_key.
"""
# Import here to avoid circular imports at module level
from backend.extractors.service import ExtractionService
self._extraction_service: ExtractionService = extraction_service
self._active_streams: dict[str, ActiveStream] = {}
self._refresh_interval = 240 # 4 minutes (safe margin for 5-min tokens)
@property
def refresh_interval(self) -> int:
"""Refresh interval in seconds."""
return self._refresh_interval
@property
def has_active_streams(self) -> bool:
"""Whether there are any active streams being watched."""
return len(self._active_streams) > 0
def mark_stream_active(self, url: str, site_key: str) -> None:
"""Mark a stream as being actively watched.
If the stream is already active, this is a no-op (idempotent).
Args:
url: The stream URL the user is watching.
site_key: The extractor site_key that found this stream.
"""
if url in self._active_streams:
logger.debug("Stream already active: %s", url)
return
now = datetime.now(timezone.utc).isoformat()
self._active_streams[url] = ActiveStream(
original_url=url,
current_url=url,
site_key=site_key,
last_refreshed=now,
)
logger.info(
"Stream marked active: %s (site_key=%s, total_active=%d)",
url,
site_key,
len(self._active_streams),
)
def mark_stream_inactive(self, url: str) -> None:
"""Mark a stream as no longer watched.
If the stream is not active, this is a no-op.
Args:
url: The original stream URL to deactivate.
"""
removed = self._active_streams.pop(url, None)
if removed:
logger.info(
"Stream marked inactive: %s (was refreshed %d times, total_active=%d)",
url,
removed.refresh_count,
len(self._active_streams),
)
else:
logger.debug("Stream was not active, nothing to deactivate: %s", url)
async def refresh_active_streams(self) -> None:
"""Re-run extractors for all active streams to get fresh URLs.
For each active stream, re-runs the extractor that originally found it
and tries to match the stream in the new results. If a match is found,
updates the current_url. If not, the previous URL is kept (it may still
work until its token expires).
This method is called by the background scheduler every 4 minutes.
Token refresh failures are logged but never crash the process.
"""
if not self._active_streams:
logger.debug("No active streams to refresh")
return
logger.info(
"Refreshing tokens for %d active stream(s)...",
len(self._active_streams),
)
# Group active streams by site_key to avoid re-running the same
# extractor multiple times
streams_by_site: dict[str, list[ActiveStream]] = {}
for stream in self._active_streams.values():
streams_by_site.setdefault(stream.site_key, []).append(stream)
now = datetime.now(timezone.utc).isoformat()
for site_key, active_list in streams_by_site.items():
try:
await self._refresh_site(site_key, active_list, now)
except Exception:
logger.exception(
"Failed to refresh tokens for site_key=%s", site_key
)
# Mark the error on all streams from this site
for stream in active_list:
stream.last_error = f"Refresh failed at {now}"
async def _refresh_site(
self, site_key: str, active_list: list[ActiveStream], now: str
) -> None:
"""Re-run a single extractor and update active streams from its results.
Args:
site_key: The extractor's site_key.
active_list: List of ActiveStream objects from this extractor.
now: ISO timestamp for this refresh cycle.
"""
registry = self._extraction_service._registry
extractor = registry.get(site_key)
if extractor is None:
logger.warning(
"Extractor '%s' not found in registry, skipping refresh",
site_key,
)
for stream in active_list:
stream.last_error = f"Extractor '{site_key}' not found"
return
logger.info(
"Re-running extractor '%s' for token refresh (%d active stream(s))",
site_key,
len(active_list),
)
# Re-run the extractor to get fresh URLs
try:
fresh_streams = await extractor.extract()
except Exception as e:
logger.error(
"Extractor '%s' failed during token refresh: %s", site_key, e
)
for stream in active_list:
stream.last_error = f"Extraction failed: {e}"
return
if not fresh_streams:
logger.warning(
"Extractor '%s' returned no streams during token refresh",
site_key,
)
for stream in active_list:
stream.last_error = "Extractor returned no streams"
return
# Build a lookup of fresh URLs by quality+title for matching
# Since the URL itself changes (new token), we match by metadata
fresh_by_key: dict[str, str] = {}
for fs in fresh_streams:
# Use quality+title as a matching key (these stay the same across refreshes)
match_key = f"{fs.quality}|{fs.title}"
fresh_by_key[match_key] = fs.url
# Also keep all fresh URLs for fallback matching
all_fresh_urls = [fs.url for fs in fresh_streams]
for stream in active_list:
# Try to find the matching stream in fresh results
# Strategy 1: Match by quality+title
match_key = self._build_match_key(stream)
if match_key and match_key in fresh_by_key:
new_url = fresh_by_key[match_key]
if new_url != stream.current_url:
logger.info(
"Token refreshed for stream (quality+title match): %s -> %s",
stream.current_url[:80],
new_url[:80],
)
stream.current_url = new_url
stream.last_refreshed = now
stream.refresh_count += 1
stream.last_error = ""
continue
# Strategy 2: Match by URL path similarity (ignoring query params / tokens)
matched_url = self._find_url_by_path(stream.current_url, all_fresh_urls)
if matched_url:
if matched_url != stream.current_url:
logger.info(
"Token refreshed for stream (path match): %s -> %s",
stream.current_url[:80],
matched_url[:80],
)
stream.current_url = matched_url
stream.last_refreshed = now
stream.refresh_count += 1
stream.last_error = ""
continue
# Strategy 3: If only one fresh stream, assume it's the same
if len(all_fresh_urls) == 1:
new_url = all_fresh_urls[0]
if new_url != stream.current_url:
logger.info(
"Token refreshed for stream (single result fallback): %s -> %s",
stream.current_url[:80],
new_url[:80],
)
stream.current_url = new_url
stream.last_refreshed = now
stream.refresh_count += 1
stream.last_error = ""
continue
# No match found - keep the old URL and log
logger.warning(
"Could not match active stream to fresh results: %s",
stream.original_url[:80],
)
stream.last_error = "No matching stream in fresh results"
def _build_match_key(self, stream: ActiveStream) -> str:
"""Build a match key from cached stream metadata.
Looks up the stream in the extraction service cache to get
quality and title metadata for matching.
Returns:
A match key string, or empty string if metadata not found.
"""
# Look up the stream in the extraction cache
cached_streams = self._extraction_service._cache.get(stream.site_key, [])
for cs in cached_streams:
if cs.url == stream.current_url or cs.url == stream.original_url:
return f"{cs.quality}|{cs.title}"
return ""
@staticmethod
def _find_url_by_path(current_url: str, fresh_urls: list[str]) -> str | None:
"""Find a fresh URL that matches the current URL by path (ignoring query params).
CDN token refreshes typically change query parameters but keep the
same path structure. This matcher strips query params and compares
the path component.
Args:
current_url: The current (possibly expired) URL.
fresh_urls: List of fresh URLs to match against.
Returns:
The matching fresh URL, or None if no match.
"""
from urllib.parse import urlparse
current_parsed = urlparse(current_url)
current_path = current_parsed.path
for fresh_url in fresh_urls:
fresh_parsed = urlparse(fresh_url)
# Match on host + path (token is typically in query string)
if (
fresh_parsed.netloc == current_parsed.netloc
and fresh_parsed.path == current_path
):
return fresh_url
return None
def get_fresh_url(self, original_url: str) -> str:
"""Get the latest URL for a stream (may have changed due to token refresh).
If the stream is not active or has not been refreshed, returns the
original URL unchanged.
Args:
original_url: The URL to look up (can be the original or any
previous current_url).
Returns:
The most recent URL for this stream.
"""
# Direct lookup by original URL
stream = self._active_streams.get(original_url)
if stream:
return stream.current_url
# Also check if the URL matches any current_url (in case the caller
# is using an intermediate refreshed URL)
for stream in self._active_streams.values():
if stream.current_url == original_url:
return stream.current_url
# Not an active stream - return as-is
return original_url
def get_active_streams(self) -> list[dict]:
"""Return all active streams with their refresh status.
Returns:
List of serialized ActiveStream dicts.
"""
return [stream.to_dict() for stream in self._active_streams.values()]

View file

@ -0,0 +1,3 @@
node_modules/
build/
.svelte-kit/

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,23 @@
{
"name": "f1-stream-frontend",
"version": "1.0.0",
"private": true,
"type": "module",
"scripts": {
"dev": "vite dev",
"build": "vite build",
"preview": "vite preview"
},
"devDependencies": {
"@sveltejs/adapter-static": "^3.0.0",
"@sveltejs/kit": "^2.0.0",
"@sveltejs/vite-plugin-svelte": "^5.0.0",
"@tailwindcss/vite": "^4.0.0",
"svelte": "^5.0.0",
"tailwindcss": "^4.0.0",
"vite": "^6.0.0"
},
"dependencies": {
"hls.js": "^1.5.0"
}
}

View file

@ -0,0 +1,35 @@
@import "tailwindcss";
@theme {
--color-f1-red: #e10600;
--color-f1-red-dark: #b50500;
--color-f1-bg: #111111;
--color-f1-surface: #1a1a1a;
--color-f1-surface-hover: #242424;
--color-f1-border: #2a2a2a;
--color-f1-text: #e0e0e0;
--color-f1-text-muted: #888888;
}
body {
background-color: var(--color-f1-bg);
color: var(--color-f1-text);
font-family: system-ui, -apple-system, sans-serif;
}
/* Scrollbar styling */
::-webkit-scrollbar {
width: 6px;
}
::-webkit-scrollbar-track {
background: var(--color-f1-bg);
}
::-webkit-scrollbar-thumb {
background: var(--color-f1-border);
border-radius: 3px;
}
/* HLS video player */
video::-webkit-media-controls {
display: none !important;
}

View file

@ -0,0 +1,13 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>🏎</text></svg>" />
<title>F1 Stream</title>
%sveltekit.head%
</head>
<body data-sveltekit-preload-data="hover">
<div style="display: contents">%sveltekit.body%</div>
</body>
</html>

View file

@ -0,0 +1,88 @@
/**
* API client for the F1 Streams backend.
* All endpoints are on the same origin, so no CORS issues.
*/
const API_BASE = '';
/**
* Fetch the F1 race schedule with session statuses.
* @returns {Promise<{season: string, fetched_at: string, races: Array}>}
*/
export async function fetchSchedule() {
const res = await fetch(`${API_BASE}/schedule`);
if (!res.ok) throw new Error(`Schedule fetch failed: ${res.status}`);
return res.json();
}
/**
* Fetch available live streams.
* @returns {Promise<{streams: Array, count: number}>}
*/
export async function fetchStreams() {
const res = await fetch(`${API_BASE}/streams`);
if (!res.ok) throw new Error(`Streams fetch failed: ${res.status}`);
return res.json();
}
/**
* Encode a URL to base64url for the proxy endpoint.
* @param {string} rawUrl - The original m3u8 URL
* @returns {string} base64url-encoded string
*/
function toBase64Url(rawUrl) {
return btoa(rawUrl).replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
}
/**
* Get the proxied m3u8 URL for HLS playback.
* @param {string} m3u8Url - The original m3u8 URL
* @returns {string} The proxy URL
*/
export function getProxyUrl(m3u8Url) {
const encoded = toBase64Url(m3u8Url);
return `${API_BASE}/proxy?url=${encoded}`;
}
/**
* Get the embed-proxy URL for an upstream iframe embed page.
*
* The proxy strips X-Frame-Options / CSP frame-ancestors and injects a
* frame-buster-defeat script so the embed renders inside our iframe even
* when the upstream tries to block it.
* @param {string} embedUrl - The original embed page URL
* @returns {string} URL pointing at our /embed proxy
*/
export function getEmbedProxyUrl(embedUrl) {
const encoded = toBase64Url(embedUrl);
return `${API_BASE}/embed?url=${encoded}`;
}
/**
* Mark a stream as actively being watched (enables token refresh).
* @param {string} url - The stream URL
* @param {string} [siteKey] - Optional site key
*/
export async function activateStream(url, siteKey = '') {
const res = await fetch(`${API_BASE}/streams/activate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url, site_key: siteKey })
});
if (!res.ok) throw new Error(`Activate failed: ${res.status}`);
return res.json();
}
/**
* Mark a stream as no longer being watched.
* @param {string} url - The stream URL
*/
export async function deactivateStream(url) {
const res = await fetch(`${API_BASE}/streams/deactivate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url })
});
if (!res.ok) throw new Error(`Deactivate failed: ${res.status}`);
return res.json();
}

View file

@ -0,0 +1,13 @@
import { writable } from 'svelte/store';
/** Schedule data store */
export const schedule = writable(null);
/** Streams data store */
export const streams = writable(null);
/** Loading state */
export const loading = writable(false);
/** Error state */
export const error = writable(null);

View file

@ -0,0 +1,3 @@
export const prerender = true;
export const ssr = false;
export const trailingSlash = 'always';

View file

@ -0,0 +1,28 @@
<script>
import '../app.css';
let { children } = $props();
</script>
<div class="min-h-screen flex flex-col">
<header class="border-b border-f1-border bg-f1-surface">
<nav class="max-w-6xl mx-auto px-4 py-3 flex items-center gap-6">
<a href="/" class="flex items-center gap-2 text-lg font-bold text-white hover:text-f1-red transition-colors">
<span class="text-f1-red font-black text-xl">F1</span>
<span>Stream</span>
</a>
<div class="flex gap-4 text-sm">
<a href="/" class="text-f1-text-muted hover:text-white transition-colors">Schedule</a>
<a href="/watch" class="text-f1-text-muted hover:text-white transition-colors">Watch</a>
</div>
</nav>
</header>
<main class="flex-1">
{@render children()}
</main>
<footer class="border-t border-f1-border py-3 text-center text-xs text-f1-text-muted">
F1 Stream
</footer>
</div>

View file

@ -0,0 +1,232 @@
<script>
import { fetchSchedule } from '$lib/api.js';
import { onMount } from 'svelte';
let scheduleData = $state(null);
let loading = $state(true);
let errorMsg = $state(null);
let now = $state(new Date());
// Update "now" every 30 seconds for live countdown
let timer;
onMount(() => {
loadSchedule();
timer = setInterval(() => { now = new Date(); }, 30000);
return () => clearInterval(timer);
});
async function loadSchedule() {
loading = true;
errorMsg = null;
try {
scheduleData = await fetchSchedule();
} catch (e) {
errorMsg = e.message;
} finally {
loading = false;
}
}
/**
* Find the next upcoming session across all races.
*/
let nextSession = $derived.by(() => {
if (!scheduleData?.races) return null;
for (const race of scheduleData.races) {
for (const session of race.sessions) {
if (session.status === 'upcoming') {
return { race, session };
}
if (session.status === 'live') {
return { race, session };
}
}
}
return null;
});
/**
* Format an ISO date string to the user's local timezone.
*/
function formatLocalTime(isoStr) {
const d = new Date(isoStr);
return d.toLocaleString(undefined, {
weekday: 'short',
month: 'short',
day: 'numeric',
hour: '2-digit',
minute: '2-digit'
});
}
/**
* Format a short date (day + month).
*/
function formatShortDate(isoStr) {
const d = new Date(isoStr);
return d.toLocaleDateString(undefined, { month: 'short', day: 'numeric' });
}
/**
* Format a time only.
*/
function formatTime(isoStr) {
const d = new Date(isoStr);
return d.toLocaleTimeString(undefined, { hour: '2-digit', minute: '2-digit' });
}
/**
* Compute countdown string to a future ISO date.
*/
function countdown(isoStr) {
const target = new Date(isoStr);
const diff = target - now;
if (diff <= 0) return 'Now';
const days = Math.floor(diff / (1000 * 60 * 60 * 24));
const hours = Math.floor((diff % (1000 * 60 * 60 * 24)) / (1000 * 60 * 60));
const mins = Math.floor((diff % (1000 * 60 * 60)) / (1000 * 60));
if (days > 0) return `${days}d ${hours}h ${mins}m`;
if (hours > 0) return `${hours}h ${mins}m`;
return `${mins}m`;
}
/**
* Get status badge classes.
*/
function statusClasses(status) {
switch (status) {
case 'live': return 'bg-f1-red text-white';
case 'upcoming': return 'bg-blue-600 text-white';
case 'past': return 'bg-neutral-700 text-neutral-400';
default: return 'bg-neutral-700 text-neutral-400';
}
}
/**
* Determine if a race has any live or upcoming sessions (to highlight it).
*/
function raceIsActive(race) {
return race.sessions.some(s => s.status === 'live' || s.status === 'upcoming');
}
/**
* Determine if a race is entirely in the past.
*/
function raceIsPast(race) {
return race.sessions.every(s => s.status === 'past');
}
</script>
<svelte:head>
<title>F1 Stream - Schedule</title>
</svelte:head>
<div class="max-w-6xl mx-auto px-4 py-6">
{#if loading}
<div class="flex items-center justify-center py-20">
<div class="w-8 h-8 border-2 border-f1-red border-t-transparent rounded-full animate-spin"></div>
<span class="ml-3 text-f1-text-muted">Loading schedule...</span>
</div>
{:else if errorMsg}
<div class="bg-red-900/30 border border-red-700 rounded-lg p-4 text-center">
<p class="text-red-300">Failed to load schedule: {errorMsg}</p>
<button onclick={loadSchedule} class="mt-2 px-4 py-1 bg-f1-red text-white rounded text-sm hover:bg-f1-red-dark transition-colors">
Retry
</button>
</div>
{:else if scheduleData}
<!-- Next Session Countdown -->
{#if nextSession}
<div class="mb-8 bg-f1-surface border border-f1-border rounded-lg p-6">
<div class="flex flex-col sm:flex-row sm:items-center sm:justify-between gap-2">
<div>
<p class="text-f1-text-muted text-sm uppercase tracking-wider">
{nextSession.session.status === 'live' ? 'Live Now' : 'Next Session'}
</p>
<h2 class="text-xl font-bold text-white mt-1">
{nextSession.race.race_name} - {nextSession.session.name}
</h2>
<p class="text-f1-text-muted text-sm mt-1">
{nextSession.race.circuit} &middot; {nextSession.race.country}
</p>
</div>
<div class="text-right">
{#if nextSession.session.status === 'live'}
<a href="/watch" class="inline-flex items-center gap-2 px-5 py-2 bg-f1-red text-white font-semibold rounded-lg hover:bg-f1-red-dark transition-colors">
<span class="w-2 h-2 rounded-full bg-white animate-pulse"></span>
Watch Live
</a>
{:else}
<p class="text-2xl font-mono font-bold text-white">{countdown(nextSession.session.start_utc)}</p>
<p class="text-f1-text-muted text-sm">{formatLocalTime(nextSession.session.start_utc)}</p>
{/if}
</div>
</div>
</div>
{/if}
<!-- Season Header -->
<div class="flex items-center justify-between mb-6">
<h1 class="text-2xl font-bold text-white">{scheduleData.season} Season</h1>
<span class="text-xs text-f1-text-muted">{scheduleData.races.length} races</span>
</div>
<!-- Race List -->
<div class="space-y-4">
{#each scheduleData.races as race (race.round)}
{@const isPast = raceIsPast(race)}
<div class="bg-f1-surface border border-f1-border rounded-lg overflow-hidden {isPast ? 'opacity-50' : ''}">
<!-- Race Header -->
<div class="px-4 py-3 flex items-center justify-between">
<div class="flex items-center gap-3">
<span class="text-f1-text-muted text-sm font-mono w-8">R{race.round}</span>
<div>
<h3 class="font-semibold text-white">{race.race_name}</h3>
<p class="text-xs text-f1-text-muted">{race.circuit} &middot; {race.locality}, {race.country}</p>
</div>
</div>
<span class="text-sm text-f1-text-muted">{formatShortDate(race.date)}</span>
</div>
<!-- Sessions -->
<div class="border-t border-f1-border">
<div class="grid grid-cols-1 sm:grid-cols-2 md:grid-cols-3 lg:grid-cols-4 gap-px bg-f1-border">
{#each race.sessions as session}
{@const isLive = session.status === 'live'}
{@const isClickable = isLive}
<div class="bg-f1-surface px-3 py-2 {isLive ? 'bg-f1-red/10' : ''} {isClickable ? 'hover:bg-f1-surface-hover cursor-pointer' : ''}">
{#if isClickable}
<a href="/watch?session={session.type}&round={race.round}" class="block">
<div class="flex items-center justify-between">
<span class="text-sm font-medium text-white">{session.name}</span>
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded {statusClasses(session.status)}">
{session.status}
</span>
</div>
<p class="text-xs text-f1-text-muted mt-0.5">{formatTime(session.start_utc)}</p>
</a>
{:else}
<div class="flex items-center justify-between">
<span class="text-sm font-medium {session.status === 'past' ? 'text-f1-text-muted' : 'text-white'}">{session.name}</span>
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded {statusClasses(session.status)}">
{session.status}
</span>
</div>
<p class="text-xs text-f1-text-muted mt-0.5">
{formatTime(session.start_utc)}
{#if session.status === 'upcoming'}
&middot; {countdown(session.start_utc)}
{/if}
</p>
{/if}
</div>
{/each}
</div>
</div>
</div>
{/each}
</div>
{/if}
</div>

View file

@ -0,0 +1,484 @@
<script>
import { fetchStreams, fetchSchedule, getProxyUrl, getEmbedProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
import { onMount, onDestroy } from 'svelte';
import { page } from '$app/state';
// Lazy-load hls.js to code-split it into a separate chunk
let Hls = $state(null);
// Query params
let sessionType = $derived(page.url?.searchParams?.get('session') || '');
let roundNumber = $derived(page.url?.searchParams?.get('round') || '');
// State
let streamsData = $state(null);
let scheduleData = $state(null);
let loading = $state(true);
let errorMsg = $state(null);
// Multi-stream player state: array of active player slots
let players = $state([]);
const MAX_PLAYERS = 4;
// Current session info from schedule
let currentRace = $derived.by(() => {
if (!scheduleData?.races || !roundNumber) return null;
return scheduleData.races.find(r => r.round === parseInt(roundNumber));
});
let currentSession = $derived.by(() => {
if (!currentRace || !sessionType) return null;
return currentRace.sessions.find(s => s.type === sessionType);
});
// Layout class based on player count
let layoutClass = $derived.by(() => {
const count = players.length;
if (count <= 1) return 'grid-cols-1';
if (count === 2) return 'grid-cols-2';
return 'grid-cols-2'; // 3-4 players: 2x2 grid
});
onMount(async () => {
const hlsModule = await import('hls.js');
Hls = hlsModule.default;
loadData();
document.addEventListener('fullscreenchange', onFullscreenChange);
});
onDestroy(() => {
// Clean up all players
for (const player of players) {
cleanupPlayer(player);
}
if (typeof document !== 'undefined') {
document.removeEventListener('fullscreenchange', onFullscreenChange);
}
});
async function loadData() {
loading = true;
errorMsg = null;
try {
const [streamsResult, scheduleResult] = await Promise.all([
fetchStreams(),
fetchSchedule()
]);
streamsData = streamsResult;
scheduleData = scheduleResult;
} catch (e) {
errorMsg = e.message;
} finally {
loading = false;
}
}
function cleanupPlayer(player) {
if (player.hls) {
player.hls.destroy();
player.hls = null;
}
if (player.originalUrl) {
deactivateStream(player.originalUrl).catch(() => {});
}
if (player.controlsTimer) {
clearTimeout(player.controlsTimer);
}
}
function removePlayer(index) {
const player = players[index];
cleanupPlayer(player);
players = players.filter((_, i) => i !== index);
}
function isStreamActive(url) {
return players.some(p => p.originalUrl === url);
}
function playStream(stream) {
// If already playing this stream, don't add a duplicate
const streamUrl = stream.stream_type === 'embed' ? stream.embed_url : stream.url;
if (isStreamActive(streamUrl)) return;
// If at max players, replace the last one
if (players.length >= MAX_PLAYERS) {
removePlayer(players.length - 1);
}
if (stream.stream_type === 'embed') {
// Embed/iframe player — route through our /embed proxy so the
// upstream's X-Frame-Options / CSP / JS frame-busters can't
// block the iframe.
const newPlayer = {
id: Date.now(),
proxyUrl: '',
originalUrl: stream.embed_url,
embedUrl: getEmbedProxyUrl(stream.embed_url),
streamType: 'embed',
siteKey: stream.site_key || '',
siteName: stream.site_name || stream.site_key || 'Unknown',
quality: stream.quality || '',
isPlaying: true,
isMuted: false,
volume: 1,
showControls: true,
error: null,
videoEl: null,
containerEl: null,
hls: null,
controlsTimer: null,
};
players = [...players, newPlayer];
return;
}
// m3u8 player — use hls.js
if (!Hls) return;
const proxyUrl = getProxyUrl(stream.url);
const newPlayer = {
id: Date.now(),
proxyUrl,
originalUrl: stream.url,
embedUrl: '',
streamType: 'm3u8',
siteKey: stream.site_key || '',
siteName: stream.site_name || stream.site_key || 'Unknown',
quality: stream.quality || '',
isPlaying: false,
isMuted: false,
volume: 1,
showControls: true,
error: null,
videoEl: null,
containerEl: null,
hls: null,
controlsTimer: null,
};
players = [...players, newPlayer];
// Activate stream for token refresh
activateStream(stream.url, stream.site_key || '').catch(() => {});
// Wait for DOM to update then initialize player
requestAnimationFrame(() => {
requestAnimationFrame(() => {
initPlayer(players.length - 1);
});
});
}
function initPlayer(index) {
const player = players[index];
if (!player || !player.videoEl) return;
if (Hls.isSupported()) {
// `lowLatencyMode` previously broke playback on regular (non-LL-HLS)
// providers like RallyTV — they don't ship the LL-HLS extensions
// hls.js needs in that mode. Default off; explicit per-stream flag
// can re-enable later.
const hlsInstance = new Hls({
enableWorker: true,
lowLatencyMode: false,
backBufferLength: 90
});
hlsInstance.loadSource(player.proxyUrl);
hlsInstance.attachMedia(player.videoEl);
hlsInstance.on(Hls.Events.MANIFEST_PARSED, () => {
player.videoEl.play().catch(() => {});
players[index] = { ...player, isPlaying: true, hls: hlsInstance };
});
hlsInstance.on(Hls.Events.ERROR, (event, data) => {
if (data.fatal) {
switch (data.type) {
case Hls.ErrorTypes.NETWORK_ERROR:
players[index] = { ...players[index], error: `Network error: ${data.details}` };
hlsInstance.startLoad();
break;
case Hls.ErrorTypes.MEDIA_ERROR:
players[index] = { ...players[index], error: `Media error: ${data.details}` };
hlsInstance.recoverMediaError();
break;
default:
players[index] = { ...players[index], error: `Fatal error: ${data.details}` };
removePlayer(index);
break;
}
}
});
player.hls = hlsInstance;
} else if (player.videoEl.canPlayType('application/vnd.apple.mpegurl')) {
// Native HLS (Safari)
player.videoEl.src = player.proxyUrl;
player.videoEl.addEventListener('loadedmetadata', () => {
player.videoEl.play().catch(() => {});
players[index] = { ...player, isPlaying: true };
});
}
}
function togglePlay(index) {
const player = players[index];
if (!player?.videoEl) return;
if (player.videoEl.paused) {
player.videoEl.play().catch(() => {});
players[index] = { ...player, isPlaying: true };
} else {
player.videoEl.pause();
players[index] = { ...player, isPlaying: false };
}
}
function toggleMute(index) {
const player = players[index];
if (!player?.videoEl) return;
const newMuted = !player.isMuted;
player.videoEl.muted = newMuted;
players[index] = { ...player, isMuted: newMuted };
}
function setVolume(index, e) {
const player = players[index];
if (!player?.videoEl) return;
const vol = parseFloat(e.target.value);
player.videoEl.volume = vol;
const muted = vol === 0;
player.videoEl.muted = muted;
players[index] = { ...player, volume: vol, isMuted: muted };
}
function toggleFullscreen(index) {
const player = players[index];
if (!player?.containerEl) return;
if (!document.fullscreenElement) {
player.containerEl.requestFullscreen().catch(() => {});
} else {
document.exitFullscreen().catch(() => {});
}
}
let isFullscreen = $state(false);
function onFullscreenChange() {
isFullscreen = !!document.fullscreenElement;
}
function onPlayerMouseMove(index) {
const player = players[index];
if (!player) return;
if (player.controlsTimer) clearTimeout(player.controlsTimer);
players[index] = { ...player, showControls: true };
const timer = setTimeout(() => {
if (players[index]?.isPlaying) {
players[index] = { ...players[index], showControls: false };
}
}, 3000);
players[index] = { ...players[index], controlsTimer: timer };
}
function responseTimeColor(ms) {
if (ms < 500) return 'text-green-400';
if (ms < 1500) return 'text-yellow-400';
return 'text-red-400';
}
</script>
<svelte:head>
<title>F1 Stream - Watch{currentRace ? ` - ${currentRace.race_name}` : ''}</title>
</svelte:head>
<div class="max-w-7xl mx-auto px-4 py-6">
<!-- Session Info Header -->
{#if currentRace && currentSession}
<div class="mb-6">
<p class="text-f1-text-muted text-sm uppercase tracking-wider">
Round {currentRace.round} &middot; {currentSession.name}
</p>
<h1 class="text-2xl font-bold text-white">{currentRace.race_name}</h1>
<p class="text-f1-text-muted text-sm">{currentRace.circuit} &middot; {currentRace.country}</p>
</div>
{:else}
<h1 class="text-2xl font-bold text-white mb-6">Watch</h1>
{/if}
<!-- Multi-Stream Players Grid -->
{#if players.length > 0}
<div class="grid {layoutClass} gap-2 mb-6">
{#each players as player, i (player.id)}
<div
class="bg-black rounded-lg overflow-hidden relative group"
bind:this={player.containerEl}
onmousemove={() => onPlayerMouseMove(i)}
role="region"
aria-label="Video player {i + 1}"
>
<!-- Stream label -->
<div class="absolute top-2 left-2 z-10 bg-black/60 rounded px-2 py-0.5 text-xs text-white">
{player.siteName}{#if player.quality} &middot; {player.quality}{/if}
</div>
<!-- Close button -->
<button
onclick={() => removePlayer(i)}
class="absolute top-2 right-2 z-10 bg-black/60 rounded-full w-6 h-6 flex items-center justify-center text-white hover:text-f1-red hover:bg-black/80 transition-colors"
aria-label="Close stream"
>
<svg class="w-3.5 h-3.5" fill="currentColor" viewBox="0 0 24 24"><path d="M19 6.41L17.59 5 12 10.59 6.41 5 5 6.41 10.59 12 5 17.59 6.41 19 12 13.41 17.59 19 19 17.59 13.41 12z"/></svg>
</button>
<!-- Video or Iframe -->
{#if player.streamType === 'embed'}
<iframe
src={player.embedUrl}
class="w-full aspect-video bg-black"
allow="autoplay; encrypted-media; fullscreen; picture-in-picture"
allowfullscreen
frameborder="0"
title="{player.siteName} stream"
></iframe>
{:else}
<video
bind:this={player.videoEl}
class="w-full aspect-video bg-black"
playsinline
></video>
{/if}
<!-- Controls Overlay -->
<div class="absolute bottom-0 left-0 right-0 bg-gradient-to-t from-black/80 to-transparent px-3 py-2 transition-opacity duration-300 {player.showControls ? 'opacity-100' : 'opacity-0'}">
<div class="flex items-center gap-2">
<button onclick={() => togglePlay(i)} class="text-white hover:text-f1-red transition-colors" aria-label={player.isPlaying ? 'Pause' : 'Play'}>
{#if player.isPlaying}
<svg class="w-5 h-5" fill="currentColor" viewBox="0 0 24 24"><path d="M6 4h4v16H6V4zm8 0h4v16h-4V4z"/></svg>
{:else}
<svg class="w-5 h-5" fill="currentColor" viewBox="0 0 24 24"><path d="M8 5v14l11-7z"/></svg>
{/if}
</button>
<button onclick={() => toggleMute(i)} class="text-white hover:text-f1-red transition-colors" aria-label={player.isMuted ? 'Unmute' : 'Mute'}>
{#if player.isMuted || player.volume === 0}
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M16.5 12c0-1.77-1.02-3.29-2.5-4.03v2.21l2.45 2.45c.03-.2.05-.41.05-.63zm2.5 0c0 .94-.2 1.82-.54 2.64l1.51 1.51C20.63 14.91 21 13.5 21 12c0-4.28-2.99-7.86-7-8.77v2.06c2.89.86 5 3.54 5 6.71zM4.27 3L3 4.27 7.73 9H3v6h4l5 5v-6.73l4.25 4.25c-.67.52-1.42.93-2.25 1.18v2.06c1.38-.31 2.63-.95 3.69-1.81L19.73 21 21 19.73l-9-9L4.27 3zM12 4L9.91 6.09 12 8.18V4z"/></svg>
{:else}
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M3 9v6h4l5 5V4L7 9H3zm13.5 3c0-1.77-1.02-3.29-2.5-4.03v8.05c1.48-.73 2.5-2.25 2.5-4.02z"/></svg>
{/if}
</button>
<input
type="range" min="0" max="1" step="0.05"
value={player.volume}
oninput={(e) => setVolume(i, e)}
class="w-16 h-1 accent-f1-red"
aria-label="Volume"
/>
<div class="flex-1"></div>
<button onclick={() => toggleFullscreen(i)} class="text-white hover:text-f1-red transition-colors" aria-label="Fullscreen">
<svg class="w-4 h-4" fill="currentColor" viewBox="0 0 24 24"><path d="M7 14H5v5h5v-2H7v-3zm-2-4h2V7h3V5H5v5zm12 7h-3v2h5v-5h-2v3zM14 5v2h3v3h2V5h-5z"/></svg>
</button>
</div>
</div>
<!-- Error overlay -->
{#if player.error}
<div class="absolute bottom-12 left-2 right-2 bg-red-900/80 rounded px-2 py-1 text-xs text-red-300">
{player.error}
</div>
{/if}
</div>
{/each}
</div>
{/if}
<!-- Stream List -->
{#if loading}
<div class="flex items-center justify-center py-20">
<div class="w-8 h-8 border-2 border-f1-red border-t-transparent rounded-full animate-spin"></div>
<span class="ml-3 text-f1-text-muted">Loading streams...</span>
</div>
{:else if errorMsg}
<div class="bg-red-900/30 border border-red-700 rounded-lg p-4 text-center">
<p class="text-red-300">Failed to load streams: {errorMsg}</p>
<button onclick={loadData} class="mt-2 px-4 py-1 bg-f1-red text-white rounded text-sm hover:bg-f1-red-dark transition-colors">
Retry
</button>
</div>
{:else if streamsData}
<div class="flex items-center justify-between mb-4">
<h2 class="text-lg font-semibold text-white">
Available Streams
<span class="text-f1-text-muted font-normal text-sm ml-2">({streamsData.count})</span>
</h2>
<div class="flex items-center gap-4">
{#if players.length > 0}
<span class="text-xs text-f1-text-muted">{players.length}/{MAX_PLAYERS} streams active</span>
{/if}
<button onclick={loadData} class="text-xs text-f1-text-muted hover:text-white transition-colors uppercase tracking-wider">
Refresh
</button>
</div>
</div>
{#if streamsData.streams.length === 0}
<div class="bg-f1-surface border border-f1-border rounded-lg p-8 text-center">
<p class="text-f1-text-muted">No streams available right now.</p>
<p class="text-f1-text-muted text-sm mt-2">Streams appear when a session is live. Check the schedule for upcoming sessions.</p>
<a href="/" class="inline-block mt-4 px-4 py-2 bg-f1-surface-hover border border-f1-border rounded text-sm text-white hover:border-f1-red transition-colors">
View Schedule
</a>
</div>
{:else}
<div class="space-y-2">
{#each streamsData.streams as stream, i}
{@const active = isStreamActive(stream.stream_type === 'embed' ? stream.embed_url : stream.url)}
<div class="bg-f1-surface border rounded-lg px-4 py-3 flex items-center gap-4 {active ? 'border-f1-red' : 'border-f1-border hover:border-f1-border'}">
<div class="flex-1 min-w-0">
<div class="flex items-center gap-2">
<span class="text-sm font-medium text-white truncate">{stream.site_name || stream.site_key || 'Unknown'}</span>
{#if stream.is_live}
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded bg-f1-red text-white">Live</span>
{/if}
{#if stream.stream_type === 'embed'}
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded bg-blue-600 text-white">Embed</span>
{/if}
{#if active}
<span class="text-[10px] font-bold uppercase px-1.5 py-0.5 rounded bg-green-600 text-white">Playing</span>
{/if}
</div>
<div class="flex items-center gap-3 mt-1 text-xs text-f1-text-muted">
{#if stream.title}
<span class="truncate">{stream.title}</span>
{/if}
{#if stream.quality}
<span>{stream.quality}</span>
{/if}
{#if stream.response_time_ms != null}
<span class={responseTimeColor(stream.response_time_ms)}>
{stream.response_time_ms}ms
</span>
{/if}
</div>
</div>
<div class="flex items-center gap-2">
{#if !active}
<button
onclick={() => playStream(stream)}
class="px-4 py-1.5 rounded text-sm font-medium bg-f1-red text-white hover:bg-f1-red-dark transition-colors"
>
{players.length > 0 ? 'Add' : 'Watch'}
</button>
{:else}
<span class="text-xs text-green-400">Active</span>
{/if}
</div>
</div>
{/each}
</div>
{/if}
{/if}
</div>

View file

@ -0,0 +1,19 @@
import adapter from '@sveltejs/adapter-static';
/** @type {import('@sveltejs/kit').Config} */
const config = {
kit: {
adapter: adapter({
pages: 'build',
assets: 'build',
fallback: 'index.html',
precompress: false,
strict: true
}),
paths: {
base: ''
}
}
};
export default config;

View file

@ -0,0 +1,10 @@
import { sveltekit } from '@sveltejs/kit/vite';
import tailwindcss from '@tailwindcss/vite';
import { defineConfig } from 'vite';
export default defineConfig({
plugins: [
tailwindcss(),
sveltekit()
]
});

View file

@ -0,0 +1,7 @@
#!/usr/bin/env bash
set -e
docker buildx build --platform linux/amd64 --provenance=false \
-t viktorbarzin/f1-stream:v2.0.1 -t viktorbarzin/f1-stream:latest \
--push .
kubectl -n f1-stream rollout restart deployment f1-stream

View file

@ -6,15 +6,6 @@ variable "nfs_server" { type = string }
variable "discord_f1_guild_id" { type = string }
variable "discord_f1_channel_ids" { type = string }
# Image tag for the Forgejo-registry image. CI (.woodpecker.yml in
# viktor/f1-stream) builds + pushes `latest` and `<short-sha>`, then drives the
# rollout via `kubectl set image`. Keel stays enrolled as a redundant net, so
# the running tag is managed outside Terraform (see KEEL_IGNORE_IMAGE below).
variable "image_tag" {
type = string
default = "latest"
}
resource "kubernetes_namespace" "f1-stream" {
metadata {
name = "f1-stream"
@ -22,7 +13,7 @@ resource "kubernetes_namespace" "f1-stream" {
"istio-injection" : "disabled"
tier = local.tiers.aux
"chrome-service.viktorbarzin.me/client" = "true"
"keel.sh/enrolled" = "true"
"keel.sh/enrolled" = "true"
}
}
lifecycle {
@ -127,7 +118,7 @@ resource "kubernetes_deployment" "f1-stream" {
}
spec {
container {
image = "forgejo.viktorbarzin.me/viktor/f1-stream:${var.image_tag}"
image = "viktorbarzin/f1-stream:latest"
image_pull_policy = "Always"
name = "f1-stream"
resources {
@ -185,11 +176,6 @@ resource "kubernetes_deployment" "f1-stream" {
claim_name = module.nfs_data_host.claim_name
}
}
# Pull the (private) Forgejo-registry image. Kyverno syncs
# registry-credentials into every namespace.
image_pull_secrets {
name = "registry-credentials"
}
}
}
}