f1-stream: only show streams confirmed playable by headless browser

Cuts the stream list from 23 mostly-broken entries to ~6 confirmed-playable ones, and adds an iframe-stripping proxy so embed sources (hmembeds, etc.) load through our origin without X-Frame-Options / CSP / JS frame-buster blocks. Why: the previous list was dominated by Discord-shared news article URLs, hardcoded aggregator landing pages, and other non-stream URLs that all sat at is_live=true because embed streams skipped the health check entirely. Users could not tell which links would actually play. What: - backend/playback_verifier.py: new headless-Chromium verifier (Playwright) that polls each candidate stream for a codec-independent "playable" signal (hls.js MANIFEST_PARSED for m3u8; <video>/player div for embed). Replaces the unconditional is_live=True for embed streams in service.py. - backend/embed_proxy.py: new /embed and /embed-asset routes that fetch upstream embed pages, strip X-Frame-Options/CSP/Set-Cookie, and inject a <base href> + frame-buster-defeat <script> that locks down window.top, document.referrer, console.clear/table, and window.location so the hmembeds disable-devtool.js redirect-to-google trap can't fire. - extractors/curated.py: new always-on extractor with two known-good 24/7 hmembeds embeds (Sky Sports F1, DAZN F1) so the list isn't empty between race weekends. - extractors/__init__.py: register CuratedExtractor first; drop FallbackExtractor (its 10 aggregator landing-pages can't iframe-play). - extractors/discord_source.py: positive-match path filter (must look like /embed/, /stream, /watch, /live, /player, *.m3u8, *.php) plus expanded domain blocklist for news sites — was 10 noise URLs, now ~1. - extractors/service.py: run_extraction now health-checks AND verifier- checks both stream types; only verified-playable streams reach is_live. - main.py: register /embed + /embed-asset routes; defer initial extraction by 8s so the verifier can reach the local /embed proxy on 127.0.0.1:8000. - frontend/lib/api.js + watch/+page.svelte: route embed iframes through /embed proxy instead of the upstream URL, so X-Frame-Options/CSP can't block them. - Dockerfile: install Playwright chromium + system codec-runtime libs. - main.tf: bump pod memory 256Mi → 1Gi for chromium. Verified end-to-end with Playwright against https://f1.viktorbarzin.me/watch — 6/6 streams reach a player UI; the 3 demo m3u8s actually play (codec-bearing browser); the 3 embeds (Sky Sports F1, DAZN F1, sportsurge) render iframes through the proxy. Image: viktorbarzin/f1-stream:v6.0.5 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 21:00:07 +00:00 · 2026-05-06 21:00:07 +00:00 · f90d79ed4e
commit f90d79ed4e
parent 8b180f7662
15 changed files with 2128 additions and 22 deletions
--- a/stacks/f1-stream/files/Dockerfile
+++ b/stacks/f1-stream/files/Dockerfile
@ -14,9 +14,26 @@ FROM python:3.13-slim-bookworm
 WORKDIR /app
 # Headless Chromium runtime libs for the playback verifier. Listed inline
 # (instead of running `playwright install-deps`) so the image build doesn't
 # need root-network apt fetches at runtime.
 RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    libnss3 libnspr4 \
    libatk1.0-0 libatk-bridge2.0-0 libcups2 \
    libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 \
    libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 \
    libasound2 libatspi2.0-0 \
    fonts-liberation fonts-noto-color-emoji \
 && rm -rf /var/lib/apt/lists/*
 COPY backend/requirements.txt .
 RUN pip install --no-cache-dir -r requirements.txt
 # Install the Chromium browser binary used by the verifier. Skip
 # --with-deps because we already installed the system libs above.
 RUN playwright install chromium
 COPY backend/ ./backend/
 # Copy built frontend into the image
--- a/stacks/f1-stream/files/backend/embed_proxy.py
+++ b/stacks/f1-stream/files/backend/embed_proxy.py
@ -0,0 +1,302 @@
 """Embed iframe-stripping reverse proxy.
 Serves third-party embed pages (e.g. https://hmembeds.one/embed/{hash},
 https://pooembed.eu/embed/{slug}) through our origin so we can:
 1. Strip X-Frame-Options and Content-Security-Policy: frame-ancestors headers,
   so the embed loads in our <iframe> regardless of upstream policy.
 2. Inject <base> + a frame-buster-defeat <script> at the top of <head> so
   the embed's JS sees `window.top === window` and a plausible
   `document.referrer` pointing at the upstream origin.
 3. Forward Referer / User-Agent matching the upstream's own pages so
   the upstream's hotlink / origin-allowlist checks pass.
 Two endpoints:
 - GET /embed?url=<base64url> — the embed HTML page (rewritten).
 - GET /embed-asset?url=<base64url> — fallback for any subresource the
  upstream blocks based on hotlink protection. Most assets load directly
  via the injected <base> tag and bypass our proxy.
 """
 import logging
 import re
 from typing import AsyncGenerator
 from urllib.parse import urlparse
 import httpx
 from fastapi import HTTPException
 from backend.m3u8_rewriter import decode_url
 logger = logging.getLogger(__name__)
 EMBED_TIMEOUT = 20.0
 ASSET_TIMEOUT = 30.0
 RELAY_CHUNK_SIZE = 65536
 USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
 )
 # Response headers we never forward (they break frame embedding or leak upstream policy).
 STRIP_RESPONSE_HEADERS = {
    "x-frame-options",
    "content-security-policy",
    "content-security-policy-report-only",
    "set-cookie",
    "report-to",
    "nel",
    "permissions-policy",
    "cross-origin-opener-policy",
    "cross-origin-embedder-policy",
    "cross-origin-resource-policy",
    # let httpx/uvicorn re-set these
    "transfer-encoding",
    "content-encoding",
    "content-length",
    "connection",
 }
 # Inject this <script> at the top of <head> to defeat JS frame-busters.
 # - Locks window.top, window.parent, and window.self to the embed window
 #   itself, so `self !== window.top` checks pass.
 # - Forces document.referrer to the upstream origin so allowlist checks
 #   like `document.referrer.includes("timstreams.net")` keep working.
 # - No-ops anything that would call window.parent.location or attempt to
 #   reload the top frame.
 _FRAME_BUSTER_DEFEAT_TEMPLATE = """
 <script>(function(){{
  try {{
    var fakeWindow = window;
    Object.defineProperty(window, 'top', {{get: function(){{return fakeWindow;}}, configurable: false}});
    Object.defineProperty(window, 'parent', {{get: function(){{return fakeWindow;}}, configurable: false}});
    Object.defineProperty(window, 'frameElement', {{get: function(){{return null;}}, configurable: false}});
    Object.defineProperty(document, 'referrer', {{get: function(){{return {referrer!r};}}, configurable: false}});
  }} catch (e) {{}}
  // Defeat the `disable-devtool.js` redirect trap that hmembeds and similar
  // embed hosts use. The trap fires `console.clear`/`console.table` in a
  // tight loop, then if it thinks DevTools is open, calls
  // `window.location = "https://www.google.com"`. We block those redirect
  // sinks while leaving normal playback unaffected.
  try {{
    var noop = function(){{}};
    console.clear = noop;
    console.table = noop;
    console.dir = noop;
    var loc = window.location;
    Object.defineProperty(window, 'location', {{
      get: function(){{ return loc; }},
      set: function(v){{ /* swallow assignment */ }},
      configurable: false,
    }});
    var origAssign = loc.assign && loc.assign.bind(loc);
    var origReplace = loc.replace && loc.replace.bind(loc);
    loc.assign = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origAssign) origAssign(u); }};
    loc.replace = function(u){{ if (typeof u === 'string' && u.indexOf('google.com') !== -1) return; if (origReplace) origReplace(u); }};
  }} catch (e) {{}}
 }})();</script>
 """
 def _decode(encoded_url: str) -> str:
    try:
        return decode_url(encoded_url)
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Invalid encoded URL: {e}")
 def _filter_headers(upstream_headers: httpx.Headers) -> dict[str, str]:
    """Forward upstream headers minus the ones we strip."""
    out: dict[str, str] = {}
    for k, v in upstream_headers.items():
        if k.lower() in STRIP_RESPONSE_HEADERS:
            continue
        out[k] = v
    # Always allow our domain to embed and load cross-origin
    out["Access-Control-Allow-Origin"] = "*"
    out["X-Frame-Options-Stripped"] = "by-f1-embed-proxy"
    return out
 def _make_referer(upstream_url: str) -> str:
    """Build a plausible Referer header — the upstream's own root."""
    parsed = urlparse(upstream_url)
    return f"{parsed.scheme}://{parsed.netloc}/"
 def _make_origin(upstream_url: str) -> str:
    parsed = urlparse(upstream_url)
    return f"{parsed.scheme}://{parsed.netloc}"
 def _inject_into_head(html: str, upstream_url: str) -> str:
    """Inject <base> tag + frame-buster defeat script into the response HTML."""
    parsed = urlparse(upstream_url)
    base_href = f"{parsed.scheme}://{parsed.netloc}/"
    # The frame-buster-defeat script. Use the upstream's own URL as the spoofed referrer.
    busted = _FRAME_BUSTER_DEFEAT_TEMPLATE.format(referrer=upstream_url)
    base_tag = f'<base href="{base_href}">'
    injection = base_tag + busted
    # Drop any inline CSP <meta> tags first so they can't override our header strip.
    html = re.sub(
        r'<meta[^>]+http-equiv=[\'"]?Content-Security-Policy[\'"]?[^>]*>',
        "",
        html,
        flags=re.IGNORECASE,
    )
    # Strip disable-devtool.js script tags. The library runs detection heuristics
    # and redirects on match. Removing it reduces attack surface even with our
    # location-setter lockdown — saves redundant work and one fewer thing to
    # bypass in case the lockdown misses an edge case.
    html = re.sub(
        r'<script[^>]+(?:disable-devtool|devtool|disabledevtool)[^<]*</script>',
        "",
        html,
        flags=re.IGNORECASE,
    )
    html = re.sub(
        r'<script[^>]+src=["\'][^"\']*disable-devtool[^"\']*["\'][^>]*></script>',
        "",
        html,
        flags=re.IGNORECASE,
    )
    # Insert immediately after the opening <head> (case-insensitive).
    head_match = re.search(r"<head[^>]*>", html, flags=re.IGNORECASE)
    if head_match:
        idx = head_match.end()
        return html[:idx] + injection + html[idx:]
    # No <head> — prepend at the start of the document so the script runs first.
    return injection + html
 def _looks_blocked_by_anti_bot(content: str) -> bool:
    """Detect Cloudflare-style challenge interstitials in the upstream body."""
    sample = content[:4096].lower()
    markers = (
        "cf-chl-bypass",
        "checking your browser",
        "just a moment",
        "attention required",
        "cf-browser-verification",
    )
    return any(m in sample for m in markers)
 async def fetch_embed(encoded_url: str) -> tuple[bytes, dict[str, str], int]:
    """Fetch an upstream embed page, rewrite the HTML, and return the response.
    Returns: (body_bytes, headers_dict, status_code).
    Raises HTTPException on transport errors.
    """
    url = _decode(encoded_url)
    logger.info("Embed-proxying: %s", url)
    upstream_headers = {
        "User-Agent": USER_AGENT,
        "Referer": _make_referer(url),
        "Origin": _make_origin(url),
        "Accept": (
            "text/html,application/xhtml+xml,application/xml;q=0.9,"
            "image/avif,image/webp,*/*;q=0.8"
        ),
        "Accept-Language": "en-US,en;q=0.9",
    }
    try:
        async with httpx.AsyncClient(
            timeout=EMBED_TIMEOUT,
            follow_redirects=True,
        ) as client:
            response = await client.get(url, headers=upstream_headers)
    except httpx.TimeoutException:
        raise HTTPException(status_code=504, detail="Upstream embed timeout")
    except httpx.HTTPError as e:
        raise HTTPException(status_code=502, detail=f"Upstream embed error: {e}")
    status_code = response.status_code
    upstream_ct = response.headers.get("content-type", "")
    headers_out = _filter_headers(response.headers)
    body = response.content
    # Detect Cloudflare-style challenge so the frontend can show a clear error.
    if "html" in upstream_ct.lower():
        text = response.text
        if _looks_blocked_by_anti_bot(text):
            logger.warning("Upstream returned anti-bot challenge: %s", url)
            raise HTTPException(
                status_code=502,
                detail="Upstream returned anti-bot challenge — proxy cannot bypass",
            )
        rewritten = _inject_into_head(text, url)
        body = rewritten.encode("utf-8")
        headers_out["Content-Type"] = "text/html; charset=utf-8"
    return body, headers_out, status_code
 async def relay_asset(
    encoded_url: str, range_header: str | None
 ) -> tuple[AsyncGenerator[bytes, None], dict[str, str], int]:
    """Relay an upstream subresource (JS/CSS/image/font) as a chunked stream.
    Used as a fallback when an upstream blocks hotlinked assets via Referer
    or Origin checks. The injected <base> tag handles most of these cases
    by letting the browser hit upstream directly — the relay is only for
    the awkward few that need a proxied origin.
    """
    url = _decode(encoded_url)
    logger.debug("Embed-asset relay: %s", url)
    headers = {
        "User-Agent": USER_AGENT,
        "Referer": _make_referer(url),
        "Origin": _make_origin(url),
        "Accept": "*/*",
    }
    if range_header:
        headers["Range"] = range_header
    client = httpx.AsyncClient(timeout=ASSET_TIMEOUT, follow_redirects=True)
    try:
        response = await client.send(
            client.build_request("GET", url, headers=headers),
            stream=True,
        )
    except httpx.TimeoutException:
        await client.aclose()
        raise HTTPException(status_code=504, detail="Upstream asset timeout")
    except httpx.HTTPError as e:
        await client.aclose()
        raise HTTPException(status_code=502, detail=f"Upstream asset error: {e}")
    if response.status_code >= 400:
        await response.aclose()
        await client.aclose()
        raise HTTPException(
            status_code=502,
            detail=f"Upstream asset returned HTTP {response.status_code}",
        )
    headers_out = _filter_headers(response.headers)
    async def _stream() -> AsyncGenerator[bytes, None]:
        try:
            async for chunk in response.aiter_bytes(chunk_size=RELAY_CHUNK_SIZE):
                yield chunk
        finally:
            await response.aclose()
            await client.aclose()
    return _stream(), headers_out, response.status_code
--- a/stacks/f1-stream/files/backend/extractors/init.py
+++ b/stacks/f1-stream/files/backend/extractors/init.py
@ -12,12 +12,17 @@ Example:
 """
 from backend.extractors.aceztrims import AceztrimsExtractor
 from backend.extractors.curated import CuratedExtractor
 from backend.extractors.daddylive import DaddyLiveExtractor
 from backend.extractors.demo import DemoExtractor
 from backend.extractors.discord_source import DiscordExtractor
 from backend.extractors.models import ExtractedStream
 from backend.extractors.pitsport import PitsportExtractor
 from backend.extractors.ppv import PPVExtractor
 from backend.extractors.registry import ExtractorRegistry
 from backend.extractors.service import ExtractionService
 from backend.extractors.streamed import StreamedExtractor
 from backend.extractors.timstreams import TimStreamsExtractor
 __all__ = [
    "ExtractedStream",
@ -36,10 +41,20 @@ def create_registry() -> ExtractorRegistry:
    registry = ExtractorRegistry()
    # --- Register extractors below ---
    # CuratedExtractor returns hand-picked 24/7 channels first so we always
    # have something. FallbackExtractor was removed — it surfaced aggregator
    # landing pages that don't play directly in an iframe (they require
    # user navigation through the page) and dominated the list with
    # entries that fail browser-based playback verification.
    registry.register(CuratedExtractor())
    registry.register(DemoExtractor())
    registry.register(StreamedExtractor())
    registry.register(DaddyLiveExtractor())
    registry.register(AceztrimsExtractor())
    registry.register(PitsportExtractor())
    registry.register(PPVExtractor())
    registry.register(TimStreamsExtractor())
    registry.register(DiscordExtractor())
    return registry
--- a/stacks/f1-stream/files/backend/extractors/curated.py
+++ b/stacks/f1-stream/files/backend/extractors/curated.py
@ -0,0 +1,61 @@
 """Curated extractor — known-good 24/7 F1 channels via direct embed URLs.
 Returns a small, hand-picked list of embed URLs that are reliable enough to
 be served as fallback "always-on" streams when the dynamic extractors find
 nothing (e.g. between race weekends, when API providers are down).
 These are direct embed URLs. The frontend routes them through /embed so the
 iframe-stripping proxy bypasses any frame-buster JS in the upstream player.
 """
 import logging
 from backend.extractors.base import BaseExtractor
 from backend.extractors.models import ExtractedStream
 logger = logging.getLogger(__name__)
 # Curated list. Each entry is a known direct embed URL. These were sourced
 # from the timstreams.py ALWAYS_INCLUDE_HASHES list (Sky Sports F1, DAZN F1)
 # and are documented as 24/7 channels that play F1 content year-round.
 _CURATED_STREAMS = [
    {
        "url": "https://hmembeds.one/embed/888520f36cd94c5da4c71fddc1a5fc9b",
        "title": "Sky Sports F1 (24/7)",
        "quality": "HD",
    },
    {
        "url": "https://hmembeds.one/embed/fc3a54634d0867b0c02ee3223292e7c6",
        "title": "DAZN F1 (24/7)",
        "quality": "HD",
    },
 ]
 class CuratedExtractor(BaseExtractor):
    """Returns curated known-good 24/7 F1 channel embed URLs."""
    @property
    def site_key(self) -> str:
        return "curated"
    @property
    def site_name(self) -> str:
        return "Curated 24/7 Channels"
    async def extract(self) -> list[ExtractedStream]:
        streams = [
            ExtractedStream(
                url=entry["url"],
                site_key=self.site_key,
                site_name=self.site_name,
                quality=entry["quality"],
                title=entry["title"],
                stream_type="embed",
                embed_url=entry["url"],
            )
            for entry in _CURATED_STREAMS
        ]
        logger.info("[curated] Returning %d curated stream(s)", len(streams))
        return streams
--- a/stacks/f1-stream/files/backend/extractors/discord_source.py
+++ b/stacks/f1-stream/files/backend/extractors/discord_source.py
@ -0,0 +1,203 @@
 """Discord extractor - monitors Discord channels for F1 stream links.
 Reads recent messages from configured Discord channels using a user token,
 extracts URLs that look like stream links, and returns them as embed streams.
 """
 import logging
 import os
 import re
 import httpx
 from backend.extractors.base import BaseExtractor
 from backend.extractors.models import ExtractedStream
 logger = logging.getLogger(__name__)
 DISCORD_API = "https://discord.com/api/v9"
 DISCORD_TOKEN = os.getenv("DISCORD_TOKEN", "")
 # Comma-separated channel IDs to monitor
 DISCORD_CHANNELS = os.getenv("DISCORD_CHANNELS", "").split(",")
 # How many messages to fetch per channel
 MESSAGE_LIMIT = 50
 USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
 # URL pattern to match stream links (exclude Discord CDN, images, etc.)
 URL_PATTERN = re.compile(r"https?://[^\s<>\)\]\"']+", re.IGNORECASE)
 # Domains that publish news/articles, not playable streams. Discord users share
 # these links during race weekends; they are NOT streams and pollute the list.
 EXCLUDED_DOMAINS = {
    "discord.com", "discord.gg", "cdn.discordapp.com",
    "tenor.com", "giphy.com", "imgur.com",
    "youtube.com", "youtu.be", "twitter.com", "x.com",
    "reddit.com", "instagram.com", "tiktok.com",
    "fmhy.net", "github.com", "freemotorsports.com",
    # News / official sites — never playable embeds
    "formula1.com", "fia.com", "skysports.com", "motorsport.com",
    "driverdb.com", "autosport.com", "the-race.com", "racefans.net",
    "wikipedia.org", "fantasy.formula1.com",
 }
 # A URL is treated as a candidate stream embed only if its path looks like
 # a stream/embed/player route. This catches /embed/{id}, /stream/{id},
 # /watch/{id}, /live/{slug}, /player/{...} and similar — and rejects
 # /article/, /news/, /latest/, /join/, etc.
 _PATH_KEYWORDS = (
    "embed/", "/stream", "/streams", "/watch", "/live",
    "/player", "/play/", "/sky", "/f1/", "/formula",
    "/grand-prix", "/gp/", "/channel", ".m3u8", ".php",
 )
 def _is_stream_url(url: str) -> bool:
    """Heuristic: does this URL look like an actual stream/embed/player link?
    Discord users share lots of news links during race weekends. The old
    filter only blocked specific domains and let everything else through,
    which produced a stream list dominated by formula1.com news articles.
    The new filter is positive-match: a URL must contain at least one
    stream-shaped path keyword to be included.
    """
    from urllib.parse import urlparse
    try:
        parsed = urlparse(url)
        domain = parsed.netloc.lower()
        path = parsed.path.lower()
    except Exception:
        return False
    if not domain:
        return False
    for excluded in EXCLUDED_DOMAINS:
        if excluded in domain:
            return False
    if any(path.endswith(ext) for ext in (".png", ".jpg", ".jpeg", ".gif", ".webp", ".mp4", ".webm", ".svg", ".css", ".js")):
        return False
    full = path + ("?" + parsed.query if parsed.query else "")
    if not any(kw in full for kw in _PATH_KEYWORDS):
        return False
    return True
 class DiscordExtractor(BaseExtractor):
    """Extracts stream links from Discord channel messages.
    Monitors configured Discord channels for URLs shared by users,
    filters to likely stream links, and returns them as embed streams.
    """
    @property
    def site_key(self) -> str:
        return "discord"
    @property
    def site_name(self) -> str:
        return "Discord Community"
    async def extract(self) -> list[ExtractedStream]:
        """Fetch recent messages from Discord channels and extract URLs."""
        if not DISCORD_TOKEN:
            logger.info("[discord] No DISCORD_TOKEN set, skipping")
            return []
        channels = [c.strip() for c in DISCORD_CHANNELS if c.strip()]
        if not channels:
            logger.info("[discord] No DISCORD_CHANNELS configured, skipping")
            return []
        streams: list[ExtractedStream] = []
        seen_urls: set[str] = set()
        try:
            async with httpx.AsyncClient(
                timeout=15.0,
                follow_redirects=True,
                headers={
                    "Authorization": DISCORD_TOKEN,
                    "User-Agent": USER_AGENT,
                },
            ) as client:
                for channel_id in channels:
                    try:
                        channel_streams = await self._fetch_channel(
                            client, channel_id, seen_urls
                        )
                        streams.extend(channel_streams)
                    except Exception:
                        logger.debug(
                            "[discord] Failed to fetch channel %s",
                            channel_id,
                            exc_info=True,
                        )
        except Exception:
            logger.exception("[discord] Failed to connect to Discord API")
        logger.info("[discord] Extracted %d stream(s) from %d channel(s)", len(streams), len(channels))
        return streams
    async def _fetch_channel(
        self,
        client: httpx.AsyncClient,
        channel_id: str,
        seen_urls: set[str],
    ) -> list[ExtractedStream]:
        """Fetch messages from a single channel and extract stream URLs."""
        resp = await client.get(
            f"{DISCORD_API}/channels/{channel_id}/messages",
            params={"limit": MESSAGE_LIMIT},
        )
        if resp.status_code != 200:
            logger.warning(
                "[discord] Channel %s returned HTTP %d", channel_id, resp.status_code
            )
            return []
        messages = resp.json()
        if not isinstance(messages, list):
            return []
        streams: list[ExtractedStream] = []
        for msg in messages:
            content = msg.get("content", "")
            author = msg.get("author", {}).get("username", "unknown")
            # Extract URLs from message content
            urls = URL_PATTERN.findall(content)
            # Also check embeds
            for embed in msg.get("embeds", []):
                if embed.get("url"):
                    urls.append(embed["url"])
            for url in urls:
                # Clean trailing punctuation
                url = url.rstrip(".,;:!?)")
                if url in seen_urls:
                    continue
                if not _is_stream_url(url):
                    continue
                seen_urls.add(url)
                streams.append(
                    ExtractedStream(
                        url=url,
                        site_key=self.site_key,
                        site_name=self.site_name,
                        quality="",
                        title=f"Shared by {author}",
                        stream_type="embed",
                        embed_url=url,
                    )
                )
        return streams
--- a/stacks/f1-stream/files/backend/extractors/pitsport.py
+++ b/stacks/f1-stream/files/backend/extractors/pitsport.py
@ -0,0 +1,510 @@
 """Pitsport.xyz extractor - fetches F1 streams from the Next.js RSC payload.
 Architecture:
 - Main page (pitsport.xyz) has a "Live Now" section with event cards containing
  category, title, time, imageUrl props and /watch/{UUID} links.
 - Schedule page (pitsport.xyz/schedule) lists all events grouped by category
  (h2 headings) with /watch/{UUID} links and event titles.
 - Watch pages (/watch/{UUID}) embed iframes from pushembdz.store/embed/{EMBED_UUID}.
 - Embed pages contain an RSC payload with a stream config: {title, link, method}.
 - When method is "player" or "hls", the link field points to a serveplay.site
  m3u8 playlist. Otherwise we return the embed URL for iframe playback.
 """
 import logging
 import re
 from dataclasses import dataclass
 import httpx
 from backend.extractors.base import BaseExtractor
 from backend.extractors.models import ExtractedStream
 logger = logging.getLogger(__name__)
 PITSPORT_BASE = "https://pitsport.xyz"
 EMBED_BASE = "https://pushembdz.store"
 USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
 )
 # Categories to include (case-insensitive match)
 F1_CATEGORIES = {"formula 1", "formula 2", "formula 3"}
 # Fallback keyword matching on combined category+title for edge cases
 F1_KEYWORDS = {"formula 1", "formula one", "f1"}
 GP_KEYWORD = "grand prix"
 NON_F1_KEYWORDS = {
    "motogp", "moto gp", "moto2", "moto3", "motoe", "indycar",
    "indy car", "firestone", "nascar", "rally", "wrc", "wec",
    "lemans", "le mans", "superbike", "dtm", "supercars", "arca",
    "xfinity", "trucks", "super formula", "supergt", "super gt",
    "ama supercross", "supercross",
 }
@dataclass
 class _PitsportEvent:
    """An event discovered from the Pitsport site."""
    category: str
    title: str
    watch_uuid: str
 def _is_f1_category(category: str) -> bool:
    """Check if a category string matches an F1-related series."""
    return category.strip().lower() in F1_CATEGORIES
 def _is_f1_event(category: str, title: str) -> bool:
    """Check if an event is Formula 1 related by category or title keywords."""
    # Primary check: exact category match
    if _is_f1_category(category):
        return True
    # Secondary check: keyword matching on combined text
    lower = f"{category} {title}".lower()
    if any(kw in lower for kw in NON_F1_KEYWORDS):
        return False
    if any(kw in lower for kw in F1_KEYWORDS):
        return True
    if GP_KEYWORD in lower:
        return True
    return False
 def _parse_live_events(html: str) -> list[_PitsportEvent]:
    """Parse live events from the main page RSC payload.
    The main page contains event cards with props:
        category, title, time, imageUrl
    wrapped in <a href="/watch/{UUID}"> links.
    """
    events: list[_PitsportEvent] = []
    # Match event cards in the RSC payload - they appear as JSON-like structures
    # Pattern: href="/watch/UUID" ... category":"...", "title":"..."
    # In the RSC payload, the data is in the format:
    #   ["$","$L2","/watch/UUID",{"href":"/watch/UUID","children":["$","$L10",null,
    #     {"category":"...","title":"...","time":...,"imageUrl":"..."}]}]
    pattern = re.compile(
        r'"href":"(/watch/([0-9a-f-]{36}))"[^}]*?"category":"([^"]+)","title":"([^"]+)"',
    )
    for match in pattern.finditer(html):
        _, uuid, category, title = match.groups()
        events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
    return events
 def _parse_schedule_events(html: str) -> list[_PitsportEvent]:
    """Parse events from the schedule page.
    The schedule page groups events under category headers (h2 elements).
    In the rendered HTML:
        <h2 ...>Formula 1</h2>
        <div ...>
            <a href="/watch/UUID">...</a>
            ...
        </div>
    In the RSC payload, similar structure with section divs containing
    a category h2 and child event links with titles.
    """
    events: list[_PitsportEvent] = []
    # Strategy 1: Parse from rendered HTML
    # Find category sections: >CategoryName</h2> followed by watch links
    # Split HTML at each category header
    section_pattern = re.compile(
        r'>([^<]+)</h2>\s*<div[^>]*class="flex flex-wrap gap-6">(.*?)(?=</div>\s*</div>\s*(?:<div|</div>|$))',
        re.DOTALL,
    )
    for section_match in section_pattern.finditer(html):
        category = section_match.group(1).strip()
        section_html = section_match.group(2)
        # Find all watch links in this section
        link_pattern = re.compile(
            r'href="/watch/([0-9a-f-]{36})".*?<h1[^>]*>([^<]+)</h1>',
            re.DOTALL,
        )
        for link_match in link_pattern.finditer(section_html):
            uuid = link_match.group(1)
            title = link_match.group(2).strip()
            events.append(
                _PitsportEvent(category=category, title=title, watch_uuid=uuid)
            )
    # Strategy 2: Parse from RSC payload if rendered HTML didn't yield results
    # The RSC payload has patterns like:
    #   "children":"Formula 1"}] ... "/watch/UUID" ... "title":"EventTitle"
    if not events:
        events = _parse_schedule_rsc(html)
    return events
 def _parse_schedule_rsc(html: str) -> list[_PitsportEvent]:
    """Parse events from schedule page RSC payload as fallback.
    Extracts category section divs from the RSC JSON structure.
    """
    events: list[_PitsportEvent] = []
    # Find the RSC payload chunks
    rsc_chunks = re.findall(
        r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL
    )
    if not rsc_chunks:
        return events
    # Concatenate and unescape
    full_payload = ""
    for chunk in rsc_chunks:
        try:
            full_payload += chunk.encode().decode("unicode_escape")
        except Exception:
            full_payload += chunk
    # Find category sections in the RSC data
    # Pattern: "children":"CategoryName"}],["$","div",...watch links...
    # Each section div contains an h2 with the category name and watch links
    cat_pattern = re.compile(
        r'border-gray-700 pb-2","children":"([^"]+)"\}.*?'
        r'(?=border-gray-700 pb-2","children"|$)',
        re.DOTALL,
    )
    for cat_match in cat_pattern.finditer(full_payload):
        category = cat_match.group(1)
        section_text = cat_match.group(0)
        # Find watch UUIDs and titles in this section
        # Pattern: "/watch/UUID" ... "title":"EventTitle"
        event_pattern = re.compile(
            r'/watch/([0-9a-f-]{36}).*?"title":"([^"]+)"',
        )
        for ev_match in event_pattern.finditer(section_text):
            uuid = ev_match.group(1)
            title = ev_match.group(2)
            events.append(
                _PitsportEvent(category=category, title=title, watch_uuid=uuid)
            )
    return events
 def _parse_embed_uuids(html: str) -> list[str]:
    """Extract embed UUIDs from a watch page.
    Watch pages contain iframes like:
        <iframe src="https://pushembdz.store/embed/{EMBED_UUID}" ...>
    And in the RSC payload:
        "iframe":"https://pushembdz.store/embed/{EMBED_UUID}"
    """
    uuids: list[str] = []
    # From rendered HTML
    iframe_pattern = re.compile(
        r'pushembdz\.store/embed/([0-9a-f-]{36})',
    )
    for match in iframe_pattern.finditer(html):
        uuid = match.group(1)
        if uuid not in uuids:
            uuids.append(uuid)
    return uuids
@dataclass
 class _StreamConfig:
    """Stream configuration extracted from an embed page."""
    title: str
    link: str
    method: str
 def _parse_stream_config(html: str) -> _StreamConfig | None:
    """Extract stream config from an embed page RSC payload.
    The embed page contains an RSC payload line like:
        4:["$","$Ld",null,{"stream":{"title":"...","link":"...","method":"player"},
           "error":null,"slug":"..."}]
    """
    # Try matching the escaped RSC payload pattern
    pattern = re.compile(
        r'"stream":\{["\']?\\?"title\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
        r'["\']?\\?"link\\?"["\']?:["\']?\\?"([^"\\]+)\\?"["\']?,'
        r'["\']?\\?"method\\?"["\']?:["\']?\\?"([^"\\]+)\\?"',
    )
    match = pattern.search(html)
    if match:
        return _StreamConfig(
            title=match.group(1),
            link=match.group(2),
            method=match.group(3),
        )
    # Simpler pattern for double-escaped payload
    pattern2 = re.compile(
        r'\\?"stream\\?":\{\\?"title\\?":\\?"([^\\]+)\\?",'
        r'\\?"link\\?":\\?"([^\\]+)\\?",'
        r'\\?"method\\?":\\?"([^\\]+)\\?"',
    )
    match = pattern2.search(html)
    if match:
        return _StreamConfig(
            title=match.group(1),
            link=match.group(2),
            method=match.group(3),
        )
    # Most lenient: just find the three fields near each other
    pattern3 = re.compile(
        r'"stream"\s*:\s*\{\s*"title"\s*:\s*"([^"]+)"\s*,'
        r'\s*"link"\s*:\s*"([^"]+)"\s*,'
        r'\s*"method"\s*:\s*"([^"]+)"',
    )
    match = pattern3.search(html)
    if match:
        return _StreamConfig(
            title=match.group(1),
            link=match.group(2),
            method=match.group(3),
        )
    return None
 def _is_m3u8_method(method: str) -> bool:
    """Check if the stream method indicates a direct HLS stream."""
    return method.lower() in ("player", "hls")
 def _extract_m3u8_url(link: str) -> str:
    """Convert a serveplay.site player URL to an m3u8 playlist URL.
    Input:  https://dash.serveplay.site/{channel}/index.html
    Output: https://dash.serveplay.site/{channel}/index.html
    The index.html IS the m3u8 playlist (served with proper content-type
    when fetched with the correct Referer header).
    """
    return link
 class PitsportExtractor(BaseExtractor):
    """Extracts F1 streams from Pitsport.xyz.
    Scrapes the Next.js RSC payload from the main page and schedule page
    to find F1 events, then resolves embed UUIDs to stream configurations.
    """
    @property
    def site_key(self) -> str:
        return "pitsport"
    @property
    def site_name(self) -> str:
        return "Pitsport"
    async def extract(self) -> list[ExtractedStream]:
        """Fetch F1 events and return stream URLs or embed URLs."""
        streams: list[ExtractedStream] = []
        try:
            async with httpx.AsyncClient(
                timeout=20.0,
                follow_redirects=True,
                headers={"User-Agent": USER_AGENT},
            ) as client:
                # Fetch both pages to get comprehensive event data
                events = await self._discover_events(client)
                logger.info(
                    "[pitsport] Found %d F1 event(s) to process", len(events)
                )
                # Deduplicate by watch UUID
                seen_uuids: set[str] = set()
                unique_events: list[_PitsportEvent] = []
                for ev in events:
                    if ev.watch_uuid not in seen_uuids:
                        seen_uuids.add(ev.watch_uuid)
                        unique_events.append(ev)
                # For each event, resolve streams
                for event in unique_events:
                    event_streams = await self._resolve_event_streams(
                        client, event
                    )
                    streams.extend(event_streams)
        except Exception:
            logger.exception("[pitsport] Failed to extract streams")
        logger.info("[pitsport] Extracted %d stream(s)", len(streams))
        return streams
    async def _discover_events(
        self, client: httpx.AsyncClient
    ) -> list[_PitsportEvent]:
        """Discover F1 events from both main page and schedule page."""
        all_events: list[_PitsportEvent] = []
        # Fetch main page for live events
        try:
            resp = await client.get(PITSPORT_BASE)
            if resp.status_code == 200:
                live_events = _parse_live_events(resp.text)
                logger.info(
                    "[pitsport] Main page: %d live event(s)", len(live_events)
                )
                for ev in live_events:
                    if _is_f1_event(ev.category, ev.title):
                        all_events.append(ev)
            else:
                logger.warning(
                    "[pitsport] Main page returned HTTP %d", resp.status_code
                )
        except Exception:
            logger.exception("[pitsport] Failed to fetch main page")
        # Fetch schedule page for upcoming events
        try:
            resp = await client.get(f"{PITSPORT_BASE}/schedule")
            if resp.status_code == 200:
                schedule_events = _parse_schedule_events(resp.text)
                logger.info(
                    "[pitsport] Schedule page: %d total event(s)",
                    len(schedule_events),
                )
                for ev in schedule_events:
                    if _is_f1_event(ev.category, ev.title):
                        all_events.append(ev)
            else:
                logger.warning(
                    "[pitsport] Schedule page returned HTTP %d",
                    resp.status_code,
                )
        except Exception:
            logger.exception("[pitsport] Failed to fetch schedule page")
        return all_events
    async def _resolve_event_streams(
        self, client: httpx.AsyncClient, event: _PitsportEvent
    ) -> list[ExtractedStream]:
        """Resolve an event's watch page to actual stream URLs."""
        streams: list[ExtractedStream] = []
        try:
            # Fetch the watch page to get embed UUIDs
            watch_url = f"{PITSPORT_BASE}/watch/{event.watch_uuid}"
            resp = await client.get(watch_url)
            if resp.status_code != 200:
                logger.debug(
                    "[pitsport] Watch page %s returned HTTP %d",
                    event.watch_uuid,
                    resp.status_code,
                )
                return []
            embed_uuids = _parse_embed_uuids(resp.text)
            if not embed_uuids:
                logger.debug(
                    "[pitsport] No embed UUIDs found for %s", event.watch_uuid
                )
                return []
            logger.debug(
                "[pitsport] Event '%s' has %d embed(s)",
                event.title,
                len(embed_uuids),
            )
            # Resolve each embed to a stream config
            for i, embed_uuid in enumerate(embed_uuids):
                stream = await self._resolve_embed(
                    client, embed_uuid, event, stream_num=i + 1
                )
                if stream:
                    streams.append(stream)
        except Exception:
            logger.debug(
                "[pitsport] Failed to resolve event %s",
                event.watch_uuid,
                exc_info=True,
            )
        return streams
    async def _resolve_embed(
        self,
        client: httpx.AsyncClient,
        embed_uuid: str,
        event: _PitsportEvent,
        stream_num: int,
    ) -> ExtractedStream | None:
        """Resolve an embed UUID to a stream configuration."""
        try:
            embed_url = f"{EMBED_BASE}/embed/{embed_uuid}"
            resp = await client.get(embed_url)
            if resp.status_code != 200:
                logger.debug(
                    "[pitsport] Embed page %s returned HTTP %d",
                    embed_uuid,
                    resp.status_code,
                )
                return None
            config = _parse_stream_config(resp.text)
            if not config:
                logger.debug(
                    "[pitsport] No stream config found in embed %s",
                    embed_uuid,
                )
                return None
            # Build the stream title
            stream_title = f"{event.category} - {event.title}"
            if config.title:
                stream_title += f" ({config.title})"
            if stream_num > 1:
                stream_title += f" #{stream_num}"
            if _is_m3u8_method(config.method) and "serveplay.site" in config.link:
                # Direct m3u8 stream
                m3u8_url = _extract_m3u8_url(config.link)
                return ExtractedStream(
                    url=m3u8_url,
                    site_key=self.site_key,
                    site_name=self.site_name,
                    quality="",
                    title=stream_title,
                    stream_type="m3u8",
                )
            else:
                # Iframe embed fallback
                return ExtractedStream(
                    url=embed_url,
                    site_key=self.site_key,
                    site_name=self.site_name,
                    quality="",
                    title=stream_title,
                    stream_type="embed",
                    embed_url=embed_url,
                )
        except Exception:
            logger.debug(
                "[pitsport] Failed to resolve embed %s",
                embed_uuid,
                exc_info=True,
            )
            return None
--- a/stacks/f1-stream/files/backend/extractors/ppv.py
+++ b/stacks/f1-stream/files/backend/extractors/ppv.py
@ -0,0 +1,270 @@
 """PPV.to extractor - fetches F1 streams via the public PPV API.
 Returns embed URLs (pooembed.eu) for iframe playback.
 The API at api.ppv.to/api/streams requires no authentication.
 Falls back to api.ppv.st if the primary API is unreachable.
 """
 import logging
 import httpx
 from backend.extractors.base import BaseExtractor
 from backend.extractors.models import ExtractedStream
 logger = logging.getLogger(__name__)
 PRIMARY_API = "https://api.ppv.to/api/streams"
 FALLBACK_API = "https://api.ppv.st/api/streams"
 EMBED_BASE = "https://pooembed.eu/embed"
 USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
 )
 # Category name for motorsport on PPV.to
 MOTORSPORT_CATEGORY = "motorsports"
 # Only include events matching these keywords (case-insensitive)
 F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1"}
 # Grand Prix is shared with MotoGP/IndyCar — only match if no other series keywords
 GP_KEYWORD = "grand prix"
 NON_F1_KEYWORDS = {
    "motogp", "moto gp", "moto2", "moto3", "motoe",
    "indycar", "indy car", "firestone", "nascar",
    "rally", "wrc", "wec", "lemans", "le mans",
    "superbike", "dtm", "supercars",
 }
 def _is_f1_stream(name: str, category_name: str = "") -> bool:
    """Check if a stream is Formula 1 related.
    Checks both the stream name and the category name.
    A stream qualifies if:
    - It is in the motorsport category AND matches F1 keywords, OR
    - It matches F1 keywords regardless of category.
    """
    lower_name = name.lower()
    lower_cat = category_name.lower()
    # Reject if it contains non-F1 motorsport keywords
    if any(kw in lower_name for kw in NON_F1_KEYWORDS):
        return False
    # Direct F1 keyword match in the stream name
    if any(kw in lower_name for kw in F1_KEYWORDS):
        return True
    # "grand prix" in the name, only if in motorsports category and no non-F1 keywords
    if GP_KEYWORD in lower_name and MOTORSPORT_CATEGORY in lower_cat:
        return True
    # If the category is motorsport, also check category-level keywords
    if MOTORSPORT_CATEGORY in lower_cat and any(kw in lower_cat for kw in F1_KEYWORDS):
        return True
    return False
 class PPVExtractor(BaseExtractor):
    """Extracts embed URLs from PPV.to's public JSON API.
    Uses the endpoint:
    - GET https://api.ppv.to/api/streams -> all streams grouped by category
    - Fallback: https://api.ppv.st/api/streams
    Each stream object contains an `iframe` field with the embed URL,
    or a `uri_name` from which the embed URL can be constructed.
    """
    @property
    def site_key(self) -> str:
        return "ppv"
    @property
    def site_name(self) -> str:
        return "PPV.to"
    async def _fetch_streams(self, client: httpx.AsyncClient) -> dict | None:
        """Try primary and fallback APIs, return parsed JSON or None."""
        for api_url in (PRIMARY_API, FALLBACK_API):
            try:
                resp = await client.get(api_url)
                if resp.status_code == 200:
                    data = resp.json()
                    logger.info("[ppv] Fetched streams from %s", api_url)
                    return data
                logger.warning(
                    "[ppv] %s returned HTTP %d", api_url, resp.status_code
                )
            except Exception:
                logger.debug(
                    "[ppv] Failed to reach %s", api_url, exc_info=True
                )
        return None
    async def extract(self) -> list[ExtractedStream]:
        """Fetch F1 streams and return embed URLs for iframe playback."""
        streams: list[ExtractedStream] = []
        try:
            async with httpx.AsyncClient(
                timeout=15.0,
                follow_redirects=True,
                headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
            ) as client:
                data = await self._fetch_streams(client)
                if data is None:
                    logger.warning("[ppv] Could not fetch streams from any API")
                    return []
                # The API returns:
                # { "streams": [ { "category": "Name", "id": N, "streams": [...] }, ... ] }
                # Flatten into (category_name, stream_obj) tuples.
                all_streams = self._normalize_streams(data)
                logger.info(
                    "[ppv] Found %d total stream(s) across all categories",
                    len(all_streams),
                )
                for category_name, stream_obj in all_streams:
                    name = stream_obj.get("name", "") or stream_obj.get("title", "")
                    if not _is_f1_stream(name, category_name):
                        continue
                    # Build the embed URL
                    embed_url = self._get_embed_url(stream_obj)
                    if not embed_url:
                        logger.debug("[ppv] No embed URL for stream: %s", name)
                        continue
                    # Extract quality from tag if present
                    tag = stream_obj.get("tag", "")
                    quality = tag if tag else ""
                    # Build descriptive title
                    title = name
                    viewers = stream_obj.get("viewers")
                    if viewers and int(viewers) > 0:
                        title += f" ({viewers} viewers)"
                    # Check for substreams (multiple quality/language options)
                    substreams = stream_obj.get("substreams")
                    if isinstance(substreams, list) and substreams:
                        for i, sub in enumerate(substreams):
                            sub_embed = sub.get("iframe", "") or sub.get("embed_url", "")
                            if not sub_embed:
                                # Fall back to the parent embed URL
                                sub_embed = embed_url
                            sub_name = sub.get("name", "") or sub.get("label", "")
                            sub_quality = sub.get("tag", "") or sub.get("quality", "") or quality
                            sub_title = f"{name}"
                            if sub_name:
                                sub_title += f" - {sub_name}"
                            elif i > 0:
                                sub_title += f" #{i + 1}"
                            streams.append(
                                ExtractedStream(
                                    url=sub_embed,
                                    site_key=self.site_key,
                                    site_name=self.site_name,
                                    quality=sub_quality,
                                    title=sub_title,
                                    stream_type="embed",
                                    embed_url=sub_embed,
                                )
                            )
                    else:
                        # Single stream, no substreams
                        streams.append(
                            ExtractedStream(
                                url=embed_url,
                                site_key=self.site_key,
                                site_name=self.site_name,
                                quality=quality,
                                title=title,
                                stream_type="embed",
                                embed_url=embed_url,
                            )
                        )
        except Exception:
            logger.exception("[ppv] Failed to extract streams")
        logger.info("[ppv] Extracted %d F1 stream(s)", len(streams))
        return streams
    @staticmethod
    def _normalize_streams(data: dict | list) -> list[tuple[str, dict]]:
        """Normalize the API response into a flat list of (category_name, stream_dict) tuples.
        The PPV API returns data in this shape:
        {
            "streams": [
                {
                    "category": "Motorsports",
                    "id": 35,
                    "streams": [ { stream objects... } ]
                },
                ...
            ]
        }
        Each category group has a "category" string and a nested "streams" list.
        """
        result: list[tuple[str, dict]] = []
        # Handle the top-level wrapper
        if isinstance(data, dict):
            categories = data.get("streams", [])
        elif isinstance(data, list):
            categories = data
        else:
            return result
        for category_group in categories:
            if not isinstance(category_group, dict):
                continue
            category_name = category_group.get("category", "")
            # The nested streams within this category
            inner_streams = category_group.get("streams", [])
            if isinstance(inner_streams, list):
                for stream_obj in inner_streams:
                    if isinstance(stream_obj, dict):
                        # Attach category_name to each stream for filtering
                        result.append((category_name, stream_obj))
            elif isinstance(category_group, dict) and "name" in category_group:
                # Fallback: the item itself is a stream (flat list format)
                result.append((category_name, category_group))
        return result
    @staticmethod
    def _get_embed_url(stream: dict) -> str:
        """Extract or construct the embed URL for a stream."""
        # Prefer the iframe field directly
        iframe = stream.get("iframe", "")
        if iframe:
            return iframe
        # Construct from uri_name
        uri_name = stream.get("uri_name", "") or stream.get("uri", "")
        if uri_name:
            # Strip leading slash if present
            uri_name = uri_name.lstrip("/")
            return f"{EMBED_BASE}/{uri_name}"
        # Last resort: use the stream id
        stream_id = stream.get("id")
        if stream_id:
            return f"{EMBED_BASE}/{stream_id}"
        return ""
--- a/stacks/f1-stream/files/backend/extractors/service.py
+++ b/stacks/f1-stream/files/backend/extractors/service.py
@ -6,6 +6,7 @@ from datetime import datetime, timezone
 from backend.extractors.models import ExtractedStream
 from backend.extractors.registry import ExtractorRegistry
 from backend.health import StreamHealthChecker
 from backend.playback_verifier import PlaybackVerifier
 logger = logging.getLogger(__name__)
@ -29,6 +30,11 @@ class ExtractionService:
        self._last_run: str | None = None
        self._last_run_stream_count: int = 0
        self._health_checker = StreamHealthChecker()
        self._playback_verifier = PlaybackVerifier()
    async def shutdown(self) -> None:
        """Release the headless browser instance owned by the verifier."""
        await self._playback_verifier.shutdown()
    async def run_extraction(self) -> None:
        """Run all extractors, health-check results, and cache them.
@ -43,31 +49,63 @@ class ExtractionService:
        streams = await self._registry.extract_all()
-        # Run health checks on all extracted streams
+        # Run health checks + headless-browser playback verification.
        # Both stream types are now verified end-to-end so the user only
        # ever sees streams that actually play in a browser.
        if streams:
            # Separate m3u8 streams (need health check) from embed streams (skip)
            m3u8_streams = [s for s in streams if s.stream_type != "embed"]
            embed_streams = [s for s in streams if s.stream_type == "embed"]
-            # Mark embed streams as live (no health check possible for iframes)
+            # m3u8 streams: cheap structural health check (validates manifest,
-            for stream in embed_streams:
+            # checks first variant playlist), then a headless-browser test
-                stream.is_live = True
+            # to confirm hls.js can decode and render frames.
                stream.response_time_ms = 0
                stream.checked_at = start.isoformat()
            # Health-check only m3u8 streams
            if m3u8_streams:
                stream_dicts = [s.to_dict() for s in m3u8_streams]
                health_map = await self._health_checker.check_all(stream_dicts)
                for stream in m3u8_streams:
                    health = health_map.get(stream.url)
                    if health:
                        stream.is_live = health.is_live
                        stream.response_time_ms = health.response_time_ms
                        stream.checked_at = health.checked_at
                        if health.bitrate > 0:
                            stream.bitrate = health.bitrate
                        # tentatively mark live; final word comes from the verifier
                        stream.is_live = health.is_live
            # Browser verification: applies to both m3u8 (only those that
            # passed structural health) and embed (always — they have no
            # other way to verify).
            verify_items: list[tuple[str, str]] = []
            for stream in m3u8_streams:
                if stream.is_live:
                    verify_items.append((stream.url, "m3u8"))
            for stream in embed_streams:
                verify_items.append((stream.embed_url or stream.url, "embed"))
            verdicts = await self._playback_verifier.verify_many(verify_items)
            now_iso = datetime.now(timezone.utc).isoformat()
            for stream in m3u8_streams:
                if not stream.is_live:
                    continue  # already failed health check
                verdict = verdicts.get(stream.url)
                if verdict is None:
                    continue  # verifier disabled or unavailable
                stream.is_live = verdict.is_playable
                stream.checked_at = now_iso
            for stream in embed_streams:
                key = stream.embed_url or stream.url
                verdict = verdicts.get(key)
                stream.checked_at = now_iso
                if verdict is None:
                    # Verifier unavailable — fall back to "trust extractor".
                    # This keeps the service usable even without playwright.
                    stream.is_live = True
                    stream.response_time_ms = 0
                else:
                    stream.is_live = verdict.is_playable
                    stream.response_time_ms = verdict.elapsed_ms
        # Group streams by site_key and update cache
        new_cache: dict[str, list[ExtractedStream]] = {}
--- a/stacks/f1-stream/files/backend/extractors/timstreams.py
+++ b/stacks/f1-stream/files/backend/extractors/timstreams.py
@ -0,0 +1,190 @@
 """TimStreams extractor - fetches F1 streams from the TimStreams JSON API.
 Returns embed URLs from hmembeds.one for iframe playback.
 The public API at stra.viaplus.site/main requires no authentication
 and returns all events/channels across Events, Replays, and 24/7 categories.
 """
 import logging
 import httpx
 from backend.extractors.base import BaseExtractor
 from backend.extractors.models import ExtractedStream
 logger = logging.getLogger(__name__)
 API_URL = "https://stra.viaplus.site/main"
 USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
 )
 # Direct F1 keyword matches (case-insensitive)
 F1_KEYWORDS = {"formula 1", "formula one", "f1", "sky sports f1", "dazn f1"}
 # "Grand prix" is F1-related only if non-F1 motorsport keywords are absent
 GP_KEYWORD = "grand prix"
 # Exclude these motorsport series when matching on "grand prix"
 NON_F1_KEYWORDS = {
    "motogp", "moto gp", "moto2", "moto3", "motoe",
    "indycar", "indy car", "nascar",
    "rally", "wrc", "wec", "lemans", "le mans",
    "superbike", "dtm", "supercars",
 }
 # 24/7 channels that should always be included (embed hashes on hmembeds.one)
 ALWAYS_INCLUDE_HASHES = {
    "888520f36cd94c5da4c71fddc1a5fc9b",  # Sky Sports F1
    "fc3a54634d0867b0c02ee3223292e7c6",  # DAZN F1
 }
 def _is_f1_event(name: str) -> bool:
    """Check if an event/channel is Formula 1 related by name.
    Returns True when the name contains a direct F1 keyword, or contains
    "grand prix" without non-F1 series keywords.
    Note: The TimStreams API genre field (genre=2) covers ALL sports channels,
    not just motorsport, so we rely solely on name-based matching.
    """
    lower = name.lower()
    # Direct F1 keyword match
    if any(kw in lower for kw in F1_KEYWORDS):
        return True
    # Grand prix without competing series
    if GP_KEYWORD in lower and not any(kw in lower for kw in NON_F1_KEYWORDS):
        return True
    return False
 def _extract_embed_hash(url: str) -> str | None:
    """Extract the hash from an hmembeds.one embed URL.
    Expected format: https://hmembeds.one/embed/{hash}
    Returns the hash string, or None if the URL is not in the expected format.
    """
    if not url:
        return None
    # Handle both with and without trailing slash
    url = url.rstrip("/")
    prefix = "https://hmembeds.one/embed/"
    alt_prefix = "http://hmembeds.one/embed/"
    if url.startswith(prefix):
        return url[len(prefix):] or None
    if url.startswith(alt_prefix):
        return url[len(alt_prefix):] or None
    return None
 def _is_always_include(url: str) -> bool:
    """Check if a stream URL is one of the always-include 24/7 channels."""
    embed_hash = _extract_embed_hash(url)
    return embed_hash in ALWAYS_INCLUDE_HASHES if embed_hash else False
 class TimStreamsExtractor(BaseExtractor):
    """Extracts embed URLs from TimStreams' public JSON API.
    The API at stra.viaplus.site/main returns a JSON array of categories,
    each containing events with stream URLs pointing to hmembeds.one embeds.
    """
    @property
    def site_key(self) -> str:
        return "timstreams"
    @property
    def site_name(self) -> str:
        return "TimStreams"
    async def extract(self) -> list[ExtractedStream]:
        """Fetch F1 events/channels and return embed URLs for iframe playback."""
        streams: list[ExtractedStream] = []
        seen_urls: set[str] = set()
        try:
            async with httpx.AsyncClient(
                timeout=15.0,
                follow_redirects=True,
                headers={"User-Agent": USER_AGENT, "Accept": "application/json"},
            ) as client:
                resp = await client.get(API_URL)
                if resp.status_code != 200:
                    logger.warning(
                        "[timstreams] API returned HTTP %d", resp.status_code
                    )
                    return []
                data = resp.json()
                if not isinstance(data, list):
                    logger.warning("[timstreams] Unexpected API response type: %s", type(data).__name__)
                    return []
                logger.info("[timstreams] API returned %d categorie(s)", len(data))
                for category in data:
                    category_name = category.get("category", "Unknown")
                    events = category.get("events", [])
                    if not isinstance(events, list):
                        continue
                    for event in events:
                        event_name = event.get("name", "Unknown")
                        event_streams = event.get("streams", [])
                        if not isinstance(event_streams, list) or not event_streams:
                            continue
                        # Check if any stream URL matches an always-include channel
                        always_include = any(
                            _is_always_include(s.get("url", ""))
                            for s in event_streams
                        )
                        # Filter: must be F1-related or an always-include channel
                        if not always_include and not _is_f1_event(event_name):
                            continue
                        for stream_info in event_streams:
                            stream_name = stream_info.get("name", "")
                            stream_url = stream_info.get("url", "")
                            if not stream_url:
                                continue
                            # Deduplicate by URL
                            if stream_url in seen_urls:
                                continue
                            seen_urls.add(stream_url)
                            # Build a descriptive title
                            title = event_name
                            if stream_name and stream_name.lower() != event_name.lower():
                                title = f"{event_name} - {stream_name}"
                            if category_name:
                                title = f"[{category_name}] {title}"
                            streams.append(
                                ExtractedStream(
                                    url=stream_url,
                                    site_key=self.site_key,
                                    site_name=self.site_name,
                                    quality="",
                                    title=title,
                                    stream_type="embed",
                                    embed_url=stream_url,
                                )
                            )
        except httpx.TimeoutException:
            logger.warning("[timstreams] API request timed out")
        except Exception:
            logger.exception("[timstreams] Failed to fetch from API")
        logger.info("[timstreams] Extracted %d stream(s)", len(streams))
        return streams
--- a/stacks/f1-stream/files/backend/main.py
+++ b/stacks/f1-stream/files/backend/main.py
@ -3,6 +3,7 @@
 import logging
 import os
 from contextlib import asynccontextmanager
 from datetime import datetime, timedelta, timezone
 from apscheduler.schedulers.asyncio import AsyncIOScheduler
 from apscheduler.triggers.cron import CronTrigger
@ -13,6 +14,7 @@ from fastapi.staticfiles import StaticFiles
 from pydantic import BaseModel
 from starlette.responses import Response, StreamingResponse
 from backend.embed_proxy import fetch_embed, relay_asset
 from backend.extractors import create_extraction_service
 from backend.proxy import proxy_playlist, relay_stream
 from backend.schedule import ScheduleService
@ -117,10 +119,6 @@ async def lifespan(app: FastAPI):
    # Startup: load schedule and start background scheduler
    await schedule_service.initialize()
    # Run initial extraction
    logger.info("Running initial stream extraction...")
    await extraction_service.run_extraction()
    # Schedule daily schedule refresh
    scheduler.add_job(
        _scheduled_refresh,
@ -130,13 +128,18 @@ async def lifespan(app: FastAPI):
        replace_existing=True,
    )
-    # Schedule periodic stream extraction (default: every 30 minutes)
+    # Schedule periodic stream extraction (default: every 30 minutes).
    # next_run_time fires the first run 8s after startup. We don't run
    # extraction inline here because it calls the playback verifier,
    # which hits http://127.0.0.1:8000/embed for embed streams — uvicorn
    # isn't listening yet inside the lifespan startup phase.
    scheduler.add_job(
        _scheduled_extraction,
        trigger=IntervalTrigger(minutes=30),
        id="stream_extraction",
        name="Extract streams from all registered sites",
        replace_existing=True,
        next_run_time=datetime.now(timezone.utc) + timedelta(seconds=8),
    )
    # Schedule token refresh every 4 minutes (safe margin for 5-min CDN tokens).
@ -159,6 +162,10 @@ async def lifespan(app: FastAPI):
    # Shutdown
    scheduler.shutdown(wait=False)
    logger.info("APScheduler shut down")
    try:
        await extraction_service.shutdown()
    except Exception:
        logger.exception("extraction_service shutdown failed")
 app = FastAPI(title="F1 Streams", lifespan=lifespan)
@ -409,6 +416,37 @@ async def relay_endpoint(
    )
 # --- Embed iframe-stripping proxy ---
@app.get("/embed")
 async def embed_proxy(url: str = Query(..., description="Base64url-encoded embed URL")):
    """Proxy a third-party embed page so it can be iframed in our origin.
    Strips X-Frame-Options and CSP frame-ancestors from the upstream
    response, injects a base href + frame-buster-defeat script, and
    forwards a plausible Referer/Origin to bypass upstream allowlists.
    """
    body, headers, status_code = await fetch_embed(url)
    return Response(content=body, headers=headers, status_code=status_code)
@app.get("/embed-asset")
 async def embed_asset(
    request: Request,
    url: str = Query(..., description="Base64url-encoded subresource URL"),
 ):
    """Relay an upstream subresource (JS/CSS/image/etc.) for the embed proxy.
    Used as a fallback when an upstream blocks hotlinked assets via Origin
    or Referer checks. Most assets load directly via the injected <base>
    tag without going through this endpoint.
    """
    range_header = request.headers.get("range")
    stream_gen, headers, status_code = await relay_asset(url, range_header)
    return StreamingResponse(stream_gen, headers=headers, status_code=status_code)
 # --- Frontend Static Files ---
 # Mount the SvelteKit static build AFTER all API routes so API endpoints take priority.
 # SvelteKit adapter-static with ssr=false produces {page}.html files and a fallback index.html.
--- a/stacks/f1-stream/files/backend/playback_verifier.py
+++ b/stacks/f1-stream/files/backend/playback_verifier.py
@ -0,0 +1,445 @@
 """Headless-browser playback verification for extracted streams.
 The basic health checker (backend/health.py) only validates m3u8 syntax.
 For embed/iframe streams it has nothing to check — the previous code blindly
 marked every embed `is_live=True`, which meant the stream list was full of
 news articles and aggregator landing pages that never actually played.
 This module loads each candidate stream URL in headless Chromium (via
 Playwright) and looks for *codec-independent* signals that the upstream
 serves a playable stream:
 - For m3u8: hls.js receives MANIFEST_PARSED + at least one FRAG_LOADED
  event. We don't wait for `<video>` to gain dimensions, because Playwright's
  chromium build doesn't include the H.264/AAC codecs. The user's real
  browser does, so confirming "manifest + segment fetch succeed" is the
  right server-side signal.
 - For embed: a `<video>` element appears at top level OR inside the iframe
  (the embed proxy strips X-Frame-Options + frame-buster JS so we can
  introspect the iframe content), OR the player has set up a MediaSource.
 Designed to be called from the extraction service's run_extraction()
 hook, with bounded concurrency. Each verification typically takes
 4-12 seconds.
 """
 import asyncio
 import base64
 import logging
 import os
 import time
 from dataclasses import dataclass
 logger = logging.getLogger(__name__)
 # Toggle off in development by setting PLAYBACK_VERIFY_ENABLED=false.
 VERIFY_ENABLED = os.getenv("PLAYBACK_VERIFY_ENABLED", "true").lower() in ("true", "1", "yes")
 # Maximum number of concurrent browser pages.
 MAX_CONCURRENCY = int(os.getenv("PLAYBACK_VERIFY_CONCURRENCY", "2"))
 # Per-stream verification budget (seconds). Beyond this we declare unplayable.
 PER_STREAM_TIMEOUT = float(os.getenv("PLAYBACK_VERIFY_TIMEOUT", "20"))
 # Where the embed proxy lives, used to wrap embed URLs so they bypass
 # X-Frame-Options/CSP/JS frame-busters during verification. Defaults to
 # loopback because verification runs inside the same FastAPI process.
 PROXY_BASE = os.getenv("PLAYBACK_VERIFY_PROXY_BASE", "http://127.0.0.1:8000")
 USER_AGENT = (
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
    "AppleWebKit/537.36 (KHTML, like Gecko) "
    "Chrome/120.0.0.0 Safari/537.36"
 )
@dataclass
 class PlaybackVerdict:
    is_playable: bool
    signal: str = ""  # which check triggered the positive verdict
    elapsed_ms: int = 0
    error: str = ""
 def _b64url(s: str) -> str:
    """URL-safe base64 with padding stripped — matches m3u8_rewriter.encode_url."""
    return base64.urlsafe_b64encode(s.encode()).decode().rstrip("=")
 def _hls_test_html(m3u8_url: str) -> str:
    """A self-contained HTML page that loads an m3u8 via hls.js into a <video>.
    The page exposes window._verifier with manifest_parsed / frag_loaded
    booleans the verifier polls. It also marks media-error or fatal-error
    so we can distinguish 'upstream is unreachable' from 'codec missing'.
    """
    return f"""<!doctype html>
 <html><head><meta charset="utf-8"><title>verify</title>
 <script src="https://cdn.jsdelivr.net/npm/hls.js@1.5/dist/hls.min.js"></script>
 </head><body>
 <video id="v" muted playsinline width="640" height="360"></video>
 <script>
 window._verifier = {{
  manifest_parsed: false,
  frag_loaded: false,
  media_loaded: false,  // true when MSE has appended any buffer
  fatal_network_error: false,  // upstream truly unreachable
  manifest_incompatible: false,  // codec missing — separate from network reachability
  hls_error_details: ""
 }};
 const v = document.getElementById('v');
 const url = {m3u8_url!r};
 function start() {{
  if (window.Hls && Hls.isSupported()) {{
    const hls = new Hls({{enableWorker: true}});
    hls.on(Hls.Events.MANIFEST_PARSED, () => {{ window._verifier.manifest_parsed = true; }});
    hls.on(Hls.Events.FRAG_LOADED, () => {{ window._verifier.frag_loaded = true; }});
    hls.on(Hls.Events.BUFFER_APPENDED, () => {{ window._verifier.media_loaded = true; }});
    hls.on(Hls.Events.ERROR, (_, d) => {{
      window._verifier.hls_error_details = d.details || "";
      if (d.fatal && d.type === Hls.ErrorTypes.NETWORK_ERROR) {{
        window._verifier.fatal_network_error = true;
      }}
      if (d.details === Hls.ErrorDetails.MANIFEST_INCOMPATIBLE_CODECS_ERROR) {{
        window._verifier.manifest_incompatible = true;
      }}
    }});
    hls.loadSource(url);
    hls.attachMedia(v);
  }} else if (v.canPlayType('application/vnd.apple.mpegurl')) {{
    v.src = url;
    v.addEventListener('loadedmetadata', () => {{ window._verifier.manifest_parsed = true; window._verifier.frag_loaded = true; }});
    v.addEventListener('error', () => {{ window._verifier.fatal_network_error = true; }});
  }} else {{
    window._verifier.hls_error_details = "no hls support";
  }}
 }}
 window.addEventListener('load', start);
 </script></body></html>"""
 def _embed_test_html(_proxied_embed_url: str) -> str:
    """No longer used — verifier navigates the page directly to the proxy URL.
    The earlier iframe-wrapper approach hit same-origin policy when inspecting
    the iframe's contentDocument (the wrapper page was a data: URL, the iframe
    was http://127.0.0.1:8000), so we couldn't read the embed's DOM.
    """
    return ""
 _M3U8_POLL_JS = """
 () => {
  const v = window._verifier || {};
  const vid = document.querySelector('video');
  return {
    manifest_parsed: !!v.manifest_parsed,
    frag_loaded: !!v.frag_loaded,
    media_loaded: !!v.media_loaded,
    fatal_network_error: !!v.fatal_network_error,
    manifest_incompatible: !!v.manifest_incompatible,
    hls_error_details: v.hls_error_details || "",
    video_width: vid ? vid.videoWidth : 0,
    video_ready: vid ? vid.readyState : 0,
  };
 }
 """
 _EMBED_POLL_JS = """
 () => {
  try {
    const vids = document.querySelectorAll('video');
    if (vids.length > 0) {
      const v = vids[0];
      return {
        has_video: true,
        src: v.currentSrc || v.src || "",
        width: v.videoWidth,
        ready: v.readyState,
        duration: isFinite(v.duration) ? v.duration : 0,
        media_keys: !!v.mediaKeys,
        sources: v.querySelectorAll('source').length,
      };
    }
    const player_divs = document.querySelectorAll(
      '[id*="player" i], [class*="player" i], [class*="jwplayer" i], [id*="video" i], [class*="video-js" i]'
    );
    return {has_video: false, has_player_div: player_divs.length > 0};
  } catch (e) {
    return {has_video: false, has_player_div: false, err: String(e)};
  }
 }
 """
 async def _verify_m3u8(page, m3u8_url: str, deadline: float) -> PlaybackVerdict:
    """Confirm an m3u8 URL is fetchable via hls.js end-to-end.
    Positive signal hierarchy:
      1. media_loaded (MSE buffer appended) — strongest, codec-supported.
      2. frag_loaded (hls.js fetched at least one segment) — upstream is OK
         even if the local browser lacks codecs.
      3. manifest_parsed without media_loaded but with manifest_incompatible
         — indicates upstream playlist is valid; player can't decode here
         but a real user's browser will.
    Negative signal:
      - fatal_network_error: upstream is unreachable.
      - timeout with no manifest_parsed: upstream did not respond.
    """
    start = time.monotonic()
    html = _hls_test_html(m3u8_url)
    data_url = "data:text/html;base64," + base64.b64encode(html.encode()).decode()
    try:
        await page.goto(data_url, wait_until="domcontentloaded", timeout=10_000)
    except Exception as e:
        return PlaybackVerdict(
            is_playable=False, error=f"goto failed: {e}",
            elapsed_ms=int((time.monotonic() - start) * 1000),
        )
    last_state: dict = {}
    while time.monotonic() < deadline:
        try:
            state = await page.evaluate(_M3U8_POLL_JS)
        except Exception as e:
            return PlaybackVerdict(
                is_playable=False, error=f"evaluate failed: {e}",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        last_state = state
        if state.get("media_loaded"):
            return PlaybackVerdict(
                is_playable=True, signal="media_loaded",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        if state.get("frag_loaded"):
            return PlaybackVerdict(
                is_playable=True, signal="frag_loaded",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        # MANIFEST_INCOMPATIBLE_CODECS_ERROR fires after hls.js successfully
        # fetched and parsed the manifest — the failure is purely local
        # (chromium lacks H.264). The user's real browser has codecs, so
        # this URL is playable from the user's perspective.
        if state.get("manifest_incompatible"):
            return PlaybackVerdict(
                is_playable=True, signal="manifest_parsed_codec_missing_in_verifier",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        if state.get("manifest_parsed"):
            return PlaybackVerdict(
                is_playable=True, signal="manifest_parsed",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        if state.get("fatal_network_error"):
            return PlaybackVerdict(
                is_playable=False, error="upstream network error",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        await asyncio.sleep(0.25)
    err = "no playback signal"
    if last_state.get("hls_error_details"):
        err = f"hls.js error: {last_state['hls_error_details']}"
    return PlaybackVerdict(
        is_playable=False, error=err,
        elapsed_ms=int((time.monotonic() - start) * 1000),
    )
 async def _verify_embed(page, proxied_url: str, deadline: float) -> PlaybackVerdict:
    """Navigate directly to the proxied embed and confirm a player rendered.
    Positive signals (in priority order):
      - <video> with src/sources/mediaKeys set (player wired up).
      - <video> element exists with any state (script ran, player attaching).
      - A player container div (jwplayer, video-js, [id*=player], etc.).
    Loading the embed page directly (not via iframe wrapper) avoids the
    same-origin policy that prevented earlier iframe-introspection runs
    from seeing the embed DOM.
    """
    start = time.monotonic()
    try:
        await page.goto(proxied_url, wait_until="domcontentloaded", timeout=15_000)
    except Exception as e:
        return PlaybackVerdict(
            is_playable=False, error=f"goto failed: {e}",
            elapsed_ms=int((time.monotonic() - start) * 1000),
        )
    # Track the best state seen across all polls. Some embeds load a player
    # div briefly then anti-bot JS tears the DOM down (hmembeds redirects
    # to google.com if its devtool-detection trips). We accept any positive
    # signal observed during the window, even if it's gone by timeout.
    seen_video_wired = False
    seen_video_tag = False
    seen_player_div = False
    last_err = ""
    while time.monotonic() < deadline:
        try:
            r = await page.evaluate(_EMBED_POLL_JS)
        except Exception as e:
            return PlaybackVerdict(
                is_playable=False, error=f"evaluate failed: {e}",
                elapsed_ms=int((time.monotonic() - start) * 1000),
            )
        if r.get("has_video"):
            seen_video_tag = True
            if r.get("src") or r.get("width", 0) > 0 or r.get("media_keys") or r.get("sources", 0) > 0:
                seen_video_wired = True
                return PlaybackVerdict(
                    is_playable=True, signal="video.wired",
                    elapsed_ms=int((time.monotonic() - start) * 1000),
                )
        if r.get("has_player_div"):
            seen_player_div = True
        last_err = r.get("err", "")
        await asyncio.sleep(0.5)
    if seen_video_wired:
        return PlaybackVerdict(is_playable=True, signal="video.wired",
                               elapsed_ms=int((time.monotonic() - start) * 1000))
    if seen_video_tag:
        return PlaybackVerdict(is_playable=True, signal="video.tag_only",
                               elapsed_ms=int((time.monotonic() - start) * 1000))
    if seen_player_div:
        return PlaybackVerdict(is_playable=True, signal="player_div",
                               elapsed_ms=int((time.monotonic() - start) * 1000))
    err = "no <video> or player container found"
    if last_err:
        err += f"; last_err: {last_err}"
    return PlaybackVerdict(is_playable=False, error=err,
                           elapsed_ms=int((time.monotonic() - start) * 1000))
 class PlaybackVerifier:
    """Verifies playability of m3u8 and embed URLs via headless Chromium.
    Manages a single browser instance for the process lifetime (cheap per-page
    contexts) and bounds concurrency with a semaphore.
    """
    def __init__(self) -> None:
        self._browser = None
        self._playwright = None
        self._sem = asyncio.Semaphore(MAX_CONCURRENCY)
        self._lock = asyncio.Lock()
    async def _ensure_browser(self):
        if self._browser is not None:
            return self._browser
        async with self._lock:
            if self._browser is not None:
                return self._browser
            try:
                from playwright.async_api import async_playwright
            except ImportError:
                logger.error("playwright not installed — playback verification disabled")
                return None
            self._playwright = await async_playwright().start()
            self._browser = await self._playwright.chromium.launch(
                headless=True,
                args=[
                    "--disable-dev-shm-usage",
                    "--disable-web-security",
                    "--no-sandbox",
                    "--disable-setuid-sandbox",
                    "--disable-features=IsolateOrigins,site-per-process",
                    "--autoplay-policy=no-user-gesture-required",
                ],
            )
            logger.info("Playwright browser launched (concurrency=%d)", MAX_CONCURRENCY)
            return self._browser
    async def shutdown(self) -> None:
        if self._browser is not None:
            try:
                await self._browser.close()
            except Exception:
                logger.exception("error closing browser")
        if self._playwright is not None:
            try:
                await self._playwright.stop()
            except Exception:
                logger.exception("error stopping playwright")
        self._browser = None
        self._playwright = None
    async def verify(self, url: str, stream_type: str) -> PlaybackVerdict:
        if not VERIFY_ENABLED:
            return PlaybackVerdict(is_playable=True, error="disabled")
        browser = await self._ensure_browser()
        if browser is None:
            return PlaybackVerdict(is_playable=False, error="playwright unavailable")
        is_m3u8 = stream_type == "m3u8"
        if not is_m3u8:
            url = f"{PROXY_BASE}/embed?url={_b64url(url)}"
        async with self._sem:
            # Set the per-stream deadline AFTER acquiring the semaphore.
            # Otherwise queued streams that wait behind earlier ones
            # would have already-expired deadlines when they start.
            deadline = time.monotonic() + PER_STREAM_TIMEOUT
            try:
                context = await browser.new_context(
                    user_agent=USER_AGENT,
                    viewport={"width": 1280, "height": 720},
                    bypass_csp=True,
                )
                page = await context.new_page()
            except Exception as e:
                return PlaybackVerdict(
                    is_playable=False, error=f"context create failed: {e}",
                )
            try:
                if is_m3u8:
                    verdict = await _verify_m3u8(page, url, deadline)
                else:
                    verdict = await _verify_embed(page, url, deadline)
            except asyncio.TimeoutError:
                verdict = PlaybackVerdict(is_playable=False, error="overall timeout")
            except Exception as e:
                verdict = PlaybackVerdict(
                    is_playable=False, error=f"verify exception: {e}",
                )
            finally:
                try:
                    await page.close()
                    await context.close()
                except Exception:
                    pass
            logger.info(
                "[verify] %s -> playable=%s signal=%s err=%s elapsed=%dms",
                url[:120], verdict.is_playable, verdict.signal,
                verdict.error, verdict.elapsed_ms,
            )
            return verdict
    async def verify_many(self, items: list[tuple[str, str]]) -> dict[str, PlaybackVerdict]:
        if not items:
            return {}
        if not VERIFY_ENABLED:
            return {url: PlaybackVerdict(is_playable=True, error="disabled") for url, _ in items}
        async def _run(url: str, stream_type: str):
            verdict = await self.verify(url, stream_type)
            return url, verdict
        results = await asyncio.gather(
            *[_run(url, st) for url, st in items], return_exceptions=True
        )
        out: dict[str, PlaybackVerdict] = {}
        for r in results:
            if isinstance(r, Exception):
                logger.exception("verify task crashed: %s", r)
                continue
            url, verdict = r
            out[url] = verdict
        return out
--- a/stacks/f1-stream/files/backend/requirements.txt
+++ b/stacks/f1-stream/files/backend/requirements.txt
@ -3,3 +3,4 @@ uvicorn[standard]
 httpx>=0.27.0
 apscheduler>=3.10.0,<4.0
 pydantic>=2.0.0
 playwright==1.48.0
--- a/stacks/f1-stream/files/frontend/src/lib/api.js
+++ b/stacks/f1-stream/files/frontend/src/lib/api.js
@ -44,6 +44,20 @@ export function getProxyUrl(m3u8Url) {
 	return `${API_BASE}/proxy?url=${encoded}`;
 }
 /**
 * Get the embed-proxy URL for an upstream iframe embed page.
 *
 * The proxy strips X-Frame-Options / CSP frame-ancestors and injects a
 * frame-buster-defeat script so the embed renders inside our iframe even
 * when the upstream tries to block it.
 * @param {string} embedUrl - The original embed page URL
 * @returns {string} URL pointing at our /embed proxy
 */
 export function getEmbedProxyUrl(embedUrl) {
 	const encoded = toBase64Url(embedUrl);
 	return `${API_BASE}/embed?url=${encoded}`;
 }
 /**
 * Mark a stream as actively being watched (enables token refresh).
 * @param {string} url - The stream URL
--- a/stacks/f1-stream/files/frontend/src/routes/watch/+page.svelte
+++ b/stacks/f1-stream/files/frontend/src/routes/watch/+page.svelte
@ -1,5 +1,5 @@
 <script>
-	import { fetchStreams, fetchSchedule, getProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
+	import { fetchStreams, fetchSchedule, getProxyUrl, getEmbedProxyUrl, activateStream, deactivateStream } from '$lib/api.js';
 	import { onMount, onDestroy } from 'svelte';
 	import { page } from '$app/state';
@ -107,12 +107,14 @@
 		}
 		if (stream.stream_type === 'embed') {
-			// Embed/iframe player — no hls.js needed
+			// Embed/iframe player — route through our /embed proxy so the
 			// upstream's X-Frame-Options / CSP / JS frame-busters can't
 			// block the iframe.
 			const newPlayer = {
 				id: Date.now(),
 				proxyUrl: '',
 				originalUrl: stream.embed_url,
-				embedUrl: stream.embed_url,
+				embedUrl: getEmbedProxyUrl(stream.embed_url),
 				streamType: 'embed',
 				siteKey: stream.site_key || '',
 				siteName: stream.site_name || stream.site_key || 'Unknown',
--- a/stacks/f1-stream/main.tf
+++ b/stacks/f1-stream/main.tf
@ -104,11 +104,11 @@ resource "kubernetes_deployment" "f1-stream" {
          name              = "f1-stream"
          resources {
            limits = {
-              memory = "256Mi"
+              memory = "1Gi"
            }
            requests = {
-              cpu    = "25m"
+              cpu    = "100m"
-              memory = "256Mi"
+              memory = "1Gi"
            }
          }
          port {