f1-stream: only show streams confirmed playable by headless browser
Cuts the stream list from 23 mostly-broken entries to ~6 confirmed-playable ones, and adds an iframe-stripping proxy so embed sources (hmembeds, etc.) load through our origin without X-Frame-Options / CSP / JS frame-buster blocks. Why: the previous list was dominated by Discord-shared news article URLs, hardcoded aggregator landing pages, and other non-stream URLs that all sat at is_live=true because embed streams skipped the health check entirely. Users could not tell which links would actually play. What: - backend/playback_verifier.py: new headless-Chromium verifier (Playwright) that polls each candidate stream for a codec-independent "playable" signal (hls.js MANIFEST_PARSED for m3u8; <video>/player div for embed). Replaces the unconditional is_live=True for embed streams in service.py. - backend/embed_proxy.py: new /embed and /embed-asset routes that fetch upstream embed pages, strip X-Frame-Options/CSP/Set-Cookie, and inject a <base href> + frame-buster-defeat <script> that locks down window.top, document.referrer, console.clear/table, and window.location so the hmembeds disable-devtool.js redirect-to-google trap can't fire. - extractors/curated.py: new always-on extractor with two known-good 24/7 hmembeds embeds (Sky Sports F1, DAZN F1) so the list isn't empty between race weekends. - extractors/__init__.py: register CuratedExtractor first; drop FallbackExtractor (its 10 aggregator landing-pages can't iframe-play). - extractors/discord_source.py: positive-match path filter (must look like /embed/, /stream, /watch, /live, /player, *.m3u8, *.php) plus expanded domain blocklist for news sites — was 10 noise URLs, now ~1. - extractors/service.py: run_extraction now health-checks AND verifier- checks both stream types; only verified-playable streams reach is_live. - main.py: register /embed + /embed-asset routes; defer initial extraction by 8s so the verifier can reach the local /embed proxy on 127.0.0.1:8000. - frontend/lib/api.js + watch/+page.svelte: route embed iframes through /embed proxy instead of the upstream URL, so X-Frame-Options/CSP can't block them. - Dockerfile: install Playwright chromium + system codec-runtime libs. - main.tf: bump pod memory 256Mi → 1Gi for chromium. Verified end-to-end with Playwright against https://f1.viktorbarzin.me/watch — 6/6 streams reach a player UI; the 3 demo m3u8s actually play (codec-bearing browser); the 3 embeds (Sky Sports F1, DAZN F1, sportsurge) render iframes through the proxy. Image: viktorbarzin/f1-stream:v6.0.5 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
8b180f7662
commit
f90d79ed4e
15 changed files with 2128 additions and 22 deletions
|
|
@ -3,6 +3,7 @@
|
|||
import logging
|
||||
import os
|
||||
from contextlib import asynccontextmanager
|
||||
from datetime import datetime, timedelta, timezone
|
||||
|
||||
from apscheduler.schedulers.asyncio import AsyncIOScheduler
|
||||
from apscheduler.triggers.cron import CronTrigger
|
||||
|
|
@ -13,6 +14,7 @@ from fastapi.staticfiles import StaticFiles
|
|||
from pydantic import BaseModel
|
||||
from starlette.responses import Response, StreamingResponse
|
||||
|
||||
from backend.embed_proxy import fetch_embed, relay_asset
|
||||
from backend.extractors import create_extraction_service
|
||||
from backend.proxy import proxy_playlist, relay_stream
|
||||
from backend.schedule import ScheduleService
|
||||
|
|
@ -117,10 +119,6 @@ async def lifespan(app: FastAPI):
|
|||
# Startup: load schedule and start background scheduler
|
||||
await schedule_service.initialize()
|
||||
|
||||
# Run initial extraction
|
||||
logger.info("Running initial stream extraction...")
|
||||
await extraction_service.run_extraction()
|
||||
|
||||
# Schedule daily schedule refresh
|
||||
scheduler.add_job(
|
||||
_scheduled_refresh,
|
||||
|
|
@ -130,13 +128,18 @@ async def lifespan(app: FastAPI):
|
|||
replace_existing=True,
|
||||
)
|
||||
|
||||
# Schedule periodic stream extraction (default: every 30 minutes)
|
||||
# Schedule periodic stream extraction (default: every 30 minutes).
|
||||
# next_run_time fires the first run 8s after startup. We don't run
|
||||
# extraction inline here because it calls the playback verifier,
|
||||
# which hits http://127.0.0.1:8000/embed for embed streams — uvicorn
|
||||
# isn't listening yet inside the lifespan startup phase.
|
||||
scheduler.add_job(
|
||||
_scheduled_extraction,
|
||||
trigger=IntervalTrigger(minutes=30),
|
||||
id="stream_extraction",
|
||||
name="Extract streams from all registered sites",
|
||||
replace_existing=True,
|
||||
next_run_time=datetime.now(timezone.utc) + timedelta(seconds=8),
|
||||
)
|
||||
|
||||
# Schedule token refresh every 4 minutes (safe margin for 5-min CDN tokens).
|
||||
|
|
@ -159,6 +162,10 @@ async def lifespan(app: FastAPI):
|
|||
# Shutdown
|
||||
scheduler.shutdown(wait=False)
|
||||
logger.info("APScheduler shut down")
|
||||
try:
|
||||
await extraction_service.shutdown()
|
||||
except Exception:
|
||||
logger.exception("extraction_service shutdown failed")
|
||||
|
||||
|
||||
app = FastAPI(title="F1 Streams", lifespan=lifespan)
|
||||
|
|
@ -409,6 +416,37 @@ async def relay_endpoint(
|
|||
)
|
||||
|
||||
|
||||
# --- Embed iframe-stripping proxy ---
|
||||
|
||||
|
||||
@app.get("/embed")
|
||||
async def embed_proxy(url: str = Query(..., description="Base64url-encoded embed URL")):
|
||||
"""Proxy a third-party embed page so it can be iframed in our origin.
|
||||
|
||||
Strips X-Frame-Options and CSP frame-ancestors from the upstream
|
||||
response, injects a base href + frame-buster-defeat script, and
|
||||
forwards a plausible Referer/Origin to bypass upstream allowlists.
|
||||
"""
|
||||
body, headers, status_code = await fetch_embed(url)
|
||||
return Response(content=body, headers=headers, status_code=status_code)
|
||||
|
||||
|
||||
@app.get("/embed-asset")
|
||||
async def embed_asset(
|
||||
request: Request,
|
||||
url: str = Query(..., description="Base64url-encoded subresource URL"),
|
||||
):
|
||||
"""Relay an upstream subresource (JS/CSS/image/etc.) for the embed proxy.
|
||||
|
||||
Used as a fallback when an upstream blocks hotlinked assets via Origin
|
||||
or Referer checks. Most assets load directly via the injected <base>
|
||||
tag without going through this endpoint.
|
||||
"""
|
||||
range_header = request.headers.get("range")
|
||||
stream_gen, headers, status_code = await relay_asset(url, range_header)
|
||||
return StreamingResponse(stream_gen, headers=headers, status_code=status_code)
|
||||
|
||||
|
||||
# --- Frontend Static Files ---
|
||||
# Mount the SvelteKit static build AFTER all API routes so API endpoints take priority.
|
||||
# SvelteKit adapter-static with ssr=false produces {page}.html files and a fallback index.html.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue