f1-stream: drop broken curated, dedupe streams, accept all pitsport categories

User feedback: every stream on /watch shows ads but the player fails
to load. Three causes, three fixes:

1. CuratedExtractor's two hmembeds 24/7 channels (Sky F1, DAZN F1)
   sat at the top of the list and ALWAYS failed: they load the
   upstream's ad overlay then JW Player throws error 102630 (empty
   playlist; the obfuscated decoder produces no fileURL in our
   environment). Disabled the registration in extractors/__init__.py
   until/unless we find a working bypass — leaving the existing
   `CURATED_BYPASS = {"curated"}` shim in service.py so the swap is
   reversible.

2. Pitsport surfaces every WRC stage / MotoGP session as its own
   /watch UUID, but they all resolve to the same upstream m3u8 URL
   (e.g. RallyTV one master.m3u8 across all 22 Rally de Portugal
   stages). Added URL-keyed dedupe in service.run_extraction so the
   /streams response shows one row per actual stream.

3. The pitsport category filter was still narrowed to motorsport.
   Pitsport.xyz only lists curated sports broadcasts (WRC, MotoGP,
   IndyCar, NASCAR, Premier League Darts, Premier League football…),
   so the site's own selection is the right filter. Replaced the
   hand-maintained MOTORSPORT_KEYWORDS list with `bool(category or
   title)` — anything pitsport returns goes through. Streams that
   aren't actually live get filtered out downstream when the embed
   API returns an empty manifest.

Frontend: hls.js `lowLatencyMode` was on by default but RallyTV (and
most non-LL-HLS providers) don't ship the LL-HLS extensions, which
broke playback in real browsers. Default to `lowLatencyMode: false`.

Result: /streams is now 1 verified live entry (Rally TV WRC stage
currently airing); was 24 with the top 2 always broken + 22 dupes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-07 15:42:24 +00:00
parent 18d96712c7
commit 00614a3302
4 changed files with 37 additions and 16 deletions

View file

@ -40,12 +40,13 @@ def create_registry() -> ExtractorRegistry:
registry = ExtractorRegistry() registry = ExtractorRegistry()
# --- Register extractors below --- # --- Register extractors below ---
# CuratedExtractor returns hand-picked 24/7 channels first so we always # CuratedExtractor previously surfaced two hmembeds 24/7 channels (Sky
# have something. DemoExtractor and FallbackExtractor were removed — # Sports F1, DAZN F1) but their JW Player decoder produces an empty
# demo streams aren't F1 content (just Big Buck Bunny etc.) and # playlist in our environment (error 102630) regardless of headed mode,
# FallbackExtractor surfaced aggregator landing pages that don't play # IP, or fingerprint we tried. The streams loaded the upstream's ad
# directly in an iframe. # overlay but never produced a video element, so they confused users —
registry.register(CuratedExtractor()) # disabled until/unless we find a working bypass.
# registry.register(CuratedExtractor())
registry.register(StreamedExtractor()) registry.register(StreamedExtractor())
registry.register(DaddyLiveExtractor()) registry.register(DaddyLiveExtractor())
registry.register(AceztrimsExtractor()) registry.register(AceztrimsExtractor())

View file

@ -71,15 +71,12 @@ def _is_motorsport_category(category: str) -> bool:
def _is_motorsport_event(category: str, title: str) -> bool: def _is_motorsport_event(category: str, title: str) -> bool:
"""Check if an event is a motorsport we want to surface (F1 + adjacent).""" """Accept anything pitsport.xyz lists. Pitsport curates sports
if _is_motorsport_category(category): broadcasts (WRC, MotoGP, IndyCar, NASCAR, Premier League Darts,
return True Premier League football, etc.) the site's own selection is the
lower = f"{category} {title}".lower() filter we want. Empty/garbage events still get filtered downstream
if any(kw in lower for kw in MOTORSPORT_KEYWORDS): when `_resolve_event_streams` produces no playable URL."""
return True return bool(category or title)
if GP_KEYWORD in lower:
return True
return False
# Aliases kept so older call-sites stay compiling. Both now point at the # Aliases kept so older call-sites stay compiling. Both now point at the

View file

@ -49,6 +49,25 @@ class ExtractionService:
streams = await self._registry.extract_all() streams = await self._registry.extract_all()
# Dedupe by canonical URL — pitsport surfaces every WRC stage as a
# separate event but they all point at the same RallyTV master.m3u8
# (and similar for MotoGP weekend sessions). Keep the first
# occurrence so the user sees one entry per actual stream.
deduped: list[ExtractedStream] = []
seen_urls: set[str] = set()
for stream in streams:
key = (stream.embed_url or "").strip() or (stream.url or "").strip()
if not key or key in seen_urls:
continue
seen_urls.add(key)
deduped.append(stream)
if len(deduped) < len(streams):
logger.info(
"Deduped streams: %d -> %d (collapsed %d duplicate URL(s))",
len(streams), len(deduped), len(streams) - len(deduped),
)
streams = deduped
# Run health checks + headless-browser playback verification. # Run health checks + headless-browser playback verification.
# Both stream types are now verified end-to-end so the user only # Both stream types are now verified end-to-end so the user only
# ever sees streams that actually play in a browser. # ever sees streams that actually play in a browser.

View file

@ -175,9 +175,13 @@
if (!player || !player.videoEl) return; if (!player || !player.videoEl) return;
if (Hls.isSupported()) { if (Hls.isSupported()) {
// `lowLatencyMode` previously broke playback on regular (non-LL-HLS)
// providers like RallyTV — they don't ship the LL-HLS extensions
// hls.js needs in that mode. Default off; explicit per-stream flag
// can re-enable later.
const hlsInstance = new Hls({ const hlsInstance = new Hls({
enableWorker: true, enableWorker: true,
lowLatencyMode: true, lowLatencyMode: false,
backBufferLength: 90 backBufferLength: 90
}); });