f1-stream: drop demo + landing-page extractors, add fetch-proxy injection

Per user feedback: the demo Big Buck Bunny / Apple test streams aren't
useful in an F1-streams app. Removed DemoExtractor entirely. Tightened
the discord-extractor path filter from "any stream-shaped path" to
"direct embed/player path only" — the previous filter still let
sportsurge `/event/...` landing pages through, which the verifier
mistook for playable because they render player-class divs without a
real player.

Embed proxy now also rewrites window.fetch + XMLHttpRequest.open inside
the upstream HTML so that cross-origin XHRs (e.g. the hmembeds
`/sec/<JWT>` token-binding endpoint) go through our /embed-asset relay.
This avoids the CORS reject that fired when the player JS tried to call
hghndasw.gbgdhdffhf.shop/sec/... from an `f1.viktorbarzin.me` origin.

The verifier now requires a `<video>` element to mark embed streams
playable (not just a player-class div). Curated streams bypass the
verifier — hmembeds aggressively detects headless Chromium (devtool
trap, console-clear timing, automation flags) and won't progress past
JW Player init in our pod, but the user's real browser should clear
those checks. We can't honestly headless-verify hmembeds, so we trust
the curator instead of falsely rejecting them.

Image: viktorbarzin/f1-stream:v6.1.1
This commit is contained in:
Viktor Barzin 2026-05-06 21:50:54 +00:00
parent f90d79ed4e
commit 574cdf08d2
5 changed files with 87 additions and 27 deletions

View file

@ -14,7 +14,6 @@ Example:
from backend.extractors.aceztrims import AceztrimsExtractor
from backend.extractors.curated import CuratedExtractor
from backend.extractors.daddylive import DaddyLiveExtractor
from backend.extractors.demo import DemoExtractor
from backend.extractors.discord_source import DiscordExtractor
from backend.extractors.models import ExtractedStream
from backend.extractors.pitsport import PitsportExtractor
@ -42,12 +41,11 @@ def create_registry() -> ExtractorRegistry:
# --- Register extractors below ---
# CuratedExtractor returns hand-picked 24/7 channels first so we always
# have something. FallbackExtractor was removed — it surfaced aggregator
# landing pages that don't play directly in an iframe (they require
# user navigation through the page) and dominated the list with
# entries that fail browser-based playback verification.
# have something. DemoExtractor and FallbackExtractor were removed —
# demo streams aren't F1 content (just Big Buck Bunny etc.) and
# FallbackExtractor surfaced aggregator landing pages that don't play
# directly in an iframe.
registry.register(CuratedExtractor())
registry.register(DemoExtractor())
registry.register(StreamedExtractor())
registry.register(DaddyLiveExtractor())
registry.register(AceztrimsExtractor())

View file

@ -42,13 +42,13 @@ EXCLUDED_DOMAINS = {
}
# A URL is treated as a candidate stream embed only if its path looks like
# a stream/embed/player route. This catches /embed/{id}, /stream/{id},
# /watch/{id}, /live/{slug}, /player/{...} and similar — and rejects
# /article/, /news/, /latest/, /join/, etc.
# a *direct* player/embed page — `/embed/{id}`, `/player/{...}`, `*.m3u8`,
# `*.php` (legacy iframe1.php style). Aggregator landing pages
# (`/event/...`, `/watch?session=...`, etc.) are rejected because they
# show a list of links instead of playing automatically — those produce
# verifier-passing UI without actual playback.
_PATH_KEYWORDS = (
"embed/", "/stream", "/streams", "/watch", "/live",
"/player", "/play/", "/sky", "/f1/", "/formula",
"/grand-prix", "/gp/", "/channel", ".m3u8", ".php",
"/embed/", "/player/", ".m3u8", ".php",
)

View file

@ -94,10 +94,21 @@ class ExtractionService:
stream.is_live = verdict.is_playable
stream.checked_at = now_iso
# Curated streams skip the verifier — they are hand-picked
# 24/7 channels whose embed pages aggressively detect headless
# automation. We can't reliably confirm playback server-side,
# but we trust the curator. The user's real browser does NOT
# trigger the same anti-bot heuristics (real plugins, real
# mouse movements, etc.).
CURATED_BYPASS = {"curated"}
for stream in embed_streams:
stream.checked_at = now_iso
if stream.site_key in CURATED_BYPASS:
stream.is_live = True
stream.response_time_ms = 0
continue
key = stream.embed_url or stream.url
verdict = verdicts.get(key)
stream.checked_at = now_iso
if verdict is None:
# Verifier unavailable — fall back to "trust extractor".
# This keeps the service usable even without playwright.