f1-stream: drop demo + landing-page extractors, add fetch-proxy injection
Per user feedback: the demo Big Buck Bunny / Apple test streams aren't useful in an F1-streams app. Removed DemoExtractor entirely. Tightened the discord-extractor path filter from "any stream-shaped path" to "direct embed/player path only" — the previous filter still let sportsurge `/event/...` landing pages through, which the verifier mistook for playable because they render player-class divs without a real player. Embed proxy now also rewrites window.fetch + XMLHttpRequest.open inside the upstream HTML so that cross-origin XHRs (e.g. the hmembeds `/sec/<JWT>` token-binding endpoint) go through our /embed-asset relay. This avoids the CORS reject that fired when the player JS tried to call hghndasw.gbgdhdffhf.shop/sec/... from an `f1.viktorbarzin.me` origin. The verifier now requires a `<video>` element to mark embed streams playable (not just a player-class div). Curated streams bypass the verifier — hmembeds aggressively detects headless Chromium (devtool trap, console-clear timing, automation flags) and won't progress past JW Player init in our pod, but the user's real browser should clear those checks. We can't honestly headless-verify hmembeds, so we trust the curator instead of falsely rejecting them. Image: viktorbarzin/f1-stream:v6.1.1
This commit is contained in:
parent
f90d79ed4e
commit
574cdf08d2
5 changed files with 87 additions and 27 deletions
|
|
@ -14,7 +14,6 @@ Example:
|
|||
from backend.extractors.aceztrims import AceztrimsExtractor
|
||||
from backend.extractors.curated import CuratedExtractor
|
||||
from backend.extractors.daddylive import DaddyLiveExtractor
|
||||
from backend.extractors.demo import DemoExtractor
|
||||
from backend.extractors.discord_source import DiscordExtractor
|
||||
from backend.extractors.models import ExtractedStream
|
||||
from backend.extractors.pitsport import PitsportExtractor
|
||||
|
|
@ -42,12 +41,11 @@ def create_registry() -> ExtractorRegistry:
|
|||
|
||||
# --- Register extractors below ---
|
||||
# CuratedExtractor returns hand-picked 24/7 channels first so we always
|
||||
# have something. FallbackExtractor was removed — it surfaced aggregator
|
||||
# landing pages that don't play directly in an iframe (they require
|
||||
# user navigation through the page) and dominated the list with
|
||||
# entries that fail browser-based playback verification.
|
||||
# have something. DemoExtractor and FallbackExtractor were removed —
|
||||
# demo streams aren't F1 content (just Big Buck Bunny etc.) and
|
||||
# FallbackExtractor surfaced aggregator landing pages that don't play
|
||||
# directly in an iframe.
|
||||
registry.register(CuratedExtractor())
|
||||
registry.register(DemoExtractor())
|
||||
registry.register(StreamedExtractor())
|
||||
registry.register(DaddyLiveExtractor())
|
||||
registry.register(AceztrimsExtractor())
|
||||
|
|
|
|||
|
|
@ -42,13 +42,13 @@ EXCLUDED_DOMAINS = {
|
|||
}
|
||||
|
||||
# A URL is treated as a candidate stream embed only if its path looks like
|
||||
# a stream/embed/player route. This catches /embed/{id}, /stream/{id},
|
||||
# /watch/{id}, /live/{slug}, /player/{...} and similar — and rejects
|
||||
# /article/, /news/, /latest/, /join/, etc.
|
||||
# a *direct* player/embed page — `/embed/{id}`, `/player/{...}`, `*.m3u8`,
|
||||
# `*.php` (legacy iframe1.php style). Aggregator landing pages
|
||||
# (`/event/...`, `/watch?session=...`, etc.) are rejected because they
|
||||
# show a list of links instead of playing automatically — those produce
|
||||
# verifier-passing UI without actual playback.
|
||||
_PATH_KEYWORDS = (
|
||||
"embed/", "/stream", "/streams", "/watch", "/live",
|
||||
"/player", "/play/", "/sky", "/f1/", "/formula",
|
||||
"/grand-prix", "/gp/", "/channel", ".m3u8", ".php",
|
||||
"/embed/", "/player/", ".m3u8", ".php",
|
||||
)
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -94,10 +94,21 @@ class ExtractionService:
|
|||
stream.is_live = verdict.is_playable
|
||||
stream.checked_at = now_iso
|
||||
|
||||
# Curated streams skip the verifier — they are hand-picked
|
||||
# 24/7 channels whose embed pages aggressively detect headless
|
||||
# automation. We can't reliably confirm playback server-side,
|
||||
# but we trust the curator. The user's real browser does NOT
|
||||
# trigger the same anti-bot heuristics (real plugins, real
|
||||
# mouse movements, etc.).
|
||||
CURATED_BYPASS = {"curated"}
|
||||
for stream in embed_streams:
|
||||
stream.checked_at = now_iso
|
||||
if stream.site_key in CURATED_BYPASS:
|
||||
stream.is_live = True
|
||||
stream.response_time_ms = 0
|
||||
continue
|
||||
key = stream.embed_url or stream.url
|
||||
verdict = verdicts.get(key)
|
||||
stream.checked_at = now_iso
|
||||
if verdict is None:
|
||||
# Verifier unavailable — fall back to "trust extractor".
|
||||
# This keeps the service usable even without playwright.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue