infra/stacks/f1-stream/files/backend
Viktor Barzin a91bbe189e f1-stream: subreddit extractor finds Reddit '[Watch / Download]' threads
Two fixes for the previously-dormant subreddit extractor + a chrome-browser TARGETS pivot to MotoGP weekend live URLs.

1. **Reddit fetch was 403'd by `Accept: application/json`**. Cluster IP +
   that header trips Reddit's anti-bot fingerprint and returns HTML 403.
   Removing the explicit Accept (default `*/*`) restores HTTP 200 with
   JSON. Confirmed via direct httpx test from the f1-stream pod.

2. **Search the right things**. The community uses a stable
   `[Watch / Download] <Series> <Year> - <Round> | <Event>` post pattern
   with selftext links to admin-curated WordPress sites (motomundo.net
   for MotoGP, sister sites for F1 when active). New extractor:
   - Hits both /new.json and /search.json across r/MotorsportsReplays
     and three smaller motorsport subs.
   - Filters posts where title contains `[watch`, `watch online`, or
     flair = `live`.
   - Extracts URLs from selftext (regex), filters to a positive
     `_INTERESTING_HOSTS` allowlist (motomundo, freemotorsports,
     pitsport, rerace, dd12, etc.) so we don't drown the verifier in
     YouTube/Discord/gofile links.
   - Returns each as embed-type so the chrome-service verifier visits.

3. **chrome_browser.TARGETS pivoted** to the live MotoMundo MotoGP
   French GP iframes (motomundo.top/e/<id> + motomundo.upns.xyz/#<id>)
   while the weekend is on. The previous DD12 NASCAR + Acestrlms F1
   targets were both broken JW Player paths anyway.

State after deploy:
- /streams: 3 verified live (WRC Rally Portugal, NASCAR 24/7, Premier League Darts) — Darts is currently active because UK is mid-match.
- Subreddit extractor surfaces the live MotoMundo URL but the verifier
  marks the WordPress wrapper page playable=False (no top-level <video>
  element; the m3u8 lives in nested iframes). Next iteration: drill the
  verifier into iframe contentDocument and capture from there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
..
extractors f1-stream: subreddit extractor finds Reddit '[Watch / Download]' threads 2026-05-07 23:29:34 +00:00
__init__.py [ci skip] f1-stream: add F1 schedule subsystem (Phase 2) 2026-02-23 22:55:13 +00:00
embed_proxy.py f1-stream: drop demo + landing-page extractors, add fetch-proxy injection 2026-05-07 23:29:32 +00:00
health.py f1-stream: add real F1 stream extractors and iframe player support 2026-03-01 14:35:19 +00:00
m3u8_rewriter.py [ci skip] f1-stream: add stream health checker and HLS proxy (Phases 4-5) 2026-02-23 23:41:16 +00:00
main.py f1-stream: only show streams confirmed playable by headless browser 2026-05-07 23:29:31 +00:00
playback_verifier.py chrome-service: in-cluster headed Chromium pool for f1-stream verifier 2026-05-07 23:29:32 +00:00
proxy.py [ci skip] f1-stream: add CDN token refresh, SvelteKit frontend, multi-stream layout (Phases 6-8) 2026-02-23 23:59:35 +00:00
requirements.txt f1-stream: only show streams confirmed playable by headless browser 2026-05-07 23:29:31 +00:00
schedule.py [ci skip] f1-stream: add F1 schedule subsystem (Phase 2) 2026-02-23 22:55:13 +00:00
stealth.py chrome-service: in-cluster headed Chromium pool for f1-stream verifier 2026-05-07 23:29:32 +00:00
token_refresh.py [ci skip] f1-stream: add CDN token refresh, SvelteKit frontend, multi-stream layout (Phases 6-8) 2026-02-23 23:59:35 +00:00