f1-stream: subreddit extractor finds Reddit '[Watch / Download]' threads
Two fixes for the previously-dormant subreddit extractor + a chrome-browser TARGETS pivot to MotoGP weekend live URLs.
1. **Reddit fetch was 403'd by `Accept: application/json`**. Cluster IP +
that header trips Reddit's anti-bot fingerprint and returns HTML 403.
Removing the explicit Accept (default `*/*`) restores HTTP 200 with
JSON. Confirmed via direct httpx test from the f1-stream pod.
2. **Search the right things**. The community uses a stable
`[Watch / Download] <Series> <Year> - <Round> | <Event>` post pattern
with selftext links to admin-curated WordPress sites (motomundo.net
for MotoGP, sister sites for F1 when active). New extractor:
- Hits both /new.json and /search.json across r/MotorsportsReplays
and three smaller motorsport subs.
- Filters posts where title contains `[watch`, `watch online`, or
flair = `live`.
- Extracts URLs from selftext (regex), filters to a positive
`_INTERESTING_HOSTS` allowlist (motomundo, freemotorsports,
pitsport, rerace, dd12, etc.) so we don't drown the verifier in
YouTube/Discord/gofile links.
- Returns each as embed-type so the chrome-service verifier visits.
3. **chrome_browser.TARGETS pivoted** to the live MotoMundo MotoGP
French GP iframes (motomundo.top/e/<id> + motomundo.upns.xyz/#<id>)
while the weekend is on. The previous DD12 NASCAR + Acestrlms F1
targets were both broken JW Player paths anyway.
State after deploy:
- /streams: 3 verified live (WRC Rally Portugal, NASCAR 24/7, Premier League Darts) — Darts is currently active because UK is mid-match.
- Subreddit extractor surfaces the live MotoMundo URL but the verifier
marks the WordPress wrapper page playable=False (no top-level <video>
element; the m3u8 lives in nested iframes). Next iteration: drill the
verifier into iframe contentDocument and capture from there.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
3148d15d5a
commit
4a636f3fb7
2 changed files with 147 additions and 77 deletions
|
|
@ -51,23 +51,22 @@ class _Target:
|
|||
# whose m3u8 is JS-computed. Add freely — each one takes ~12s to scrape.
|
||||
# ---------------------------------------------------------------------------
|
||||
TARGETS: tuple[_Target, ...] = (
|
||||
# DD12streams' /nas iframe → /new-nas/jwplayer → JW player setup with
|
||||
# an inline m3u8. The HTML is generated server-side and the URL string
|
||||
# is embedded directly, so this would also work over curl — but the
|
||||
# browser path future-proofs against dd12 starting to compute the URL
|
||||
# in JS.
|
||||
# MotoMundo embed pages — the community-curated WordPress site for
|
||||
# MotoGP. Each /e/<id> URL is one of the iframes their "Watch Online"
|
||||
# post lists for the active session (FP/Q/Race). The m3u8 is
|
||||
# JS-computed at load time so a real browser is required to capture
|
||||
# it. Update IDs each weekend to match the current race; subreddit.py
|
||||
# discovers them from the Reddit "[Watch / Download]" thread.
|
||||
_Target(
|
||||
label="DD12Streams",
|
||||
title="NASCAR Cup (24/7) — DD12",
|
||||
url="https://dd12streams.com/nas",
|
||||
settle=10,
|
||||
label="MotoMundo",
|
||||
title="MotoGP Live (MotoMundo) — French GP / Le Mans",
|
||||
url="https://motomundo.top/e/9yzn08jk9py4",
|
||||
settle=15,
|
||||
),
|
||||
# Acestrlms aggregator — always-on F1 page that re-frames pooembed.
|
||||
# Captured m3u8 only appears once the embed's JS runs.
|
||||
_Target(
|
||||
label="Acestrlms",
|
||||
title="Sky Sports F1 (24/7) — Acestrlms",
|
||||
url="https://acestrlms.pages.dev/f11/",
|
||||
label="MotoMundo",
|
||||
title="MotoGP Live (MotoMundo upns) — French GP / Le Mans",
|
||||
url="https://motomundo.upns.xyz/#kqasde",
|
||||
settle=15,
|
||||
),
|
||||
)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue