# Pitfalls Research **Domain:** Live stream aggregation and proxy service (F1-focused) **Researched:** 2026-02-23 **Confidence:** MEDIUM — findings synthesized from yt-dlp/streamlink source analysis (HIGH), nginx proxy documentation (HIGH), HLS RFC/spec analysis (HIGH), OpenF1 API docs (HIGH), and web searches that returned sparse results (LOW where noted) --- ## Critical Pitfalls ### Pitfall 1: Treating JavaScript-Rendered Tokens as Static **What goes wrong:** Stream URLs on sports streaming sites are not present in raw HTML. They are computed client-side by obfuscated JavaScript — the page HTML contains an encrypted or encoded config blob, and the actual HLS URL is assembled by executing that JS. A scraper that fetches the raw HTML page and runs a regex over it finds nothing. **Why it happens:** Developers assume "the URL must be somewhere in the page source." They inspect the page in DevTools, see the URL in the network tab, and try to replicate what they observe — but miss that the URL was produced by JS execution, not served in the initial response. **How to avoid:** - For each target site: trace the actual network request that fetches the m3u8 in browser DevTools (Network tab, filter by `.m3u8`). Identify the API endpoint that the JS calls to get the signed URL. Replicate *that* API call (often a JSON endpoint), not the page fetch. - If the token is computed entirely client-side (e.g., via CryptoJS with a hardcoded key), implement the same algorithm in your extractor. Do not run headless browser — reverse-engineer the JS algorithm. - Document which sites require JS execution vs. which expose a clean API endpoint. Sites often have a backend API that the JavaScript calls; scraping that API is faster and more stable than re-implementing the JS. **Warning signs:** - Extractor returns empty results on a page you can watch in the browser - Network tab shows the m3u8 URL appearing only after JavaScript fires an XHR/fetch call - Page HTML contains a large base64 blob or heavily obfuscated JS variable (e.g., `var _0x1a2b = [...]`) **Phase to address:** Extractor design phase (before writing a single extractor). Establish upfront: for each target site, determine if raw HTTP fetch is sufficient or if API reverse-engineering is required. --- ### Pitfall 2: m3u8 Segment URLs Break When Proxied Through a Different Domain **What goes wrong:** When you fetch an m3u8 playlist and serve it through your proxy, the segment URLs inside the playlist may be absolute URLs pointing to the original CDN. The browser (or HLS.js) follows those segment URLs directly, bypassing your proxy entirely. This means you cannot control access, cannot inject headers, and CORS blocks the segments if the CDN doesn't allow cross-origin requests from your frontend domain. **Why it happens:** HLS playlists can contain either absolute URLs (`https://cdn.example.com/seg001.ts`) or relative paths (`seg001.ts`). Most streaming CDNs use absolute URLs with signed tokens. Proxying only the m3u8 is insufficient — every segment URL must also be rewritten to route through your relay. **How to avoid:** - When serving the m3u8 through your proxy, **rewrite all segment URLs** to point to your relay endpoint before sending the playlist to the client. Example: replace `https://cdn.site.com/segment001.ts?token=xyz` with `https://your-relay.domain/proxy/segment?url=`. - Your relay endpoint then fetches the original segment and streams it to the client. - Handle multi-level playlists: master playlists (variant streams) reference child playlists which reference segments — rewrite at each level. **Warning signs:** - Client-side CORS errors in browser console referencing the original CDN domain - Network tab shows segment fetches bypassing your proxy after the m3u8 loads - Some quality variants play but others don't (partial rewriting) **Phase to address:** Stream relay/proxy phase. Must be designed before the first end-to-end stream test. --- ### Pitfall 3: CDN-Signed Token URLs Expire Mid-Stream **What goes wrong:** Many CDNs sign stream URLs with a short-lived token (often 5–30 minutes). The m3u8 playlist URL itself may be signed, and so may each segment URL. A user who starts watching near the token expiry time will get a working stream for the first few segments, then receive 403 Forbidden errors as the token expires mid-playback. **Why it happens:** Developers test stream extraction, confirm the URL works, and ship it — without accounting for token TTL. The token was valid at extraction time but expires before or during playback. Live streams compound this: the m3u8 playlist updates every few seconds and each update may contain newly signed segment URLs. If your relay cached an old playlist, segments within it are expired. **How to avoid:** - Never cache m3u8 playlist files. Always fetch the live playlist from upstream on each client request. Cache only TS/m4s segments (which have longer or no expiry). - When extracting the initial stream URL, record the extraction timestamp and the token TTL (if discoverable from response headers like `Cache-Control: max-age=N`). Re-extract before expiry. - Implement a background refresh: when serving a stream, periodically re-run the extractor to get a fresh URL and pivot the relay to the new upstream without interrupting the client. - Test expiry by extracting a URL and waiting 30 minutes before playing — a failing test here reveals token TTL issues. **Warning signs:** - Stream plays for exactly N minutes then fails with 403 on segments - Extracting a URL works in isolation but fails when embedded in the player after a delay - Sites with Cloudflare or custom CDN always add `?token=` or `?sig=` parameters to segment URLs **Phase to address:** Stream relay phase. The relay architecture must include a URL refresh loop from day one. --- ### Pitfall 4: Per-Site Extractor Maintenance Burden Is Dramatically Underestimated **What goes wrong:** Each target site is a custom engineering problem. Sites change their HTML structure, JavaScript obfuscation, API endpoints, or anti-bot measures without notice. A working extractor can break silently overnight. With 5 target sites, you effectively have 5 separate maintenance tracks. Expecting "set and forget" behavior is the most common planning mistake in this domain. **Why it happens:** Developers build an extractor, it works, and they move on. Sites then deploy a CDN update, change their frontend framework, or rotate their obfuscation keys. The failure is silent — no exception is raised, the extractor just returns no URL, and the user sees an empty player. **How to avoid:** - Build a health check system that runs each extractor on a schedule (every 15 minutes during race weekends, every hour otherwise), logs success/failure, and triggers alerts on failure. - Design extractors with failure visibility: log exactly which step failed (page fetch, URL parse, API call, etc.) so debugging is fast. - Keep extractor logic isolated and testable: each extractor is a module that takes no inputs and returns a stream URL or raises an exception. Run integration tests against live sites on a schedule. - Plan 1–2 hours of maintenance per extractor per month as baseline, more during site redesigns. **Warning signs:** - No automated testing of extractors against live sites - Extractor code tightly coupled to specific HTML element IDs or class names (breaks on any frontend change) - No alerting when an extractor returns no URL **Phase to address:** Extractor design phase. The monitoring/health-check system must be built alongside the first extractor, not added later. --- ### Pitfall 5: Missing or Incorrect CORS Headers on the Relay Breaks Browser Playback **What goes wrong:** HLS.js in the browser makes cross-origin requests to fetch m3u8 playlists and segment files. If your relay doesn't serve the correct CORS headers, every segment request fails with a CORS error. Even a single missing header (e.g., on `.ts` segment responses but not `.m3u8` responses) breaks the stream. **Why it happens:** Developers test the relay with `curl` or server-to-server calls, where CORS is irrelevant. The relay works in isolation but fails when the browser's HLS.js player makes the requests. **How to avoid:** - Set CORS headers on **all** relay endpoints: `Access-Control-Allow-Origin: *` (or your specific frontend domain), `Access-Control-Allow-Methods: GET, HEAD, OPTIONS`, `Access-Control-Allow-Headers: Range`. - The `Range` header is critical: HLS.js often sends range requests for segments. If `Range` is not in `Allow-Headers`, preflight OPTIONS requests fail. - Do not use wildcard `*` if you also send `Access-Control-Allow-Credentials: true` — that combination is invalid and browsers reject it. - Test from the actual browser environment (or use a CORS testing tool) before calling any relay endpoint "done." **Warning signs:** - Browser console shows `No 'Access-Control-Allow-Origin' header` errors - Streams work when loaded directly in a `