docs: complete project research

2026-02-23 22:06:23 +00:00 · 2026-02-23 22:06:23 +00:00 · 36fd424107
commit 36fd424107
parent 89a6e08245
5 changed files with 1356 additions and 0 deletions
--- a/.planning/research/ARCHITECTURE.md
+++ b/.planning/research/ARCHITECTURE.md
@ -0,0 +1,434 @@
+# Architecture Research
+
+**Domain:** Live stream aggregation and proxy service (F1 streaming)
+**Researched:** 2026-02-23
+**Confidence:** MEDIUM — HLS spec and proxy mechanics are HIGH confidence from RFC 8216 and Apple docs; extractor patterns are MEDIUM confidence from yt-dlp/streamlink analysis; system composition for this specific use-case is inferred from domain knowledge.
+
+---
+
+## Standard Architecture
+
+### System Overview
+
+```
+┌───────────────────────────────────────────────────────────────────┐
+│                        CLIENT LAYER                               │
+│  ┌──────────────────────────────────────────────────────────┐     │
+│  │   Svelte Frontend (schedule view, stream picker, player)  │     │
+│  └────────────────────────────┬─────────────────────────────┘     │
+└───────────────────────────────│───────────────────────────────────┘
+                                │ HTTP/REST
+┌───────────────────────────────▼───────────────────────────────────┐
+│                        API LAYER                                   │
+│  ┌──────────────────────────────────────────────────────────┐     │
+│  │   Backend API (schedule, streams, health state)           │     │
+│  └────────┬──────────────────┬──────────────────────────────┘     │
+└───────────│──────────────────│────────────────────────────────────┘
+            │                  │
+            ▼                  ▼
+┌───────────────────┐  ┌──────────────────────────────────────────┐
+│   SCHEDULE        │  │         EXTRACTION LAYER                  │
+│   SUBSYSTEM       │  │  ┌───────────┐  ┌───────────┐            │
+│                   │  │  │ Extractor │  │ Extractor │  ...        │
+│ Jolpica/OpenF1    │  │  │ Site A    │  │ Site B    │            │
+│ API client        │  │  └─────┬─────┘  └─────┬─────┘            │
+│                   │  │        │               │                   │
+│ Cron: refresh     │  │  ┌─────▼───────────────▼──────────────┐   │
+│ schedule          │  │  │   Extractor Registry / Dispatcher   │   │
+└───────────────────┘  │  └─────────────────────┬──────────────┘   │
+                        │                        │                   │
+                        │  ┌─────────────────────▼──────────────┐   │
+                        │  │   Stream Health Checker             │   │
+                        │  │   (HEAD/partial GET on .m3u8 URLs)  │   │
+                        │  └─────────────────────────────────────┘   │
+                        └──────────────────────────────────────────┘
+                                          │
+                                          ▼ valid stream URLs
+                        ┌──────────────────────────────────────────┐
+                        │         PROXY LAYER                       │
+                        │                                           │
+                        │  Master Playlist Rewriter                 │
+                        │  ┌────────────────────────────────────┐   │
+                        │  │ GET /proxy?url=<encoded-m3u8>       │   │
+                        │  │  → fetch upstream m3u8              │   │
+                        │  │  → rewrite all URIs to proxy paths  │   │
+                        │  │  → return modified playlist         │   │
+                        │  └────────────────────────────────────┘   │
+                        │                                           │
+                        │  Segment Relay                            │
+                        │  ┌────────────────────────────────────┐   │
+                        │  │ GET /relay?url=<encoded-segment>    │   │
+                        │  │  → upstream fetch with headers      │   │
+                        │  │  → pipe response to client          │   │
+                        │  └────────────────────────────────────┘   │
+                        └──────────────────────────────────────────┘
+                                          │
+                                          ▼ piped bytes
+                        ┌──────────────────────────────────────────┐
+                        │         STORAGE / CACHE                   │
+                        │  ┌─────────────────┐  ┌───────────────┐  │
+                        │  │ In-memory cache  │  │   NFS mount   │  │
+                        │  │ (stream links,   │  │ (schedule     │  │
+                        │  │  health status)  │  │  snapshots,   │  │
+                        │  └─────────────────┘  │  config)      │  │
+                        │                        └───────────────┘  │
+                        └──────────────────────────────────────────┘
+```
+
+---
+
+### Component Responsibilities
+
+| Component | Responsibility | Typical Implementation |
+|-----------|----------------|------------------------|
+| **Svelte Frontend** | Schedule display, stream picker UI, embedded HLS player | SvelteKit app; hls.js or Video.js for player |
+| **Backend API** | Serves schedule, current stream list, health status to frontend | Python (FastAPI) or Node.js; REST endpoints |
+| **Schedule Subsystem** | Polls Jolpica/OpenF1 API, normalises session data, stores locally | Async background task with cron interval |
+| **Extractor Registry** | Maps site hostnames to extractor implementations; dispatches extraction | Plain dict/map of site-key → extractor class |
+| **Per-Site Extractor** | Performs HTTP requests with session cookies/CSRF, parses HTML/JS, follows redirect chains, returns raw stream URL | Python class per site; uses `httpx`/`requests` + `BeautifulSoup`/`regex` |
+| **Stream Health Checker** | Verifies extracted URLs are live (partial GET on m3u8, checks HTTP 200 + content-type) | Background poller; marks streams up/down in cache |
+| **Proxy / Playlist Rewriter** | Fetches upstream m3u8, rewrites all embedded URIs to go through `/relay`, returns modified playlist | Stateless HTTP handler; no buffering of media data |
+| **Segment Relay** | Fetches upstream `.ts`/`.fmp4` segments and pipes bytes to client; forwards necessary headers | Streaming HTTP proxy (not buffered); forwards Range, Content-Type |
+| **In-Memory Cache** | Stores current stream states and health, avoids redundant extraction on every client request | Python dict with TTL, or Redis (existing cluster Redis) |
+| **NFS Storage** | Persists schedule snapshots, extractor configuration, optional diagnostics | NFS at `10.0.10.15` via existing pattern |
+
+---
+
+## Recommended Project Structure
+
+```
+f1-streams/
+├── backend/
+│   ├── api/
+│   │   ├── routes/
+│   │   │   ├── schedule.py     # GET /schedule
+│   │   │   ├── streams.py      # GET /streams, POST /streams/refresh
+│   │   │   └── proxy.py        # GET /proxy, GET /relay
+│   │   └── main.py             # FastAPI app, lifespan hooks
+│   ├── extractors/
+│   │   ├── base.py             # Extractor ABC: extract() -> list[StreamInfo]
+│   │   ├── registry.py         # Map site-key -> extractor class
+│   │   ├── site_a.py           # Site-A specific extractor
+│   │   └── site_b.py           # Site-B specific extractor
+│   ├── schedule/
+│   │   ├── client.py           # Jolpica/OpenF1 API client
+│   │   ├── models.py           # Session, Race pydantic models
+│   │   └── poller.py           # Background cron task
+│   ├── health/
+│   │   └── checker.py          # Stream liveness verification
+│   ├── proxy/
+│   │   ├── playlist.py         # m3u8 fetch + URI rewriting
+│   │   └── relay.py            # Segment pipe-through handler
+│   ├── cache.py                # In-memory store with TTL
+│   └── config.py               # Site list, polling intervals, NFS paths
+├── frontend/
+│   ├── src/
+│   │   ├── routes/
+│   │   │   ├── +page.svelte    # Schedule home
+│   │   │   └── watch/
+│   │   │       └── +page.svelte # Stream picker + player
+│   │   ├── lib/
+│   │   │   ├── api.ts           # Backend API client
+│   │   │   ├── player.ts        # hls.js wrapper
+│   │   │   └── schedule.ts      # Session time formatting
+│   │   └── app.html
+│   ├── static/
+│   └── package.json
+├── stacks/
+│   └── f1-streams/
+│       ├── main.tf
+│       └── terragrunt.hcl
+└── Dockerfile                   # Multi-stage: backend + frontend
+```
+
+### Structure Rationale
+
+- **backend/extractors/**: One file per site; base class enforces interface. Adding a new site = add one file + register it. No change to core.
+- **backend/proxy/**: Isolated from extraction. Proxy only knows about URLs — it does not care how they were found.
+- **backend/schedule/**: Completely independent subsystem. Can fail without breaking stream delivery.
+- **backend/health/**: Decoupled checker; stores results in cache, consulted by API on `/streams` requests.
+- **frontend/**: Standard SvelteKit layout. Minimal — schedule + player, nothing else.
+- **stacks/f1-streams/**: Single Terragrunt stack following existing pattern in repo.
+
+---
+
+## Architectural Patterns
+
+### Pattern 1: Extractor Plugin Interface
+
+**What:** Each site extractor implements a fixed interface (`extract(session_hint) -> list[StreamURL]`). The registry maps site keys to extractor classes. The dispatcher iterates the registry, calls each extractor, aggregates results.
+
+**When to use:** Always — the number of sites will grow and their anti-scraping measures change independently. Isolation prevents one broken extractor from affecting others.
+
+**Trade-offs:** Slightly more boilerplate per site; but each extractor is testable in isolation and replaceable without touching shared code.
+
+**Example:**
+```python
+class BaseExtractor(ABC):
+    site_key: str  # e.g. "siteA"
+
+    @abstractmethod
+    async def extract(self, hint: SessionHint | None = None) -> list[StreamURL]:
+        """Return list of live stream URLs found on this site."""
+        ...
+
+class SiteAExtractor(BaseExtractor):
+    site_key = "siteA"
+
+    async def extract(self, hint=None) -> list[StreamURL]:
+        # 1. GET page, parse CSRF token from HTML
+        # 2. POST with token to get obfuscated JSON
+        # 3. Decode JS-obfuscated URL
+        # 4. Follow redirects to final .m3u8
+        ...
+```
+
+### Pattern 2: Playlist Rewriting Proxy
+
+**What:** The proxy layer fetches the upstream m3u8 and rewrites every URL inside it (both master → variant pointers, and variant → segment pointers) to point back through `/relay?url=<base64-encoded-original>`. The client never contacts upstream directly.
+
+**When to use:** Always when proxying HLS — the player will follow URLs in the playlist; if those URLs point to the origin CDN, the proxy is bypassed for segment delivery.
+
+**Trade-offs:** Adds ~1 hop latency per segment request. For a private service with 1-5 users, this is negligible. Benefit: hides origin, enables header injection (e.g., `Referer`), unified player experience.
+
+**Example:**
+```python
+def rewrite_playlist(m3u8_text: str, base_url: str, proxy_base: str) -> str:
+    """Rewrite all URIs in an m3u8 to go through the proxy relay endpoint."""
+    lines = []
+    for line in m3u8_text.splitlines():
+        if line and not line.startswith("#"):
+            # resolve relative URL, then encode through proxy
+            absolute = urllib.parse.urljoin(base_url, line)
+            proxied = f"{proxy_base}/relay?url={b64encode(absolute)}"
+            lines.append(proxied)
+        else:
+            lines.append(line)
+    return "\n".join(lines)
+```
+
+### Pattern 3: Background Polling with In-Memory Cache
+
+**What:** Extraction and health checking run as background tasks on a schedule (e.g., every 2 minutes). Results are stored in a shared in-memory dict with timestamps. The API layer reads from cache and returns immediately — no per-request extraction.
+
+**When to use:** Always — on-demand extraction per client request would be slow (2-10s per site) and would hammer the source sites.
+
+**Trade-offs:** Cache staleness window (default 2 min). Acceptable for live sports: streams stay stable once live.
+
+**Example:**
+```python
+# cache.py
+_stream_cache: dict[str, CachedResult] = {}
+
+async def get_streams() -> list[StreamURL]:
+    if cache_is_fresh():
+        return _stream_cache["streams"].data
+    # else trigger background refresh
+    ...
+```
+
+---
+
+## Data Flow
+
+### Stream Discovery Flow (background)
+
+```
+[Cron trigger: every 2 min]
+        ↓
+[Extractor Registry]
+        ↓ (fan-out, concurrent)
+[SiteA Extractor]   [SiteB Extractor]   [SiteN Extractor]
+        ↓
+[Raw stream URLs: list of .m3u8 candidates]
+        ↓
+[Health Checker: partial GET each URL]
+        ↓ (filter: only HTTP 200 + video/mpegURL content-type)
+[Validated stream URLs]
+        ↓
+[Cache: store with timestamp + site metadata]
+```
+
+### Client Playback Flow (per request)
+
+```
+[User opens /watch in browser]
+        ↓
+[Frontend GET /api/streams]
+        ↓
+[Backend reads cache → returns stream list (site, quality, label)]
+        ↓
+[User picks a stream]
+        ↓
+[Player requests: GET /proxy?url=<m3u8-url>]
+        ↓
+[Backend: fetch upstream m3u8, rewrite URIs → return modified m3u8]
+        ↓
+[Player follows variant playlist: GET /proxy?url=<variant-m3u8>]
+        ↓
+[Backend: rewrite segment URIs]
+        ↓
+[Player fetches segments: GET /relay?url=<segment>]
+        ↓
+[Backend: upstream fetch, pipe bytes → client]
+        ↓
+[Video plays in browser]
+```
+
+### Schedule Flow
+
+```
+[Cron: daily or on-demand]
+        ↓
+[Schedule Client: GET Jolpica API /ergast/f1/current.json]
+        ↓
+[Parse: races, session types, UTC timestamps]
+        ↓
+[Normalise: map to internal Session model]
+        ↓
+[Store: NFS JSON file + in-memory cache]
+        ↓
+[Frontend GET /api/schedule → displays session list]
+```
+
+### Key Data Flows
+
+1. **Extraction → Cache → API → Frontend**: All stream data originates from extractors, flows through the cache as the single source of truth, and is served read-only to the frontend. No frontend-triggered extraction.
+2. **Client → Proxy → Upstream CDN**: The proxy is a pure pass-through relay. It does not store segments. Bytes from upstream go directly to client socket.
+3. **Schedule API → NFS**: Schedule data is written to NFS on refresh so the pod can serve it immediately on restart without waiting for the next API poll.
+
+---
+
+## Component Boundaries
+
+| Component | Owns | Does Not Own |
+|-----------|------|--------------|
+| Extractor (per site) | How to get stream URL from that site | Health checking, caching, proxying |
+| Health Checker | Liveness state of each URL | How the URL was found |
+| Proxy / Relay | Rewriting m3u8 URIs, piping bytes | Authentication with upstream (that's extractor's job) |
+| Schedule Subsystem | F1 session calendar data | Stream availability for a given session |
+| Backend API | Serving current state to frontend | Fetching or refreshing state |
+| Frontend | User interaction, player | Any backend logic |
+
+---
+
+## Suggested Build Order (Phase Dependencies)
+
+The dependencies flow strictly upward — each layer depends only on the layer below it being stable:
+
+```
+Phase 1: Schedule Subsystem
+    ↓ (F1 data available)
+Phase 2: Extractor Framework + First Site Extractor
+    ↓ (raw URLs available)
+Phase 3: Health Checker
+    ↓ (validated URLs available)
+Phase 4: Proxy / Relay Layer
+    ↓ (streams playable through service)
+Phase 5: Frontend (schedule + player)
+    ↓ (end-to-end usable)
+Phase 6: Additional Site Extractors
+    ↓ (stream coverage widened)
+Phase 7: K8s Deployment (Terraform/Terragrunt stack)
+```
+
+**Rationale:**
+- Schedule first: gives a testable data source with zero anti-scraping complexity.
+- Extractor framework before specific sites: the base class and registry must exist before any site can plug in.
+- Health checker before proxy: no point proxying dead streams; the checker filters the list fed to the proxy.
+- Proxy before frontend: the frontend player needs a working `/proxy` endpoint to function.
+- Frontend last of core: all backend components are independently testable via curl/httpie before a UI exists.
+- Additional extractors after core is working: adding more sites is low-risk incremental work once the pattern is proven.
+- Deployment last: deploy once the service works end-to-end locally; avoids debugging infra and app simultaneously.
+
+---
+
+## Anti-Patterns
+
+### Anti-Pattern 1: On-Demand Extraction Per Client Request
+
+**What people do:** Trigger extraction when the user clicks "show streams" in the browser.
+
+**Why it's wrong:** Extraction takes 2-10 seconds per site (HTTP round trips, JS parsing, redirect following). With multiple sites, this is 10-30 seconds of wall time. Source sites may rate-limit aggressive bursts. Multiple concurrent users would multiply the load.
+
+**Do this instead:** Run extraction on a background schedule. Cache results. The API returns immediately from cache. The user sees streams in <100ms.
+
+### Anti-Pattern 2: Single Extractor Handles All Sites
+
+**What people do:** One big function with `if site == "A": ... elif site == "B": ...` branches.
+
+**Why it's wrong:** Sites change their obfuscation methods independently. A change to Site A's extraction logic can accidentally break Site B. Testing is impossible in isolation. Adding Site C requires modifying a shared file.
+
+**Do this instead:** One class per site, implementing a common interface. Changes to Site A's extractor never touch Site B's code.
+
+### Anti-Pattern 3: Buffering Segments in Memory Before Sending
+
+**What people do:** Download the entire `.ts` segment to memory, then serve it to the client.
+
+**Why it's wrong:** HLS segments can be 2-10 MB each. With multiple concurrent viewers, memory pressure grows quickly. Introduces unnecessary latency (client waits for full download before first byte).
+
+**Do this instead:** Pipe bytes from the upstream response directly to the client socket as they arrive (chunked transfer). The client starts receiving immediately, memory stays flat.
+
+### Anti-Pattern 4: Hardcoding Site URLs and Tokens in Extractor Logic
+
+**What people do:** Hardcode `BASE_URL = "https://site-a.example.com"` and referer/cookie values inside the extractor file.
+
+**Why it's wrong:** Sites change domains and anti-scraping parameters frequently. When a site moves, you have to find and edit code rather than config.
+
+**Do this instead:** Extractor reads its config (base URL, required headers, any known static tokens) from a config object injected at construction. The registry passes config to extractors at instantiation.
+
+---
+
+## Integration Points
+
+### External Services
+
+| Service | Integration Pattern | Notes |
+|---------|---------------------|-------|
+| Jolpica F1 API (`api.jolpi.ca/ergast/f1/`) | REST GET, poll daily | No API key required; backwards-compatible Ergast endpoints; schedule data available |
+| OpenF1 API (`api.openf1.org/`) | REST GET, poll as needed | No API key; 3 req/s rate limit; 2023+ data only; useful for session status (live/upcoming) |
+| Upstream streaming sites (Site A, B, N) | HTTP GET/POST with session cookies, CSRF tokens | Per-site; no shared pattern; treated as black boxes by the framework |
+| Upstream CDN (HLS segments) | HTTP GET with Range support | Proxy relays bytes; must forward `Referer` and sometimes `Origin` headers or CDN rejects |
+
+### Internal Boundaries
+
+| Boundary | Communication | Notes |
+|----------|---------------|-------|
+| Extractor → Cache | Direct function call (write) | Extractors do not call the cache directly — the dispatcher aggregates results then writes once |
+| API → Cache | Direct read | Synchronous, O(1) |
+| API → Proxy | Not direct — frontend calls `/proxy` endpoint, which is part of the same backend process | Can be split into separate service later if needed |
+| Proxy → Upstream CDN | Outbound HTTP | Must preserve session headers; upstream CDN may check Referer/Origin |
+| Schedule Poller → NFS | File write (JSON) | On pod restart, reads NFS before first API poll |
+
+---
+
+## Scaling Considerations
+
+This is a single-user or small-group private service. Scaling is not a primary concern, but here are the natural pressure points:
+
+| Scale | Architecture Adjustments |
+|-------|--------------------------|
+| 1-5 concurrent viewers | Single backend pod, in-memory cache, direct pipe relay — fully sufficient |
+| 10-20 concurrent viewers | Same architecture; segment relay becomes the bandwidth bottleneck (each viewer streams independently) — add HLS caching proxy (nginx) in front of relay |
+| 50+ concurrent viewers | Segment relay load increases linearly; consider a CDN or caching layer for segments; extraction/health remain unchanged |
+
+### Scaling Priorities
+
+1. **First bottleneck:** Outbound bandwidth on segment relay. Each viewer pulls full bitrate independently through the service. At private-use scale this is negligible (1-5 viewers).
+2. **Second bottleneck:** In-memory cache invalidation if multiple pods deploy (stateless pods don't share cache). Solved by using existing cluster Redis instead of in-process dict — but unnecessary until horizontal scaling.
+
+---
+
+## Sources
+
+- HLS specification: RFC 8216 (IETF) — playlist structure, master/media playlist relationship, segment mechanics (HIGH confidence)
+- HLS proxy pattern: Apple Developer Documentation (conceptual), corroborated by yt-dlp extractor framework analysis (MEDIUM confidence)
+- yt-dlp plugin architecture: github.com/yt-dlp/yt-dlp README + docs (MEDIUM confidence)
+- OpenF1 API: openf1.org official page — endpoints, rate limits, data coverage (HIGH confidence)
+- Jolpica F1 API: github.com/jolpica/jolpica-f1 — Ergast compatibility, availability (MEDIUM confidence)
+- System composition for this domain: inference from domain patterns, corroborated by extractor tool analysis (MEDIUM confidence)
+
+---
+
+*Architecture research for: Live stream aggregation and proxy service (F1)*
+*Researched: 2026-02-23*
--- a/.planning/research/FEATURES.md
+++ b/.planning/research/FEATURES.md
@ -0,0 +1,215 @@
+# Feature Research
+
+**Domain:** Live Stream Aggregation / Sports Stream Proxy Service
+**Researched:** 2026-02-23
+**Confidence:** MEDIUM
+
+---
+
+## Feature Landscape
+
+### Table Stakes (Users Expect These)
+
+Features users assume exist. Missing these = product feels incomplete.
+
+| Feature | Why Expected | Complexity | Notes |
+|---------|--------------|------------|-------|
+| Race schedule view | Users need to know when sessions are live without external lookup | LOW | Pull from OpenF1 API (`/sessions` endpoint). Session types: FP1, FP2, FP3, Quali, Sprint, Sprint Quali, Race. Confidence: HIGH (OpenF1 API confirmed). |
+| Live session indicator | Users need to distinguish live vs upcoming vs finished sessions at a glance | LOW | Visual status badge (LIVE / UPCOMING / FINISHED) based on session start time + duration. No polling needed at schedule level. |
+| Stream picker | Multiple stream sources per session — user picks which one to watch | LOW | List available extracted stream links with source label. Core UX of the whole product. |
+| Embedded video player | Users won't navigate to external players for each stream | MEDIUM | HLS.js in Svelte for in-page playback. Must handle m3u8 sources natively. Confidence: HIGH (HLS.js is the standard client-side HLS library). |
+| Stream health indicator | Users don't want to click a dead stream and stare at a spinner | MEDIUM | Backend health-check each extracted URL before displaying. Simple HEAD or short-lived GET on the m3u8 playlist. Mark dead streams visually. |
+| CORS-transparent stream proxy | Browsers block cross-origin HLS requests; streams can't play directly from scraped origins | HIGH | Proxy all m3u8 manifests + .ts/.m4s segments through your own backend. Rewrite manifest URLs to point to your proxy. This is architecturally mandatory, not optional. Confidence: HIGH (HLS-Proxy documentation confirms this). |
+| All F1 session types covered | Users specifically want FP, Quali, Sprint, Race, and pre/post content — not just race day | MEDIUM | Scraper scheduler must run for every session type on the F1 calendar. OpenF1 `/sessions` endpoint returns `session_type` field. |
+| Session countdown timer | For upcoming sessions, users want to know time-until-start without mental math | LOW | Client-side countdown from schedule data already fetched. Zero backend cost. |
+| Stream auto-refresh / re-extraction | Stream links expire (tokens, redirect chains rotate) — stale links silently fail | HIGH | Periodic re-extraction (e.g., every 5-10 min during a live session). Depends on extractor infrastructure. |
+| Multiple quality options (if available) | Users on slow connections need lower bitrate; users on fast connections want max quality | MEDIUM | Expose quality variants from multi-variant HLS playlists if source provides them. Let user pick or default to auto (hls.js handles ABR natively). |
+
+---
+
+### Differentiators (Competitive Advantage)
+
+Features that set the product apart. Not required, but valuable.
+
+| Feature | Value Proposition | Complexity | Notes |
+|---------|-------------------|------------|-------|
+| Automatic stream extraction at session start | Zero manual effort — streams appear when the session goes live | HIGH | Cron/scheduler tied to F1 calendar. Triggers extractors N minutes before session start. Eliminates "is there a stream yet?" manual checking. |
+| Per-site extractor isolation | Bypassing CSRF/JS obfuscation cleanly per site without shared code that breaks globally | HIGH | Each extractor is a self-contained module. One site's changes don't break others. Confidence: MEDIUM (pattern from streamlink plugin system). |
+| Session timeline: pre/post shows + press conferences | Competitors (scrapers, IPTV playlists) cover race only; full weekend coverage is rare | MEDIUM | Requires scheduling extractors for non-race events. OpenF1 does not cover pre/post shows — need site-specific session detection. |
+| Stream source labeling | Shows which site/feed each stream came from — users learn which sources are reliable | LOW | Store source metadata with each extracted URL. Display in picker. |
+| Fallback stream ordering | Automatically surfaces known-good streams first when multiple sources exist | MEDIUM | Health-check result + historical success rate drives ordering. Depends on: stream health checking + a minimal persistence layer to store success history. |
+| Proxy-cached segment prefetch | Reduces buffering by prefetching upcoming .ts segments into local cache | HIGH | Node-HLS-Proxy pattern: maintain per-stream segment cache up to N segments ahead. High implementation cost for marginal UX gain at private scale. |
+| Session notes / source reputation | Lightweight annotations (e.g., "this source often drops at lap 40") | LOW | Simple static config or admin-editable markdown. No database needed at MVP. |
+| Race weekend overview page | One page showing all sessions for a Grand Prix weekend — not just next session | LOW | Group sessions by event/round from schedule API. Pure frontend feature once schedule data is available. |
+
+---
+
+### Anti-Features (Commonly Requested, Often Problematic)
+
+Features to explicitly NOT build.
+
+| Feature | Why Requested | Why Problematic | Alternative |
+|---------|---------------|-----------------|-------------|
+| DVR / stream recording | Users want to rewatch if they miss something | Massive storage cost, legal exposure, complexity (recording live HLS streams, serving VOD). Out of scope by design. | Live viewing only. Accept the constraint. |
+| Chat / comments | Social viewing experience | Scope creep. You're building a stream aggregator, not a community platform. Auth, moderation, and DB schema all follow. | None — explicitly out of scope. |
+| User accounts / watchlists | "Remember my preferred stream source" | Requires auth layer, session storage, DB. Contradicts the "no auth, private URL" design decision. | Persist last-used quality/source in browser localStorage. Zero backend cost. |
+| Stream transcoding / re-encoding | Normalize quality across sources | Enormous CPU cost, latency, and complexity. An FFmpeg transcoding pipeline per stream is overkill for a private service. | Pass-through proxy only. Let hls.js handle ABR on the client. |
+| Headless browser extraction | Universal extractor that handles any site's JS obfuscation | Puppeteer/Playwright adds 200-400 MB RAM per session, slow cold starts, flaky in containers, and complex cluster scheduling. Per-site custom extractors are faster and more reliable. | Custom per-site extractors (Go/Python HTTP + regex/DOM parser). |
+| Mobile app | Access on phone | Web app with responsive Svelte layout is sufficient. Native app is weeks of work for a private tool. | Responsive web design. PWA if needed. |
+| Discovery / search for new stream sites | Auto-find new sources | Scraping discovery is an unsolved problem and a rabbit hole. You have a fixed list of sites. | User-provided site list. Extractor per site. |
+| Telemetry overlay / timing data | F1 fans love live timing alongside streams | Different product category (timing dashboard vs stream aggregator). OpenF1 has timing data but integrating it is a separate project. | Link to existing timing tools (e.g., openf1.org). |
+| DRM stream support | Some quality sources use Widevine/FairPlay | DRM circumvention is legally distinct from re-streaming. Avoid. | Non-DRM HLS sources only. |
+
+---
+
+## Feature Dependencies
+
+```
+Race Schedule View
+    └──requires──> F1 Schedule API Integration (OpenF1 or Ergast)
+                       └──enables──> Session Countdown Timer
+                       └──enables──> Automatic Extraction Trigger
+
+Stream Picker
+    └──requires──> CORS-Transparent Stream Proxy (browser cannot directly fetch cross-origin m3u8)
+    └──requires──> Stream Health Indicator (to filter dead streams before display)
+                       └──requires──> Stream Health Checker (backend periodic HEAD/GET)
+
+Embedded Video Player
+    └──requires──> CORS-Transparent Stream Proxy (proxied URLs served from same origin)
+    └──requires──> Stream Picker (to know which URL to play)
+
+Stream Auto-Refresh
+    └──requires──> Per-Site Extractor (to re-run extraction)
+    └──requires──> Session-live detection (know when to run vs stop)
+
+Fallback Stream Ordering
+    └──requires──> Stream Health Indicator
+    └──enhances──> Stream Picker (surfaces best streams first)
+
+Multiple Quality Options
+    └──requires──> CORS-Transparent Stream Proxy (proxy must rewrite variant playlist URLs too)
+    └──enhances──> Embedded Video Player (user control or ABR)
+
+Proxy-Cached Segment Prefetch
+    └──requires──> CORS-Transparent Stream Proxy (must be same proxy layer)
+    └──conflicts──> Minimal resource footprint (high memory cost)
+
+Session Timeline (pre/post/press conf)
+    └──requires──> F1 Schedule API Integration (for race events)
+    └──requires──> Per-Site Session Detection (API doesn't include pre/post show timing)
+```
+
+### Dependency Notes
+
+- **Stream Picker requires CORS proxy:** Browsers enforce same-origin policy. A scraped m3u8 URL from `site.com` cannot be fetched by a Svelte app on `f1.viktorbarzin.me`. Every user-facing stream URL must route through the proxy backend. This is a hard architectural dependency, not an option.
+- **Stream health checker enables stream picker quality:** Without health checking, the picker shows dead links. Health checking must run before streams are displayed and periodically during live sessions.
+- **Automatic extraction trigger depends on schedule:** The scheduler must know when sessions start. Schedule API integration is therefore the first thing to build — everything else gates on it.
+- **Multiple quality options conflict with simple proxy:** If the source provides a multi-variant HLS playlist, the proxy must rewrite ALL variant URLs (not just the master manifest). Adds complexity to the proxy rewriting layer.
+- **Fallback ordering conflicts with stateless proxy:** Tracking success history requires at least a lightweight persistence layer (e.g., Redis or SQLite). If staying fully stateless, fall back to health-check-only ordering.
+
+---
+
+## MVP Definition
+
+### Launch With (v1)
+
+Minimum viable product — what's needed to validate the concept.
+
+- [ ] **F1 Schedule view** — Show upcoming/live sessions for the current season. Single page, no navigation needed.
+- [ ] **CORS-transparent HLS proxy** — Proxy m3u8 manifests + segment URLs through the backend. Without this, nothing plays in the browser.
+- [ ] **Per-site stream extractor(s)** — At least one working extractor for at least one reliable source site. Proves the extraction pipeline end-to-end.
+- [ ] **Stream health checker** — Validate extracted URLs before showing. Dead streams must not surface to users.
+- [ ] **Stream picker** — List available working streams for the current session. User clicks, player loads.
+- [ ] **Embedded HLS player** — HLS.js in Svelte. Plays proxied m3u8 URL in-page.
+- [ ] **Session countdown** — Time-until-start for upcoming sessions. Pure frontend, zero cost.
+- [ ] **Live session indicator** — Visual LIVE/UPCOMING/FINISHED badge. Core navigational signal.
+
+### Add After Validation (v1.x)
+
+Features to add once core pipeline is working and streams actually play reliably.
+
+- [ ] **Stream auto-refresh** — Re-run extractors every 5-10 min during live sessions. Trigger: user reports dead stream or health check fails on previously-valid URL.
+- [ ] **Fallback stream ordering** — Sort by health-check recency and past reliability. Trigger: multiple sources available per session.
+- [ ] **Source labeling in picker** — Show site name with each stream link. Low effort, high trust signal for users.
+- [ ] **Race weekend overview** — All sessions grouped per Grand Prix. Trigger: users navigating between sessions in a weekend.
+- [ ] **Additional extractors** — Expand site coverage once first extractor is stable. Each adds incremental reliability.
+
+### Future Consideration (v2+)
+
+Features to defer until product-market fit is established.
+
+- [ ] **Pre/post show + press conference coverage** — Complex site-specific session detection. Defer until core race coverage is solid.
+- [ ] **Multiple quality options** — Source sites may or may not provide multi-variant playlists. Complexity of rewriting variant URLs in proxy is non-trivial. Validate first if sources actually offer quality tiers.
+- [ ] **Proxy segment prefetch/cache** — High memory cost. Only valuable if buffering is a real user complaint at private scale.
+- [ ] **Session reputation annotations** — Nice UX polish. Not needed at launch.
+
+---
+
+## Feature Prioritization Matrix
+
+| Feature | User Value | Implementation Cost | Priority |
+|---------|------------|---------------------|----------|
+| F1 Schedule view | HIGH | LOW | P1 |
+| CORS-transparent HLS proxy | HIGH | HIGH | P1 (architectural blocker) |
+| Per-site stream extractor | HIGH | HIGH | P1 (core value) |
+| Embedded HLS player | HIGH | LOW | P1 |
+| Stream health checker | HIGH | MEDIUM | P1 |
+| Stream picker | HIGH | LOW | P1 |
+| Session countdown timer | MEDIUM | LOW | P1 |
+| Live session indicator | HIGH | LOW | P1 |
+| Stream auto-refresh | HIGH | MEDIUM | P2 |
+| Source labeling | MEDIUM | LOW | P2 |
+| Fallback stream ordering | MEDIUM | MEDIUM | P2 |
+| Race weekend overview page | MEDIUM | LOW | P2 |
+| Additional extractors | HIGH | MEDIUM | P2 |
+| Multiple quality options | MEDIUM | HIGH | P3 |
+| Pre/post show coverage | MEDIUM | HIGH | P3 |
+| Proxy segment prefetch | LOW | HIGH | P3 |
+| Session reputation annotations | LOW | LOW | P3 |
+
+**Priority key:**
+- P1: Must have for launch
+- P2: Should have, add when possible
+- P3: Nice to have, future consideration
+
+---
+
+## Competitor Feature Analysis
+
+Reference products surveyed: RaceControl (unofficial F1TV client), f1viewer (TUI F1TV client), streamlink (stream extraction CLI), HLS-Proxy (node HLS proxy), Threadfin (M3U proxy), ErsatzTV (self-hosted IPTV).
+
+| Feature | RaceControl (F1TV client) | Streamlink (CLI extractor) | HLS-Proxy (node) | Our Approach |
+|---------|--------------------------|---------------------------|-----------------|--------------|
+| Session schedule | F1TV API (official, auth required) | None (site-specific) | None | OpenF1/Ergast (free, unauthenticated) |
+| Stream extraction | Official F1TV API | Plugin-per-site Python | N/A | Custom per-site extractors (Go/Python HTTP) |
+| Stream quality selection | Multi-variant picker + Chromecast | CLI flag `--default-stream` | Pass-through | HLS.js ABR + manual picker |
+| Multi-stream view | Yes (layout builder, experimental sync) | Multiple instances | N/A | Single stream (MVP), multi optional later |
+| Health checking | None visible | None | None | Active periodic health checks (our differentiator) |
+| Stream proxy | No (plays direct from F1TV CDN) | No (piped to local player) | Yes (manifest + segment rewrite) | Yes (mandatory for browser CORS) |
+| CORS handling | N/A (desktop app) | N/A (local) | Yes (adds permissive CORS headers) | Yes (same-origin proxy) |
+| Auto-extraction at session start | Via F1TV live schedule | None | None | Yes (scheduler + extractor trigger) |
+| Embedded browser player | No (external VLC/mpv) | No (external player) | N/A | Yes (HLS.js in Svelte) |
+| No auth required | No (F1TV subscription) | Varies by source | None | Yes (private URL, no auth layer) |
+
+**Key insight:** Existing tools either require official F1TV credentials (RaceControl, f1viewer) or extract streams to local players (streamlink). None combine automated extraction from unofficial sources + browser-native proxied playback + schedule integration in a single web service. That combination is the product's core novelty.
+
+---
+
+## Sources
+
+- OpenF1 API documentation: https://openf1.org/ — MEDIUM confidence (marketing page, limited technical detail on session endpoints)
+- HLS-Proxy (warren-bank/HLS-Proxy) README — HIGH confidence for proxy architecture requirements (CORS, manifest rewriting, segment caching)
+- HLS.js README (video-dev/hls.js) — HIGH confidence for client-side HLS capabilities (ABR modes, quality switching, error recovery)
+- Streamlink documentation: https://streamlink.github.io/ — HIGH confidence for extraction patterns and plugin architecture
+- yt-dlp README — HIGH confidence for extractor-per-site pattern and format selection
+- RaceControl (robvdpol/RaceControl) README — MEDIUM confidence for F1 streaming UX expectations
+- f1viewer (SoMuchForSubtlety/f1viewer) README — MEDIUM confidence for F1 session coverage expectations
+- Threadfin README — MEDIUM confidence for IPTV/HLS proxy feature patterns
+- Telly README — LOW confidence (Plex-specific, limited relevance)
+- Eyevinn/hls-proxy README — HIGH confidence for HLS manifest manipulation patterns
+
+---
+
+*Feature research for: F1 Live Stream Aggregation Service*
+*Researched: 2026-02-23*
--- a/.planning/research/PITFALLS.md
+++ b/.planning/research/PITFALLS.md
@ -0,0 +1,291 @@
+# Pitfalls Research
+
+**Domain:** Live stream aggregation and proxy service (F1-focused)
+**Researched:** 2026-02-23
+**Confidence:** MEDIUM — findings synthesized from yt-dlp/streamlink source analysis (HIGH), nginx proxy documentation (HIGH), HLS RFC/spec analysis (HIGH), OpenF1 API docs (HIGH), and web searches that returned sparse results (LOW where noted)
+
+---
+
+## Critical Pitfalls
+
+### Pitfall 1: Treating JavaScript-Rendered Tokens as Static
+
+**What goes wrong:**
+Stream URLs on sports streaming sites are not present in raw HTML. They are computed client-side by obfuscated JavaScript — the page HTML contains an encrypted or encoded config blob, and the actual HLS URL is assembled by executing that JS. A scraper that fetches the raw HTML page and runs a regex over it finds nothing.
+
+**Why it happens:**
+Developers assume "the URL must be somewhere in the page source." They inspect the page in DevTools, see the URL in the network tab, and try to replicate what they observe — but miss that the URL was produced by JS execution, not served in the initial response.
+
+**How to avoid:**
+- For each target site: trace the actual network request that fetches the m3u8 in browser DevTools (Network tab, filter by `.m3u8`). Identify the API endpoint that the JS calls to get the signed URL. Replicate *that* API call (often a JSON endpoint), not the page fetch.
+- If the token is computed entirely client-side (e.g., via CryptoJS with a hardcoded key), implement the same algorithm in your extractor. Do not run headless browser — reverse-engineer the JS algorithm.
+- Document which sites require JS execution vs. which expose a clean API endpoint. Sites often have a backend API that the JavaScript calls; scraping that API is faster and more stable than re-implementing the JS.
+
+**Warning signs:**
+- Extractor returns empty results on a page you can watch in the browser
+- Network tab shows the m3u8 URL appearing only after JavaScript fires an XHR/fetch call
+- Page HTML contains a large base64 blob or heavily obfuscated JS variable (e.g., `var _0x1a2b = [...]`)
+
+**Phase to address:**
+Extractor design phase (before writing a single extractor). Establish upfront: for each target site, determine if raw HTTP fetch is sufficient or if API reverse-engineering is required.
+
+---
+
+### Pitfall 2: m3u8 Segment URLs Break When Proxied Through a Different Domain
+
+**What goes wrong:**
+When you fetch an m3u8 playlist and serve it through your proxy, the segment URLs inside the playlist may be absolute URLs pointing to the original CDN. The browser (or HLS.js) follows those segment URLs directly, bypassing your proxy entirely. This means you cannot control access, cannot inject headers, and CORS blocks the segments if the CDN doesn't allow cross-origin requests from your frontend domain.
+
+**Why it happens:**
+HLS playlists can contain either absolute URLs (`https://cdn.example.com/seg001.ts`) or relative paths (`seg001.ts`). Most streaming CDNs use absolute URLs with signed tokens. Proxying only the m3u8 is insufficient — every segment URL must also be rewritten to route through your relay.
+
+**How to avoid:**
+- When serving the m3u8 through your proxy, **rewrite all segment URLs** to point to your relay endpoint before sending the playlist to the client. Example: replace `https://cdn.site.com/segment001.ts?token=xyz` with `https://your-relay.domain/proxy/segment?url=<encoded-original-url>`.
+- Your relay endpoint then fetches the original segment and streams it to the client.
+- Handle multi-level playlists: master playlists (variant streams) reference child playlists which reference segments — rewrite at each level.
+
+**Warning signs:**
+- Client-side CORS errors in browser console referencing the original CDN domain
+- Network tab shows segment fetches bypassing your proxy after the m3u8 loads
+- Some quality variants play but others don't (partial rewriting)
+
+**Phase to address:**
+Stream relay/proxy phase. Must be designed before the first end-to-end stream test.
+
+---
+
+### Pitfall 3: CDN-Signed Token URLs Expire Mid-Stream
+
+**What goes wrong:**
+Many CDNs sign stream URLs with a short-lived token (often 5–30 minutes). The m3u8 playlist URL itself may be signed, and so may each segment URL. A user who starts watching near the token expiry time will get a working stream for the first few segments, then receive 403 Forbidden errors as the token expires mid-playback.
+
+**Why it happens:**
+Developers test stream extraction, confirm the URL works, and ship it — without accounting for token TTL. The token was valid at extraction time but expires before or during playback. Live streams compound this: the m3u8 playlist updates every few seconds and each update may contain newly signed segment URLs. If your relay cached an old playlist, segments within it are expired.
+
+**How to avoid:**
+- Never cache m3u8 playlist files. Always fetch the live playlist from upstream on each client request. Cache only TS/m4s segments (which have longer or no expiry).
+- When extracting the initial stream URL, record the extraction timestamp and the token TTL (if discoverable from response headers like `Cache-Control: max-age=N`). Re-extract before expiry.
+- Implement a background refresh: when serving a stream, periodically re-run the extractor to get a fresh URL and pivot the relay to the new upstream without interrupting the client.
+- Test expiry by extracting a URL and waiting 30 minutes before playing — a failing test here reveals token TTL issues.
+
+**Warning signs:**
+- Stream plays for exactly N minutes then fails with 403 on segments
+- Extracting a URL works in isolation but fails when embedded in the player after a delay
+- Sites with Cloudflare or custom CDN always add `?token=` or `?sig=` parameters to segment URLs
+
+**Phase to address:**
+Stream relay phase. The relay architecture must include a URL refresh loop from day one.
+
+---
+
+### Pitfall 4: Per-Site Extractor Maintenance Burden Is Dramatically Underestimated
+
+**What goes wrong:**
+Each target site is a custom engineering problem. Sites change their HTML structure, JavaScript obfuscation, API endpoints, or anti-bot measures without notice. A working extractor can break silently overnight. With 5 target sites, you effectively have 5 separate maintenance tracks. Expecting "set and forget" behavior is the most common planning mistake in this domain.
+
+**Why it happens:**
+Developers build an extractor, it works, and they move on. Sites then deploy a CDN update, change their frontend framework, or rotate their obfuscation keys. The failure is silent — no exception is raised, the extractor just returns no URL, and the user sees an empty player.
+
+**How to avoid:**
+- Build a health check system that runs each extractor on a schedule (every 15 minutes during race weekends, every hour otherwise), logs success/failure, and triggers alerts on failure.
+- Design extractors with failure visibility: log exactly which step failed (page fetch, URL parse, API call, etc.) so debugging is fast.
+- Keep extractor logic isolated and testable: each extractor is a module that takes no inputs and returns a stream URL or raises an exception. Run integration tests against live sites on a schedule.
+- Plan 1–2 hours of maintenance per extractor per month as baseline, more during site redesigns.
+
+**Warning signs:**
+- No automated testing of extractors against live sites
+- Extractor code tightly coupled to specific HTML element IDs or class names (breaks on any frontend change)
+- No alerting when an extractor returns no URL
+
+**Phase to address:**
+Extractor design phase. The monitoring/health-check system must be built alongside the first extractor, not added later.
+
+---
+
+### Pitfall 5: Missing or Incorrect CORS Headers on the Relay Breaks Browser Playback
+
+**What goes wrong:**
+HLS.js in the browser makes cross-origin requests to fetch m3u8 playlists and segment files. If your relay doesn't serve the correct CORS headers, every segment request fails with a CORS error. Even a single missing header (e.g., on `.ts` segment responses but not `.m3u8` responses) breaks the stream.
+
+**Why it happens:**
+Developers test the relay with `curl` or server-to-server calls, where CORS is irrelevant. The relay works in isolation but fails when the browser's HLS.js player makes the requests.
+
+**How to avoid:**
+- Set CORS headers on **all** relay endpoints: `Access-Control-Allow-Origin: *` (or your specific frontend domain), `Access-Control-Allow-Methods: GET, HEAD, OPTIONS`, `Access-Control-Allow-Headers: Range`.
+- The `Range` header is critical: HLS.js often sends range requests for segments. If `Range` is not in `Allow-Headers`, preflight OPTIONS requests fail.
+- Do not use wildcard `*` if you also send `Access-Control-Allow-Credentials: true` — that combination is invalid and browsers reject it.
+- Test from the actual browser environment (or use a CORS testing tool) before calling any relay endpoint "done."
+
+**Warning signs:**
+- Browser console shows `No 'Access-Control-Allow-Origin' header` errors
+- Streams work when loaded directly in a `<video>` src but fail when loaded via HLS.js
+- Preflight OPTIONS requests returning 405 Method Not Allowed
+
+**Phase to address:**
+Stream relay phase, first integration test.
+
+---
+
+### Pitfall 6: IP-Based Banning From Target Streaming Sites
+
+**What goes wrong:**
+Streaming sites detect and ban server IP ranges because your relay sends requests from a datacenter IP (K8s node) with no residential IP characteristics. The extractor initially works from a developer laptop (residential ISP), then fails when deployed to the cluster because the server IP is blocked or fingerprinted differently.
+
+**Why it happens:**
+Streaming sites use IP reputation databases. Cloud provider IP ranges (AWS, GCP, Hetzner, OVH, etc.) are pre-blocked or rate-limited on many streaming platforms. Your home cluster may or may not be in a residential IP range depending on your ISP.
+
+**How to avoid:**
+- Test extractors from the same environment they'll run in (the K8s cluster) before committing to a site as a target. A site that works from your laptop may be blocked from the server.
+- Use realistic HTTP headers: `User-Agent` matching a current browser, `Accept`, `Accept-Language`, `Accept-Encoding` headers that match a real browser session. Missing or mismatched headers are a primary signal.
+- Include `Referer` headers matching the expected source page. Many CDNs check that the referer is the streaming site itself before serving signed URLs.
+- Rotate request patterns if hitting the same site repeatedly: add random delays, avoid predictable polling intervals.
+
+**Warning signs:**
+- Extractor works in local testing but returns 403/429 in deployment
+- Sites return Cloudflare IUAM challenge pages (JS challenge) when scraped from the server IP
+- Response body contains "Access Denied" or "Bot Detected" rather than the expected HTML
+
+**Phase to address:**
+Extractor design phase. Test against the production network before finalizing site targets.
+
+---
+
+## Technical Debt Patterns
+
+Shortcuts that seem reasonable but create long-term problems.
+
+| Shortcut | Immediate Benefit | Long-term Cost | When Acceptable |
+|----------|-------------------|----------------|-----------------|
+| Hardcode stream URL for the first race | Ship fast | Breaks immediately; no automation | Never — defeats the purpose |
+| Use `BeautifulSoup` regex on entire page HTML | Simple implementation | Breaks on any frontend change; misses JS-rendered content | Only for static HTML pages with predictable structure |
+| Cache m3u8 playlists at the relay | Reduce upstream requests | Serves expired segment URLs; stream breaks mid-playback | Never for live content |
+| Single-threaded sequential extractor polling | Simple code | Can't handle concurrent users fetching different streams | Only in MVP with a single stream |
+| Skip extractor health checks for MVP | Faster to ship | Silent failures, no visibility into broken extractors | Only if you have < 1 stream and check manually |
+| Proxy all segments (relay mode) without segment caching | Correct behavior | High bandwidth usage; each viewer multiplies bandwidth | Only at low viewer count (< 5) |
+| Use headless browser for all extractors | Handles all JS | Slow (3–10s per extraction), high memory, complex ops | Fallback for sites that truly cannot be handled otherwise |
+
+---
+
+## Integration Gotchas
+
+Common mistakes when connecting to external services.
+
+| Integration | Common Mistake | Correct Approach |
+|-------------|----------------|------------------|
+| Target streaming sites | Fetch the HTML page and regex for the stream URL | Identify the actual API endpoint the site JS calls; hit that directly |
+| Target streaming sites | Ignore cookies and session state | Maintain a cookie jar per site; some sites require a session cookie from the homepage before serving stream API |
+| Target streaming sites | Send requests without `Referer` header | Always set `Referer` to the page that would normally contain the player |
+| CDN segment URLs | Use the same signed URL for the full stream duration | Re-fetch the m3u8 on each playlist poll to get freshly signed segment URLs |
+| OpenF1 API (race schedule) | Call live-data endpoints during a session on the free tier | Free tier allows only historical data; live data costs €9.90/month — use F1 calendar static JSON for schedule |
+| OpenF1 API | Assume the API is official F1 data | OpenF1 is a third-party fan project, not affiliated with F1; data may lag or be incorrect |
+| Ergast API | Expect stable availability in 2026 | Ergast deprecated their API in late 2024; use OpenF1 or the unofficial `api.formula1.com` instead |
+| HLS.js player | Load the proxied m3u8 URL directly without error handler | Always attach `hls.on(Hls.Events.ERROR, ...)` with media error recovery; live streams have transient failures |
+| HLS.js player | Assume autoplay works | Browser autoplay policies block unmuted video. Always mute by default or show a play button |
+
+---
+
+## Performance Traps
+
+Patterns that work at small scale but fail as usage grows.
+
+| Trap | Symptoms | Prevention | When It Breaks |
+|------|----------|------------|----------------|
+| Relay proxies all segment bytes | Works for 1 viewer; saturates uplink for 5+ | Serve the rewritten m3u8 but let clients fetch segments directly from CDN (only proxy the playlist) | > 3 concurrent viewers on a typical F1 stream (5–8 Mbps per viewer) |
+| Polling all extractors every minute | Works with 1–2 sites; CPU/memory spike at race time | Poll only during race windows; use event-driven triggers from the schedule | Always — race starts matter, not constant polling |
+| Synchronous extractor execution blocks the API response | First request takes 5–10s while extractor runs | Pre-warm extractors before the race start time; cache last-known working URL | First user to request a stream before pre-warming |
+| No connection pooling to upstream CDNs | High segment fetch latency | Reuse HTTP connections with keep-alive | > 10 segments/second through the relay |
+| Storing stream session state in memory (in-process) | Works on one pod | Lost on pod restart; user stream breaks | Any Kubernetes pod restart or rolling deployment |
+
+---
+
+## Security Mistakes
+
+Domain-specific security issues beyond general web security.
+
+| Mistake | Risk | Prevention |
+|---------|------|------------|
+| Exposing the raw upstream CDN URL in API response | Users bypass your relay; sites can track and block the raw URL if scraped | Keep upstream URLs server-side only; serve an opaque relay URL to the client |
+| Open relay endpoint with no auth | Your relay becomes a public proxy for any content, burning bandwidth and attracting abuse | Require at minimum a shared secret or same-origin check; this is private infrastructure |
+| Logging full signed CDN URLs | Signed URLs in logs = anyone with log access can watch the stream | Log only the site name and stream quality, not the signed URL |
+| Storing site credentials (if target site requires login) in source code | Credentials rotate or get revoked; leaked credentials cause account bans | Use environment variables / Kubernetes secrets; never commit credentials |
+| No rate limiting on the relay API | A single misbehaving client can exhaust bandwidth | Add rate limiting per IP on the `/proxy/` endpoints |
+
+---
+
+## UX Pitfalls
+
+Common user experience mistakes in this domain.
+
+| Pitfall | User Impact | Better Approach |
+|---------|-------------|-----------------|
+| Showing all streams including dead/offline ones | User clicks a stream, gets a black player; no indication of why | Pre-validate streams before the race and tag each as "live", "offline", or "extracting"; surface status in the stream picker |
+| Player starts with audio on (bypassing autoplay mute) | Browser blocks autoplay; user sees a broken play state | Start muted by default; show a prominent unmute button |
+| No stream quality selector | Users on slow connections buffer constantly | Expose the HLS quality levels via HLS.js API; let users pick |
+| Race schedule shows times in UTC only | Users outside UTC miss sessions | Detect browser timezone and display in local time; let users configure their timezone |
+| Stream picker has no quality or language indicator | User has to try each stream to find the best one | Label streams with: source site, resolution (1080p/720p/480p), language, and status |
+| No loading state feedback during extraction | User sees blank screen for 5–10 seconds during extractor run | Show a "Finding stream..." spinner with a progress indicator |
+
+---
+
+## "Looks Done But Isn't" Checklist
+
+Things that appear complete but are missing critical pieces.
+
+- [ ] **Extractor for site X works:** Verify it works from the production K8s network (not just localhost) and handles the case where the site returns a challenge page
+- [ ] **Stream proxy works:** Verify with HLS.js in a browser, not just with curl — CORS errors only appear in browser context
+- [ ] **m3u8 rewriting works:** Verify that multi-level playlists (master → variant → segments) are all rewritten, not just the top-level m3u8
+- [ ] **Token expiry handled:** Wait 30 minutes after extracting a URL, then try to play it — test that the refresh mechanism kicks in
+- [ ] **Race schedule is accurate:** Verify timezone handling — F1 races in different countries, and session times shift with DST changes mid-season
+- [ ] **Relay is actually private:** Confirm the relay endpoints are not publicly accessible without auth — check via Traefik ingress rules
+- [ ] **Extractor monitoring alerts:** Trigger an extractor failure manually and verify an alert fires before the next race
+
+---
+
+## Recovery Strategies
+
+When pitfalls occur despite prevention, how to recover.
+
+| Pitfall | Recovery Cost | Recovery Steps |
+|---------|---------------|----------------|
+| Extractor breaks because site changed its JS | MEDIUM | Open browser DevTools on the site, re-trace the API call sequence, update the extractor; typical fix time 30–90 minutes per site |
+| CDN-signed URL expires mid-stream | LOW | Restart the stream extraction (background refresh handles this if implemented); user may need to re-click play |
+| Relay bandwidth saturated | MEDIUM | Switch relay strategy: serve rewritten m3u8 but redirect segment fetches directly to CDN (remove relay from segment path) |
+| Site IP-bans the cluster | HIGH | Either accept the site as unavailable or route extractor requests through a residential proxy/VPN exit; may require re-evaluating the site as a target |
+| OpenF1 API unavailable | LOW | Fall back to F1 calendar static JSON; race schedule data changes infrequently so a cached fallback is safe |
+| HLS.js CORS errors in browser | LOW | Add missing `Access-Control-Allow-Origin` and `Access-Control-Allow-Headers: Range` to relay responses; deploy fix |
+
+---
+
+## Pitfall-to-Phase Mapping
+
+How roadmap phases should address these pitfalls.
+
+| Pitfall | Prevention Phase | Verification |
+|---------|------------------|--------------|
+| JS-rendered tokens not found in HTML | Extractor design (Phase 1) | Each extractor spec documents: "token source = API endpoint or JS algorithm" |
+| m3u8 segment URLs bypass proxy | Stream relay (Phase 2) | End-to-end browser test: open Network tab and confirm zero requests go to original CDN domain |
+| CDN token expiry mid-stream | Stream relay (Phase 2) | Play stream for 45 minutes; verify no 403 errors on segments |
+| Extractor maintenance burden | Extractor design + monitoring (Phase 1 + Phase 3) | Health check system alerts fire within 5 minutes of extractor failure |
+| Missing CORS on relay | Stream relay (Phase 2) | Browser-based smoke test with CORS error detection |
+| IP-based banning | Extractor design (Phase 1) | Test all extractors from production network before finalizing site list |
+| Silent extractor failures | Monitoring (Phase 3) | Inject a deliberate failure; verify alert reaches notification channel |
+| Race schedule timezone errors | Schedule integration (Phase 1) | Test with browser timezone set to UTC+11 (Australia) and UTC-5 (Americas) |
+| Open relay as public proxy | Infrastructure (Phase 2) | Verify relay endpoint returns 401/403 without auth from an external network |
+
+---
+
+## Sources
+
+- yt-dlp `common.py` extractor base class source (HIGH confidence — production extractor framework): https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/extractor/common.py
+- yt-dlp contribution guidelines and extractor authoring notes (HIGH confidence): https://github.com/yt-dlp/yt-dlp/blob/master/CONTRIBUTING.md
+- RFC 8216 — HTTP Live Streaming specification, segment URL and playlist requirements (HIGH confidence): https://datatracker.ietf.org/doc/html/rfc8216
+- nginx `ngx_http_proxy_module` documentation — proxy buffering, URL rewriting, timeout configuration (HIGH confidence): https://nginx.org/en/docs/http/ngx_http_proxy_module.html
+- OpenF1 API documentation — rate limits, live vs. historical data, usage rights (HIGH confidence): https://openf1.org/
+- HLS.js API documentation — initialization order, error handling, quality level management (HIGH confidence): https://github.com/video-dev/hls.js/blob/master/docs/API.md
+- MDN CORS documentation — credential restrictions, preflight requirements, header rules (HIGH confidence): https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS
+- F1 calendar site (f1calendar.com) — timezone complexity observations, session structure (MEDIUM confidence)
+- Web search findings on streaming site anti-scraping techniques (LOW confidence — search returned sparse results for this specific domain)
+
+---
+*Pitfalls research for: F1 live stream aggregation and proxy service*
+*Researched: 2026-02-23*
--- a/.planning/research/STACK.md
+++ b/.planning/research/STACK.md
@ -0,0 +1,172 @@
+# Stack Research
+
+**Domain:** Live stream aggregation and proxy service (F1 streams)
+**Researched:** 2026-02-23
+**Confidence:** HIGH (versions verified against PyPI, GitHub releases, and official docs)
+
+---
+
+## Recommended Stack
+
+### Core Technologies
+
+| Technology | Version | Purpose | Why Recommended |
+|------------|---------|---------|-----------------|
+| Python | 3.13 (image: `python:3.13-slim-bookworm`) | Backend runtime | Latest stable; JIT in 3.13 helps CPU-bound m3u8 rewriting; `python:3.13-slim-bookworm` is the official minimal production image. Async-native for concurrent stream proxying. |
+| FastAPI | 0.132.0 | HTTP API + stream relay | Async-first ASGI framework with native streaming response support (`StreamingResponse`). Best-in-class for HTTP proxy patterns where you relay chunked data. Built-in OpenAPI docs. Pydantic integration. |
+| yt-dlp | 2026.2.21 | Stream URL extraction | De-facto standard for extracting final HLS URLs from obfuscated sites. Supports 1000+ extractors, handles redirect chains, CSRF cookies, JS-rendered pages. Used as a Python library (`yt_dlp.YoutubeDL`), not just CLI. Updated continuously — this is critical for staying current with sites that change obfuscation. |
+| Playwright (Python) | 1.58.0 | JS-rendered scraping | Required for sites that serve stream links only after JavaScript execution. `playwright.async_api` integrates naturally with FastAPI's async event loop. Use only when yt-dlp extractors don't cover the target site — Playwright is the fallback for custom per-site extractors. Chromium headless. |
+| Svelte 5 + SvelteKit 2 | Svelte 5.53.3 / SvelteKit 2.53.0 | Frontend | Matches user's stated preference. Svelte 5's runes reactivity model is stable and production-ready. SvelteKit 2 provides SSR + SPA routing. Minimal bundle size matters for embedded player page. |
+| hls.js | 1.6.15 | In-browser HLS playback | Native `<video>` does not support HLS on non-Safari. hls.js is the only production-grade MSE-based HLS player. Handles adaptive bitrate switching. Integrates trivially into Svelte via `onMount`. |
+| FastF1 | 3.8.1 | F1 race schedule data | Wraps jolpica-f1 API (Ergast-compatible) with built-in caching, pandas DataFrames, and Python-native models. Direct replacement for the deprecated Ergast API. Provides race schedule, session times, round numbers. |
+| Redis | 7.x (existing cluster: `redis.redis.svc.cluster.local`) | Extracted URL caching, dedup | Extracted HLS URLs are expensive (JS render + redirect chain). Cache them with TTL (15–30 min). Existing Redis in cluster — zero additional infra. Use `redis-py` (async: `redis.asyncio`). |
+| APScheduler | 3.11.2 | Schedule polling cron | Runs the F1 schedule sync job (daily) and triggers scrape jobs before sessions. AsyncIOScheduler integrates with FastAPI lifespan. Avoids needing a separate Celery worker for simple periodic tasks at this scale. |
+
+### Supporting Libraries
+
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| httpx | 0.28.1 | Async HTTP client for scraping | Use for fetching pages and following redirect chains without JS. Preferred over `requests` in async context (requests is sync-only). Supports HTTP/2, connection pooling, cookie jars. |
+| BeautifulSoup4 | 4.14.3 | HTML parsing for link extraction | Parse scraped HTML to find stream link candidates before handing to yt-dlp. Use with `lxml` parser (faster). Only needed for custom extractors; yt-dlp handles most internally. |
+| m3u8 | 6.0.0 | HLS playlist parsing and rewriting | Parse master playlists to rewrite segment URLs through the proxy. Required if you relay streams (rewrite absolute URLs → proxy URLs). |
+| Pydantic | 2.12.5 | Data models and validation | FastAPI already depends on it. Use for `StreamSource`, `RaceEvent`, `ExtractorResult` models. Pydantic v2 is significantly faster than v1 (Rust-backed). |
+| aiosqlite | 0.22.1 | Lightweight persistent store | Store race schedule cache and scrape job state. Single pod → no concurrency issues with SQLite. Use for data that outlives Redis TTL (schedule, extractor configs). |
+| streamlink | 8.2.0 | Secondary stream extractor | Plugin-based extractor as an alternative/complement to yt-dlp. Has different coverage. Use as fallback when yt-dlp lacks an extractor for a specific site. Can also pipe stream bytes directly. |
+| python-multipart | latest | Form data parsing | Required by FastAPI for any form-based endpoints. |
+| uvicorn | latest | ASGI server | FastAPI production server. Use `uvicorn[standard]` (includes uvloop + httptools for 2x throughput). |
+| redis-py | 5.x | Redis async client | `redis.asyncio.from_url(...)` — async Redis client. Part of the `redis` package. |
+| Tailwind CSS | 4.2.1 | Frontend styling | v4 is stable. Zero-runtime CSS. Works natively with SvelteKit via Vite plugin. |
+
+### Development Tools
+
+| Tool | Purpose | Notes |
+|------|---------|-------|
+| Vite 8 | Frontend build | SvelteKit 2.53.0 ships with Vite 8. No separate install needed. |
+| pytest + pytest-asyncio | Backend tests | Test extractors and stream proxy logic. `pytest-asyncio` for async route tests. |
+| ruff | Python linting + formatting | Replaces flake8 + black + isort. Single fast binary. |
+| mypy | Type checking | FastAPI + Pydantic 2 provide good type inference. Catches extractor return type bugs early. |
+| Playwright test runner | Extractor integration tests | Use `playwright codegen` to record site interactions for new extractors. |
+
+---
+
+## Installation
+
+```bash
+# Backend (Python 3.13)
+pip install fastapi==0.132.0 uvicorn[standard] httpx==0.28.1
+pip install yt-dlp==2026.2.21 playwright==1.58.0 streamlink==8.2.0
+pip install fastf1==3.8.1 apscheduler==3.11.2
+pip install beautifulsoup4==4.14.3 lxml m3u8==6.0.0
+pip install pydantic==2.12.5 aiosqlite==0.22.1 redis==5.x
+pip install python-multipart
+
+# Install Playwright browser binaries (in Dockerfile)
+playwright install chromium --with-deps
+
+# Frontend (Node/SvelteKit)
+npm create svelte@latest frontend
+npm install -D tailwindcss @tailwindcss/vite
+npm install hls.js
+```
+
+---
+
+## Alternatives Considered
+
+| Recommended | Alternative | When to Use Alternative |
+|-------------|-------------|-------------------------|
+| FastAPI + Python | Go + gin/fiber | If throughput > 10k concurrent streams. Go is more efficient at raw TCP proxying, but Python's yt-dlp/playwright ecosystem has no equivalent in Go — would require subprocess shelling out. Python wins for this use case. |
+| yt-dlp (library) | yt-dlp (subprocess) | Never subprocess — library mode gives direct access to extractor info_dict, format selection, and cookie jars without shell overhead. |
+| APScheduler | Celery | Celery requires a separate worker process + broker queue. APScheduler runs in-process. Overkill for 2 periodic jobs (schedule sync + pre-session scrape trigger). Use Celery only if scrape jobs need distributed execution across many workers. |
+| hls.js | Video.js + @videojs/http-streaming | Video.js (v8.23.4) wraps hls.js internally but adds significant bundle overhead. Use hls.js directly in Svelte for minimal footprint and full control. |
+| Redis (existing) | In-memory dict cache | In-memory cache is lost on pod restart (K8s). Redis provides persistence across restarts + shared state if you ever run multiple replicas. Already in cluster at no cost. |
+| SvelteKit | Next.js / Nuxt | User preference is Svelte. SvelteKit's adapter-node works well in Docker/K8s. Bundle size advantage matters for embedded player page load time. |
+| FastF1 | Direct Jolpica API calls | FastF1 adds built-in caching, retry logic, and Python models. Saves implementing Ergast-compatible parsing from scratch. jolpica-f1 API endpoint: `https://api.jolpi.ca/ergast/f1/<season>/races/` |
+| Playwright (Python) | Selenium | Playwright is faster, has async API, and Chromium headless is more stable than Selenium's ChromeDriver setup. Playwright's `page.route()` lets you intercept XHR/fetch calls to capture stream URLs without loading full pages. |
+| aiosqlite | PostgreSQL | PostgreSQL is overkill for single-node persistent schedule cache. SQLite with aiosqlite on NFS is fine for read-heavy, write-rare schedule data. If you need multi-pod writes, migrate to PostgreSQL. |
+
+---
+
+## What NOT to Use
+
+| Avoid | Why | Use Instead |
+|-------|-----|-------------|
+| `requests` library | Synchronous only — blocks the event loop in async FastAPI handlers, causing stream proxy delays | `httpx` (drop-in async replacement with identical API style) |
+| `youtube-dl` | Archived/unmaintained since 2021. yt-dlp is the maintained fork with 3x more extractors and weekly updates. Sites actively break youtube-dl. | `yt-dlp` |
+| Selenium | Heavy (~500MB browser + driver), poor async support, ChromeDriver version pinning causes constant breakage in K8s | `playwright` (native async, auto-manages browser binaries) |
+| Django | Sync-first ORM, not designed for long-lived streaming HTTP connections. FastAPI handles `StreamingResponse` for chunked relay natively. | `FastAPI` |
+| FFmpeg-based re-encoding | Introduces CPU-intensive transcode step, adds latency, and is unnecessary — HLS segments are already in a playable format. Proxy segments as-is. | Direct HLS segment relay (fetch + stream through) |
+| Tornado | Was the async Python standard before asyncio. Replaced by asyncio-native frameworks. Less ecosystem support. | `FastAPI` + `uvicorn` |
+| Celery for simple scheduling | Requires Redis as broker + a separate worker pod. Two extra moving parts for tasks that can run in-process. | `APScheduler` (AsyncIOScheduler in FastAPI lifespan) |
+| `m3u8` for full stream relay | Segment-by-segment proxying via Python is high-latency (Python overhead per segment fetch). Use HTTP redirect for public segments; only rewrite the playlist if segments require auth cookies. | Redirect to original segments where possible; only proxy if cookies are required |
+
+---
+
+## Stack Patterns by Variant
+
+**If a site requires only cookie passing (no JS rendering):**
+- Use `httpx` with cookie jar from initial login request
+- Extract HLS URL from HTML with BeautifulSoup4 or regex
+- No Playwright needed; much faster startup
+
+**If a site requires JavaScript execution to reveal the stream URL:**
+- Use Playwright async API: `page.route()` to intercept XHR requests containing `.m3u8` URLs
+- Use `page.evaluate()` to extract obfuscated CSRF tokens from JS context
+- Cache the result in Redis with 20-minute TTL
+
+**If the HLS segments require authentication cookies:**
+- Use m3u8 library to rewrite segment URLs → point to proxy endpoint
+- Proxy endpoint fetches segments with stored cookies and streams back bytes
+- Required when segments have signed URLs or cookie gates
+
+**If yt-dlp has an extractor for the site:**
+- Use `yt_dlp.YoutubeDL(opts).extract_info(url)` — returns `formats` list with direct HLS URLs
+- Much simpler than custom Playwright extractor; always try yt-dlp first
+
+---
+
+## Version Compatibility
+
+| Package | Compatible With | Notes |
+|---------|-----------------|-------|
+| FastAPI 0.132.0 | Pydantic 2.x | FastAPI 0.100+ requires Pydantic v2. Do not mix with Pydantic v1. |
+| SvelteKit 2.53.0 | Vite 8.x | SvelteKit 2.53.0 explicitly added Vite 8 support (Feb 2025). |
+| yt-dlp 2026.2.21 | Python 3.9+ | yt-dlp follows CalVer; latest version works with Python 3.13. |
+| Playwright 1.58.0 | Python 3.9+ | Requires `playwright install chromium` in Dockerfile. Chromium 145.0.7632.6. |
+| APScheduler 3.11.2 | Python 3.8+ | Use `AsyncIOScheduler` with `asyncio` event loop. APScheduler 4.x is in beta — stick with 3.x. |
+| FastF1 3.8.1 | Python 3.10+ | Requires Python 3.10+. Uses jolpica-f1 API under the hood (Ergast-compatible). |
+| hls.js 1.6.15 | Modern browsers | MSE required (Chrome, Firefox, Edge). No iOS Safari (use native HLS). |
+| redis-py 5.x | Redis 7.x | `redis.asyncio` for async usage. Cluster in infra already runs Redis 7.x. |
+| Tailwind CSS 4.2.1 | Vite 8 | Tailwind v4 uses Vite plugin (`@tailwindcss/vite`), not PostCSS config. Breaking change from v3. |
+
+---
+
+## Sources
+
+- PyPI: yt-dlp 2026.2.21 — https://pypi.org/project/yt-dlp/ (HIGH confidence, verified Feb 2026)
+- PyPI: FastAPI 0.132.0 — https://pypi.org/project/fastapi/ (HIGH confidence, verified Feb 2026)
+- PyPI: Playwright 1.58.0 — https://pypi.org/project/playwright/ (HIGH confidence, verified Feb 2026)
+- PyPI: httpx 0.28.1 — https://pypi.org/project/httpx/ (HIGH confidence, verified Feb 2026)
+- PyPI: APScheduler 3.11.2 — https://pypi.org/project/APScheduler/ (HIGH confidence, verified Feb 2026)
+- PyPI: FastF1 3.8.1 — https://pypi.org/project/fastf1/ (HIGH confidence, verified Feb 2026)
+- PyPI: Pydantic 2.12.5 — https://pypi.org/project/pydantic/ (HIGH confidence, verified Feb 2026)
+- PyPI: BeautifulSoup4 4.14.3 — https://pypi.org/project/beautifulsoup4/ (HIGH confidence, verified Feb 2026)
+- PyPI: aiohttp 3.13.3 — https://pypi.org/project/aiohttp/ (HIGH confidence, verified Feb 2026)
+- PyPI: aiosqlite 0.22.1 — https://pypi.org/project/aiosqlite/ (HIGH confidence, verified Feb 2026)
+- PyPI: streamlink 8.2.0 — https://pypi.org/project/streamlink/ (HIGH confidence, verified Feb 2026)
+- PyPI: m3u8 6.0.0 — https://pypi.org/project/m3u8/ (HIGH confidence, verified Feb 2026)
+- PyPI: requests 2.32.5 — https://pypi.org/project/requests/ (HIGH confidence, verified Aug 2025)
+- GitHub: svelte 5.53.3 — https://github.com/sveltejs/svelte/releases (HIGH confidence, verified Feb 2026)
+- GitHub: SvelteKit 2.53.0 — https://github.com/sveltejs/kit/releases (HIGH confidence, verified Feb 2026)
+- GitHub: hls.js 1.6.15 — https://github.com/video-dev/hls.js/releases (HIGH confidence, verified)
+- GitHub: Tailwind CSS 4.2.1 — https://github.com/tailwindlabs/tailwindcss/releases (HIGH confidence, verified Feb 2026)
+- GitHub: Traefik v3.6.9 — https://github.com/traefik/traefik/releases (HIGH confidence, verified Feb 2026)
+- GitHub: jolpica-f1 — https://github.com/jolpica/jolpica-f1 (MEDIUM confidence — Ergast API replacement, community-maintained)
+- GitHub: Python Docker images — https://github.com/docker-library/python/blob/master/versions.json (HIGH confidence — python:3.13-slim-bookworm, Python 3.13.12)
+- GitHub: Video.js v8.23.4 — https://github.com/videojs/video.js/releases (HIGH confidence, confirmed hls.js is preferred for direct integration)
+- PyPI: Celery 5.6.2 — https://pypi.org/project/celery/ (HIGH confidence — confirmed as overkill vs APScheduler for this use case)
+
+---
+
+*Stack research for: F1 stream aggregation and proxy service*
+*Researched: 2026-02-23*
--- a/.planning/research/SUMMARY.md
+++ b/.planning/research/SUMMARY.md
@ -0,0 +1,244 @@
+# Project Research Summary
+
+**Project:** F1 Live Stream Aggregation and Proxy Service
+**Domain:** Live stream aggregation, HLS proxy, sports scheduling
+**Researched:** 2026-02-23
+**Confidence:** MEDIUM (stack HIGH, architecture MEDIUM, features MEDIUM, pitfalls MEDIUM)
+
+## Executive Summary
+
+This project builds a self-hosted web service that aggregates live F1 streams from unofficial streaming sites, proxies them through the service to handle CORS and authentication, and presents them via an embedded HLS player with an F1 race schedule. The recommended approach is a Python/FastAPI backend (async, streaming-capable) paired with a Svelte 5/SvelteKit 2 frontend. The backend has four distinct responsibilities that must be built in dependency order: schedule data retrieval, per-site stream extraction, stream health checking, and HLS proxy/relay. Each component is independently testable and the architecture enforces clean separation so that one broken extractor cannot affect the rest of the system.
+
+The core novelty of this product — and its hardest engineering challenge — is the per-site extractor subsystem. Each target streaming site uses custom anti-scraping measures (JS-rendered tokens, signed CDN URLs, IP-based blocking) that require a custom extractor per site, maintained independently. Existing tools (streamlink, yt-dlp) provide extraction patterns but not out-of-the-box support for private F1 streaming aggregators. The recommended approach treats yt-dlp as a first-pass extractor where it has coverage, and uses httpx + BeautifulSoup for custom extractors, with Playwright as a fallback only when JS execution is strictly required.
+
+The primary risks are: (1) extractor brittleness — sites change without notice and extractors silently fail, requiring a health-check monitoring loop from day one; (2) CDN-signed URL expiry mid-stream, requiring the proxy to never cache m3u8 playlists and to implement background URL refresh; (3) IP-based blocking from the K8s cluster — all extractors must be tested from production network before finalizing site targets. These risks are all addressable through upfront architectural decisions rather than retrofitting.
+
+---
+
+## Key Findings
+
+### Recommended Stack
+
+The backend runs Python 3.13 on FastAPI 0.132.0 with uvicorn, using async throughout. yt-dlp 2026.2.21 is the primary extractor library (used as a Python library, not CLI). Playwright 1.58.0 (async Chromium) is the fallback for JS-rendered pages. httpx handles async HTTP for custom extractors. FastF1 3.8.1 provides the F1 race schedule via the Ergast-compatible jolpica API. APScheduler 3.11.2 runs periodic jobs (schedule refresh, extraction triggers) in-process without a separate worker. The existing cluster Redis is used for URL caching with TTL. SQLite via aiosqlite persists schedule snapshots to survive pod restarts.
+
+The frontend is Svelte 5.53.3 / SvelteKit 2.53.0 (user preference, also well-suited for minimal bundle size). hls.js 1.6.15 handles in-browser HLS playback via MSE. Tailwind CSS 4.2.1 provides styling via the Vite plugin (not PostCSS — breaking change from v3). All infrastructure deploys as a single Terragrunt stack following the existing repo pattern.
+
+**Core technologies:**
+- **Python 3.13 + FastAPI 0.132.0**: Async-first, StreamingResponse for HLS relay, Pydantic models
+- **yt-dlp 2026.2.21**: Primary stream extraction library — 1000+ extractors, Python library mode
+- **Playwright 1.58.0**: JS-rendered page fallback — async API, `page.route()` for XHR interception
+- **httpx 0.28.1**: Async HTTP for custom extractors and redirect chain following
+- **FastF1 3.8.1**: F1 race schedule via jolpica API with built-in caching
+- **APScheduler 3.11.2**: In-process async scheduler — avoids Celery overhead for 2 periodic jobs
+- **hls.js 1.6.15**: Browser HLS playback via MSE — mandatory on non-Safari
+- **Svelte 5 + SvelteKit 2**: Frontend framework (user preference, minimal bundle size)
+- **Redis (existing cluster)**: Extracted URL cache with TTL — no additional infra cost
+- **aiosqlite 0.22.1**: Schedule persistence to NFS — survives pod restarts
+
+**Do not use:** `requests` (sync, blocks event loop), `youtube-dl` (unmaintained), Selenium (poor async/K8s support), FFmpeg re-encoding (unnecessary latency), Celery (overkill for 2 jobs).
+
+### Expected Features
+
+Research confirms this product's novelty: no existing tool combines automated extraction from unofficial sources + browser-native proxied playback + schedule integration in a single web service.
+
+**Must have (table stakes):**
+- F1 schedule view — show all session types (FP, Quali, Sprint, Race) with live/upcoming/finished indicator
+- CORS-transparent HLS proxy — mandatory architectural requirement; streams cannot play in browser without it
+- Per-site stream extractor — at least one working extractor proves the end-to-end pipeline
+- Stream health checker — validates URLs before display; dead streams must not surface
+- Stream picker — list available working streams, user clicks to load player
+- Embedded HLS player — hls.js in Svelte, plays proxied m3u8 in-page
+- Session countdown timer — client-side, zero backend cost
+- Live session indicator — visual LIVE/UPCOMING/FINISHED badge
+
+**Should have (add after MVP validation):**
+- Stream auto-refresh — re-extract every 5-10 min during live sessions
+- Fallback stream ordering — health-check + reliability history drives ordering
+- Source labeling in picker — show site name with each stream
+- Race weekend overview — all sessions grouped per Grand Prix
+- Additional site extractors — expand coverage once first extractor is stable
+
+**Defer (v2+):**
+- Pre/post show and press conference coverage — complex site-specific session detection
+- Multiple quality tiers — only if sources actually provide multi-variant playlists
+- Proxy segment prefetch — high memory cost; only if buffering complaints emerge at scale
+- Session reputation annotations — UX polish, not launch-critical
+
+**Explicit anti-features (do not build):** DVR/recording, chat, user accounts, stream transcoding, DRM support, telemetry overlay.
+
+### Architecture Approach
+
+The system has five clearly bounded layers: (1) Schedule Subsystem — polls jolpica/OpenF1 API, stores to NFS; (2) Extractor Layer — plugin-per-site pattern with a registry dispatcher, concurrent fan-out execution; (3) Health Checker — validates extracted URLs via partial GET, stores liveness state in cache; (4) Proxy/Relay Layer — rewrites m3u8 URIs at all levels (master → variant → segments) through `/relay`; (5) Svelte Frontend — schedule view, stream picker, hls.js player. All state flows from extractors through cache to the API; the frontend never triggers extraction directly.
+
+**Major components:**
+1. **Extractor Registry** — maps site-key to extractor class; fan-out concurrent dispatch; one file per site
+2. **Playlist Rewriter** — fetches upstream m3u8, rewrites all URIs to point through `/relay`; stateless
+3. **Segment Relay** — pipes upstream `.ts`/`.m4s` bytes to client as chunked transfer; no buffering
+4. **Schedule Subsystem** — daily cron via APScheduler, NFS persistence, jolpica API client
+5. **Stream Health Checker** — background poller, HEAD/partial-GET on m3u8 URLs, results in Redis/memory cache
+6. **Backend API (FastAPI)** — serves `/schedule`, `/streams`, `/proxy`, `/relay` endpoints; reads from cache only
+7. **Svelte Frontend** — schedule page, watch page with stream picker and hls.js player
+
+**Critical patterns:**
+- Extraction runs on background schedule, never on client request (on-demand extraction = 10-30s wait)
+- One extractor class per site; common `BaseExtractor` interface; isolation prevents cross-site failures
+- Proxy must rewrite m3u8 at every level — master, variant, and segment; partial rewriting breaks streams
+- Segment relay must stream bytes chunked, never buffer entire segment in memory
+
+### Critical Pitfalls
+
+1. **JS-rendered tokens not in HTML** — Before writing any extractor, trace network traffic in DevTools to find the actual API endpoint the site JS calls. Replicate the API call, not the page fetch. Using Playwright is the last resort; most sites expose a clean JSON API once reverse-engineered.
+
+2. **m3u8 segment URLs bypass the proxy** — Rewrite all URLs in the playlist at every level (master → variant → segment). Verify with browser Network tab that zero requests reach the original CDN domain.
+
+3. **CDN-signed URLs expire mid-stream** — Never cache m3u8 playlists in the relay. Always fetch the live playlist from upstream on each poll. Implement background URL refresh that re-extracts before token TTL expires.
+
+4. **Extractor maintenance burden underestimated** — Sites break extractors without notice. Build health-check monitoring alongside the first extractor, not later. Alert on extractor failure within 5 minutes. Budget 1-2 hours/extractor/month for maintenance.
+
+5. **IP-based blocking from K8s cluster** — Test all extractors from the production cluster network before finalizing site targets. Datacenter IPs are pre-blocked on many streaming platforms. Simulate realistic browser headers (User-Agent, Referer, Accept-Language).
+
+6. **CORS missing on relay endpoints** — Set `Access-Control-Allow-Origin`, `Access-Control-Allow-Methods`, and `Access-Control-Allow-Headers: Range` on all relay responses. Missing `Range` header causes preflight failures for segment requests. Test from actual browser, not curl.
+
+---
+
+## Implications for Roadmap
+
+The architecture's strict dependency chain dictates phase ordering. No phase can be skipped — each provides inputs required by the next. The recommended build order from ARCHITECTURE.md is confirmed by the pitfall analysis: the schedule subsystem must exist before extractors know when to run; extractors must work before the proxy has URLs to relay; the proxy must exist before the frontend has anything to play.
+
+### Phase 1: Foundation — Schedule, Infrastructure, and Extractor Framework
+
+**Rationale:** Schedule data is the trigger for everything downstream. Building the extractor framework (base class + registry) before writing any site-specific code prevents architectural lock-in. Both are low anti-scraping complexity — schedule uses a public API, framework is pure Python scaffolding.
+
+**Delivers:** Working F1 schedule API endpoint, extractor plugin system with registry, Terragrunt deployment stack, NFS mount, development environment
+
+**Addresses features:** F1 schedule view, live/upcoming/finished indicators, session countdown timer (frontend-only, depends on schedule data)
+
+**Avoids pitfalls:** Establishes upfront which extraction approach each target site requires (API endpoint vs JS reverse-engineering vs Playwright); tests extractors from production network before committing to sites; implements timezone-aware schedule storage
+
+**Research flag:** STANDARD — jolpica API is well-documented, Terragrunt stack pattern is established in repo
+
+---
+
+### Phase 2: Extraction Pipeline — First Working Extractor
+
+**Rationale:** One end-to-end working extractor (raw URL → validated stream URL) proves the extraction architecture before scaling to multiple sites. Health checker must be built alongside first extractor — not after — because silent failures are the primary operational risk.
+
+**Delivers:** First site extractor returning live HLS URLs, stream health checker (HEAD/GET validation), Redis caching with TTL, background polling scheduler
+
+**Addresses features:** Per-site stream extractor, stream health checker, stream auto-refresh (background polling)
+
+**Avoids pitfalls:** Extractor built with full failure visibility (logs which step fails); health-check alerts configured from day one; extractor tested from production K8s network before finalizing
+
+**Research flag:** NEEDS RESEARCH DURING PLANNING — specific target sites unknown; each site requires independent reverse-engineering; Playwright requirement depends on site-specific JS analysis
+
+---
+
+### Phase 3: Stream Proxy and Relay Layer
+
+**Rationale:** The proxy layer converts raw CDN URLs (which browsers cannot fetch cross-origin) into browser-playable same-origin URLs. This is the architectural blocker for the frontend — no proxy, no browser playback. Must be built before any UI work.
+
+**Delivers:** `/proxy` endpoint (m3u8 fetch + full URI rewrite at all levels), `/relay` endpoint (chunked segment pipe-through), CORS headers on all relay responses, URL refresh loop for token expiry
+
+**Addresses features:** CORS-transparent HLS proxy (mandatory for all browser playback), multiple quality options (variant playlist rewriting), stream picker (proxied URLs safe to expose to frontend)
+
+**Avoids pitfalls:** Rewrites m3u8 at master + variant + segment levels; never caches playlists; streams segments as chunked transfer (no memory buffering); CORS headers include `Range` header; relay endpoint is not publicly accessible (Traefik auth)
+
+**Research flag:** STANDARD — HLS spec (RFC 8216) and proxy patterns are well-documented; implementation is mechanical once architecture is understood
+
+---
+
+### Phase 4: Frontend — Schedule, Picker, and Player
+
+**Rationale:** All backend components are independently testable via curl before the UI exists. The frontend is the final assembly step, not an intermediate one. Building it last means it integrates against a working backend rather than mocking everything.
+
+**Delivers:** SvelteKit app with schedule view, stream picker, embedded hls.js player, session countdown timer, live/upcoming/finished badges
+
+**Addresses features:** Embedded HLS player, stream picker, session countdown, live session indicator, race weekend overview (grouping sessions by Grand Prix)
+
+**Avoids pitfalls:** hls.js error handler attached from day one; autoplay muted by default; streams display with source label and liveness status; timezone displayed in browser local time
+
+**Research flag:** STANDARD — SvelteKit + hls.js integration is well-documented; component structure is straightforward given small scope
+
+---
+
+### Phase 5: Coverage Expansion and Reliability
+
+**Rationale:** Once the full pipeline is proven end-to-end with one extractor, adding more sites is low-risk incremental work following the established pattern. Stream reliability features (fallback ordering, source labeling) are only meaningful once multiple sources exist.
+
+**Delivers:** Additional site extractors (2-3 more sites), fallback stream ordering by health-check recency, source labels in stream picker, extractor monitoring alerts (notification channel)
+
+**Addresses features:** Additional extractors, fallback stream ordering, source labeling, stream auto-refresh improvements
+
+**Avoids pitfalls:** Each new extractor reverse-engineered independently; health-check alerts tested by deliberate failure injection before each race weekend
+
+**Research flag:** NEEDS RESEARCH DURING PLANNING — each new target site requires individual analysis of extraction approach; cannot be planned generically
+
+---
+
+### Phase Ordering Rationale
+
+- **Schedule first:** Public API, no anti-scraping complexity, required by extraction scheduler. Proves the Terragrunt stack without risking extractor failures.
+- **Extractor framework before site-specific extractors:** Base class and registry must exist first; forces interface design before implementation.
+- **Health checker with first extractor:** Silent failures are the top operational risk; monitoring must not be deferred.
+- **Proxy before frontend:** The frontend's player cannot function without a working `/proxy` endpoint; building UI against a mock wastes time.
+- **Frontend last of core phases:** All backend endpoints are curl-testable; UI is integration, not a prerequisite.
+- **Additional extractors after core works:** Pattern is proven, risk is low, each site is independently scoped.
+
+### Research Flags
+
+Phases needing `/gsd:research-phase` during planning:
+- **Phase 2 (Extraction Pipeline):** Target sites unknown; each requires independent DevTools session to determine extraction approach (API endpoint, JS algorithm, or Playwright). Cannot scope extractors without site-specific analysis.
+- **Phase 5 (Coverage Expansion):** Each new target site is a fresh reverse-engineering problem. Budget per-site research before each extractor is scoped.
+
+Phases with standard patterns (skip research-phase):
+- **Phase 1 (Foundation):** jolpica/OpenF1 API is public and documented. Terragrunt stack follows established repo pattern. Extractor base class is standard Python ABC.
+- **Phase 3 (Proxy/Relay):** HLS spec is RFC 8216. Proxy rewriting pattern is well-documented in HLS-Proxy and yt-dlp literature. CORS mechanics are standard.
+- **Phase 4 (Frontend):** SvelteKit + hls.js integration has clear documentation. Component scope is small.
+
+---
+
+## Confidence Assessment
+
+| Area | Confidence | Notes |
+|------|------------|-------|
+| Stack | HIGH | All versions verified against PyPI and GitHub releases as of 2026-02-23. Version compatibility matrix confirmed. |
+| Features | MEDIUM | Feature list is well-grounded; competitor analysis confirms novelty. OpenF1 API confidence is MEDIUM (third-party fan project, not official F1). |
+| Architecture | MEDIUM | HLS spec and proxy mechanics are HIGH confidence (RFC 8216, Apple docs). System composition for this specific use-case is inferred from domain patterns. |
+| Pitfalls | MEDIUM | yt-dlp/streamlink source analysis and HLS RFC are HIGH; streaming site anti-scraping behavior is LOW (sparse public documentation). |
+
+**Overall confidence:** MEDIUM
+
+### Gaps to Address
+
+- **Target site list not defined:** The research assumes a list of specific streaming sites to target but does not name them. Phase 2 cannot be scoped until specific sites are identified and reverse-engineered in a DevTools session. This is the largest planning gap.
+- **OpenF1 live data cost:** OpenF1's live session data costs €9.90/month on free tier. Research recommends using the F1 calendar static JSON for schedule. Validate whether jolpica API provides sufficient real-time session status (live/upcoming) before finalizing the schedule integration approach.
+- **Home ISP IP classification:** Whether the K8s cluster's home ISP IP is treated as residential or datacenter by streaming site IP reputation databases is unknown. Must test each target site from the cluster before committing. Recovery if blocked: residential proxy or VPN exit node.
+- **Multi-variant playlist availability:** The multiple-quality feature depends on source sites providing multi-variant HLS playlists. This cannot be confirmed until specific sites are targeted. Phase 3 proxy rewriting should handle it correctly regardless, but the UX feature may not be usable at launch.
+- **Token TTL per site:** Each site's CDN token TTL is unknown until extractors are built and tested. The background refresh architecture is in place, but the refresh interval must be configured per-site based on observed TTLs.
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+- PyPI release pages — all stack versions (FastAPI, yt-dlp, Playwright, httpx, APScheduler, FastF1, Pydantic, hls.js, Tailwind CSS, SvelteKit, Svelte)
+- RFC 8216 (IETF) — HLS specification, playlist structure, segment URL mechanics
+- yt-dlp `common.py` + CONTRIBUTING.md — extractor plugin pattern, format selection
+- HLS.js API documentation — initialization, error handling, quality level management
+- MDN CORS documentation — preflight requirements, credential restrictions, header rules
+- OpenF1 API documentation — rate limits, live vs. historical tiers, session endpoints
+
+### Secondary (MEDIUM confidence)
+- jolpica-f1 GitHub README — Ergast-compatible API, availability guarantees (community-maintained)
+- Streamlink plugin documentation — per-site extractor isolation pattern
+- HLS-Proxy (warren-bank) README — CORS proxy architecture requirements
+- RaceControl (robvdpol), f1viewer (SoMuchForSubtlety) READMEs — F1 streaming UX expectations
+
+### Tertiary (LOW confidence)
+- Web searches on streaming site anti-scraping techniques — sparse results; pitfalls inferred from yt-dlp source patterns
+- f1calendar.com — timezone complexity observations; not an authoritative source
+
+---
+
+*Research completed: 2026-02-23*
+*Ready for roadmap: yes*