infra/stacks/f1-stream/files/backend/extractors
Viktor Barzin deede6dd11 chrome-service: switch to CDP + persistent profile + hourly snapshot pipeline
The chrome-service stack ran `playwright launch-server`, which creates
ephemeral browser contexts per `connect()`. Despite the encrypted PVC
mounted at /profile, no chromium user-data ever persisted — only npm
cache + fontconfig. Logging in via noVNC was effectively a no-op.

Refactor:
- Replace launch-server with direct chromium (TCP CDP on :9223 internal),
  fronted by a Python HTTP+WS bridge on :9222 that rewrites the Host
  header to bypass Chrome's hardcoded DNS-rebinding protection (no
  `--remote-allow-hosts` flag exists in stock Chrome 130; verified by
  binary string grep). Bridge also forces Connection: close on HTTP
  responses so Node ws opens a fresh TCP for the WS upgrade rather than
  trying to reuse the dead keep-alive socket.
- Add `--user-data-dir=/profile/chromium-data` so cookies/localStorage
  actually persist on the encrypted PVC.
- New snapshot-server sidecar (stdlib python HTTP) serves
  GET /api/snapshot at chrome.viktorbarzin.me/api/snapshot,
  bearer-token-gated by the existing api_bearer_token.
- New chrome-service-snapshot-harvester CronJob (hourly) connects via
  CDP, dumps storage_state() (cookies + localStorage), writes atomically
  to /profile/snapshots/storage-state.json.
- NetworkPolicy: TCP/9222 (was :3000), TCP/8088 added for traefik.

Caller migration:
- f1-stream: `chromium.connect(ws_url)` → `chromium.connect_over_cdp(cdp_url)`,
  env var CHROME_WS_URL → CHROME_CDP_URL. CHROME_WS_TOKEN dropped (no
  longer used by code; ExternalSecret kept for symmetry with the snapshot
  endpoint).

Dev-box side (out of scope for this commit — see ~/.config/systemd/user/):
- playwright-mcp.service flips to `--isolated --storage-state=...`
  so per-Claude-Code-session ephemeral contexts seed from the snapshot.
- playwright-snapshot-refresh.{service,timer} (hourly) pulls the
  snapshot via the bearer-gated HTTPS endpoint.

Docs updated:
- docs/architecture/chrome-service.md — new architecture diagram + wire protocol.
- docs/runbooks/chrome-service-snapshot.md — day-2 ops (refresh, rotation,
  failure modes, restore).
- stacks/chrome-service/README.md — connect_over_cdp recipe.

Design spec at docs/superpowers/specs/2026-06-04-playwright-per-session-browser-design.md.
2026-06-05 09:19:10 +00:00
..
__init__.py f1-stream: register HmembedsExtractor in registry 2026-05-07 23:47:50 +00:00
aceztrims.py f1-stream: revive aceztrims + pitsport, more ppv variants 2026-05-24 22:05:37 +00:00
base.py [ci skip] f1-stream: add extractor framework with demo streams (Phase 3) 2026-02-23 23:02:56 +00:00
chrome_browser.py chrome-service: switch to CDP + persistent profile + hourly snapshot pipeline 2026-06-05 09:19:10 +00:00
curated.py f1-stream: only show streams confirmed playable by headless browser 2026-05-06 21:00:07 +00:00
daddylive.py f1-stream: add real F1 stream extractors and iframe player support 2026-03-01 14:35:19 +00:00
dd12.py f1-stream: add chrome-browser, subreddit, dd12 extractors; fix streamed.pk 2026-05-07 16:05:25 +00:00
demo.py f1-stream: add real F1 stream extractors and iframe player support 2026-03-01 14:35:19 +00:00
discord_source.py f1-stream: drop demo + landing-page extractors, add fetch-proxy injection 2026-05-06 21:50:54 +00:00
hmembeds.py f1-stream: hmembeds offline decoder — reverse-engineered the JW Player trap 2026-05-07 23:47:25 +00:00
models.py f1-stream: add real F1 stream extractors and iframe player support 2026-03-01 14:35:19 +00:00
pitsport.py f1-stream: revive aceztrims + pitsport, more ppv variants 2026-05-24 22:05:37 +00:00
ppv.py f1-stream: revive aceztrims + pitsport, more ppv variants 2026-05-24 22:05:37 +00:00
registry.py [ci skip] f1-stream: add extractor framework with demo streams (Phase 3) 2026-02-23 23:02:56 +00:00
service.py f1-stream: drop broken curated, dedupe streams, accept all pitsport categories 2026-05-07 15:42:24 +00:00
streamed.py f1-stream: add chrome-browser, subreddit, dd12 extractors; fix streamed.pk 2026-05-07 16:05:25 +00:00
stremio.py f1-stream: Stremio addon extractor — TvVoo + StremVerse Sky F1 / DAZN F1 2026-05-07 23:16:39 +00:00
subreddit.py f1-stream: subreddit extractor scans r/motorsportsstreams2 (active sub) 2026-05-07 22:42:51 +00:00
timstreams.py f1-stream: only show streams confirmed playable by headless browser 2026-05-06 21:00:07 +00:00