stacks/f1-stream/files/backend/playback_verifier.py and chrome_browser.py describe an in-cluster CDP caller, but the deployed f1-stream image is built from github.com/ViktorBarzin/f1-stream which has neither file — verified by `kubectl exec ls /app/backend/` and grepping for 'CHROME' in the deployed pod. The infra/stacks/f1-stream/files/backend/ tree is a vestigial design that was never wired up to a build pipeline. Calling it out so the next reader doesn't waste time debugging why the migration "didn't take effect" — it took effect on dead code. The hourly snapshot-harvester CronJob is the only live in-cluster caller of the CDP endpoint today.
199 lines
11 KiB
Markdown
199 lines
11 KiB
Markdown
# chrome-service — In-cluster headed Chromium with persistent profile
|
|
|
|
## Overview
|
|
|
|
`chrome-service` is a single-replica, persistent-profile, headed
|
|
Chromium browser exposed over the Chrome DevTools Protocol (CDP). It
|
|
serves two distinct populations:
|
|
|
|
1. **In-cluster automation callers** — connect via
|
|
`chromium.connect_over_cdp("http://chrome-service.chrome-service.svc:9222")`
|
|
to drive a real browser when upstream anti-bot trips a headless one
|
|
(`disable-devtool.js` redirect-to-google trap, `navigator.webdriver`
|
|
checks, console-clear timing tricks). The only currently-active
|
|
in-cluster caller is the `chrome-service-snapshot-harvester` CronJob;
|
|
the `stacks/f1-stream/files/backend/playback_verifier.py` +
|
|
`chrome_browser.py` tree is a vestigial design — the deployed
|
|
f1-stream image (built from `github.com/ViktorBarzin/f1-stream`)
|
|
does not use this code path.
|
|
2. **External dev-box Claude Code sessions** — pull an hourly snapshot
|
|
of cookies + localStorage from `chrome.viktorbarzin.me/api/snapshot`
|
|
(bearer-gated) and seed local `@playwright/mcp` instances in
|
|
`--isolated --storage-state=…` mode. This is how concurrent Claude
|
|
Code sessions get their own isolated browser contexts without losing
|
|
shared cookies for logged-in sites.
|
|
|
|
## Why a separate stack
|
|
|
|
In-process Chromium inside `f1-stream`:
|
|
|
|
- Runs **headless** by default (no `Xvfb`/`DISPLAY`).
|
|
- Has the `HeadlessChromium/...` UA suffix and `navigator.webdriver === true`.
|
|
- Trips `disable-devtool.js`'s **Performance** detector — Playwright's CDP
|
|
adds latency to `console.log(largeArray)` vs `console.table(largeArray)`,
|
|
which the lib reads as "DevTools is open" and redirects to
|
|
`https://www.google.com/`.
|
|
|
|
`chrome-service` solves this by:
|
|
|
|
1. Running **headed** under `Xvfb :99` (chromium with `DISPLAY=:99`,
|
|
not `--headless`).
|
|
2. Living in a long-lived pod so JIT browser launch latency disappears.
|
|
3. Allowing a per-context init script
|
|
(`stacks/chrome-service/files/stealth.js` ~ 40 lines, vendored from
|
|
`puppeteer-extra-plugin-stealth`) to spoof `webdriver`, `chrome.runtime`,
|
|
`plugins`, `languages`, `Permissions.query`, WebGL renderer strings, and
|
|
to hide the `disable-devtool-auto` script-tag attribute so the lib's
|
|
IIFE exits early.
|
|
|
|
## Wire protocol — CDP (current, since 2026-06-04)
|
|
|
|
```text
|
|
http://chrome-service.chrome-service.svc.cluster.local:9222
|
|
│
|
|
┌───────────────────────────────┼───────────────────────────────┐
|
|
│ caller pod │ chrome-service pod
|
|
│ (e.g. f1-stream) │ (single replica)
|
|
│ │
|
|
│ CHROME_CDP_URL ──────────────┘
|
|
│
|
|
│ await chromium.connect_over_cdp(cdp_url)
|
|
│ context = await browser.new_context() ← incognito (no cookies)
|
|
│ OR: context = browser.contexts[0] ← persistent (shared cookies)
|
|
│ await context.add_init_script(STEALTH_JS)
|
|
│ page.goto("https://upstream.com/embed/...")
|
|
│
|
|
└─── ←── pages render under Xvfb, headed Chromium ──── ─────────┘
|
|
```
|
|
|
|
### Wire protocol — WS (legacy, removed 2026-06-04)
|
|
|
|
The previous design used `playwright launch-server --browser chromium`
|
|
with a path-token (`ws://...:3000/<TOKEN>`). Callers used
|
|
`chromium.connect(ws_url)`. **Problem**: `launch-server` creates
|
|
ephemeral browser contexts per `connect()` call, so cookies never
|
|
persisted to the PVC despite the `/profile` mount. We migrated to
|
|
direct chromium launch with `--user-data-dir` + CDP exposed on :9222
|
|
so cookies actually live across pod restarts.
|
|
|
|
## Cookie warming + snapshot pipeline
|
|
|
|
```text
|
|
┌─────────── chrome-service pod ──────────────────────────────────────────┐
|
|
│ │
|
|
│ chrome-service container (chromium --user-data-dir=/profile/chromium-data
|
|
│ --remote-debugging-port=9222) │
|
|
│ ▲ │
|
|
│ │ user logs in via noVNC ← chrome.viktorbarzin.me (Authentik) │
|
|
│ │ │
|
|
│ Cookies + localStorage land in /profile/chromium-data/Default/ │
|
|
│ │
|
|
│ snapshot-server sidecar (python stdlib HTTP server, :8088) │
|
|
│ ↑ serves /profile/snapshots/storage-state.json (bearer-gated) │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
▲
|
|
│ hourly (cron 23 * * * *)
|
|
│
|
|
┌──────┴── chrome-service-snapshot-harvester CronJob ─────────────────────┐
|
|
│ podAffinity → same node as chrome-service (RWO PVC) │
|
|
│ python: connect_over_cdp + ctx.storage_state(path=...) │
|
|
│ writes /profile/snapshots/storage-state.json (atomic rename) │
|
|
└──────────────────────────────────────────────────────────────────────────┘
|
|
|
|
External caller (dev box):
|
|
systemd timer (hourly) → curl -H "Authorization: Bearer $TOKEN"
|
|
https://chrome.viktorbarzin.me/api/snapshot
|
|
-o ~/.cache/playwright-shared-storage-state.json
|
|
@playwright/mcp --isolated --storage-state ~/.cache/...storage-state.json
|
|
```
|
|
|
|
## Image pin
|
|
|
|
Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
|
`stacks/chrome-service/main.tf`) and the Python client
|
|
(`playwright==1.48.0` in callers' `requirements.txt`) **must match
|
|
minor-versions**. Bump in lockstep — Playwright protocol changes between
|
|
minors and the client cannot connect to a mismatched server.
|
|
|
|
The harvester + snapshot-server sidecar use
|
|
`mcr.microsoft.com/playwright/python:v1.48.0-noble` — same playwright
|
|
minor, with Python-side bindings pre-installed.
|
|
|
|
## Storage
|
|
|
|
- **`chrome-service-profile-encrypted`** (PVC, 2Gi → 10Gi autoresize,
|
|
`proxmox-lvm-encrypted`) — Chromium user-data dir at
|
|
`/profile/chromium-data` + snapshot at `/profile/snapshots/storage-state.json`.
|
|
Encrypted because cookies/localStorage may include third-party auth tokens
|
|
for sites callers drive.
|
|
- **`chrome-service-backup-host`** (NFS, RWX) — destination for a 6-hourly
|
|
CronJob that `tar -czf /backup/<YYYY_MM_DD_HH>.tar.gz -C /profile .`,
|
|
retention 30 days.
|
|
|
|
## Auth + secrets
|
|
|
|
- Vault KV `secret/chrome-service.api_bearer_token` — 32-byte URL-safe
|
|
random, rotated by hand:
|
|
`vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
|
|
- ESO syncs into namespace-local Secret `chrome-service-secrets`. The
|
|
`snapshot-server` sidecar reads it via `secret_key_ref`.
|
|
- f1-stream still imports the secret (via `chrome-service-client-secrets`)
|
|
for parity, but the CDP endpoint no longer requires it for connection —
|
|
NetworkPolicy is the gate.
|
|
- Reloader (`reloader.stakater.com/auto = "true"`) cascades token rotation
|
|
to the snapshot-server sidecar.
|
|
- **Dev-box cache**: each dev box keeps a local copy at
|
|
`~/.config/playwright/token` (chmod 600). Re-fetch from Vault after
|
|
rotation: `vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token`.
|
|
|
|
## Network controls
|
|
|
|
- **`kubernetes_network_policy_v1.ws_ingress`** — three ingress rules:
|
|
- **TCP/9222** (Chromium CDP): only namespaces labelled
|
|
`chrome-service.viktorbarzin.me/client = "true"` (plus an explicit
|
|
fallback for `f1-stream` by `kubernetes.io/metadata.name`, plus
|
|
`chrome-service`'s own namespace for the harvester CronJob).
|
|
- **TCP/6080** (noVNC HTTP+WS): only the `traefik` namespace.
|
|
- **TCP/8088** (snapshot-server): only the `traefik` namespace
|
|
(bearer-token check happens in `snapshot_server.py`).
|
|
- **CDP port 9222** is internal-only (no ingress, no Cloudflare DNS).
|
|
- **noVNC sidecar** (`forgejo.viktorbarzin.me/viktor/chrome-service-novnc`)
|
|
exposes a live HTML5 view of the headed Chromium session via
|
|
`x11vnc` (connected to Xvfb on `localhost:6099`) bridged to
|
|
`websockify` on port 6080. Service `chrome` maps :80 → :6080 and is
|
|
exposed via `ingress_factory` at `chrome.viktorbarzin.me`,
|
|
Authentik-gated.
|
|
- **snapshot-server sidecar** (`mcr.microsoft.com/playwright/python:v1.48.0-noble`)
|
|
serves `GET /api/snapshot` from `/profile/snapshots/storage-state.json`,
|
|
bearer-gated by `PW_TOKEN`. Service `chrome-snapshot` maps :8088 → :8088
|
|
and is exposed at `chrome.viktorbarzin.me/api/snapshot` via a second
|
|
`ingress_factory` call with `auth = "none"` (the bearer check is in
|
|
the sidecar, not at the ingress layer).
|
|
|
|
## Adding a new in-cluster caller
|
|
|
|
See `stacks/chrome-service/README.md` for the recipe (label namespace,
|
|
inject `CHROME_CDP_URL`, vendor `stealth.js`).
|
|
|
|
## Limits + risks
|
|
|
|
- **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
|
|
license check, device-fingerprint mismatch, hotlink protection that
|
|
whitelists specific parent domains), the verifier returns
|
|
`is_playable=False` and the extractor moves on. No user-visible
|
|
breakage, just empty stream lists for that source.
|
|
- **JWPlayer DRM error 102630** — observed with several hmembeds embeds
|
|
even from the headed chrome-service. The license check bails because
|
|
the request origin isn't on the embed's allowlist; this is upstream
|
|
policy, not an infra defect.
|
|
- **Single replica + RWO PVC** — the deployment uses `Recreate` strategy.
|
|
Brief outage on rollout, ~30s for browser warmup.
|
|
- **No `/metrics` endpoint** — the cluster's generic
|
|
`KubePodCrashLooping` rule covers basic alerting. A Prometheus scrape
|
|
exporter is day-2 work.
|
|
- **Snapshot covers cookies + localStorage only** — Playwright's
|
|
`storage_state()` API doesn't capture IndexedDB or sessionStorage.
|
|
Sites that rely on those for auth won't warm via the snapshot.
|
|
- **Snapshot freshness up to 1h stale** — if a site rotates session
|
|
cookies more often than that, an on-demand refresh CLI is needed
|
|
(deferred to follow-on).
|