stacks/f1-stream/files/backend/playback_verifier.py and chrome_browser.py describe an in-cluster CDP caller, but the deployed f1-stream image is built from github.com/ViktorBarzin/f1-stream which has neither file — verified by `kubectl exec ls /app/backend/` and grepping for 'CHROME' in the deployed pod. The infra/stacks/f1-stream/files/backend/ tree is a vestigial design that was never wired up to a build pipeline. Calling it out so the next reader doesn't waste time debugging why the migration "didn't take effect" — it took effect on dead code. The hourly snapshot-harvester CronJob is the only live in-cluster caller of the CDP endpoint today.
11 KiB
chrome-service — In-cluster headed Chromium with persistent profile
Overview
chrome-service is a single-replica, persistent-profile, headed
Chromium browser exposed over the Chrome DevTools Protocol (CDP). It
serves two distinct populations:
- In-cluster automation callers — connect via
chromium.connect_over_cdp("http://chrome-service.chrome-service.svc:9222")to drive a real browser when upstream anti-bot trips a headless one (disable-devtool.jsredirect-to-google trap,navigator.webdriverchecks, console-clear timing tricks). The only currently-active in-cluster caller is thechrome-service-snapshot-harvesterCronJob; thestacks/f1-stream/files/backend/playback_verifier.py+chrome_browser.pytree is a vestigial design — the deployed f1-stream image (built fromgithub.com/ViktorBarzin/f1-stream) does not use this code path. - External dev-box Claude Code sessions — pull an hourly snapshot
of cookies + localStorage from
chrome.viktorbarzin.me/api/snapshot(bearer-gated) and seed local@playwright/mcpinstances in--isolated --storage-state=…mode. This is how concurrent Claude Code sessions get their own isolated browser contexts without losing shared cookies for logged-in sites.
Why a separate stack
In-process Chromium inside f1-stream:
- Runs headless by default (no
Xvfb/DISPLAY). - Has the
HeadlessChromium/...UA suffix andnavigator.webdriver === true. - Trips
disable-devtool.js's Performance detector — Playwright's CDP adds latency toconsole.log(largeArray)vsconsole.table(largeArray), which the lib reads as "DevTools is open" and redirects tohttps://www.google.com/.
chrome-service solves this by:
- Running headed under
Xvfb :99(chromium withDISPLAY=:99, not--headless). - Living in a long-lived pod so JIT browser launch latency disappears.
- Allowing a per-context init script
(
stacks/chrome-service/files/stealth.js~ 40 lines, vendored frompuppeteer-extra-plugin-stealth) to spoofwebdriver,chrome.runtime,plugins,languages,Permissions.query, WebGL renderer strings, and to hide thedisable-devtool-autoscript-tag attribute so the lib's IIFE exits early.
Wire protocol — CDP (current, since 2026-06-04)
http://chrome-service.chrome-service.svc.cluster.local:9222
│
┌───────────────────────────────┼───────────────────────────────┐
│ caller pod │ chrome-service pod
│ (e.g. f1-stream) │ (single replica)
│ │
│ CHROME_CDP_URL ──────────────┘
│
│ await chromium.connect_over_cdp(cdp_url)
│ context = await browser.new_context() ← incognito (no cookies)
│ OR: context = browser.contexts[0] ← persistent (shared cookies)
│ await context.add_init_script(STEALTH_JS)
│ page.goto("https://upstream.com/embed/...")
│
└─── ←── pages render under Xvfb, headed Chromium ──── ─────────┘
Wire protocol — WS (legacy, removed 2026-06-04)
The previous design used playwright launch-server --browser chromium
with a path-token (ws://...:3000/<TOKEN>). Callers used
chromium.connect(ws_url). Problem: launch-server creates
ephemeral browser contexts per connect() call, so cookies never
persisted to the PVC despite the /profile mount. We migrated to
direct chromium launch with --user-data-dir + CDP exposed on :9222
so cookies actually live across pod restarts.
Cookie warming + snapshot pipeline
┌─────────── chrome-service pod ──────────────────────────────────────────┐
│ │
│ chrome-service container (chromium --user-data-dir=/profile/chromium-data
│ --remote-debugging-port=9222) │
│ ▲ │
│ │ user logs in via noVNC ← chrome.viktorbarzin.me (Authentik) │
│ │ │
│ Cookies + localStorage land in /profile/chromium-data/Default/ │
│ │
│ snapshot-server sidecar (python stdlib HTTP server, :8088) │
│ ↑ serves /profile/snapshots/storage-state.json (bearer-gated) │
└──────────────────────────────────────────────────────────────────────────┘
▲
│ hourly (cron 23 * * * *)
│
┌──────┴── chrome-service-snapshot-harvester CronJob ─────────────────────┐
│ podAffinity → same node as chrome-service (RWO PVC) │
│ python: connect_over_cdp + ctx.storage_state(path=...) │
│ writes /profile/snapshots/storage-state.json (atomic rename) │
└──────────────────────────────────────────────────────────────────────────┘
External caller (dev box):
systemd timer (hourly) → curl -H "Authorization: Bearer $TOKEN"
https://chrome.viktorbarzin.me/api/snapshot
-o ~/.cache/playwright-shared-storage-state.json
@playwright/mcp --isolated --storage-state ~/.cache/...storage-state.json
Image pin
Both the server image (mcr.microsoft.com/playwright:v1.48.0-noble in
stacks/chrome-service/main.tf) and the Python client
(playwright==1.48.0 in callers' requirements.txt) must match
minor-versions. Bump in lockstep — Playwright protocol changes between
minors and the client cannot connect to a mismatched server.
The harvester + snapshot-server sidecar use
mcr.microsoft.com/playwright/python:v1.48.0-noble — same playwright
minor, with Python-side bindings pre-installed.
Storage
chrome-service-profile-encrypted(PVC, 2Gi → 10Gi autoresize,proxmox-lvm-encrypted) — Chromium user-data dir at/profile/chromium-data+ snapshot at/profile/snapshots/storage-state.json. Encrypted because cookies/localStorage may include third-party auth tokens for sites callers drive.chrome-service-backup-host(NFS, RWX) — destination for a 6-hourly CronJob thattar -czf /backup/<YYYY_MM_DD_HH>.tar.gz -C /profile ., retention 30 days.
Auth + secrets
- Vault KV
secret/chrome-service.api_bearer_token— 32-byte URL-safe random, rotated by hand:vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))'). - ESO syncs into namespace-local Secret
chrome-service-secrets. Thesnapshot-serversidecar reads it viasecret_key_ref. - f1-stream still imports the secret (via
chrome-service-client-secrets) for parity, but the CDP endpoint no longer requires it for connection — NetworkPolicy is the gate. - Reloader (
reloader.stakater.com/auto = "true") cascades token rotation to the snapshot-server sidecar. - Dev-box cache: each dev box keeps a local copy at
~/.config/playwright/token(chmod 600). Re-fetch from Vault after rotation:vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token.
Network controls
kubernetes_network_policy_v1.ws_ingress— three ingress rules:- TCP/9222 (Chromium CDP): only namespaces labelled
chrome-service.viktorbarzin.me/client = "true"(plus an explicit fallback forf1-streambykubernetes.io/metadata.name, pluschrome-service's own namespace for the harvester CronJob). - TCP/6080 (noVNC HTTP+WS): only the
traefiknamespace. - TCP/8088 (snapshot-server): only the
traefiknamespace (bearer-token check happens insnapshot_server.py).
- TCP/9222 (Chromium CDP): only namespaces labelled
- CDP port 9222 is internal-only (no ingress, no Cloudflare DNS).
- noVNC sidecar (
forgejo.viktorbarzin.me/viktor/chrome-service-novnc) exposes a live HTML5 view of the headed Chromium session viax11vnc(connected to Xvfb onlocalhost:6099) bridged towebsockifyon port 6080. Servicechromemaps :80 → :6080 and is exposed viaingress_factoryatchrome.viktorbarzin.me, Authentik-gated. - snapshot-server sidecar (
mcr.microsoft.com/playwright/python:v1.48.0-noble) servesGET /api/snapshotfrom/profile/snapshots/storage-state.json, bearer-gated byPW_TOKEN. Servicechrome-snapshotmaps :8088 → :8088 and is exposed atchrome.viktorbarzin.me/api/snapshotvia a secondingress_factorycall withauth = "none"(the bearer check is in the sidecar, not at the ingress layer).
Adding a new in-cluster caller
See stacks/chrome-service/README.md for the recipe (label namespace,
inject CHROME_CDP_URL, vendor stealth.js).
Limits + risks
- Anti-bot vs stealth arms race — when an upstream beats us (DRM
license check, device-fingerprint mismatch, hotlink protection that
whitelists specific parent domains), the verifier returns
is_playable=Falseand the extractor moves on. No user-visible breakage, just empty stream lists for that source. - JWPlayer DRM error 102630 — observed with several hmembeds embeds even from the headed chrome-service. The license check bails because the request origin isn't on the embed's allowlist; this is upstream policy, not an infra defect.
- Single replica + RWO PVC — the deployment uses
Recreatestrategy. Brief outage on rollout, ~30s for browser warmup. - No
/metricsendpoint — the cluster's genericKubePodCrashLoopingrule covers basic alerting. A Prometheus scrape exporter is day-2 work. - Snapshot covers cookies + localStorage only — Playwright's
storage_state()API doesn't capture IndexedDB or sessionStorage. Sites that rely on those for auth won't warm via the snapshot. - Snapshot freshness up to 1h stale — if a site rotates session cookies more often than that, an on-demand refresh CLI is needed (deferred to follow-on).