Existing NetworkPolicy only admitted port 3000 (Playwright WS) from labelled client namespaces, blocking Traefik's traffic to the noVNC sidecar on port 6080. The chrome.viktorbarzin.me ingress would hang forever — page never loads, eventually times out. Adds a second ingress rule allowing TCP/6080 from the traefik namespace only. Authentik forward-auth still gates external access at the Traefik layer. Also reconciles the noVNC image to the new Forgejo registry path (:v4 unchanged) — already declared in TF, just live-state drift from the Phase 3 registry consolidation. Updates the architecture doc; the previous text still described the old nginx static health stub that noVNC replaced.
136 lines
6.5 KiB
Markdown
136 lines
6.5 KiB
Markdown
# chrome-service — In-cluster headed Chromium pool
|
|
|
|
## Overview
|
|
|
|
`chrome-service` is a single-replica, persistent-profile, bearer-token-gated
|
|
Playwright **launch-server** that exposes a headed Chromium browser over a
|
|
WebSocket. Sibling services connect to it instead of running their own
|
|
in-process Chromium when the upstream's anti-bot tooling
|
|
(`disable-devtool.js` redirect-to-google trap, console-clear timing tricks,
|
|
`navigator.webdriver` checks) defeats a headless browser.
|
|
|
|
Initial caller: `f1-stream`'s `playback_verifier`. Future callers attach
|
|
via the WS+token contract documented in `stacks/chrome-service/README.md`.
|
|
|
|
## Why a separate stack
|
|
|
|
In-process Chromium inside `f1-stream`:
|
|
|
|
- Runs **headless** by default (no `Xvfb`/`DISPLAY`).
|
|
- Has the `HeadlessChromium/...` UA suffix and `navigator.webdriver === true`.
|
|
- Trips `disable-devtool.js`'s **Performance** detector — Playwright's CDP
|
|
adds latency to `console.log(largeArray)` vs `console.table(largeArray)`,
|
|
which the lib reads as "DevTools is open" and redirects to
|
|
`https://www.google.com/`.
|
|
|
|
`chrome-service` solves this by:
|
|
|
|
1. Running **headed** under `Xvfb :99` (via `playwright launch-server` with
|
|
a JSON config that pins `headless: false`).
|
|
2. Living in a long-lived pod so JIT browser launch latency disappears.
|
|
3. Allowing a per-context init script
|
|
(`stacks/chrome-service/files/stealth.js` ~ 40 lines, vendored from
|
|
`puppeteer-extra-plugin-stealth`) to spoof `webdriver`, `chrome.runtime`,
|
|
`plugins`, `languages`, `Permissions.query`, WebGL renderer strings, and
|
|
to hide the `disable-devtool-auto` script-tag attribute so the lib's
|
|
IIFE exits early.
|
|
|
|
## Wire protocol
|
|
|
|
```text
|
|
ws://chrome-service.chrome-service.svc.cluster.local:3000/<TOKEN>
|
|
│
|
|
┌───────────────────────────────┼───────────────────────────────┐
|
|
│ caller pod │ chrome-service pod
|
|
│ (e.g. f1-stream) │ (single replica)
|
|
│ │
|
|
│ CHROME_WS_URL ──────────────┘
|
|
│ CHROME_WS_TOKEN ─── from `secret/chrome-service.api_bearer_token` (ESO)
|
|
│
|
|
│ await chromium.connect(f"{ws}/{token}")
|
|
│ await ctx.add_init_script(STEALTH_JS)
|
|
│ page.goto("https://upstream.com/embed/...")
|
|
│
|
|
└─── ←── pages render under Xvfb, headed Chromium ──── ─────────┘
|
|
```
|
|
|
|
## Image pin
|
|
|
|
Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
|
`stacks/chrome-service/main.tf`) and the Python client
|
|
(`playwright==1.48.0` in callers' `requirements.txt`) **must match
|
|
minor-versions**. Bump in lockstep — Playwright protocol changes between
|
|
minors and the client cannot connect to a mismatched server.
|
|
|
|
The Microsoft image ships only the browser binaries, not the `playwright`
|
|
npm SDK; the start command runs `npx -y playwright@1.48.0 launch-server`
|
|
which downloads the SDK on first start (cached under `$HOME/.npm` via the
|
|
PVC) and reuses it on subsequent restarts.
|
|
|
|
## Storage
|
|
|
|
- **`chrome-service-profile-encrypted`** (PVC, 2Gi → 10Gi autoresize,
|
|
`proxmox-lvm-encrypted`) — Chromium user-data dir + npm cache.
|
|
Encrypted because cookies/localStorage may include third-party auth tokens
|
|
for sites callers drive. `HOME=/profile` so npx caches there.
|
|
- **`chrome-service-backup-host`** (NFS, RWX) — destination for a 6-hourly
|
|
CronJob that `tar -czf /backup/<YYYY_MM_DD_HH>.tar.gz -C /profile .`,
|
|
retention 30 days.
|
|
|
|
## Auth + secrets
|
|
|
|
- Vault KV `secret/chrome-service.api_bearer_token` — 32-byte URL-safe
|
|
random, rotated by hand:
|
|
`vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
|
|
- ESO syncs into namespace-local Secret `chrome-service-secrets`
|
|
(server pod) and `chrome-service-client-secrets` (each caller pod).
|
|
- Reloader (`reloader.stakater.com/auto = "true"`) cascades token rotation
|
|
to both server and any annotated caller — no manual rollout.
|
|
|
|
## Network controls
|
|
|
|
- **`kubernetes_network_policy_v1.ws_ingress`** — two separate ingress
|
|
rules on the same policy:
|
|
- **TCP/3000** (Playwright WS): only namespaces labelled
|
|
`chrome-service.viktorbarzin.me/client = "true"` (plus an explicit
|
|
fallback for `f1-stream` by `kubernetes.io/metadata.name`).
|
|
- **TCP/6080** (noVNC HTTP+WS): only the `traefik` namespace, since
|
|
the public-facing path is `chrome.viktorbarzin.me` ingress →
|
|
Traefik → sidecar. Authentik forward-auth still gates external
|
|
access at the Traefik layer.
|
|
- **WS port 3000** is internal-only (no ingress, no Cloudflare DNS).
|
|
- **noVNC sidecar** (`forgejo.viktorbarzin.me/viktor/chrome-service-novnc`)
|
|
exposes a live HTML5 view of the headed Chromium session via
|
|
`x11vnc` (connected to Xvfb on `localhost:6099`) bridged to
|
|
`websockify` on port 6080. Service `chrome` maps :80 → :6080 and is
|
|
exposed via `ingress_factory` at `chrome.viktorbarzin.me`,
|
|
Authentik-gated. Both static page and WebSocket upgrade share the
|
|
same path — Cloudflare proxy, Cloudflared tunnel, Traefik, and
|
|
Authentik forward-auth all preserve `Upgrade: websocket`.
|
|
|
|
## Adding a new caller
|
|
|
|
See `stacks/chrome-service/README.md` for the four-step recipe:
|
|
|
|
1. Label the caller's namespace.
|
|
2. Add an `ExternalSecret` pulling `secret/chrome-service`.
|
|
3. Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN` env vars.
|
|
4. Vendor `stealth.js` and apply via `await context.add_init_script(...)`
|
|
after every `new_context()`.
|
|
|
|
## Limits + risks
|
|
|
|
- **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
|
|
license check, device-fingerprint mismatch, hotlink protection that
|
|
whitelists specific parent domains), the verifier returns
|
|
`is_playable=False` and the extractor moves on. No user-visible
|
|
breakage, just empty stream lists for that source.
|
|
- **JWPlayer DRM error 102630** — observed with several hmembeds embeds
|
|
even from the headed chrome-service. The license check bails because
|
|
the request origin isn't on the embed's allowlist; this is upstream
|
|
policy, not an infra defect.
|
|
- **Single replica + RWO PVC** — the deployment uses `Recreate` strategy.
|
|
Brief outage on rollout, ~30s for browser warmup.
|
|
- **No `/metrics` endpoint** — the cluster's generic
|
|
`KubePodCrashLooping` rule covers basic alerting. A Prometheus scrape
|
|
exporter is day-2 work.
|