infra/stacks/chrome-service
Viktor Barzin 2e50c1235c
All checks were successful
ci/woodpecker/push/default Pipeline was successful
chrome-service: grant emo shared browser access (noVNC + homelab browser CLI)
Viktor asked to give emo access to the cluster's headed Chrome so he can fill
in forms and get past anti-bot / captcha pages. emo was deliberately locked
out of chrome-service (noVNC Authentik allowlist was Viktor-only + his
power-user RBAC has no pods/portforward). Viktor's explicit decision: SHARE
his existing browser rather than stand up an isolated per-user instance,
accepting that emo can therefore reach Viktor's warmed logged-in sessions
(CDP has no per-context auth, so the single shared persistent profile is
reachable by anyone who can drive the browser). emo's CLI use is hands-off
(his agent can run it unattended).

- authentik: add emo (emil.barzin / emil.barzin@gmail.com) to CHROME_ALLOWED
  so the admin-services-restriction policy admits him to chrome.viktorbarzin.me
  (noVNC). Reverses the prior Viktor-only lock; comment updated to record why.
- chrome-service/rbac.tf (new): emo-browser ServiceAccount + long-lived token
  (dashboard-sa.tf pattern), a chrome-service-portforward Role granting
  pods/portforward, and a cluster read-only binding (oidc-power-user-readonly)
  so the SA can resolve the Service and emo's normal read access doesn't regress.
- t3-provision-users.sh: install_browser_kubeconfig installs a dual-context
  kubeconfig for any user with a <user>-browser SA — SA token as the default
  context (non-interactive, works headless), personal OIDC retained as the
  oidc@homelab named context. emo's OIDC-only kubeconfig can't authenticate the
  headless agent session that homelab browser needs.
- docs/architecture/chrome-service.md: document the shared-browser multi-user
  access model, the session-exposure trade-off, and how to grant/revoke a user.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 15:20:07 +00:00
..
files chrome-service: supervise x11vnc in noVNC sidecar so the VNC view self-heals 2026-06-27 08:03:29 +00:00
main.tf chrome-service: reconcile state after pipeline #366 was killed mid-apply + document cancel-previous hazard 2026-06-27 08:15:41 +00:00
rbac.tf chrome-service: grant emo shared browser access (noVNC + homelab browser CLI) 2026-06-28 15:20:07 +00:00
README.md fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
terragrunt.hcl fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00

chrome-service

In-cluster headed Chromium exposed over the Chrome DevTools Protocol (CDP) on TCP :9222. Sibling services drive it instead of running their own in-process browser — useful when the upstream tries to detect headless mode (e.g. hmembeds' disable-devtool.js redirect-to-google trap). Also publishes an hourly snapshot of cookies + localStorage so external dev-box Claude Code sessions can warm their isolated playwright contexts from the same logged-in profile.

Connect (in-cluster callers)

from playwright.async_api import async_playwright

CDP_URL = "http://chrome-service.chrome-service.svc.cluster.local:9222"

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(CDP_URL, timeout=15_000)
    # browser.contexts[0] is the persistent default context (the one
    # the user logs into via noVNC). For bot work that should NOT share
    # cookies, create a fresh incognito context:
    context = await browser.new_context()
    await context.add_init_script(STEALTH_JS)
    page = await context.new_page()
    ...
    await browser.close()

NetworkPolicy is the only gate on the CDP endpoint — labelled client namespaces or explicit fallback (f1-stream). No bearer token is required for the connection itself.

Snapshot endpoint (external callers)

# Bearer token comes from Vault secret/chrome-service.api_bearer_token.
TOKEN=$(vault kv get -field=api_bearer_token secret/chrome-service)
curl -fsSL \
  -H "Authorization: Bearer $TOKEN" \
  https://chrome.viktorbarzin.me/api/snapshot \
  > storage-state.json

# Use the snapshot with @playwright/mcp:
npx @playwright/mcp@latest --port 8931 --host localhost \
  --headless --browser chrome \
  --isolated --storage-state ./storage-state.json

The snapshot is refreshed hourly by the chrome-service-snapshot-harvester CronJob (schedule 23 * * * *) which calls context.storageState() via the CDP endpoint and writes to /profile/snapshots/storage-state.json (atomic rename). The snapshot-server sidecar serves that file.

Add a new in-cluster caller

  1. Label the caller's namespace so the chrome-service NetworkPolicy admits it:
    resource "kubernetes_namespace" "<ns>" {
      metadata {
        labels = {
          "chrome-service.viktorbarzin.me/client" = "true"
        }
      }
    }
    
  2. Inject CHROME_CDP_URL into the caller's pod env:
    env {
      name  = "CHROME_CDP_URL"
      value = "http://chrome-service.chrome-service.svc.cluster.local:9222"
    }
    
  3. Vendor stealth.js into the caller (or just paste — it's ~40 lines) and apply via await context.add_init_script(STEALTH_JS) after every new_context(). Without it, hmembeds-class anti-bot still trips.

Image pin

Both the server image (mcr.microsoft.com/playwright:v1.48.0-noble in main.tf) and the client (playwright==1.48.0 in callers' requirements) must match minor-versions. Bump in lockstep — Playwright protocol changes between minors.

Operations

  • Storage: encrypted PVC at /profile. Chromium user-data-dir lives at /profile/chromium-data — cookies + localStorage + IndexedDB persist here. Snapshots at /profile/snapshots/storage-state.json. Backed up tar+gzip every 6h to /srv/nfs/chrome-service-backup/, 30-day retention.
  • Probes: TCP/9222. Chrome's CDP serves /json/version once it's bound; TCP-open is enough for readiness.
  • Health page: visit https://chrome.viktorbarzin.me (Authentik- gated) to confirm the pod is up and to log into sites. The CDP port stays internal-only.
  • Token rotation: vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))'). Reloader cascades to the snapshot-server sidecar. Update the cached token on any dev box that pulls the snapshot: vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token.

Why headed (Xvfb) instead of headless?

disable-devtool.js and similar libraries detect navigator.webdriver, console-clear timing, and the HeadlessChromium/... user-agent suffix. Running headed inside Xvfb :99 reports as a normal Chromium, and the stealth init script handles the JS-visible giveaways.

Why direct chromium (CDP) instead of playwright launch-server?

playwright launch-server creates ephemeral browser contexts per connect() call — cookies and localStorage never persist to the PVC. The /profile mount only ever held npm cache + fontconfig cache despite the original docs claiming it held "cookies, localStorage, IndexedDB". Switched 2026-06-04 to direct chromium launch with --user-data-dir=/profile/chromium-data --remote-debugging-port=9222 so the persistent profile actually persists, and callers migrate chromium.connect(ws_url)chromium.connect_over_cdp(cdp_url).