Viktor Barzin fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]

6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-09 08:45:33 +00:00

7.5 KiB

Raw Blame History

Runbook — chrome-service snapshot pipeline

Operational playbook for the hourly cookie-snapshot pipeline that warms external Claude Code sessions on the dev box. Architecture in architecture/chrome-service.md.

At a glance

Component	Where	When	What
chrome-service Deployment	`chrome-service` ns	always-on	headed chromium, CDP :9222, persistent /profile/chromium-data
snapshot-server sidecar	same pod	always-on	serves `/api/snapshot`, bearer-gated, port 8088
snapshot-harvester CronJob	`chrome-service` ns	`23 * * * *`	dumps `storage_state()` via CDP → `/profile/snapshots/storage-state.json`
dev-box refresh timer	each dev box	hourly	curls `chrome.viktorbarzin.me/api/snapshot` → `~/.cache/playwright-shared-storage-state.json`
dev-box `playwright-mcp.service`	each dev box	always-on	`@playwright/mcp --isolated --storage-state=…` per-MCP-connection contexts

Day-to-day

Log into a new site (warm the profile)

Open https://chrome.viktorbarzin.me/ (Authentik will gate).
The noVNC view of the in-cluster headed chromium loads. Click on the browser window, navigate, log in.
Cookies land in /profile/chromium-data/Default/Cookies on the PVC.
Within ≤60 min, the snapshot-harvester CronJob picks them up and writes the snapshot. Within ≤60 min after that, dev boxes pull the new file. New Claude Code sessions see the new cookies.
To skip the wait: trigger the harvester now (next section).

Trigger snapshot harvester manually

kubectl -n chrome-service create job \
  --from=cronjob/chrome-service-snapshot-harvester \
  snapshot-harvest-$(date +%s)

# Watch logs
kubectl -n chrome-service logs -f -l job-name=$(kubectl -n chrome-service get jobs -o name | tail -1 | cut -d/ -f2)

Expected: wrote snapshot (… bytes) to /profile/snapshots/storage-state.json.

Trigger dev-box refresh manually

# On the dev box, as the user whose Claude Code sessions need the new state:
systemctl --user start playwright-snapshot-refresh.service

# Or directly:
/usr/local/bin/playwright-snapshot-refresh

# Verify
ls -la ~/.cache/playwright-shared-storage-state.json

Inspect the current snapshot

# In-cluster (from any pod with kubectl exec into the chrome-service pod):
kubectl -n chrome-service exec deploy/chrome-service -c snapshot-server -- \
  cat /profile/snapshots/storage-state.json | jq '.cookies | length'

# Externally (via the bearer-gated endpoint):
TOKEN=$(vault kv get -field=api_bearer_token secret/chrome-service)
curl -fsSL -H "Authorization: Bearer $TOKEN" \
  https://chrome.viktorbarzin.me/api/snapshot | jq '.cookies | length'

Failure modes

"no browser contexts found"

The harvester reports no browser contexts found — chrome-service may not have launched a persistent context yet and exits non-zero.

Cause: chromium just started and hasn't created its default context yet, or it crashed.

Fix: check chrome-service pod logs (kubectl -n chrome-service logs deploy/chrome-service -c chrome-service). The next hourly run will retry. If chromium is wedged: kubectl -n chrome-service rollout restart deploy/chrome-service (strategy = Recreate, brief downtime).

"connect_over_cdp failed"

Harvester or any in-cluster caller can't reach the CDP endpoint.

Cause: chrome-service pod not Ready, NetworkPolicy doesn't admit the caller's namespace, or chromium isn't listening on :9222.

Diagnose:

kubectl -n chrome-service get pods
kubectl -n chrome-service describe networkpolicy chrome-service-ws-ingress

# From inside the cluster (e.g. a debug pod in chrome-service ns):
nc -zv chrome-service.chrome-service.svc.cluster.local 9222
curl -fsSL http://chrome-service.chrome-service.svc.cluster.local:9222/json/version

Fix: depends on the diagnosis. NetworkPolicy needs the caller's namespace label or an explicit name-fallback. If chromium isn't binding, check the container logs.

Dev-box `playwright-snapshot-refresh` returns 401

The bearer token in ~/.config/playwright/token doesn't match the server's. Almost always means the Vault secret was rotated and the local cache is stale.

Fix:

vault login -method=oidc  # if needed
vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token
chmod 600 ~/.config/playwright/token
systemctl --user start playwright-snapshot-refresh.service

Dev-box `playwright-snapshot-refresh` returns 404 with "snapshot not yet available"

The harvester hasn't run successfully yet (fresh cluster, or all recent runs failed). Trigger it manually (see "Trigger snapshot harvester manually").

Claude Code sessions still see old cookies

The MCP server reads the snapshot file at process start and seeds each new context with it. Existing MCP sessions don't hot-reload — they keep the cookies they were seeded with at session start. New sessions get the fresh snapshot.

Fix: restart the MCP server on the dev box to pick up the new file:

systemctl --user restart playwright-mcp.service

Snapshot file is suspiciously small or empty cookies array

The persistent chromium context isn't holding any cookies. Probably means the user hasn't logged into anything via noVNC, or chromium was relaunched without preserving /profile/chromium-data.