chrome-service: switch to CDP + persistent profile + hourly snapshot pipeline
The chrome-service stack ran `playwright launch-server`, which creates
ephemeral browser contexts per `connect()`. Despite the encrypted PVC
mounted at /profile, no chromium user-data ever persisted — only npm
cache + fontconfig. Logging in via noVNC was effectively a no-op.
Refactor:
- Replace launch-server with direct chromium (TCP CDP on :9223 internal),
fronted by a Python HTTP+WS bridge on :9222 that rewrites the Host
header to bypass Chrome's hardcoded DNS-rebinding protection (no
`--remote-allow-hosts` flag exists in stock Chrome 130; verified by
binary string grep). Bridge also forces Connection: close on HTTP
responses so Node ws opens a fresh TCP for the WS upgrade rather than
trying to reuse the dead keep-alive socket.
- Add `--user-data-dir=/profile/chromium-data` so cookies/localStorage
actually persist on the encrypted PVC.
- New snapshot-server sidecar (stdlib python HTTP) serves
GET /api/snapshot at chrome.viktorbarzin.me/api/snapshot,
bearer-token-gated by the existing api_bearer_token.
- New chrome-service-snapshot-harvester CronJob (hourly) connects via
CDP, dumps storage_state() (cookies + localStorage), writes atomically
to /profile/snapshots/storage-state.json.
- NetworkPolicy: TCP/9222 (was :3000), TCP/8088 added for traefik.
Caller migration:
- f1-stream: `chromium.connect(ws_url)` → `chromium.connect_over_cdp(cdp_url)`,
env var CHROME_WS_URL → CHROME_CDP_URL. CHROME_WS_TOKEN dropped (no
longer used by code; ExternalSecret kept for symmetry with the snapshot
endpoint).
Dev-box side (out of scope for this commit — see ~/.config/systemd/user/):
- playwright-mcp.service flips to `--isolated --storage-state=...`
so per-Claude-Code-session ephemeral contexts seed from the snapshot.
- playwright-snapshot-refresh.{service,timer} (hourly) pulls the
snapshot via the bearer-gated HTTPS endpoint.
Docs updated:
- docs/architecture/chrome-service.md — new architecture diagram + wire protocol.
- docs/runbooks/chrome-service-snapshot.md — day-2 ops (refresh, rotation,
failure modes, restore).
- stacks/chrome-service/README.md — connect_over_cdp recipe.
Design spec at docs/superpowers/specs/2026-06-04-playwright-per-session-browser-design.md.
This commit is contained in:
parent
b64d8d6168
commit
deede6dd11
10 changed files with 1152 additions and 177 deletions
|
|
@ -1,16 +1,23 @@
|
|||
# chrome-service — In-cluster headed Chromium pool
|
||||
# chrome-service — In-cluster headed Chromium with persistent profile
|
||||
|
||||
## Overview
|
||||
|
||||
`chrome-service` is a single-replica, persistent-profile, bearer-token-gated
|
||||
Playwright **launch-server** that exposes a headed Chromium browser over a
|
||||
WebSocket. Sibling services connect to it instead of running their own
|
||||
in-process Chromium when the upstream's anti-bot tooling
|
||||
(`disable-devtool.js` redirect-to-google trap, console-clear timing tricks,
|
||||
`navigator.webdriver` checks) defeats a headless browser.
|
||||
`chrome-service` is a single-replica, persistent-profile, headed
|
||||
Chromium browser exposed over the Chrome DevTools Protocol (CDP). It
|
||||
serves two distinct populations:
|
||||
|
||||
Initial caller: `f1-stream`'s `playback_verifier`. Future callers attach
|
||||
via the WS+token contract documented in `stacks/chrome-service/README.md`.
|
||||
1. **In-cluster automation callers** (e.g. `f1-stream`'s
|
||||
`playback_verifier`, `chrome_browser` extractor) — connect via
|
||||
`chromium.connect_over_cdp("http://chrome-service.chrome-service.svc:9222")`
|
||||
to drive a real browser when upstream anti-bot trips a headless one
|
||||
(`disable-devtool.js` redirect-to-google trap, `navigator.webdriver`
|
||||
checks, console-clear timing tricks).
|
||||
2. **External dev-box Claude Code sessions** — pull an hourly snapshot
|
||||
of cookies + localStorage from `chrome.viktorbarzin.me/api/snapshot`
|
||||
(bearer-gated) and seed local `@playwright/mcp` instances in
|
||||
`--isolated --storage-state=…` mode. This is how concurrent Claude
|
||||
Code sessions get their own isolated browser contexts without losing
|
||||
shared cookies for logged-in sites.
|
||||
|
||||
## Why a separate stack
|
||||
|
||||
|
|
@ -25,8 +32,8 @@ In-process Chromium inside `f1-stream`:
|
|||
|
||||
`chrome-service` solves this by:
|
||||
|
||||
1. Running **headed** under `Xvfb :99` (via `playwright launch-server` with
|
||||
a JSON config that pins `headless: false`).
|
||||
1. Running **headed** under `Xvfb :99` (chromium with `DISPLAY=:99`,
|
||||
not `--headless`).
|
||||
2. Living in a long-lived pod so JIT browser launch latency disappears.
|
||||
3. Allowing a per-context init script
|
||||
(`stacks/chrome-service/files/stealth.js` ~ 40 lines, vendored from
|
||||
|
|
@ -35,25 +42,67 @@ In-process Chromium inside `f1-stream`:
|
|||
to hide the `disable-devtool-auto` script-tag attribute so the lib's
|
||||
IIFE exits early.
|
||||
|
||||
## Wire protocol
|
||||
## Wire protocol — CDP (current, since 2026-06-04)
|
||||
|
||||
```text
|
||||
ws://chrome-service.chrome-service.svc.cluster.local:3000/<TOKEN>
|
||||
http://chrome-service.chrome-service.svc.cluster.local:9222
|
||||
│
|
||||
┌───────────────────────────────┼───────────────────────────────┐
|
||||
│ caller pod │ chrome-service pod
|
||||
│ (e.g. f1-stream) │ (single replica)
|
||||
│ │
|
||||
│ CHROME_WS_URL ──────────────┘
|
||||
│ CHROME_WS_TOKEN ─── from `secret/chrome-service.api_bearer_token` (ESO)
|
||||
│ CHROME_CDP_URL ──────────────┘
|
||||
│
|
||||
│ await chromium.connect(f"{ws}/{token}")
|
||||
│ await ctx.add_init_script(STEALTH_JS)
|
||||
│ await chromium.connect_over_cdp(cdp_url)
|
||||
│ context = await browser.new_context() ← incognito (no cookies)
|
||||
│ OR: context = browser.contexts[0] ← persistent (shared cookies)
|
||||
│ await context.add_init_script(STEALTH_JS)
|
||||
│ page.goto("https://upstream.com/embed/...")
|
||||
│
|
||||
└─── ←── pages render under Xvfb, headed Chromium ──── ─────────┘
|
||||
```
|
||||
|
||||
### Wire protocol — WS (legacy, removed 2026-06-04)
|
||||
|
||||
The previous design used `playwright launch-server --browser chromium`
|
||||
with a path-token (`ws://...:3000/<TOKEN>`). Callers used
|
||||
`chromium.connect(ws_url)`. **Problem**: `launch-server` creates
|
||||
ephemeral browser contexts per `connect()` call, so cookies never
|
||||
persisted to the PVC despite the `/profile` mount. We migrated to
|
||||
direct chromium launch with `--user-data-dir` + CDP exposed on :9222
|
||||
so cookies actually live across pod restarts.
|
||||
|
||||
## Cookie warming + snapshot pipeline
|
||||
|
||||
```text
|
||||
┌─────────── chrome-service pod ──────────────────────────────────────────┐
|
||||
│ │
|
||||
│ chrome-service container (chromium --user-data-dir=/profile/chromium-data
|
||||
│ --remote-debugging-port=9222) │
|
||||
│ ▲ │
|
||||
│ │ user logs in via noVNC ← chrome.viktorbarzin.me (Authentik) │
|
||||
│ │ │
|
||||
│ Cookies + localStorage land in /profile/chromium-data/Default/ │
|
||||
│ │
|
||||
│ snapshot-server sidecar (python stdlib HTTP server, :8088) │
|
||||
│ ↑ serves /profile/snapshots/storage-state.json (bearer-gated) │
|
||||
└──────────────────────────────────────────────────────────────────────────┘
|
||||
▲
|
||||
│ hourly (cron 23 * * * *)
|
||||
│
|
||||
┌──────┴── chrome-service-snapshot-harvester CronJob ─────────────────────┐
|
||||
│ podAffinity → same node as chrome-service (RWO PVC) │
|
||||
│ python: connect_over_cdp + ctx.storage_state(path=...) │
|
||||
│ writes /profile/snapshots/storage-state.json (atomic rename) │
|
||||
└──────────────────────────────────────────────────────────────────────────┘
|
||||
|
||||
External caller (dev box):
|
||||
systemd timer (hourly) → curl -H "Authorization: Bearer $TOKEN"
|
||||
https://chrome.viktorbarzin.me/api/snapshot
|
||||
-o ~/.cache/playwright-shared-storage-state.json
|
||||
@playwright/mcp --isolated --storage-state ~/.cache/...storage-state.json
|
||||
```
|
||||
|
||||
## Image pin
|
||||
|
||||
Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
||||
|
|
@ -62,17 +111,17 @@ Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
|||
minor-versions**. Bump in lockstep — Playwright protocol changes between
|
||||
minors and the client cannot connect to a mismatched server.
|
||||
|
||||
The Microsoft image ships only the browser binaries, not the `playwright`
|
||||
npm SDK; the start command runs `npx -y playwright@1.48.0 launch-server`
|
||||
which downloads the SDK on first start (cached under `$HOME/.npm` via the
|
||||
PVC) and reuses it on subsequent restarts.
|
||||
The harvester + snapshot-server sidecar use
|
||||
`mcr.microsoft.com/playwright/python:v1.48.0-noble` — same playwright
|
||||
minor, with Python-side bindings pre-installed.
|
||||
|
||||
## Storage
|
||||
|
||||
- **`chrome-service-profile-encrypted`** (PVC, 2Gi → 10Gi autoresize,
|
||||
`proxmox-lvm-encrypted`) — Chromium user-data dir + npm cache.
|
||||
`proxmox-lvm-encrypted`) — Chromium user-data dir at
|
||||
`/profile/chromium-data` + snapshot at `/profile/snapshots/storage-state.json`.
|
||||
Encrypted because cookies/localStorage may include third-party auth tokens
|
||||
for sites callers drive. `HOME=/profile` so npx caches there.
|
||||
for sites callers drive.
|
||||
- **`chrome-service-backup-host`** (NFS, RWX) — destination for a 6-hourly
|
||||
CronJob that `tar -czf /backup/<YYYY_MM_DD_HH>.tar.gz -C /profile .`,
|
||||
retention 30 days.
|
||||
|
|
@ -82,41 +131,45 @@ PVC) and reuses it on subsequent restarts.
|
|||
- Vault KV `secret/chrome-service.api_bearer_token` — 32-byte URL-safe
|
||||
random, rotated by hand:
|
||||
`vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
|
||||
- ESO syncs into namespace-local Secret `chrome-service-secrets`
|
||||
(server pod) and `chrome-service-client-secrets` (each caller pod).
|
||||
- ESO syncs into namespace-local Secret `chrome-service-secrets`. The
|
||||
`snapshot-server` sidecar reads it via `secret_key_ref`.
|
||||
- f1-stream still imports the secret (via `chrome-service-client-secrets`)
|
||||
for parity, but the CDP endpoint no longer requires it for connection —
|
||||
NetworkPolicy is the gate.
|
||||
- Reloader (`reloader.stakater.com/auto = "true"`) cascades token rotation
|
||||
to both server and any annotated caller — no manual rollout.
|
||||
to the snapshot-server sidecar.
|
||||
- **Dev-box cache**: each dev box keeps a local copy at
|
||||
`~/.config/playwright/token` (chmod 600). Re-fetch from Vault after
|
||||
rotation: `vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token`.
|
||||
|
||||
## Network controls
|
||||
|
||||
- **`kubernetes_network_policy_v1.ws_ingress`** — two separate ingress
|
||||
rules on the same policy:
|
||||
- **TCP/3000** (Playwright WS): only namespaces labelled
|
||||
- **`kubernetes_network_policy_v1.ws_ingress`** — three ingress rules:
|
||||
- **TCP/9222** (Chromium CDP): only namespaces labelled
|
||||
`chrome-service.viktorbarzin.me/client = "true"` (plus an explicit
|
||||
fallback for `f1-stream` by `kubernetes.io/metadata.name`).
|
||||
- **TCP/6080** (noVNC HTTP+WS): only the `traefik` namespace, since
|
||||
the public-facing path is `chrome.viktorbarzin.me` ingress →
|
||||
Traefik → sidecar. Authentik forward-auth still gates external
|
||||
access at the Traefik layer.
|
||||
- **WS port 3000** is internal-only (no ingress, no Cloudflare DNS).
|
||||
fallback for `f1-stream` by `kubernetes.io/metadata.name`, plus
|
||||
`chrome-service`'s own namespace for the harvester CronJob).
|
||||
- **TCP/6080** (noVNC HTTP+WS): only the `traefik` namespace.
|
||||
- **TCP/8088** (snapshot-server): only the `traefik` namespace
|
||||
(bearer-token check happens in `snapshot_server.py`).
|
||||
- **CDP port 9222** is internal-only (no ingress, no Cloudflare DNS).
|
||||
- **noVNC sidecar** (`forgejo.viktorbarzin.me/viktor/chrome-service-novnc`)
|
||||
exposes a live HTML5 view of the headed Chromium session via
|
||||
`x11vnc` (connected to Xvfb on `localhost:6099`) bridged to
|
||||
`websockify` on port 6080. Service `chrome` maps :80 → :6080 and is
|
||||
exposed via `ingress_factory` at `chrome.viktorbarzin.me`,
|
||||
Authentik-gated. Both static page and WebSocket upgrade share the
|
||||
same path — Cloudflare proxy, Cloudflared tunnel, Traefik, and
|
||||
Authentik forward-auth all preserve `Upgrade: websocket`.
|
||||
Authentik-gated.
|
||||
- **snapshot-server sidecar** (`mcr.microsoft.com/playwright/python:v1.48.0-noble`)
|
||||
serves `GET /api/snapshot` from `/profile/snapshots/storage-state.json`,
|
||||
bearer-gated by `PW_TOKEN`. Service `chrome-snapshot` maps :8088 → :8088
|
||||
and is exposed at `chrome.viktorbarzin.me/api/snapshot` via a second
|
||||
`ingress_factory` call with `auth = "none"` (the bearer check is in
|
||||
the sidecar, not at the ingress layer).
|
||||
|
||||
## Adding a new caller
|
||||
## Adding a new in-cluster caller
|
||||
|
||||
See `stacks/chrome-service/README.md` for the four-step recipe:
|
||||
|
||||
1. Label the caller's namespace.
|
||||
2. Add an `ExternalSecret` pulling `secret/chrome-service`.
|
||||
3. Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN` env vars.
|
||||
4. Vendor `stealth.js` and apply via `await context.add_init_script(...)`
|
||||
after every `new_context()`.
|
||||
See `stacks/chrome-service/README.md` for the recipe (label namespace,
|
||||
inject `CHROME_CDP_URL`, vendor `stealth.js`).
|
||||
|
||||
## Limits + risks
|
||||
|
||||
|
|
@ -134,3 +187,9 @@ See `stacks/chrome-service/README.md` for the four-step recipe:
|
|||
- **No `/metrics` endpoint** — the cluster's generic
|
||||
`KubePodCrashLooping` rule covers basic alerting. A Prometheus scrape
|
||||
exporter is day-2 work.
|
||||
- **Snapshot covers cookies + localStorage only** — Playwright's
|
||||
`storage_state()` API doesn't capture IndexedDB or sessionStorage.
|
||||
Sites that rely on those for auth won't warm via the snapshot.
|
||||
- **Snapshot freshness up to 1h stale** — if a site rotates session
|
||||
cookies more often than that, an on-demand refresh CLI is needed
|
||||
(deferred to follow-on).
|
||||
|
|
|
|||
211
docs/runbooks/chrome-service-snapshot.md
Normal file
211
docs/runbooks/chrome-service-snapshot.md
Normal file
|
|
@ -0,0 +1,211 @@
|
|||
# Runbook — chrome-service snapshot pipeline
|
||||
|
||||
Operational playbook for the hourly cookie-snapshot pipeline that warms
|
||||
external Claude Code sessions on the dev box. Architecture in
|
||||
`architecture/chrome-service.md`.
|
||||
|
||||
## At a glance
|
||||
|
||||
| Component | Where | When | What |
|
||||
|---|---|---|---|
|
||||
| chrome-service Deployment | `chrome-service` ns | always-on | headed chromium, CDP :9222, persistent /profile/chromium-data |
|
||||
| snapshot-server sidecar | same pod | always-on | serves `/api/snapshot`, bearer-gated, port 8088 |
|
||||
| snapshot-harvester CronJob | `chrome-service` ns | `23 * * * *` | dumps `storage_state()` via CDP → `/profile/snapshots/storage-state.json` |
|
||||
| dev-box refresh timer | each dev box | hourly | curls `chrome.viktorbarzin.me/api/snapshot` → `~/.cache/playwright-shared-storage-state.json` |
|
||||
| dev-box `playwright-mcp.service` | each dev box | always-on | `@playwright/mcp --isolated --storage-state=…` per-MCP-connection contexts |
|
||||
|
||||
## Day-to-day
|
||||
|
||||
### Log into a new site (warm the profile)
|
||||
|
||||
1. Open `https://chrome.viktorbarzin.me/` (Authentik will gate).
|
||||
2. The noVNC view of the in-cluster headed chromium loads. Click on the
|
||||
browser window, navigate, log in.
|
||||
3. Cookies land in `/profile/chromium-data/Default/Cookies` on the PVC.
|
||||
4. Within ≤60 min, the snapshot-harvester CronJob picks them up and
|
||||
writes the snapshot. Within ≤60 min after that, dev boxes pull the
|
||||
new file. New Claude Code sessions see the new cookies.
|
||||
5. To skip the wait: trigger the harvester now (next section).
|
||||
|
||||
### Trigger snapshot harvester manually
|
||||
|
||||
```bash
|
||||
kubectl -n chrome-service create job \
|
||||
--from=cronjob/chrome-service-snapshot-harvester \
|
||||
snapshot-harvest-$(date +%s)
|
||||
|
||||
# Watch logs
|
||||
kubectl -n chrome-service logs -f -l job-name=$(kubectl -n chrome-service get jobs -o name | tail -1 | cut -d/ -f2)
|
||||
```
|
||||
|
||||
Expected: `wrote snapshot (… bytes) to /profile/snapshots/storage-state.json`.
|
||||
|
||||
### Trigger dev-box refresh manually
|
||||
|
||||
```bash
|
||||
# On the dev box, as the user whose Claude Code sessions need the new state:
|
||||
systemctl --user start playwright-snapshot-refresh.service
|
||||
|
||||
# Or directly:
|
||||
/usr/local/bin/playwright-snapshot-refresh
|
||||
|
||||
# Verify
|
||||
ls -la ~/.cache/playwright-shared-storage-state.json
|
||||
```
|
||||
|
||||
### Inspect the current snapshot
|
||||
|
||||
```bash
|
||||
# In-cluster (from any pod with kubectl exec into the chrome-service pod):
|
||||
kubectl -n chrome-service exec deploy/chrome-service -c snapshot-server -- \
|
||||
cat /profile/snapshots/storage-state.json | jq '.cookies | length'
|
||||
|
||||
# Externally (via the bearer-gated endpoint):
|
||||
TOKEN=$(vault kv get -field=api_bearer_token secret/chrome-service)
|
||||
curl -fsSL -H "Authorization: Bearer $TOKEN" \
|
||||
https://chrome.viktorbarzin.me/api/snapshot | jq '.cookies | length'
|
||||
```
|
||||
|
||||
## Failure modes
|
||||
|
||||
### "no browser contexts found"
|
||||
|
||||
The harvester reports `no browser contexts found — chrome-service may
|
||||
not have launched a persistent context yet` and exits non-zero.
|
||||
|
||||
**Cause**: chromium just started and hasn't created its default context
|
||||
yet, or it crashed.
|
||||
|
||||
**Fix**: check chrome-service pod logs (`kubectl -n chrome-service logs
|
||||
deploy/chrome-service -c chrome-service`). The next hourly run will
|
||||
retry. If chromium is wedged: `kubectl -n chrome-service rollout restart
|
||||
deploy/chrome-service` (strategy = Recreate, brief downtime).
|
||||
|
||||
### "connect_over_cdp failed"
|
||||
|
||||
Harvester or any in-cluster caller can't reach the CDP endpoint.
|
||||
|
||||
**Cause**: chrome-service pod not Ready, NetworkPolicy doesn't admit
|
||||
the caller's namespace, or chromium isn't listening on :9222.
|
||||
|
||||
**Diagnose**:
|
||||
```bash
|
||||
kubectl -n chrome-service get pods
|
||||
kubectl -n chrome-service describe networkpolicy chrome-service-ws-ingress
|
||||
|
||||
# From inside the cluster (e.g. a debug pod in chrome-service ns):
|
||||
nc -zv chrome-service.chrome-service.svc.cluster.local 9222
|
||||
curl -fsSL http://chrome-service.chrome-service.svc.cluster.local:9222/json/version
|
||||
```
|
||||
|
||||
**Fix**: depends on the diagnosis. NetworkPolicy needs the caller's
|
||||
namespace label or an explicit name-fallback. If chromium isn't
|
||||
binding, check the container logs.
|
||||
|
||||
### Dev-box `playwright-snapshot-refresh` returns 401
|
||||
|
||||
The bearer token in `~/.config/playwright/token` doesn't match the
|
||||
server's. Almost always means the Vault secret was rotated and the
|
||||
local cache is stale.
|
||||
|
||||
**Fix**:
|
||||
```bash
|
||||
vault login -method=oidc # if needed
|
||||
vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token
|
||||
chmod 600 ~/.config/playwright/token
|
||||
systemctl --user start playwright-snapshot-refresh.service
|
||||
```
|
||||
|
||||
### Dev-box `playwright-snapshot-refresh` returns 404 with "snapshot not yet available"
|
||||
|
||||
The harvester hasn't run successfully yet (fresh cluster, or all
|
||||
recent runs failed). Trigger it manually (see "Trigger snapshot
|
||||
harvester manually").
|
||||
|
||||
### Claude Code sessions still see old cookies
|
||||
|
||||
The MCP server reads the snapshot file at process start and seeds each
|
||||
new context with it. **Existing MCP sessions don't hot-reload** — they
|
||||
keep the cookies they were seeded with at session start. New sessions
|
||||
get the fresh snapshot.
|
||||
|
||||
**Fix**: restart the MCP server on the dev box to pick up the new file:
|
||||
```bash
|
||||
systemctl --user restart playwright-mcp.service
|
||||
```
|
||||
|
||||
### Snapshot file is suspiciously small or empty cookies array
|
||||
|
||||
The persistent chromium context isn't holding any cookies. Probably
|
||||
means the user hasn't logged into anything via noVNC, or chromium was
|
||||
relaunched without preserving `/profile/chromium-data`.
|
||||
|
||||
**Diagnose**:
|
||||
```bash
|
||||
kubectl -n chrome-service exec deploy/chrome-service -c chrome-service -- \
|
||||
ls -la /profile/chromium-data/Default/Cookies
|
||||
```
|
||||
|
||||
A populated `Cookies` SQLite file should be several hundred KB once
|
||||
real logins exist. If it's missing or empty, log in via noVNC.
|
||||
|
||||
## Token rotation
|
||||
|
||||
```bash
|
||||
# Rotate Vault secret (32-byte URL-safe random).
|
||||
vault kv put secret/chrome-service \
|
||||
api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')
|
||||
|
||||
# Reloader auto-restarts chrome-service pod (snapshot-server picks up new token).
|
||||
|
||||
# On EVERY dev box that pulls the snapshot:
|
||||
vault kv get -field=api_bearer_token secret/chrome-service > ~/.config/playwright/token
|
||||
chmod 600 ~/.config/playwright/token
|
||||
|
||||
# Verify the next refresh succeeds:
|
||||
systemctl --user start playwright-snapshot-refresh.service
|
||||
journalctl --user -u playwright-snapshot-refresh.service -n 20
|
||||
```
|
||||
|
||||
## Restore from a backup tarball
|
||||
|
||||
The 6-hourly backup CronJob writes `tar -czf /backup/YYYY_MM_DD_HH.tar.gz
|
||||
-C /profile .` to NFS at `/srv/nfs/chrome-service-backup/`. To restore
|
||||
the entire profile:
|
||||
|
||||
```bash
|
||||
# 1. Scale chrome-service down so its lock is released.
|
||||
kubectl -n chrome-service scale deploy/chrome-service --replicas=0
|
||||
|
||||
# 2. Mount the PVC in a helper pod and restore.
|
||||
kubectl -n chrome-service apply -f - <<EOF
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
metadata: {name: restore-helper, namespace: chrome-service}
|
||||
spec:
|
||||
containers:
|
||||
- name: helper
|
||||
image: alpine:3.20
|
||||
command: [sleep, infinity]
|
||||
volumeMounts:
|
||||
- {name: profile, mountPath: /profile}
|
||||
- {name: backup, mountPath: /backup, readOnly: true}
|
||||
volumes:
|
||||
- name: profile
|
||||
persistentVolumeClaim: {claimName: chrome-service-profile-encrypted}
|
||||
- name: backup
|
||||
persistentVolumeClaim: {claimName: chrome-service-backup-host}
|
||||
restartPolicy: Never
|
||||
EOF
|
||||
|
||||
kubectl -n chrome-service wait --for=condition=ready pod/restore-helper
|
||||
|
||||
kubectl -n chrome-service exec restore-helper -- sh -c '
|
||||
rm -rf /profile/chromium-data /profile/snapshots &&
|
||||
tar -xzf /backup/2026_06_04_18.tar.gz -C /profile
|
||||
'
|
||||
|
||||
# 3. Cleanup helper, scale chrome-service back up.
|
||||
kubectl -n chrome-service delete pod restore-helper
|
||||
kubectl -n chrome-service scale deploy/chrome-service --replicas=1
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue