chrome-service: in-cluster headed Chromium pool for f1-stream verifier
The f1-stream verifier's in-process headless Chromium kept tripping hmembeds' disable-devtool.js Performance detector (CDP latency on console.log vs console.table) and getting redirected to google.com. This adds a single-replica chrome-service stack running Playwright launch-server under Xvfb so callers can connect via WS+token to a shared headed browser. f1-stream's _ensure_browser now prefers chromium.connect(CHROME_WS_URL/CHROME_WS_TOKEN) and adds a vendored stealth init script (webdriver/plugins/languages/Permissions/WebGL spoofs + querySelector hijack to disarm disable-devtool-auto) on every new context. Falls back to in-process headless if the env vars aren't set. Encrypted PVC for profile + npm cache, NetworkPolicy to TCP/3000 gated by client-namespace label, 6h tar.gz backup CronJob to NFS, Authentik-gated nginx sidecar at chrome.viktorbarzin.me for human liveness checks. Image pinned to playwright:v1.48.0-noble in lockstep with the Python client's playwright==1.48.0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
41655096c7
commit
f18cd1d314
9 changed files with 901 additions and 14 deletions
90
stacks/chrome-service/README.md
Normal file
90
stacks/chrome-service/README.md
Normal file
|
|
@ -0,0 +1,90 @@
|
|||
# chrome-service
|
||||
|
||||
In-cluster headed Chromium exposed over Playwright's WebSocket protocol.
|
||||
Sibling services drive it instead of running their own in-process browser
|
||||
— useful when the upstream tries to detect headless mode (e.g. hmembeds'
|
||||
`disable-devtool.js` redirect-to-google trap).
|
||||
|
||||
## Connect
|
||||
|
||||
```python
|
||||
from playwright.async_api import async_playwright
|
||||
|
||||
WS_URL = "ws://chrome-service.chrome-service.svc.cluster.local:3000"
|
||||
WS_TOKEN = os.environ["CHROME_WS_TOKEN"] # 32-byte URL-safe random
|
||||
|
||||
async with async_playwright() as p:
|
||||
browser = await p.chromium.connect(f"{WS_URL}/{WS_TOKEN}", timeout=15_000)
|
||||
context = await browser.new_context()
|
||||
await context.add_init_script(STEALTH_JS) # see files/stealth.js
|
||||
page = await context.new_page()
|
||||
...
|
||||
await browser.close()
|
||||
```
|
||||
|
||||
The token comes from Vault KV `secret/chrome-service.api_bearer_token`,
|
||||
which ESO syncs into a per-namespace K8s Secret in each caller stack
|
||||
(see f1-stream's `chrome-service-client-secrets`).
|
||||
|
||||
## Add a new caller
|
||||
|
||||
1. **Label the caller's namespace** so the chrome-service NetworkPolicy
|
||||
admits it:
|
||||
```hcl
|
||||
resource "kubernetes_namespace" "<ns>" {
|
||||
metadata {
|
||||
labels = {
|
||||
"chrome-service.viktorbarzin.me/client" = "true"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
2. **Add an ExternalSecret** in the caller stack pulling the token:
|
||||
```hcl
|
||||
resource "kubernetes_manifest" "chrome_token" {
|
||||
manifest = {
|
||||
apiVersion = "external-secrets.io/v1beta1"
|
||||
kind = "ExternalSecret"
|
||||
metadata = { name = "chrome-service-client-secrets", namespace = "<ns>" }
|
||||
spec = {
|
||||
refreshInterval = "15m"
|
||||
secretStoreRef = { name = "vault-kv", kind = "ClusterSecretStore" }
|
||||
target = { name = "chrome-service-client-secrets" }
|
||||
dataFrom = [{ extract = { key = "chrome-service" } }]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
3. **Inject `CHROME_WS_URL` + `CHROME_WS_TOKEN`** into the caller's pod env.
|
||||
Use `secret_key_ref` for the token; the URL is a plain value.
|
||||
4. **Vendor `stealth.js`** into the caller (or just paste — it's ~40 lines)
|
||||
and apply via `await context.add_init_script(STEALTH_JS)` after every
|
||||
`new_context()`. Without it, hmembeds-class anti-bot still trips.
|
||||
|
||||
## Image pin
|
||||
|
||||
Both the server image (`mcr.microsoft.com/playwright:v1.48.0-noble` in
|
||||
`main.tf`) and the client (`playwright==1.48.0` in callers' requirements)
|
||||
must match minor-versions. Bump in lockstep — Playwright protocol changes
|
||||
between minors.
|
||||
|
||||
## Operations
|
||||
|
||||
- **Storage**: encrypted PVC at `/profile` for cookies + npm cache. Ephemeral
|
||||
contexts (`browser.new_context()`) bypass the profile; persistent contexts
|
||||
share it. Backed up tar+gzip every 6h to `/srv/nfs/chrome-service-backup/`,
|
||||
30-day retention.
|
||||
- **Probes**: TCP/3000. Playwright run-server has no HTTP `/health`; a TCP
|
||||
open is the only liveness signal available without spinning a browser.
|
||||
- **Health page**: visit `https://chrome.viktorbarzin.me` (Authentik-gated)
|
||||
to confirm the pod is up. The WS port stays internal-only.
|
||||
- **Token rotation**: `vault kv put secret/chrome-service api_bearer_token=$(python3 -c 'import secrets; print(secrets.token_urlsafe(32))')`.
|
||||
Reloader cascades the rotation to both the server pod and any caller
|
||||
whose secret has the `reloader.stakater.com/auto = "true"` annotation.
|
||||
|
||||
## Why headed (Xvfb) instead of headless?
|
||||
|
||||
`disable-devtool.js` and similar libraries detect `navigator.webdriver`,
|
||||
console-clear timing, and the `HeadlessChromium/...` user-agent suffix.
|
||||
Running headed inside `Xvfb :99` reports as a normal Chromium, and the
|
||||
stealth init script handles the JS-visible giveaways.
|
||||
Loading…
Add table
Add a link
Reference in a new issue