From 9a2e9200063d4486a7eaaa5955d571b7f6b84388 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 18 Apr 2026 13:23:15 +0000 Subject: [PATCH] =?UTF-8?q?[rybbit]=20Narrow=20CF=20Worker=20routes=20to?= =?UTF-8?q?=20SITE=5FIDS=20hosts=20=E2=80=94=20fix=20free-tier=20quota=20b?= =?UTF-8?q?reach?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Context The `rybbit-analytics` Cloudflare Worker hit the free-tier quota of 100k requests/day. CF GraphQL analytics showed **97,153 invocations in the last 24h**, up from ~0 before 2026-04-17 21:26 UTC when Rybbit script injection migrated off the broken Traefik rewrite-body plugin (Yaegi ResponseWriter bug on Traefik v3.6.12) onto this Worker. Root cause: `wrangler.toml` registered two wildcard routes (`viktorbarzin.me/*` + `*.viktorbarzin.me/*`) which match every Cloudflare- proxied request on the zone. Only 27 of ~119 proxied hostnames appear in `SITE_IDS` in `index.js`; the rest burn Worker invocations for nothing since `siteId` is `null` and the Worker no-ops. Worse, the wildcard caught `rybbit.viktorbarzin.me` itself — every tracker `script.js` fetch and event POST round-trip was spawning its own Worker invocation (self-amplification). CF GraphQL per-host breakdown (last 24h, zone `viktorbarzin.me`): - Top waste (NOT in SITE_IDS): tuya-bridge 96.6k, beadboard 55.8k, terminal 30.2k, authentik 19.9k, claude-memory 12.6k - Sum of 27 SITE_IDS hosts: 47.2k - `rybbit.viktorbarzin.me` self-amplifier: 782 - Projected post-narrow: 46.4k/day (52% reduction, well under quota) ## This change Replaces the two wildcards with an explicit list of the **26** hostnames present in `SITE_IDS`. `rybbit.viktorbarzin.me` is deliberately excluded even though it has a site ID — it serves `/api/script.js` (JS) and `/api/track` (JSON), both of which fail the Worker's `text/html` content-type guard anyway. Leaving it routed just burned invocations. BEFORE AFTER ────────────────────────── ────────────────────────────────── viktorbarzin.me/* ┐ viktorbarzin.me/* ┐ *.viktorbarzin.me/* ┘ www.viktorbarzin.me/* │ actualbudget.vb.me/* │ → matches ~119 hosts ... (26 total) │ → matches → ~97k Worker inv/day stirling-pdf.vb.me/* │ only 26 → rybbit → self-amplifies vaultwarden.vb.me/* ┘ specific hosts rybbit.vb.me INTENTIONALLY EXCLUDED (self-amplifier) Deployment is unchanged — this Worker is not in Terraform. Deploy from `stacks/rybbit/worker/` via: CLOUDFLARE_EMAIL=vbarzin@gmail.com \ CLOUDFLARE_API_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) \ npx --yes wrangler@latest deploy `wrangler deploy` replaces all worker routes on the zone with the list from `wrangler.toml`, so there is no cleanup step. Already deployed today as version `d7f83980-a499-40f5-ba55-f8e18d531863` — this commit just captures the source of truth in git. ## What is NOT in this change - Self-hosted injection (nginx `sub_filter` sidecar, compiled Traefik plugin). Deferred — revisit only if analytics traffic grows past 80k/day again, or if we add more high-traffic hosts to `SITE_IDS`. - Cloudflare Workers Paid plan ($5/mo for 10M requests). User declined. - Moving the Worker into Terraform. Out of scope. - Any Rybbit backend/frontend changes. Rybbit itself continues running. ## Test plan ### Automated Post-deploy CF API enumeration of zone routes: $ curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \ | jq -r '.result[] | "\(.pattern)\t→ \(.script)"' | wc -l 26 # Wildcards gone: $ curl -s ... | jq -r '.result[].pattern' | grep -c '\*\.' 0 ### Manual Verification Script injection behaviour, verified via `curl`: 1. SITE_IDS host — script IS injected: $ curl -s -L https://viktorbarzin.me/ | grep -oE ']*rybbit[^>]*>'