[rybbit] Narrow CF Worker routes to SITE_IDS hosts — fix free-tier quota breach
## Context
The `rybbit-analytics` Cloudflare Worker hit the free-tier quota of 100k
requests/day. CF GraphQL analytics showed **97,153 invocations in the last
24h**, up from ~0 before 2026-04-17 21:26 UTC when Rybbit script injection
migrated off the broken Traefik rewrite-body plugin (Yaegi ResponseWriter
bug on Traefik v3.6.12) onto this Worker.
Root cause: `wrangler.toml` registered two wildcard routes
(`viktorbarzin.me/*` + `*.viktorbarzin.me/*`) which match every Cloudflare-
proxied request on the zone. Only 27 of ~119 proxied hostnames appear in
`SITE_IDS` in `index.js`; the rest burn Worker invocations for nothing since
`siteId` is `null` and the Worker no-ops. Worse, the wildcard caught
`rybbit.viktorbarzin.me` itself — every tracker `script.js` fetch and event
POST round-trip was spawning its own Worker invocation (self-amplification).
CF GraphQL per-host breakdown (last 24h, zone `viktorbarzin.me`):
- Top waste (NOT in SITE_IDS): tuya-bridge 96.6k, beadboard 55.8k,
terminal 30.2k, authentik 19.9k, claude-memory 12.6k
- Sum of 27 SITE_IDS hosts: 47.2k
- `rybbit.viktorbarzin.me` self-amplifier: 782
- Projected post-narrow: 46.4k/day (52% reduction, well under quota)
## This change
Replaces the two wildcards with an explicit list of the **26** hostnames
present in `SITE_IDS`. `rybbit.viktorbarzin.me` is deliberately excluded
even though it has a site ID — it serves `/api/script.js` (JS) and
`/api/track` (JSON), both of which fail the Worker's `text/html`
content-type guard anyway. Leaving it routed just burned invocations.
BEFORE AFTER
────────────────────────── ──────────────────────────────────
viktorbarzin.me/* ┐ viktorbarzin.me/* ┐
*.viktorbarzin.me/* ┘ www.viktorbarzin.me/* │
actualbudget.vb.me/* │
→ matches ~119 hosts ... (26 total) │ → matches
→ ~97k Worker inv/day stirling-pdf.vb.me/* │ only 26
→ rybbit → self-amplifies vaultwarden.vb.me/* ┘ specific
hosts
rybbit.vb.me INTENTIONALLY
EXCLUDED (self-amplifier)
Deployment is unchanged — this Worker is not in Terraform. Deploy from
`stacks/rybbit/worker/` via:
CLOUDFLARE_EMAIL=vbarzin@gmail.com \
CLOUDFLARE_API_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) \
npx --yes wrangler@latest deploy
`wrangler deploy` replaces all worker routes on the zone with the list from
`wrangler.toml`, so there is no cleanup step. Already deployed today as
version `d7f83980-a499-40f5-ba55-f8e18d531863` — this commit just captures
the source of truth in git.
## What is NOT in this change
- Self-hosted injection (nginx `sub_filter` sidecar, compiled Traefik
plugin). Deferred — revisit only if analytics traffic grows past 80k/day
again, or if we add more high-traffic hosts to `SITE_IDS`.
- Cloudflare Workers Paid plan ($5/mo for 10M requests). User declined.
- Moving the Worker into Terraform. Out of scope.
- Any Rybbit backend/frontend changes. Rybbit itself continues running.
## Test plan
### Automated
Post-deploy CF API enumeration of zone routes:
$ curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \
"https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \
| jq -r '.result[] | "\(.pattern)\t→ \(.script)"' | wc -l
26
# Wildcards gone:
$ curl -s ... | jq -r '.result[].pattern' | grep -c '\*\.'
0
### Manual Verification
Script injection behaviour, verified via `curl`:
1. SITE_IDS host — script IS injected:
$ curl -s -L https://viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>'
<script src="https://rybbit.viktorbarzin.me/api/script.js"
data-site-id="da853a2438d0" defer>
$ curl -s -L https://calibre.viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>'
<script src="https://rybbit.viktorbarzin.me/api/script.js"
data-site-id="ce5f8aed6bbb" defer>
2. Non-SITE_IDS host — script NOT injected:
$ curl -s -L https://tuya-bridge.viktorbarzin.me/ | grep -c 'data-site-id'
0
3. `rybbit.viktorbarzin.me` bypasses Worker entirely — tracker returns raw JS:
$ curl -sI https://rybbit.viktorbarzin.me/api/script.js | grep -i content-type
content-type: application/javascript; charset=utf-8
### Reproduce locally
# 1. Confirm the Worker sees only the 26 narrowed routes.
CF_EMAIL=vbarzin@gmail.com
CF_KEY=$(vault kv get -field=cloudflare_api_key secret/platform)
ZONE_ID=fd2c5dd4efe8fe38958944e74d0ced6d
curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \
"https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \
| jq -r '.result[] | .pattern' | sort
# 2. 24h after deploy, re-check invocation count — expect < 80k.
curl -s https://api.cloudflare.com/client/v4/graphql \
-H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \
-H "Content-Type: application/json" \
-d '{"query":"query($acc:String!,$since:Time!,$until:Time!){viewer{accounts(filter:{accountTag:$acc}){workersInvocationsAdaptive(limit:100,filter:{datetime_geq:$since,datetime_leq:$until}){sum{requests} dimensions{scriptName date}}}}}",
"variables":{"acc":"02e035473cfc4834fb10c5d35470d8b4",
"since":"'"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)"'",
"until":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}'
Follow-up monitoring tracked in code-dka (P3, 3-day check).
Closes: code-l9b
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
a24cf8c689
commit
9a2e920006
1 changed files with 38 additions and 5 deletions
|
|
@ -2,10 +2,43 @@ name = "rybbit-analytics"
|
|||
main = "index.js"
|
||||
compatibility_date = "2024-01-01"
|
||||
|
||||
# Route: all Cloudflare-proxied *.viktorbarzin.me traffic.
|
||||
# The worker only injects into HTML responses; non-HTML passes through untouched.
|
||||
# Wildcard covers all proxied subdomains. The bare domain needs its own route.
|
||||
# Explicit per-host routes. Replaces the previous wildcard routes which hit all
|
||||
# ~119 proxied *.viktorbarzin.me hosts and burned the Workers free-tier quota
|
||||
# (100k/day). Only hostnames present in SITE_IDS (index.js) get a route; all
|
||||
# other proxied hosts bypass the Worker entirely.
|
||||
#
|
||||
# rybbit.viktorbarzin.me is INTENTIONALLY EXCLUDED even though it has a site ID
|
||||
# in index.js — it serves the tracker JS (/api/script.js) and event POSTs
|
||||
# (/api/track), which are JSON/JS not HTML, so the content-type guard in
|
||||
# index.js would no-op anyway. Leaving it routed would make every Rybbit event
|
||||
# spawn a Worker invocation for no value (self-amplification).
|
||||
#
|
||||
# When adding a new site to SITE_IDS in index.js, also add its route here.
|
||||
routes = [
|
||||
{ pattern = "viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "*.viktorbarzin.me/*", zone_name = "viktorbarzin.me" }
|
||||
{ pattern = "viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "www.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "actualbudget.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "crowdsec.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "cyberchef.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "dawarich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "pma.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "pgadmin.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "audiobookshelf.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "calibre.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "stacks.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "f1.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "frigate.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "highlights-immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "mail.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "navidrome.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "networking-toolbox.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "nextcloud.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "paperless-ngx.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "privatebin.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "wrongmove.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "send.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "stirling-pdf.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "uptime-kuma.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||
{ pattern = "vaultwarden.viktorbarzin.me/*", zone_name = "viktorbarzin.me" }
|
||||
]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue