infra/stacks/rybbit/worker/wrangler.toml

45 lines
3.1 KiB
TOML
Raw Normal View History

name = "rybbit-analytics"
main = "index.js"
compatibility_date = "2024-01-01"
[rybbit] Narrow CF Worker routes to SITE_IDS hosts — fix free-tier quota breach ## Context The `rybbit-analytics` Cloudflare Worker hit the free-tier quota of 100k requests/day. CF GraphQL analytics showed **97,153 invocations in the last 24h**, up from ~0 before 2026-04-17 21:26 UTC when Rybbit script injection migrated off the broken Traefik rewrite-body plugin (Yaegi ResponseWriter bug on Traefik v3.6.12) onto this Worker. Root cause: `wrangler.toml` registered two wildcard routes (`viktorbarzin.me/*` + `*.viktorbarzin.me/*`) which match every Cloudflare- proxied request on the zone. Only 27 of ~119 proxied hostnames appear in `SITE_IDS` in `index.js`; the rest burn Worker invocations for nothing since `siteId` is `null` and the Worker no-ops. Worse, the wildcard caught `rybbit.viktorbarzin.me` itself — every tracker `script.js` fetch and event POST round-trip was spawning its own Worker invocation (self-amplification). CF GraphQL per-host breakdown (last 24h, zone `viktorbarzin.me`): - Top waste (NOT in SITE_IDS): tuya-bridge 96.6k, beadboard 55.8k, terminal 30.2k, authentik 19.9k, claude-memory 12.6k - Sum of 27 SITE_IDS hosts: 47.2k - `rybbit.viktorbarzin.me` self-amplifier: 782 - Projected post-narrow: 46.4k/day (52% reduction, well under quota) ## This change Replaces the two wildcards with an explicit list of the **26** hostnames present in `SITE_IDS`. `rybbit.viktorbarzin.me` is deliberately excluded even though it has a site ID — it serves `/api/script.js` (JS) and `/api/track` (JSON), both of which fail the Worker's `text/html` content-type guard anyway. Leaving it routed just burned invocations. BEFORE AFTER ────────────────────────── ────────────────────────────────── viktorbarzin.me/* ┐ viktorbarzin.me/* ┐ *.viktorbarzin.me/* ┘ www.viktorbarzin.me/* │ actualbudget.vb.me/* │ → matches ~119 hosts ... (26 total) │ → matches → ~97k Worker inv/day stirling-pdf.vb.me/* │ only 26 → rybbit → self-amplifies vaultwarden.vb.me/* ┘ specific hosts rybbit.vb.me INTENTIONALLY EXCLUDED (self-amplifier) Deployment is unchanged — this Worker is not in Terraform. Deploy from `stacks/rybbit/worker/` via: CLOUDFLARE_EMAIL=vbarzin@gmail.com \ CLOUDFLARE_API_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) \ npx --yes wrangler@latest deploy `wrangler deploy` replaces all worker routes on the zone with the list from `wrangler.toml`, so there is no cleanup step. Already deployed today as version `d7f83980-a499-40f5-ba55-f8e18d531863` — this commit just captures the source of truth in git. ## What is NOT in this change - Self-hosted injection (nginx `sub_filter` sidecar, compiled Traefik plugin). Deferred — revisit only if analytics traffic grows past 80k/day again, or if we add more high-traffic hosts to `SITE_IDS`. - Cloudflare Workers Paid plan ($5/mo for 10M requests). User declined. - Moving the Worker into Terraform. Out of scope. - Any Rybbit backend/frontend changes. Rybbit itself continues running. ## Test plan ### Automated Post-deploy CF API enumeration of zone routes: $ curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \ | jq -r '.result[] | "\(.pattern)\t→ \(.script)"' | wc -l 26 # Wildcards gone: $ curl -s ... | jq -r '.result[].pattern' | grep -c '\*\.' 0 ### Manual Verification Script injection behaviour, verified via `curl`: 1. SITE_IDS host — script IS injected: $ curl -s -L https://viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>' <script src="https://rybbit.viktorbarzin.me/api/script.js" data-site-id="da853a2438d0" defer> $ curl -s -L https://calibre.viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>' <script src="https://rybbit.viktorbarzin.me/api/script.js" data-site-id="ce5f8aed6bbb" defer> 2. Non-SITE_IDS host — script NOT injected: $ curl -s -L https://tuya-bridge.viktorbarzin.me/ | grep -c 'data-site-id' 0 3. `rybbit.viktorbarzin.me` bypasses Worker entirely — tracker returns raw JS: $ curl -sI https://rybbit.viktorbarzin.me/api/script.js | grep -i content-type content-type: application/javascript; charset=utf-8 ### Reproduce locally # 1. Confirm the Worker sees only the 26 narrowed routes. CF_EMAIL=vbarzin@gmail.com CF_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) ZONE_ID=fd2c5dd4efe8fe38958944e74d0ced6d curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \ | jq -r '.result[] | .pattern' | sort # 2. 24h after deploy, re-check invocation count — expect < 80k. curl -s https://api.cloudflare.com/client/v4/graphql \ -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ -H "Content-Type: application/json" \ -d '{"query":"query($acc:String!,$since:Time!,$until:Time!){viewer{accounts(filter:{accountTag:$acc}){workersInvocationsAdaptive(limit:100,filter:{datetime_geq:$since,datetime_leq:$until}){sum{requests} dimensions{scriptName date}}}}}", "variables":{"acc":"02e035473cfc4834fb10c5d35470d8b4", "since":"'"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)"'", "until":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}' Follow-up monitoring tracked in code-dka (P3, 3-day check). Closes: code-l9b Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:23:15 +00:00
# Explicit per-host routes. Replaces the previous wildcard routes which hit all
# ~119 proxied *.viktorbarzin.me hosts and burned the Workers free-tier quota
# (100k/day). Only hostnames present in SITE_IDS (index.js) get a route; all
# other proxied hosts bypass the Worker entirely.
#
# rybbit.viktorbarzin.me is INTENTIONALLY EXCLUDED even though it has a site ID
# in index.js — it serves the tracker JS (/api/script.js) and event POSTs
# (/api/track), which are JSON/JS not HTML, so the content-type guard in
# index.js would no-op anyway. Leaving it routed would make every Rybbit event
# spawn a Worker invocation for no value (self-amplification).
#
# When adding a new site to SITE_IDS in index.js, also add its route here.
routes = [
[rybbit] Narrow CF Worker routes to SITE_IDS hosts — fix free-tier quota breach ## Context The `rybbit-analytics` Cloudflare Worker hit the free-tier quota of 100k requests/day. CF GraphQL analytics showed **97,153 invocations in the last 24h**, up from ~0 before 2026-04-17 21:26 UTC when Rybbit script injection migrated off the broken Traefik rewrite-body plugin (Yaegi ResponseWriter bug on Traefik v3.6.12) onto this Worker. Root cause: `wrangler.toml` registered two wildcard routes (`viktorbarzin.me/*` + `*.viktorbarzin.me/*`) which match every Cloudflare- proxied request on the zone. Only 27 of ~119 proxied hostnames appear in `SITE_IDS` in `index.js`; the rest burn Worker invocations for nothing since `siteId` is `null` and the Worker no-ops. Worse, the wildcard caught `rybbit.viktorbarzin.me` itself — every tracker `script.js` fetch and event POST round-trip was spawning its own Worker invocation (self-amplification). CF GraphQL per-host breakdown (last 24h, zone `viktorbarzin.me`): - Top waste (NOT in SITE_IDS): tuya-bridge 96.6k, beadboard 55.8k, terminal 30.2k, authentik 19.9k, claude-memory 12.6k - Sum of 27 SITE_IDS hosts: 47.2k - `rybbit.viktorbarzin.me` self-amplifier: 782 - Projected post-narrow: 46.4k/day (52% reduction, well under quota) ## This change Replaces the two wildcards with an explicit list of the **26** hostnames present in `SITE_IDS`. `rybbit.viktorbarzin.me` is deliberately excluded even though it has a site ID — it serves `/api/script.js` (JS) and `/api/track` (JSON), both of which fail the Worker's `text/html` content-type guard anyway. Leaving it routed just burned invocations. BEFORE AFTER ────────────────────────── ────────────────────────────────── viktorbarzin.me/* ┐ viktorbarzin.me/* ┐ *.viktorbarzin.me/* ┘ www.viktorbarzin.me/* │ actualbudget.vb.me/* │ → matches ~119 hosts ... (26 total) │ → matches → ~97k Worker inv/day stirling-pdf.vb.me/* │ only 26 → rybbit → self-amplifies vaultwarden.vb.me/* ┘ specific hosts rybbit.vb.me INTENTIONALLY EXCLUDED (self-amplifier) Deployment is unchanged — this Worker is not in Terraform. Deploy from `stacks/rybbit/worker/` via: CLOUDFLARE_EMAIL=vbarzin@gmail.com \ CLOUDFLARE_API_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) \ npx --yes wrangler@latest deploy `wrangler deploy` replaces all worker routes on the zone with the list from `wrangler.toml`, so there is no cleanup step. Already deployed today as version `d7f83980-a499-40f5-ba55-f8e18d531863` — this commit just captures the source of truth in git. ## What is NOT in this change - Self-hosted injection (nginx `sub_filter` sidecar, compiled Traefik plugin). Deferred — revisit only if analytics traffic grows past 80k/day again, or if we add more high-traffic hosts to `SITE_IDS`. - Cloudflare Workers Paid plan ($5/mo for 10M requests). User declined. - Moving the Worker into Terraform. Out of scope. - Any Rybbit backend/frontend changes. Rybbit itself continues running. ## Test plan ### Automated Post-deploy CF API enumeration of zone routes: $ curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \ | jq -r '.result[] | "\(.pattern)\t→ \(.script)"' | wc -l 26 # Wildcards gone: $ curl -s ... | jq -r '.result[].pattern' | grep -c '\*\.' 0 ### Manual Verification Script injection behaviour, verified via `curl`: 1. SITE_IDS host — script IS injected: $ curl -s -L https://viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>' <script src="https://rybbit.viktorbarzin.me/api/script.js" data-site-id="da853a2438d0" defer> $ curl -s -L https://calibre.viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>' <script src="https://rybbit.viktorbarzin.me/api/script.js" data-site-id="ce5f8aed6bbb" defer> 2. Non-SITE_IDS host — script NOT injected: $ curl -s -L https://tuya-bridge.viktorbarzin.me/ | grep -c 'data-site-id' 0 3. `rybbit.viktorbarzin.me` bypasses Worker entirely — tracker returns raw JS: $ curl -sI https://rybbit.viktorbarzin.me/api/script.js | grep -i content-type content-type: application/javascript; charset=utf-8 ### Reproduce locally # 1. Confirm the Worker sees only the 26 narrowed routes. CF_EMAIL=vbarzin@gmail.com CF_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) ZONE_ID=fd2c5dd4efe8fe38958944e74d0ced6d curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \ | jq -r '.result[] | .pattern' | sort # 2. 24h after deploy, re-check invocation count — expect < 80k. curl -s https://api.cloudflare.com/client/v4/graphql \ -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \ -H "Content-Type: application/json" \ -d '{"query":"query($acc:String!,$since:Time!,$until:Time!){viewer{accounts(filter:{accountTag:$acc}){workersInvocationsAdaptive(limit:100,filter:{datetime_geq:$since,datetime_leq:$until}){sum{requests} dimensions{scriptName date}}}}}", "variables":{"acc":"02e035473cfc4834fb10c5d35470d8b4", "since":"'"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)"'", "until":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}' Follow-up monitoring tracked in code-dka (P3, 3-day check). Closes: code-l9b Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:23:15 +00:00
{ pattern = "viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "www.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "actualbudget.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "crowdsec.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "cyberchef.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "dawarich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "pma.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "pgadmin.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "audiobookshelf.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "calibre.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "stacks.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "f1.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "frigate.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "highlights-immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "mail.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "navidrome.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "networking-toolbox.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "nextcloud.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "paperless-ngx.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "privatebin.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "wrongmove.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "send.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "stirling-pdf.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "uptime-kuma.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "vaultwarden.viktorbarzin.me/*", zone_name = "viktorbarzin.me" }
]