[rybbit] Narrow CF Worker routes to SITE_IDS hosts — fix free-tier quota breach

## Context

The `rybbit-analytics` Cloudflare Worker hit the free-tier quota of 100k
requests/day. CF GraphQL analytics showed **97,153 invocations in the last
24h**, up from ~0 before 2026-04-17 21:26 UTC when Rybbit script injection
migrated off the broken Traefik rewrite-body plugin (Yaegi ResponseWriter
bug on Traefik v3.6.12) onto this Worker.

Root cause: `wrangler.toml` registered two wildcard routes
(`viktorbarzin.me/*` + `*.viktorbarzin.me/*`) which match every Cloudflare-
proxied request on the zone. Only 27 of ~119 proxied hostnames appear in
`SITE_IDS` in `index.js`; the rest burn Worker invocations for nothing since
`siteId` is `null` and the Worker no-ops. Worse, the wildcard caught
`rybbit.viktorbarzin.me` itself — every tracker `script.js` fetch and event
POST round-trip was spawning its own Worker invocation (self-amplification).

CF GraphQL per-host breakdown (last 24h, zone `viktorbarzin.me`):
- Top waste (NOT in SITE_IDS): tuya-bridge 96.6k, beadboard 55.8k,
  terminal 30.2k, authentik 19.9k, claude-memory 12.6k
- Sum of 27 SITE_IDS hosts: 47.2k
- `rybbit.viktorbarzin.me` self-amplifier: 782
- Projected post-narrow: 46.4k/day (52% reduction, well under quota)

## This change

Replaces the two wildcards with an explicit list of the **26** hostnames
present in `SITE_IDS`. `rybbit.viktorbarzin.me` is deliberately excluded
even though it has a site ID — it serves `/api/script.js` (JS) and
`/api/track` (JSON), both of which fail the Worker's `text/html`
content-type guard anyway. Leaving it routed just burned invocations.

    BEFORE                              AFTER
    ──────────────────────────          ──────────────────────────────────
    viktorbarzin.me/*          ┐        viktorbarzin.me/*          ┐
    *.viktorbarzin.me/*        ┘        www.viktorbarzin.me/*      │
                                        actualbudget.vb.me/*       │
    → matches ~119 hosts                ... (26 total)             │ → matches
    → ~97k Worker inv/day                stirling-pdf.vb.me/*      │   only 26
    → rybbit → self-amplifies            vaultwarden.vb.me/*       ┘   specific
                                                                        hosts
                                        rybbit.vb.me INTENTIONALLY
                                        EXCLUDED (self-amplifier)

Deployment is unchanged — this Worker is not in Terraform. Deploy from
`stacks/rybbit/worker/` via:

    CLOUDFLARE_EMAIL=vbarzin@gmail.com \
    CLOUDFLARE_API_KEY=$(vault kv get -field=cloudflare_api_key secret/platform) \
    npx --yes wrangler@latest deploy

`wrangler deploy` replaces all worker routes on the zone with the list from
`wrangler.toml`, so there is no cleanup step. Already deployed today as
version `d7f83980-a499-40f5-ba55-f8e18d531863` — this commit just captures
the source of truth in git.

## What is NOT in this change

- Self-hosted injection (nginx `sub_filter` sidecar, compiled Traefik
  plugin). Deferred — revisit only if analytics traffic grows past 80k/day
  again, or if we add more high-traffic hosts to `SITE_IDS`.
- Cloudflare Workers Paid plan ($5/mo for 10M requests). User declined.
- Moving the Worker into Terraform. Out of scope.
- Any Rybbit backend/frontend changes. Rybbit itself continues running.

## Test plan

### Automated

Post-deploy CF API enumeration of zone routes:

    $ curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \
        "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \
      | jq -r '.result[] | "\(.pattern)\t→ \(.script)"' | wc -l
    26

    # Wildcards gone:
    $ curl -s ... | jq -r '.result[].pattern' | grep -c '\*\.'
    0

### Manual Verification

Script injection behaviour, verified via `curl`:

1. SITE_IDS host — script IS injected:

       $ curl -s -L https://viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>'
       <script src="https://rybbit.viktorbarzin.me/api/script.js"
         data-site-id="da853a2438d0" defer>

       $ curl -s -L https://calibre.viktorbarzin.me/ | grep -oE '<script[^>]*rybbit[^>]*>'
       <script src="https://rybbit.viktorbarzin.me/api/script.js"
         data-site-id="ce5f8aed6bbb" defer>

2. Non-SITE_IDS host — script NOT injected:

       $ curl -s -L https://tuya-bridge.viktorbarzin.me/ | grep -c 'data-site-id'
       0

3. `rybbit.viktorbarzin.me` bypasses Worker entirely — tracker returns raw JS:

       $ curl -sI https://rybbit.viktorbarzin.me/api/script.js | grep -i content-type
       content-type: application/javascript; charset=utf-8

### Reproduce locally

    # 1. Confirm the Worker sees only the 26 narrowed routes.
    CF_EMAIL=vbarzin@gmail.com
    CF_KEY=$(vault kv get -field=cloudflare_api_key secret/platform)
    ZONE_ID=fd2c5dd4efe8fe38958944e74d0ced6d
    curl -s -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \
      "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/workers/routes" \
      | jq -r '.result[] | .pattern' | sort

    # 2. 24h after deploy, re-check invocation count — expect < 80k.
    curl -s https://api.cloudflare.com/client/v4/graphql \
      -H "X-Auth-Email: $CF_EMAIL" -H "X-Auth-Key: $CF_KEY" \
      -H "Content-Type: application/json" \
      -d '{"query":"query($acc:String!,$since:Time!,$until:Time!){viewer{accounts(filter:{accountTag:$acc}){workersInvocationsAdaptive(limit:100,filter:{datetime_geq:$since,datetime_leq:$until}){sum{requests} dimensions{scriptName date}}}}}",
           "variables":{"acc":"02e035473cfc4834fb10c5d35470d8b4",
                        "since":"'"$(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ)"'",
                        "until":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'"}}'

Follow-up monitoring tracked in code-dka (P3, 3-day check).

Closes: code-l9b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-18 13:23:15 +00:00
parent a24cf8c689
commit 9a2e920006

View file

@ -2,10 +2,43 @@ name = "rybbit-analytics"
main = "index.js"
compatibility_date = "2024-01-01"
# Route: all Cloudflare-proxied *.viktorbarzin.me traffic.
# The worker only injects into HTML responses; non-HTML passes through untouched.
# Wildcard covers all proxied subdomains. The bare domain needs its own route.
# Explicit per-host routes. Replaces the previous wildcard routes which hit all
# ~119 proxied *.viktorbarzin.me hosts and burned the Workers free-tier quota
# (100k/day). Only hostnames present in SITE_IDS (index.js) get a route; all
# other proxied hosts bypass the Worker entirely.
#
# rybbit.viktorbarzin.me is INTENTIONALLY EXCLUDED even though it has a site ID
# in index.js — it serves the tracker JS (/api/script.js) and event POSTs
# (/api/track), which are JSON/JS not HTML, so the content-type guard in
# index.js would no-op anyway. Leaving it routed would make every Rybbit event
# spawn a Worker invocation for no value (self-amplification).
#
# When adding a new site to SITE_IDS in index.js, also add its route here.
routes = [
{ pattern = "viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "*.viktorbarzin.me/*", zone_name = "viktorbarzin.me" }
{ pattern = "viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "www.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "actualbudget.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "crowdsec.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "cyberchef.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "dawarich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "pma.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "pgadmin.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "audiobookshelf.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "calibre.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "stacks.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "f1.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "frigate.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "highlights-immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "mail.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "navidrome.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "networking-toolbox.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "nextcloud.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "paperless-ngx.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "privatebin.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "wrongmove.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "send.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "stirling-pdf.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "uptime-kuma.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
{ pattern = "vaultwarden.viktorbarzin.me/*", zone_name = "viktorbarzin.me" }
]