infra/stacks/servarr
Viktor Barzin 789cb61310 [servarr] Rewrite MAM ratio farming — break Mouse death spiral, adopt in TF
## Context

A MAM (MyAnonamouse) freeleech farming workflow was deployed on 2026-04-14
via kubectl apply (outside Terraform). Five days later the account was
still stuck in Mouse class: 715 MiB downloaded, 0 uploaded, ratio 0.
Tracker responses on 7 of 9 active torrents returned
`status=4 | msg="User currently mouse rank, you need to get your ratio up!"`
— MAM was actively refusing to serve peer lists because the account was
in Mouse class, and refusing to serve peer lists made the ratio impossible
to recover. Meanwhile the grabber kept digging: 501 torrents sat in
qBittorrent, 0 completed, 0 bytes uploaded.

Root causes (ranked):
1. Death spiral — Mouse class blocks announces, nothing uploads.
2. BP-spender 30 000 BP threshold blocked the only exit even though the
   account already had 24 500 BP.
3. Grabber selection (`score = 1.0 / (seeders+1)`) preferred low-demand
   torrents filtered to <100 MiB — ratio-hostile by design.
4. Grabber/cleanup deadlock: cleanup only fired on seed_time > 3d, so
   torrents that never started never qualified. Combined with the 500-
   torrent cap this stalled the grabber indefinitely.
5. qBittorrent queueing amplified (4) — 495/501 stuck in queuedDL.
6. Ratio-monitor labelled queued torrents `unknown` (empty tracker
   field), hiding the problem on the MAM Grafana panel.
7. qBittorrent memory limit (256 Mi LimitRange default) too low.
8. All of the above was Terraform drift with no reviewability.

## This change

Introduces `stacks/servarr/mam-farming/` — a new TF module that adopts
the three kubectl-applied resources and replaces their scripts with
demand-first, H&R-aware logic. Also bumps qBittorrent resources, fixes
ratio-monitor labelling, and adds five Prometheus alerts plus a Grafana
panel row.

### Architecture

    MAM API ───┬─── jsonLoad.php (profile: ratio, class, BP)
               ├─── loadSearchJSONbasic.php (freeleech search)
               ├─── bonusBuy.php (50 GiB min tier for API)
               └─── download.php (torrent file)
                               │
    Pushgateway <──┬────────────┤
                   │  mam_ratio            ┌────────────────────┐
                   │  mam_class_code       │ freeleech-grabber  │ */30
                   │  mam_bp_balance   ◄───│  (ratio-guarded)   │
                   │  mam_farming_*        └──────────┬─────────┘
                   │  mam_janitor_*                   │ adds to
                   │                                  ▼
                   │  Grafana panels      qBittorrent (mam-farming)
                   │  + 5 alerts                      ▲
                   │                                  │ deletes by rule
                   │                       ┌──────────┴─────────┐
                   │                   ◄───│ farming-janitor    │ */15
                   │                       │  (H&R-aware)       │
                   │                       └──────────┬─────────┘
                   │                                  │ buys credit
                   │                       ┌──────────┴─────────┐
                   └───────────────────────│ bp-spender         │ 0 */6
                                           │  (tier-aware)      │
                                           └────────────────────┘

### Key decisions

- **Ratio guard on grabber** — refuse to grab if ratio < 1.2 OR class ==
  Mouse. Prevents the death spiral from deepening. Emits
  `mam_grabber_skipped_reason{reason=...}` and exits clean.
- **Demand-first selection** — new score formula
  `leechers*3 - seeders*0.5 + 200 if freeleech_wedge else 0`; size band
  50 MiB – 1 GiB; leecher floor 1; seeder ceiling 50. Picks titles that
  will actually upload.
- **Janitor decoupled from grabber** — runs every 15 min regardless of
  the ratio-guard state. Without this, stuck torrents accumulate
  fastest exactly when the grabber is skipping (Mouse class). H&R-aware:
  never deletes `progress==1.0 AND seeding_time < 72h`. Six delete
  reasons observable via `mam_janitor_deleted_per_run{reason=...}`.
- **BP-spender tier-aware** — MAM imposes a hard 50 GiB minimum on API
  buyers ("Automated spenders are limited to buying at least 50 GB...
  due to log spam"). Valid API tiers: 50/100/200/500 GiB at 500 BP/GiB.
  The spender picks the smallest tier that satisfies the ratio deficit
  AND fits the budget, preserving a 500 BP reserve. If even the 50 GiB
  tier is too expensive, it skips and retries on the next 6-hour cron.
- **Authoritative metrics use MAM profile fields** —
  `downloaded_bytes` / `uploaded_bytes` (integers) rather than the
  pretty-printed `downloaded` / `uploaded` strings like "715.55 MiB"
  that MAM also returns.
- **Ratio-monitor category-first labelling** — `tracker` is empty for
  queued torrents that never announced. Now maps `category==mam-farming`
  to label `mam` first, only falls back to tracker-URL parsing when
  category is absent. Stops hundreds of MAM torrents collecting under
  `unknown`.
- **qBittorrent resources bumped** to `requests=512Mi / limits=1Gi` so
  hundreds of active torrents don't OOM.

### Emergency recovery performed this session

1. Adopted 5 in-cluster resources via root-module `import {}` blocks
   (Terraform 1.5+ rejects imports inside child modules).
2. Ran the janitor in DRY_RUN=1 to verify rules against live state —
   466 `never_started` candidates, 0 false positives in any other
   reason bucket. Flipped to enforce mode.
3. Janitor deleted 466 stuck torrents (matches plan's ~495 target; 35
   preserved as active/in-progress).
4. Truncated `/data/grabbed_ids.txt` so newly-popular titles become
   eligible again.

The ratio is still 0 because the API cannot buy below 50 GiB and the
account sits at 24 551 BP (needs 25 000). Manual 1 GiB purchase via the
MAM web UI — 500 BP — would immediately lift the account to ratio ≈ 1.4
and unblock announces. Future automation cannot do this for us due to
MAMs anti-spam rule.

### What is NOT in this change

- qBittorrent prefs reconciliation (max_active_downloads=20,
  max_active_uploads=150, max_active_torrents=150). The plan wanted
  this; deferred to a follow-up because the janitor + ratio recovery
  handles the 500-torrent backlog first. A small reconciler CronJob
  posting to /api/v2/app/setPreferences is the intended follow-up.
- VIP purchase (~100 k BP) — deferred until BP accumulates.
- Cross-seed / autobrr — separate initiative.

## Alerts added

- P1 MAMMouseClass — `mam_class_code == 0` for 1h
- P1 MAMCookieExpired — `mam_farming_cookie_expired > 0`
- P2 MAMRatioBelowOne — `mam_ratio < 1.0` for 24h (replaces old
  QBittorrentMAMRatioLow, now driven by authoritative profile metric)
- P2 MAMFarmingStuck — no grabs in 4h while ratio is healthy
- P2 MAMJanitorStuckBacklog — `skipped_active > 400` for 6h

## Test plan

### Automated

    $ cd infra/stacks/servarr && ../../scripts/tg plan 2>&1 | grep Plan
    Plan: 5 to import, 2 to add, 6 to change, 0 to destroy.

    $ ../../scripts/tg apply --non-interactive
    Apply complete! Resources: 5 imported, 2 added, 6 changed, 0 destroyed.

    # Re-plan after import block removal (idempotent)
    $ ../../scripts/tg plan 2>&1 | grep Plan
    Plan: 0 to add, 1 to change, 0 to destroy.
    # The 1 change is a pre-existing MetalLB annotation drift on the
    # qbittorrent-torrenting Service — unrelated to this change.

    $ cd ../monitoring && ../../scripts/tg apply --non-interactive
    Apply complete! Resources: 0 added, 2 changed, 0 destroyed.

    # Python + JSON syntax
    $ python3 -c 'import ast; [ast.parse(open(p).read()) for p in [
        "infra/stacks/servarr/mam-farming/files/freeleech-grabber.py",
        "infra/stacks/servarr/mam-farming/files/bp-spender.py",
        "infra/stacks/servarr/mam-farming/files/mam-farming-janitor.py"]]'
    $ python3 -c 'import json; json.load(open(
        "infra/stacks/monitoring/modules/monitoring/dashboards/qbittorrent.json"))'

### Manual Verification

1. Grabber ratio-guard path:

       $ kubectl -n servarr create job --from=cronjob/mam-freeleech-grabber g1
       $ kubectl -n servarr logs job/g1
       Skip grab: ratio=0.0 class=Mouse (floor=1.2) reason=mouse_class

2. BP-spender tier path:

       $ kubectl -n servarr create job --from=cronjob/mam-bp-spender s1
       $ kubectl -n servarr logs job/s1
       Profile: ratio=0.0 class=Mouse DL=0.70 GiB UL=0.00 GiB BP=24551
         | deficit=1.40 GiB needed=3 affordable=48 buy=0
       Done: BP=24551, spent=0 GiB (needed=3, affordable=48)

   Correctly skips because affordable (48) < smallest API tier (50).

3. Janitor in enforce mode:

       $ kubectl -n servarr create job --from=cronjob/mam-farming-janitor j1
       $ kubectl -n servarr logs job/j1 | tail -3
       Done: deleted=466 preserved_hnr=0 skipped_active=35 dry_run=False
         per reason: {'never_started': 466, ...}

   Second run immediately after: `deleted=0 skipped_active=35` —
   steady state with only active/seeding torrents left.

4. Alerts loaded:

       $ kubectl -n monitoring get cm prometheus-server \
           -o jsonpath='{.data.alerting_rules\.yml}' \
           | grep -E "alert: MAM|alert: QBittorrent"
         - alert: MAMMouseClass
         - alert: MAMCookieExpired
         - alert: MAMRatioBelowOne
         - alert: MAMFarmingStuck
         - alert: MAMJanitorStuckBacklog
         - alert: QBittorrentDisconnected
         - alert: QBittorrentMAMUnsatisfied

5. Dashboard: browse to Grafana "qBittorrent - Seeding & Ratio" → new
   "MAM Profile (from jsonLoad.php)" row at the bottom shows class, BP
   balance, profile ratio, transfer, BP-vs-reserve timeseries, janitor
   deletion stacked chart, janitor state stat, grabber state stat.

## Reproduce locally

1. `cd infra/stacks/servarr && ../../scripts/tg plan` — expect
   0 add / 1 change (unrelated MetalLB annotation drift).
2. `kubectl -n servarr get cronjobs` — expect three:
   mam-freeleech-grabber, mam-bp-spender, mam-farming-janitor.
3. Trigger each via `kubectl create job --from=cronjob/<name> <job>`
   and read logs; outputs match the manual-verification snippets above.

Closes: code-qfs
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:45:38 +00:00
..
aiostreams [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
flaresolverr [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
lidarr [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
listenarr [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
mam-farming [servarr] Rewrite MAM ratio farming — break Mouse death spiral, adopt in TF 2026-04-19 11:45:38 +00:00
prowlarr [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
qbittorrent [servarr] Rewrite MAM ratio farming — break Mouse death spiral, adopt in TF 2026-04-19 11:45:38 +00:00
readarr [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
soulseek [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
.terraform.lock.hcl [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
backend.tf [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
main.tf [servarr] Rewrite MAM ratio farming — break Mouse death spiral, adopt in TF 2026-04-19 11:45:38 +00:00
providers.tf [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
secrets [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
terragrunt.hcl migrate all secrets from SOPS to Vault KV 2026-03-14 17:15:48 +00:00