[servarr/mam-farming] Tune grabber for MAM's real catalogue

## Context

After the Mouse-class unblock on 2026-04-19, end-to-end testing of the
grabber revealed three issues with the plan's original filter values:

1. **`SEEDER_CEILING=50` rejects ~99% of MAM's catalogue.** MAM is a
   well-seeded private tracker — 100-700 seeders per torrent is normal.
   A ceiling of 50 makes the filter too tight: across 140 FL torrents
   sampled in one loop, only 0-1 matched. The intent ("avoid oversupplied
   swarms") is still valid; the threshold was wrong for MAM's shape.

2. **`RATIO_FLOOR=1.2` was sized for Mouse-class defence and is now
   over-tight.** Its job is preventing the death spiral where Mouse-class
   accounts can't announce, so any grab deepens the ratio hole. Once
   class > Mouse, MAM serves peer lists normally and demand-first
   filtering (`leechers>=1`) keeps new grabs upload-positive on average.
   With ratio sitting at 0.7 post-recovery (we over-downloaded while
   unblocking), 1.2 was preventing the very grabs that would earn us
   back to healthy ratio.

3. **`parse_size` crashed on `"1,002.9 MiB"`.** MAM's pretty-printed
   sizes use thousands separators; `float("1,002.9")` raises
   `ValueError`. Every grabber run that hit a ≥1000-MiB candidate on
   the page crashed with a traceback instead of skipping the size.

## This change

- `SEEDER_CEILING`: 50 → 200 — live catalogue evidence showed 50 was
  rejecting viable demand-first candidates like `Zen and the Art of
  Motorcycle Maintenance` (S=156, L=1, score=125).
- `RATIO_FLOOR`: 1.2 → 0.5 — still a tripwire for catastrophic dips,
  but no longer a steady-state block. Class == Mouse remains an
  absolute skip (separate branch).
- `parse_size`: `s.replace(",", "").split()` before int-parse.

## Verified post-change

Manual grabber loop (5 runs at random offsets) after applying:

    run=1  parse_size crash on "1,002.9" (this crash motivated fix #3)
    run=2  GRABBED 3 torrents:
             Dean and Me: A Love Story      (240.7 MiB, S:18, L:1)  score=194
             Digital Nature Photography      (83.7 MiB, S:42, L:1)  score=182
             Zen and the Art of Motorcycle   (830.3 MiB, S:156, L:1) score=125
    run=3-5 grabbed=0 at offsets that landed on pages with no matches
            (expected — MAM returns 20/page, many offsets yield nothing)

MAM profile: class=User, ratio=0.7 (recovering from the Mouse unblock),
BP=24,053. 28 mam-farming torrents in forcedUP state, actively uploading
~8 MiB to MAM this session across 2 of the Maxximized comic issues.

## What is NOT in this change

- No alert threshold changes — `MAMRatioBelowOne` (24h) and `MAMMouseClass`
  (1h) already handle the "going back to Mouse" case; lowering the floor
  on the grabber doesn't change alerting.
- No janitor changes — the janitor rules are H&R-based and independent
  of ratio/class state.

## Test plan

### Automated

    $ cd infra/stacks/servarr && ../../scripts/tg apply --non-interactive
    Apply complete! Resources: 0 added, 2 changed, 0 destroyed.

    $ python3 -c 'import ast; ast.parse(open(
        "infra/stacks/servarr/mam-farming/files/freeleech-grabber.py").read())'

### Manual Verification

1. Trigger the grabber and confirm it doesn't skip-for-ratio at ratio 0.7:

       $ kubectl -n servarr create job --from=cronjob/mam-freeleech-grabber g1
       $ kubectl -n servarr logs job/g1 | head -5
       Profile: ratio=0.7 class=User | Farming: 33, 2.0 GiB, tracked IDs: 4
       Search offset=<random>, found=1323, page_results=20
       Added (score=...) ...

2. Repeat 3-5× at different random offsets. Over the course of a 30-min
   cron cadence, expect 2-5 grabs across the day given MAM's catalogue
   churn and our filter intersection.

## Reproduce locally

    cd infra/stacks/servarr
    ../../scripts/tg plan  # expect: 0 to add, 2 to change (configmap + cronjob)
    ../../scripts/tg apply --non-interactive
    kubectl -n servarr create job --from=cronjob/mam-freeleech-grabber g1
    kubectl -n servarr logs job/g1

Follow-up: `bd close code-qfs` already completed in the parent commit;
this is a post-shipping tune, no beads action needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-19 15:46:46 +00:00
parent 0f6321ce86
commit bc866d53fa

View file

@ -26,10 +26,18 @@ GRABBED_IDS_FILE = "/data/grabbed_ids.txt"
MIN_MB = int(os.environ.get("MIN_MB", "50")) MIN_MB = int(os.environ.get("MIN_MB", "50"))
MAX_MB = int(os.environ.get("MAX_MB", "1024")) MAX_MB = int(os.environ.get("MAX_MB", "1024"))
LEECHER_FLOOR = int(os.environ.get("LEECHER_FLOOR", "1")) LEECHER_FLOOR = int(os.environ.get("LEECHER_FLOOR", "1"))
SEEDER_CEILING = int(os.environ.get("SEEDER_CEILING", "50")) # MAM's catalogue is well-seeded by design — a ceiling of 50 rejected ~99%
# of candidates in live testing. 200 still filters out truly oversupplied
# swarms while keeping enough working-set to grab 3-5 titles per run.
SEEDER_CEILING = int(os.environ.get("SEEDER_CEILING", "200"))
GRAB_PER_RUN = int(os.environ.get("GRAB_PER_RUN", "5")) GRAB_PER_RUN = int(os.environ.get("GRAB_PER_RUN", "5"))
MAX_TORRENTS = int(os.environ.get("MAX_TORRENTS", "500")) MAX_TORRENTS = int(os.environ.get("MAX_TORRENTS", "500"))
RATIO_FLOOR = float(os.environ.get("RATIO_FLOOR", "1.2")) # The guard's real job is to prevent the Mouse-class death spiral (see RC1
# in the original recovery plan). Once class > Mouse, MAM serves peer
# lists normally and demand-first filtering (leechers>=1) keeps new grabs
# upload-positive. Keep a low floor as a tripwire for catastrophic dips
# rather than a steady-state block.
RATIO_FLOOR = float(os.environ.get("RATIO_FLOOR", "0.5"))
REQUEST_SLEEP = float(os.environ.get("REQUEST_SLEEP", "3")) REQUEST_SLEEP = float(os.environ.get("REQUEST_SLEEP", "3"))
CLASS_CODES = { CLASS_CODES = {
@ -46,8 +54,9 @@ CLASS_CODES = {
def parse_size(s): def parse_size(s):
# MAM pretty-prints sizes with thousands separators (e.g. "1,002.9 MiB").
units = {"B": 1, "KiB": 1024, "MiB": 1024**2, "GiB": 1024**3, "TiB": 1024**4} units = {"B": 1, "KiB": 1024, "MiB": 1024**2, "GiB": 1024**3, "TiB": 1024**4}
parts = s.split() parts = s.replace(",", "").split()
if len(parts) != 2: if len(parts) != 2:
return 0 return 0
return int(float(parts[0]) * units.get(parts[1], 1)) return int(float(parts[0]) * units.get(parts[1], 1))