mam-farming: make MAMFarmingStuck a grabber heartbeat, not a grab-count check
Some checks failed
ci/woodpecker/push/default Pipeline failed

MAMFarmingStuck fired whenever the freeleech grabber added 0 torrents in 4h, but
grabbing 0 is normal: the grabber searches a random catalogue offset each run and
legitimately finds nothing when freeleech is dry (account ratio was a healthy
37.5; the alert even misreported it as "0.00" because $value was the grabbed
count, not the ratio). The alert's real intent was to catch the grabber not
running at all (CronJob Forbid-blocked / wedged), but increase(grabbed[4h])==0
cannot distinguish "didn't run" from "ran, nothing to grab" since Pushgateway
serves the last pushed value forever.

The grabber now heartbeats mam_grabber_last_run_timestamp on every completed run
(main success, ratio/mouse skip, and qBittorrent-unreachable paths). The alert
fires only when that heartbeat is >4h stale — the true stuck condition. Cookie
expiry and qBittorrent-down keep their own dedicated alerts.

Surfaced by /cluster-health as a false-firing alert.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-16 08:18:33 +00:00
parent a0725ede57
commit 2479560fa2
2 changed files with 19 additions and 6 deletions

View file

@ -134,6 +134,7 @@ def main():
profile_metrics
+ f'mam_grabber_skipped_reason{{reason="{reason}"}} 1\n'
+ f"mam_farming_grabbed 0\n"
+ f"mam_grabber_last_run_timestamp {int(time.time())}\n"
)
return
@ -153,7 +154,11 @@ def main():
).json()
except Exception as e:
print(f"qBittorrent unreachable: {e}", file=sys.stderr)
push(profile_metrics + "mam_farming_grabbed 0\n")
push(
profile_metrics
+ "mam_farming_grabbed 0\n"
+ f"mam_grabber_last_run_timestamp {int(time.time())}\n"
)
sys.exit(1)
farming = [t for t in all_torrents if t.get("category") == "mam-farming"]
@ -264,6 +269,7 @@ def main():
+ f"mam_farming_grabbed {grabbed}\n"
+ f"mam_farming_total_seeding {len(farming) + grabbed}\n"
+ f"mam_farming_size_bytes {total_size}\n"
+ f"mam_grabber_last_run_timestamp {int(time.time())}\n"
)
push(metrics)
print(f"Done: grabbed={grabbed}")