From eb53f6dbb6d482290a0192ddd5725685c9285e34 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Thu, 28 May 2026 22:34:24 +0000 Subject: [PATCH] examples: document deferred follow-ups (metrics, fixtures, pushshift) --- .../2026-05-28-reddit-examples-followups.md | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 docs/plans/2026-05-28-reddit-examples-followups.md diff --git a/docs/plans/2026-05-28-reddit-examples-followups.md b/docs/plans/2026-05-28-reddit-examples-followups.md new file mode 100644 index 0000000..7185b41 --- /dev/null +++ b/docs/plans/2026-05-28-reddit-examples-followups.md @@ -0,0 +1,40 @@ +# Reddit examples — follow-ups (deferred from initial plan) + +Items deliberately deferred from +`docs/plans/2026-05-28-reddit-examples-plan.md` so the initial cut can +ship without disproportionate scaffolding. + +## Prometheus counters (Task 11a) + +The design doc names four counters: + +- `fire_examples_scraped_total{sub}` +- `fire_examples_extracted_total{sub,confidence_bucket}` +- `fire_examples_llm_fallback_total` +- `fire_examples_extract_failed_total{reason}` + +Current state: log-based observability only. The ingest CLI emits one +"ingest done: inserted=X skipped=Y sub_runs_succ=Z/T" line per run. + +Reason for deferral: the ingest runs once a week via CronJob. Wiring +`prometheus_client` Counters into a non-FastAPI Job requires either a +pushgateway hop or a metrics-emitting HTTP endpoint exposed for the +Job's lifetime — both are disproportionate plumbing for a weekly +batch. Revisit if we ever need drift signal (e.g., extraction quality +collapsing silently) or higher cadence. + +## Fixture expansion + +The design said 20 hand-curated fixtures; Task 12 ships with 5 (one +per fi_status / currency combination that exercises a distinct code +path). Add the remaining 15 once the live ingest has run — pull +candidates from the real DB by hand and lock them in as regression +anchors. + +## Pushshift / pullpush.io for older posts + +PRAW's `top-of-all-time` caps at 1000 posts per subreddit. For the 12 +target subs that's ~12k posts before deduplication. If a Q3-2027 +review of `fire_planner.fire_example` shows we're consistently missing +milestone posts older than the cap, layer pullpush.io as a secondary +source feeding the same pipeline.