1.6 KiB
Reddit examples — follow-ups (deferred from initial plan)
Items deliberately deferred from
docs/plans/2026-05-28-reddit-examples-plan.md so the initial cut can
ship without disproportionate scaffolding.
Prometheus counters (Task 11a)
The design doc names four counters:
fire_examples_scraped_total{sub}fire_examples_extracted_total{sub,confidence_bucket}fire_examples_llm_fallback_totalfire_examples_extract_failed_total{reason}
Current state: log-based observability only. The ingest CLI emits one "ingest done: inserted=X skipped=Y sub_runs_succ=Z/T" line per run.
Reason for deferral: the ingest runs once a week via CronJob. Wiring
prometheus_client Counters into a non-FastAPI Job requires either a
pushgateway hop or a metrics-emitting HTTP endpoint exposed for the
Job's lifetime — both are disproportionate plumbing for a weekly
batch. Revisit if we ever need drift signal (e.g., extraction quality
collapsing silently) or higher cadence.
Fixture expansion
The design said 20 hand-curated fixtures; Task 12 ships with 5 (one per fi_status / currency combination that exercises a distinct code path). Add the remaining 15 once the live ingest has run — pull candidates from the real DB by hand and lock them in as regression anchors.
Pushshift / pullpush.io for older posts
PRAW's top-of-all-time caps at 1000 posts per subreddit. For the 12
target subs that's ~12k posts before deduplication. If a Q3-2027
review of fire_planner.fire_example shows we're consistently missing
milestone posts older than the cap, layer pullpush.io as a secondary
source feeding the same pipeline.