fire-planner/docs/plans/2026-05-28-reddit-examples-followups.md
Viktor Barzin eb53f6dbb6
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
examples: document deferred follow-ups (metrics, fixtures, pushshift)
2026-05-28 22:34:24 +00:00

1.6 KiB

Reddit examples — follow-ups (deferred from initial plan)

Items deliberately deferred from docs/plans/2026-05-28-reddit-examples-plan.md so the initial cut can ship without disproportionate scaffolding.

Prometheus counters (Task 11a)

The design doc names four counters:

  • fire_examples_scraped_total{sub}
  • fire_examples_extracted_total{sub,confidence_bucket}
  • fire_examples_llm_fallback_total
  • fire_examples_extract_failed_total{reason}

Current state: log-based observability only. The ingest CLI emits one "ingest done: inserted=X skipped=Y sub_runs_succ=Z/T" line per run.

Reason for deferral: the ingest runs once a week via CronJob. Wiring prometheus_client Counters into a non-FastAPI Job requires either a pushgateway hop or a metrics-emitting HTTP endpoint exposed for the Job's lifetime — both are disproportionate plumbing for a weekly batch. Revisit if we ever need drift signal (e.g., extraction quality collapsing silently) or higher cadence.

Fixture expansion

The design said 20 hand-curated fixtures; Task 12 ships with 5 (one per fi_status / currency combination that exercises a distinct code path). Add the remaining 15 once the live ingest has run — pull candidates from the real DB by hand and lock them in as regression anchors.

Pushshift / pullpush.io for older posts

PRAW's top-of-all-time caps at 1000 posts per subreddit. For the 12 target subs that's ~12k posts before deduplication. If a Q3-2027 review of fire_planner.fire_example shows we're consistently missing milestone posts older than the cap, layer pullpush.io as a secondary source feeding the same pipeline.