40 lines
1.6 KiB
Markdown
40 lines
1.6 KiB
Markdown
# Reddit examples — follow-ups (deferred from initial plan)
|
|
|
|
Items deliberately deferred from
|
|
`docs/plans/2026-05-28-reddit-examples-plan.md` so the initial cut can
|
|
ship without disproportionate scaffolding.
|
|
|
|
## Prometheus counters (Task 11a)
|
|
|
|
The design doc names four counters:
|
|
|
|
- `fire_examples_scraped_total{sub}`
|
|
- `fire_examples_extracted_total{sub,confidence_bucket}`
|
|
- `fire_examples_llm_fallback_total`
|
|
- `fire_examples_extract_failed_total{reason}`
|
|
|
|
Current state: log-based observability only. The ingest CLI emits one
|
|
"ingest done: inserted=X skipped=Y sub_runs_succ=Z/T" line per run.
|
|
|
|
Reason for deferral: the ingest runs once a week via CronJob. Wiring
|
|
`prometheus_client` Counters into a non-FastAPI Job requires either a
|
|
pushgateway hop or a metrics-emitting HTTP endpoint exposed for the
|
|
Job's lifetime — both are disproportionate plumbing for a weekly
|
|
batch. Revisit if we ever need drift signal (e.g., extraction quality
|
|
collapsing silently) or higher cadence.
|
|
|
|
## Fixture expansion
|
|
|
|
The design said 20 hand-curated fixtures; Task 12 ships with 5 (one
|
|
per fi_status / currency combination that exercises a distinct code
|
|
path). Add the remaining 15 once the live ingest has run — pull
|
|
candidates from the real DB by hand and lock them in as regression
|
|
anchors.
|
|
|
|
## Pushshift / pullpush.io for older posts
|
|
|
|
PRAW's `top-of-all-time` caps at 1000 posts per subreddit. For the 12
|
|
target subs that's ~12k posts before deduplication. If a Q3-2027
|
|
review of `fire_planner.fire_example` shows we're consistently missing
|
|
milestone posts older than the cap, layer pullpush.io as a secondary
|
|
source feeding the same pipeline.
|