examples: document deferred follow-ups (metrics, fixtures, pushshift)
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful

This commit is contained in:
Viktor Barzin 2026-05-28 22:34:24 +00:00
parent 2271d7d5e5
commit eb53f6dbb6

View file

@ -0,0 +1,40 @@
# Reddit examples — follow-ups (deferred from initial plan)
Items deliberately deferred from
`docs/plans/2026-05-28-reddit-examples-plan.md` so the initial cut can
ship without disproportionate scaffolding.
## Prometheus counters (Task 11a)
The design doc names four counters:
- `fire_examples_scraped_total{sub}`
- `fire_examples_extracted_total{sub,confidence_bucket}`
- `fire_examples_llm_fallback_total`
- `fire_examples_extract_failed_total{reason}`
Current state: log-based observability only. The ingest CLI emits one
"ingest done: inserted=X skipped=Y sub_runs_succ=Z/T" line per run.
Reason for deferral: the ingest runs once a week via CronJob. Wiring
`prometheus_client` Counters into a non-FastAPI Job requires either a
pushgateway hop or a metrics-emitting HTTP endpoint exposed for the
Job's lifetime — both are disproportionate plumbing for a weekly
batch. Revisit if we ever need drift signal (e.g., extraction quality
collapsing silently) or higher cadence.
## Fixture expansion
The design said 20 hand-curated fixtures; Task 12 ships with 5 (one
per fi_status / currency combination that exercises a distinct code
path). Add the remaining 15 once the live ingest has run — pull
candidates from the real DB by hand and lock them in as regression
anchors.
## Pushshift / pullpush.io for older posts
PRAW's `top-of-all-time` caps at 1000 posts per subreddit. For the 12
target subs that's ~12k posts before deduplication. If a Q3-2027
review of `fire_planner.fire_example` shows we're consistently missing
milestone posts older than the cap, layer pullpush.io as a secondary
source feeding the same pipeline.