examples: document deferred follow-ups (metrics, fixtures, pushshift)
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
This commit is contained in:
parent
2271d7d5e5
commit
eb53f6dbb6
1 changed files with 40 additions and 0 deletions
40
docs/plans/2026-05-28-reddit-examples-followups.md
Normal file
40
docs/plans/2026-05-28-reddit-examples-followups.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# Reddit examples — follow-ups (deferred from initial plan)
|
||||
|
||||
Items deliberately deferred from
|
||||
`docs/plans/2026-05-28-reddit-examples-plan.md` so the initial cut can
|
||||
ship without disproportionate scaffolding.
|
||||
|
||||
## Prometheus counters (Task 11a)
|
||||
|
||||
The design doc names four counters:
|
||||
|
||||
- `fire_examples_scraped_total{sub}`
|
||||
- `fire_examples_extracted_total{sub,confidence_bucket}`
|
||||
- `fire_examples_llm_fallback_total`
|
||||
- `fire_examples_extract_failed_total{reason}`
|
||||
|
||||
Current state: log-based observability only. The ingest CLI emits one
|
||||
"ingest done: inserted=X skipped=Y sub_runs_succ=Z/T" line per run.
|
||||
|
||||
Reason for deferral: the ingest runs once a week via CronJob. Wiring
|
||||
`prometheus_client` Counters into a non-FastAPI Job requires either a
|
||||
pushgateway hop or a metrics-emitting HTTP endpoint exposed for the
|
||||
Job's lifetime — both are disproportionate plumbing for a weekly
|
||||
batch. Revisit if we ever need drift signal (e.g., extraction quality
|
||||
collapsing silently) or higher cadence.
|
||||
|
||||
## Fixture expansion
|
||||
|
||||
The design said 20 hand-curated fixtures; Task 12 ships with 5 (one
|
||||
per fi_status / currency combination that exercises a distinct code
|
||||
path). Add the remaining 15 once the live ingest has run — pull
|
||||
candidates from the real DB by hand and lock them in as regression
|
||||
anchors.
|
||||
|
||||
## Pushshift / pullpush.io for older posts
|
||||
|
||||
PRAW's `top-of-all-time` caps at 1000 posts per subreddit. For the 12
|
||||
target subs that's ~12k posts before deduplication. If a Q3-2027
|
||||
review of `fire_planner.fire_example` shows we're consistently missing
|
||||
milestone posts older than the cap, layer pullpush.io as a secondary
|
||||
source feeding the same pipeline.
|
||||
Loading…
Add table
Add a link
Reference in a new issue