Commit graph

16 commits

Author SHA1 Message Date
Viktor Barzin
b9f576ae2b
Stream-process listings as IDs arrive via asyncio.Queue
Replace the sequential fetch-all-then-process pipeline with a streaming
architecture where listing processing starts as soon as IDs become
available from each subquery. A producer task fetches pages and enqueues
new IDs (filtered inline against DB), while 20 consumer workers process
listings concurrently from the queue.

- Add ListingRepository.get_listing_ids() for fast ID-only projection
- Refactor listing_tasks.py: remove get_ids_to_process/dump_listings_and_monitor,
  replace with unified producer/worker/monitor pipeline
- Apply same pattern to CLI path in listing_fetcher.py
- Remove 'filtering' phase from frontend, show combined fetch+process metrics
- Add fetching_done flag to TaskResult for phase transition tracking
2026-02-06 23:43:54 +00:00
Viktor Barzin
7e8f1f0339
Fix live logs: replace monkey-patch with direct log injection
The previous approach monkey-patched task.update_state on the Celery
Task instance, but the assignment didn't take effect (Celery's Task
singleton may prevent instance method shadowing). Additionally, the
celery.task logger level was left at WARNING by Celery's worker setup,
silencing all INFO-level log capture.

Fix:
- Replace wrapper with _update_task_state() helper that directly injects
  logs from a module-level _active_log_buffer into every meta dict
- Attach TaskLogHandler to BOTH celery.task and uvicorn.error loggers
- Force both loggers to INFO level during task execution
- Every task.update_state call site now uses _update_task_state()
2026-02-06 23:13:01 +00:00
Viktor Barzin
b4837e1603
Add crawl job progress drawer with phase tracking and live logs
- Add phase-aware progress reporting across all crawl phases (splitting,
  fetching, filtering, processing) with per-step counters
- Add TaskProgressDrawer component with phase timeline stepper, detail
  counters, progress bar with ETA, and live worker log viewer
- Add on_step_complete callback to ListingProcessor for granular tracking
  of details/images/OCR steps
- Extend QuerySplitter on_progress callback with structured counter data
- Capture celery worker logs via ring buffer handler and inject into task
  state updates for frontend display
- Guard taskResult updates with phase presence check to prevent drawer
  from blanking during state transitions
2026-02-06 22:37:53 +00:00
Viktor Barzin
5514fa6381 Remove 1000-result limit, add Redis caching and virtual scrolling
- Remove hard-coded limit=1000 default from listing_geojson and streaming
  endpoints, allowing all matching results to be returned
- Add Redis caching service (db=2, 30min TTL) that caches query results
  as Redis Lists for fast re-queries with reduced DB load
- Integrate cache into streaming endpoint: serve from cache on hit,
  populate cache on miss during DB streaming
- Invalidate cache after scrape completes (both success and no-new-listings)
- Replace ScrollArea with react-virtuoso in ListView for virtual scrolling,
  keeping only ~20-30 DOM nodes regardless of list size
- Handle metadata streaming message to show "0 / N" progress from start
- Throttle frontend state updates with requestAnimationFrame to prevent
  UI jank from rapid re-renders during cached response streaming
2026-02-06 20:47:36 +00:00
Viktor Barzin
c4b11ccfe9 Add comprehensive logging to Celery tasks and listing processor 2026-02-06 20:47:36 +00:00
Viktor Barzin
e8293c6042 Add intelligent query splitting to maximize Rightmove data extraction 2026-02-06 20:47:36 +00:00
Viktor Barzin
ceb943f198 Fix refresh listings returning immediate success with no progress
The get_ids_to_process function was using set union instead of set
difference, causing it to return all existing listing IDs along with
new ones. This meant:

1. When there were no new listings, the task would iterate through all
   existing listings, find nothing to process for each, and complete
   almost instantly

2. The task showed no progress because processing was too fast

Fixed by:
- Changed `all_listing_ids.union(identifiers)` to `identifiers - all_listing_ids`
  to only return IDs that are NOT already in the database
- Added explicit check for empty set with informative task state
  "No new listings found" so users understand why the task completed quickly
2026-02-01 22:30:37 +00:00
Viktor Barzin
0ca955796e Show processed/total counts in task progress indicator 2026-02-01 19:19:59 +00:00
Viktor Barzin
c7ac448f15 Add configurable scheduling, UI health/task indicators, and auto-load map with default filters 2026-02-01 17:28:37 +00:00
Viktor Barzin
d1cef99c5a
make task processing a bit better. still doing 1 query to check if needs processing; will fix later 2025-07-27 20:09:41 +00:00
Viktor Barzin
87efe0694c
format progress to 2 digits and add status updates before starting 2025-07-27 18:47:09 +00:00
Viktor Barzin
91a0436f7f
migrate processing to a pipeline approach where each listing is processed in a pipeline in parallel and status reported back to track progress 2025-07-27 18:33:39 +00:00
Viktor Barzin
206471cee8
fix argument error in tasks 2025-07-26 10:38:51 +00:00
Viktor Barzin
272d54d014
add daily scrape of interesting rent listings 2025-07-25 22:14:45 +00:00
Viktor Barzin
d4b22deda0
save user queries in redis so that user can refresh the page and still come back to their latest task 2025-07-06 12:02:25 +00:00
Viktor Barzin
93129333e6
migrate background tasks to celery 2025-06-22 21:18:52 +00:00