- Add phase-aware progress reporting across all crawl phases (splitting,
fetching, filtering, processing) with per-step counters
- Add TaskProgressDrawer component with phase timeline stepper, detail
counters, progress bar with ETA, and live worker log viewer
- Add on_step_complete callback to ListingProcessor for granular tracking
of details/images/OCR steps
- Extend QuerySplitter on_progress callback with structured counter data
- Capture celery worker logs via ring buffer handler and inject into task
state updates for frontend display
- Guard taskResult updates with phase presence check to prevent drawer
from blanking during state transitions
- Remove hard-coded limit=1000 default from listing_geojson and streaming
endpoints, allowing all matching results to be returned
- Add Redis caching service (db=2, 30min TTL) that caches query results
as Redis Lists for fast re-queries with reduced DB load
- Integrate cache into streaming endpoint: serve from cache on hit,
populate cache on miss during DB streaming
- Invalidate cache after scrape completes (both success and no-new-listings)
- Replace ScrollArea with react-virtuoso in ListView for virtual scrolling,
keeping only ~20-30 DOM nodes regardless of list size
- Handle metadata streaming message to show "0 / N" progress from start
- Throttle frontend state updates with requestAnimationFrame to prevent
UI jank from rapid re-renders during cached response streaming
The get_ids_to_process function was using set union instead of set
difference, causing it to return all existing listing IDs along with
new ones. This meant:
1. When there were no new listings, the task would iterate through all
existing listings, find nothing to process for each, and complete
almost instantly
2. The task showed no progress because processing was too fast
Fixed by:
- Changed `all_listing_ids.union(identifiers)` to `identifiers - all_listing_ids`
to only return IDs that are NOT already in the database
- Added explicit check for empty set with informative task state
"No new listings found" so users understand why the task completed quickly