Stream-process listings as IDs arrive via asyncio.Queue
Replace the sequential fetch-all-then-process pipeline with a streaming architecture where listing processing starts as soon as IDs become available from each subquery. A producer task fetches pages and enqueues new IDs (filtered inline against DB), while 20 consumer workers process listings concurrently from the queue. - Add ListingRepository.get_listing_ids() for fast ID-only projection - Refactor listing_tasks.py: remove get_ids_to_process/dump_listings_and_monitor, replace with unified producer/worker/monitor pipeline - Apply same pattern to CLI path in listing_fetcher.py - Remove 'filtering' phase from frontend, show combined fetch+process metrics - Add fetching_done flag to TaskResult for phase transition tracking
This commit is contained in:
parent
7e8f1f0339
commit
b9f576ae2b
6 changed files with 372 additions and 420 deletions
|
|
@ -343,6 +343,20 @@ class ListingRepository:
|
|||
|
||||
return model_listing
|
||||
|
||||
def get_listing_ids(
|
||||
self,
|
||||
listing_type: ListingType = ListingType.RENT,
|
||||
) -> set[int]:
|
||||
"""Get all listing IDs from the database (ID-only projection).
|
||||
|
||||
Much faster than get_listings() when only IDs are needed for
|
||||
filtering against API results.
|
||||
"""
|
||||
model = RentListing if listing_type == ListingType.RENT else BuyListing
|
||||
with Session(self.engine) as session:
|
||||
result = session.execute(sa_select(model.id))
|
||||
return {row[0] for row in result.fetchall()}
|
||||
|
||||
async def mark_seen(self, listing_id: int) -> None:
|
||||
listings = await self.get_listings(only_ids=[listing_id])
|
||||
if len(listings) == 0:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue