Commit graph

16 commits

Author SHA1 Message Date
Viktor Barzin
1ae00b7cbf
Add multi-layer caching: 24h Redis TTL, stale-while-revalidate, frontend LRU cache
- Increase Redis cache TTL from 30 minutes to 24 hours
- Add stale-while-revalidate: serve stale cache (>4h) immediately while
  repopulating in background with SETNX lock to prevent concurrent rebuilds
- Add in-memory frontend LRU cache (5 entries) so repeat filter visits
  are instant without network requests
- Invalidate frontend cache on listing refresh and task completion
- Add unit tests for get_cache_age, is_cache_stale, acquire_repopulation_lock
2026-02-23 20:09:36 +00:00
Viktor Barzin
cde3540a1e
Remove watchdog and tqdm dependencies, replace with logging
- Remove watchdog (unused) and tqdm from pyproject.toml dependencies
- Replace tqdm.gather() with asyncio.gather() + logger.info() in
  image_fetcher, floorplan_detector, and route_calculator services
- Replace tqdm progress bar with logger.info() in listing_repository
- Remove tqdm from mypy ignore_missing_imports overrides
2026-02-21 19:39:49 +00:00
Viktor Barzin
9e1beb7495
Add listing decisions (like/dislike) backend with detail endpoint
- ListingDecision model with unique constraint on (user_id, listing_id, listing_type)
- Alembic migration for listingdecision table
- DecisionRepository with dialect-aware upsert (MySQL/SQLite)
- DecisionService with input validation
- Decision API routes: PUT/GET/DELETE on /api/decisions
- GET /api/listing/{id}/detail endpoint extracting full property info from additional_info
- Add listing ID to GeoJSON feature properties
- Decision filtering on GeoJSON stream endpoint (decision_filter param)
2026-02-21 15:49:10 +00:00
Viktor Barzin
49280d9679
fix: QA fixes for property decisions feature
- Replace deprecated datetime.utcnow() with datetime.now(UTC) in model
  and repository
- Add listing_type validation to decision_service (RENT/BUY only)
- Fix decision filtering tests failing due to rate limiting by patching
  _match_endpoint
- Add SwipeCard component test suite (11 tests covering rendering,
  interactions, and POI distances)
- Add test for invalid listing_type validation
2026-02-21 14:04:34 +00:00
Viktor Barzin
4877a5fc9f
feat: add decision_service with unit tests
Service layer provides validation, delegation to repository, and
get_disliked_listing_ids for filtering. All 7 unit tests pass.
2026-02-21 13:52:54 +00:00
Viktor Barzin
f833309297
Refactor backend for cleaner error handling, DRY, and type safety
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
  into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
  RightmoveApiError; narrow catch clauses to specific exception types;
  fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
  _find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
  guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
  stale comments and missing type annotation, auth.py type:ignore,
  ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
  bypass rate limiter for test isolation, fix cache invalidation test to
  account for two-pattern scan loop
2026-02-10 22:19:24 +00:00
Viktor Barzin
791b5a9d55
Fix real-time task progress by closing WS on pubsub exit and keeping polling active
Three interconnected bugs prevented progress updates from reaching the frontend:

1. _forward_pubsub could exit silently while _handle_client_messages kept
   the WebSocket alive (responding to pings), so the client never detected
   the broken forwarding path. Replace asyncio.gather with asyncio.wait
   (FIRST_COMPLETED) so both coroutines are cancelled together.

2. Polling was stopped on WS connect with no fallback if forwarding broke.
   Now polling runs always alongside WebSocket as a safety net.

3. Redis publish failures in task_progress_publisher were logged at DEBUG
   and the broken client was reused forever. Log at WARNING and reset the
   client so the next call reconnects.
2026-02-09 22:48:57 +00:00
Viktor Barzin
8559c4b461
Add real-time WebSocket task progress with multi-job drawer
Replace 5s HTTP polling with WebSocket-based real-time updates for task
progress. Celery workers publish progress to Redis pub/sub channels;
a FastAPI WebSocket endpoint subscribes and forwards to the browser.
Polling is kept as a 30s fallback when WebSocket is unavailable.

The task progress drawer now supports multiple concurrent jobs with a
tab bar for switching between scrape and POI distance tasks.

Backend:
- Add services/task_progress_publisher.py (Redis pub/sub bridge)
- Add api/ws_routes.py (WebSocket endpoint with JWT auth)
- Publish progress from listing_tasks and poi_tasks
- Publish REVOKED via pub/sub on cancel/clear to fix stuck UI

Frontend:
- Add useTaskWebSocket hook with reconnection and keepalive
- Add TaskState and WS message types
- TaskIndicator: WS-driven updates with polling fallback
- TaskProgressDrawer: multi-job tabs, POI phase timeline
- Guard against WS overwriting local cancel state
2026-02-09 21:31:45 +00:00
Viktor Barzin
73d19e29d5
Fix duplicate listings via staged Redis cache and frontend stream cancellation
Three-pronged fix for duplicate listings appearing in the UI:

1. Backend: Replace direct rpush cache writes with staged population
   (write to temp key, then atomic RENAME to live key). Skip cache
   writes entirely for POI-enriched requests. Clean staging keys on
   invalidation.

2. Frontend: Add AbortController to cancel in-flight streaming requests
   when loadListings is called again, preventing data mixing.

3. Frontend: Deduplicate features by URL during stream accumulation as
   a safety net against any remaining server-side duplicates.
2026-02-09 21:17:30 +00:00
Viktor Barzin
5b566bab4c
Fix POI distance calculation reliability for remote/Celery execution
- Fix silent log loss: replace hardcoded "uvicorn.error" logger with __name__
  in osrm_client, otp_client, poi_distance_calculator, and poi_tasks (uvicorn
  logger has no handlers in Celery worker, so all errors were silently dropped)
- Add Celery retry: autoretry_for=(Exception,), max_retries=3, retry_backoff
- Add top-level exception handling in task with full traceback logging
- Fix upsert_distances: replace session.merge() (PK-based) with proper
  dialect-aware INSERT ON DUPLICATE KEY UPDATE / ON CONFLICT DO UPDATE
- Filter out listings with null/zero coordinates before routing
- Raise OSError when all routing engines fail with 0 results computed,
  distinguishing "nothing to compute" from "all engines unreachable"
2026-02-08 20:11:12 +00:00
Viktor Barzin
e431eaf2aa
Fix POI distance calculation for buy listings
The distance calculator always queried the rentlisting table regardless of
listing type because get_listings() defaulted to RentListing when called
without query_parameters. Added a listing_type parameter to get_listings()
and _get_model_for_query() so callers can select the correct table directly.
2026-02-08 19:10:32 +00:00
Viktor Barzin
8a5d1b3787
Fix POI distance calculation: OSRM index separator and error handling
- Fix OSRM client to use semicolons (not commas) for source/destination
  indices in /table API requests. Commas caused "Query string malformed"
  errors for any batch with more than one origin.
- Add error handling in poi_distance_calculator for unreachable routing
  engines (OSRM/OTP). Connection failures now log an error and skip the
  mode instead of crashing the entire Celery task.
2026-02-08 14:50:09 +00:00
Viktor Barzin
da0a56895d
Add self-hosted routing clients and distance calculator
RoutingConfig loads OSRM/OTP URLs from env vars. OSRM client uses the
/table endpoint for batch NxM distance matrices (walk/cycle). OTP client
uses GraphQL API for transit routes. POI distance calculator orchestrates
both, skipping already-computed distances and reporting progress.
2026-02-08 13:14:37 +00:00
Viktor Barzin
8a31e5449c
Add POI repository and service layer
POIRepository handles all database operations for POIs and distances
including upsert, cascading delete, and skip-on-recompute via
get_existing_distance_keys(). POI service provides unified high-level
functions shared by both CLI and API.
2026-02-08 13:13:17 +00:00
Viktor Barzin
e5ce8c1201
Fix buy listing support: thread ListingType through processing pipeline
The listing processor was hardcoded to create RentListing objects and
query only the rentlisting table. Buy listings fetched from Rightmove
were stored in the wrong table with missing fields. This threads
ListingType through ListingProcessor and all Step subclasses so the
correct model (RentListing/BuyListing) is created, the correct table
is queried, and buy-specific fields (service_charge, lease_left) are
parsed from the API response and included in GeoJSON streaming output.
2026-02-07 23:34:08 +00:00
Viktor Barzin
eafbc1ac52
Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/
The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.

Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
2026-02-07 23:01:20 +00:00