Commit graph

20 commits

Author SHA1 Message Date
Viktor Barzin
43f9d210fb
Fix tests to match decision service API and add filtering to non-streaming endpoint
- Update test mocks from _get_disliked_ids to _get_user_id_safe
- Fix decision service test method names (clear_decision -> remove_decision, etc.)
- Fix positional vs keyword arg assertion in set_decision test
- Add decision_filter param to non-streaming listing_geojson endpoint
2026-02-21 15:52:31 +00:00
Viktor Barzin
9e1beb7495
Add listing decisions (like/dislike) backend with detail endpoint
- ListingDecision model with unique constraint on (user_id, listing_id, listing_type)
- Alembic migration for listingdecision table
- DecisionRepository with dialect-aware upsert (MySQL/SQLite)
- DecisionService with input validation
- Decision API routes: PUT/GET/DELETE on /api/decisions
- GET /api/listing/{id}/detail endpoint extracting full property info from additional_info
- Add listing ID to GeoJSON feature properties
- Decision filtering on GeoJSON stream endpoint (decision_filter param)
2026-02-21 15:49:10 +00:00
Viktor Barzin
8452f65d25
feat: filter disliked listings from GeoJSON endpoints
Both /api/listing_geojson and /api/listing_geojson/stream now exclude
disliked listings by default. A decision_filter='everything' param
bypasses filtering. 2 integration tests verify the behavior.
2026-02-21 13:57:43 +00:00
Viktor Barzin
341de89004
feat: add decision API routes with tests
PUT /api/decisions/{listing_id} to set decision,
GET /api/decisions to list all user decisions,
DELETE /api/decisions/{listing_id} to remove a decision.
All 6 API route tests pass.
2026-02-21 13:54:38 +00:00
Viktor Barzin
2d6726dcd7
Show last listing update time next to connection status in header
Add last_updated timestamp to /api/status endpoint by querying
MAX(last_seen) across both listing tables. Display it in the
HealthIndicator as relative time (e.g. "2h ago") with full
date/time in the tooltip on hover.
2026-02-17 19:54:15 +00:00
Viktor Barzin
25912eac0c
Fix metric imports: use module-level access instead of name imports
Type-annotated metric variables (e.g. `geojson_cache_operations: Counter`)
don't exist as importable names until init_metrics() runs.  Switch all
`from api.metrics import <metric>` to `import api.metrics as m` and
access instruments as attributes at runtime to avoid ImportError.
2026-02-14 11:21:49 +00:00
Viktor Barzin
d6edb747d2
Add structured JSON logging, OTel business metrics, and Grafana dashboard
Structured logging via JsonFormatter replaces uvicorn's default format so
Loki can parse timestamps and fields.  14 business metrics (scrape stats,
throttle events, circuit breaker state, cache hit rate, OCR success rate,
Celery task lifecycle) are defined in a shared metrics module and
instrumented across the scraper pipeline, API, and workers.  Celery
workers expose a Prometheus HTTP endpoint on configurable ports.
2026-02-14 10:59:12 +00:00
Viktor Barzin
41b7d221e4
Fix 7 bugs: security, memory leak, stale state, error handling
- WebSocket: verify task ownership before allowing subscribe (security)
- POI routes: replace assert with HTTPException for production safety
- cancel_task: return HTTP 404 instead of 200 for missing tasks
- routing_config: add descriptive ValueError for invalid env vars
- POIManager: show error feedback instead of silently swallowing failures
- VisualizationCard: reset POI/travel mode state on metric switch
- Map: clean up heatmap layers/sources on unmount to prevent memory leak
- Update test to expect 404 from cancel_task ownership check
2026-02-13 19:36:43 +00:00
Viktor Barzin
f833309297
Refactor backend for cleaner error handling, DRY, and type safety
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
  into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
  RightmoveApiError; narrow catch clauses to specific exception types;
  fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
  _find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
  guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
  stale comments and missing type annotation, auth.py type:ignore,
  ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
  bypass rate limiter for test isolation, fix cache invalidation test to
  account for two-pattern scan loop
2026-02-10 22:19:24 +00:00
Viktor Barzin
902f1b0852
Switch task progress to throttled event-driven updates
Replace timer-based _monitor_progress (1s sleep loop) with a
ProgressReporter class that publishes on actual state changes,
throttled to at most 1 publish per 250ms. A background flush
every 2s keeps ETA/elapsed current during quiet periods.

Switch WebSocket forwarder from get_message() polling (1s timeout)
to async pubsub.listen() for instant Redis-to-WebSocket delivery.

Combined latency improvement: ~1.5s average → ~250ms.
2026-02-10 21:24:33 +00:00
Viktor Barzin
3616e678ac
Reduce task polling frequency and raise rate limits to prevent 429s
With 8+ active tasks, polling every 5s generates ~96 task_status
requests/min, exceeding the 60/60s rate limit. Two fixes:

- Adaptive polling: 30s when WebSocket is connected (safety net),
  5s only when WebSocket is down (primary source)
- Raise task_status rate limit to 200/60s and tasks_for_user to
  60/60s to handle burst scenarios (page reloads, WS reconnects)
2026-02-09 22:59:39 +00:00
Viktor Barzin
791b5a9d55
Fix real-time task progress by closing WS on pubsub exit and keeping polling active
Three interconnected bugs prevented progress updates from reaching the frontend:

1. _forward_pubsub could exit silently while _handle_client_messages kept
   the WebSocket alive (responding to pings), so the client never detected
   the broken forwarding path. Replace asyncio.gather with asyncio.wait
   (FIRST_COMPLETED) so both coroutines are cancelled together.

2. Polling was stopped on WS connect with no fallback if forwarding broke.
   Now polling runs always alongside WebSocket as a safety net.

3. Redis publish failures in task_progress_publisher were logged at DEBUG
   and the broken client was reused forever. Log at WARNING and reset the
   client so the next call reconnects.
2026-02-09 22:48:57 +00:00
Viktor Barzin
8559c4b461
Add real-time WebSocket task progress with multi-job drawer
Replace 5s HTTP polling with WebSocket-based real-time updates for task
progress. Celery workers publish progress to Redis pub/sub channels;
a FastAPI WebSocket endpoint subscribes and forwards to the browser.
Polling is kept as a 30s fallback when WebSocket is unavailable.

The task progress drawer now supports multiple concurrent jobs with a
tab bar for switching between scrape and POI distance tasks.

Backend:
- Add services/task_progress_publisher.py (Redis pub/sub bridge)
- Add api/ws_routes.py (WebSocket endpoint with JWT auth)
- Publish progress from listing_tasks and poi_tasks
- Publish REVOKED via pub/sub on cancel/clear to fix stuck UI

Frontend:
- Add useTaskWebSocket hook with reconnection and keepalive
- Add TaskState and WS message types
- TaskIndicator: WS-driven updates with polling fallback
- TaskProgressDrawer: multi-job tabs, POI phase timeline
- Guard against WS overwriting local cancel state
2026-02-09 21:31:45 +00:00
Viktor Barzin
73d19e29d5
Fix duplicate listings via staged Redis cache and frontend stream cancellation
Three-pronged fix for duplicate listings appearing in the UI:

1. Backend: Replace direct rpush cache writes with staged population
   (write to temp key, then atomic RENAME to live key). Skip cache
   writes entirely for POI-enriched requests. Clean staging keys on
   invalidation.

2. Frontend: Add AbortController to cancel in-flight streaming requests
   when loadListings is called again, preventing data mixing.

3. Frontend: Deduplicate features by URL during stream accumulation as
   a safety net against any remaining server-side duplicates.
2026-02-09 21:17:30 +00:00
Viktor Barzin
1ace45353a
Add API anti-abuse hardening: disable docs in prod, origin validator, exception handler
- Disable OpenAPI docs/redoc/openapi.json when APP_ENV=production
- Strip uvicorn Server header with --no-server-header in Dockerfile and docker-compose.yml
- Add OriginValidatorMiddleware to reject state-changing requests from disallowed origins
- Add global exception handler to prevent stack trace leakage on unhandled errors
- Add tests for all new security features (OpenAPI, origin validation, exception handler, server header)
2026-02-08 20:06:46 +00:00
Viktor Barzin
0a9a83507e
Harden backend security: IDOR fix, error sanitization, rate limiter fallback, security headers
- Fix task status IDOR by adding ownership check; suppress traceback/error in production
- Passkey routes: return generic error messages for internal exceptions, keep ValueError for user-facing
- JWT_SECRET and OIDC_CLIENT_ID: raise RuntimeError in production when using defaults
- Rate limiter: add in-memory fallback counter when Redis is unavailable
- Fix X-Forwarded-For IP spoofing with trusted_proxy_depth (rightmost-N selection)
- Add SecurityHeadersMiddleware (X-Content-Type-Options, X-Frame-Options, CSP, conditional HSTS)
- CORS: add PUT/DELETE methods for POI routes
- POI input validation: field length and coordinate range constraints
- QueryParameters: add min_sqm <= max_sqm validation
2026-02-08 19:42:30 +00:00
Viktor Barzin
743e018668
Redesign filter panel with range sliders, separated visualization card, and backend filter support
Simplify the filter UI to show only essential filters (type toggle, price/bedroom
range sliders, min size) by default, with advanced filters collapsed. Extract
visualization controls (color-by metric, POI travel mode) into a separate
VisualizationCard component. Wire up previously ignored backend filters: max_sqm,
min/max_price_per_sqm, and district_names now work end-to-end.
2026-02-08 18:50:06 +00:00
Viktor Barzin
bd788df9aa
Add POI API routes and Celery task
FastAPI router with CRUD endpoints for POIs, distance calculation
trigger, and distance queries. Streaming GeoJSON endpoint now accepts
include_poi_distances=true to inject travel times into features.
Celery task wraps the distance calculator with progress reporting.
2026-02-08 13:14:47 +00:00
Viktor Barzin
87b5bd8676
Add API rate limiting, metrics guard, and audit middleware
Per-user rate limits via Redis sliding window, IP-restricted /metrics
endpoint, audit logging of all requests, CORS tightening, and export
caps on listing/geojson endpoints.
2026-02-08 00:45:43 +00:00
Viktor Barzin
eafbc1ac52
Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/
The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.

Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
2026-02-07 23:01:20 +00:00