Commit graph

487 commits

Author SHA1 Message Date
Viktor Barzin
941004cf1b
docs: add property decisions feature design document
Tinder-style swipe card UI for liking/disliking properties with
shared shortlists. Two-phase rollout: core decisions first, sharing second.
2026-02-21 13:41:30 +00:00
Viktor Barzin
13637ec881
fix: update tests for mobile responsive changes
- Add matchMedia polyfill to test setup (jsdom doesn't implement it)
- Remove Header tests for filter toggle (replaced by mobile FAB)
2026-02-21 11:45:50 +00:00
Viktor Barzin
a744b33578
feat: make frontend fully responsive with mobile-first layout
Add mobile-responsive design with full feature parity:
- Bottom sheet (vaul) with 3 snap points for map+list coexistence
- Swipeable property cards with horizontal scroll-snap
- Hamburger menu with health, tasks, user info
- Full-screen map with repositioned legend (top-left on mobile)
- Filter FAB opening Sheet drawer
- TaskProgressDrawer from bottom on mobile
- All changes gated behind useIsMobile() hook (768px breakpoint)
- Desktop layout completely untouched

New components: MobileBottomSheet, SwipeableCardRow,
PropertyCardCompact, MobileMenu

Also fixes: idempotent longitude migration, React hooks order
2026-02-21 11:34:53 +00:00
Viktor Barzin
8f068a581e
Fix frontend CI pipeline OOM kills and test timeouts
- Set memory limit to 2048MiB for the "Run frontend tests" step
  (node:24-alpine was OOM killed at the default 1Gi)
- Set memory limit to 2048MiB for the "Build frontend image" kaniko
  step (also OOM killed at 1Gi)
- Increase vitest testTimeout to 30s (CI runner is ~75x slower than
  local dev; tests were timing out at the default 5s)
2026-02-17 21:46:29 +00:00
Viktor Barzin
2d6726dcd7
Show last listing update time next to connection status in header
Add last_updated timestamp to /api/status endpoint by querying
MAX(last_seen) across both listing tables. Display it in the
HealthIndicator as relative time (e.g. "2h ago") with full
date/time in the tooltip on hover.
2026-02-17 19:54:15 +00:00
Viktor Barzin
7833bd3ecf
Add retry logic to Drone CI clone step for transient DNS failures
Disable the default clone and use a custom clone step with up to 5
retry attempts in both frontend and api pipelines. This prevents
builds from failing due to transient "Could not resolve host" errors.
2026-02-15 17:24:56 +00:00
Viktor Barzin
79586015f1
Remove pre-push hook that required Docker for tests 2026-02-15 15:48:01 +00:00
Viktor Barzin
6489012e8f
Auto-pan map to keep popup fully visible in viewport
When a property popup opens near the edge of the map, the map now
automatically pans so the entire popup is visible with 20px padding.
2026-02-15 15:27:51 +00:00
Viktor Barzin
f6f0aa8d14
Fix metrics not initialized in Celery prefork worker processes
worker_ready fires in the main process after pool workers are forked,
so init_metrics() never ran in the child processes. Add a
worker_process_init handler to initialize OTel metrics in each worker.
2026-02-15 12:58:59 +00:00
Viktor Barzin
51e7a7ccc5
Add setup-test-venv.sh for running tests without Docker [ci skip] 2026-02-14 11:44:49 +00:00
Viktor Barzin
556b5b0f42
Add local venv fallback for running tests in CLAUDE.md [ci skip] 2026-02-14 11:42:26 +00:00
Viktor Barzin
fb98e5c12e
Add pre-commit test requirement to CLAUDE.md [ci skip] 2026-02-14 11:38:06 +00:00
Viktor Barzin
dbabdf39d0
Add shared pre-push hook to run tests before pushing
Hook lives in .githooks/pre-push (tracked) and runs pytest inside the
Docker app container.  start.sh auto-configures core.hooksPath so new
clones pick it up on first run.
2026-02-14 11:36:26 +00:00
Viktor Barzin
25912eac0c
Fix metric imports: use module-level access instead of name imports
Type-annotated metric variables (e.g. `geojson_cache_operations: Counter`)
don't exist as importable names until init_metrics() runs.  Switch all
`from api.metrics import <metric>` to `import api.metrics as m` and
access instruments as attributes at runtime to avoid ImportError.
2026-02-14 11:21:49 +00:00
Viktor Barzin
d6edb747d2
Add structured JSON logging, OTel business metrics, and Grafana dashboard
Structured logging via JsonFormatter replaces uvicorn's default format so
Loki can parse timestamps and fields.  14 business metrics (scrape stats,
throttle events, circuit breaker state, cache hit rate, OCR success rate,
Celery task lifecycle) are defined in a shared metrics module and
instrumented across the scraper pipeline, API, and workers.  Celery
workers expose a Prometheus HTTP endpoint on configurable ports.
2026-02-14 10:59:12 +00:00
Viktor Barzin
a1829957c1
Auto-redirect to login on 401 API responses
When a session token expires, API calls return 401 but nothing caught
it — errors were shown as generic dialogs or swallowed. Now both
apiClient and streamingService detect 401 responses and clear auth
state, which causes App.tsx to render the login modal automatically.
2026-02-13 21:16:53 +00:00
Viktor Barzin
3acf8db7af
Switch frontend build to kaniko to avoid ephemeral-storage eviction
plugins/docker uses Docker-in-Docker which consumes significant
ephemeral storage for image layers. On nodes with disk pressure
(k8s-node1), the build pod gets evicted before completing.

Switch to plugins/kaniko which builds OCI images without a Docker
daemon, using significantly less ephemeral storage. Enable kaniko's
built-in layer caching via a dedicated cache repo.
2026-02-13 20:12:15 +00:00
Viktor Barzin
431e276d05
Remove cache builder stages to fix ephemeral-storage eviction
Both frontend and API pipelines had separate "Cache builder stage"
steps that built and pushed intermediate Docker targets purely for
layer caching. Running these alongside the actual build steps doubled
the Docker build work per pipeline, causing pods to be evicted when
the node hit its ephemeral-storage threshold (~20GB).

The "Build image" steps already use cache_from with the existing
:builder tags from previous builds, so the separate cache steps
are redundant for the common case (unchanged package files).
2026-02-13 19:52:17 +00:00
Viktor Barzin
41b7d221e4
Fix 7 bugs: security, memory leak, stale state, error handling
- WebSocket: verify task ownership before allowing subscribe (security)
- POI routes: replace assert with HTTPException for production safety
- cancel_task: return HTTP 404 instead of 200 for missing tasks
- routing_config: add descriptive ValueError for invalid env vars
- POIManager: show error feedback instead of silently swallowing failures
- VisualizationCard: reset POI/travel mode state on metric switch
- Map: clean up heatmap layers/sources on unmount to prevent memory leak
- Update test to expect 404 from cancel_task ownership check
2026-02-13 19:36:43 +00:00
Viktor Barzin
25c87da1cf
Move NODE_OPTIONS before npm ci and bump pod memory to 4Gi
npm ci can also OOM during dependency installation. Move the heap
limit before npm ci so it applies to all Node processes. Bump Drone
pod limits to 4Gi (requests 2Gi) to cover Docker-in-Docker overhead.
2026-02-11 23:11:55 +00:00
Viktor Barzin
fd4864fd03
Fix frontend build OOM: skip tsc, use vite only, bump memory limits
- Replace `npm run build` (tsc -b && vite build) with `npx vite build`
  in Dockerfile since Vite transpiles via SWC independently of tsc.
  Type-checking is already done in the test step.
- Set Node heap to 1024MB (was 384MB which OOMed even for Vite)
- Bump Drone pod memory: requests 1.5Gi, limits 3Gi to cover
  plugins/docker overhead
2026-02-11 22:59:12 +00:00
Viktor Barzin
f9e4960783
Fix Drone resource spec: use bytes instead of Gi suffix 2026-02-11 22:05:04 +00:00
Viktor Barzin
6492741757
Fix frontend Docker build OOM: limit Node heap + request more memory
- Set NODE_OPTIONS=--max-old-space-size=512 in Dockerfile to cap tsc
  heap usage within constrained CI pods
- Add resource requests (1Gi) and limits (2Gi) to frontend Docker
  build steps in Drone pipeline
2026-02-11 21:50:16 +00:00
Viktor Barzin
34f70b2ba4
Exclude test files from frontend production TypeScript build
Test files use Vitest globals (vi, describe, it, expect) which aren't
available to tsc during production builds. Exclude __tests__ dirs and
*.test.* / *.spec.* files from tsconfig.app.json so tsc -b succeeds.
2026-02-11 21:26:50 +00:00
Viktor Barzin
96bcebdb40
Cache test image to speed up CI backend test step
Add a 'test' stage to Dockerfile that extends runtime-base with the
venv and test dependencies (pytest, fakeredis, etc.) pre-installed.
Drone CI now builds and caches this image as :test, then uses it
directly for running tests — eliminating apt-get and pip install
on every build.
2026-02-10 22:57:42 +00:00
Viktor Barzin
909cb8e04b
Fix async_client fixture to patch engine with in-memory DB
The fixture accepted in_memory_engine but never actually patched
database.engine or api.app.engine, causing tests to hit the real
SQLite path which fails in CI where data/ doesn't exist.
2026-02-10 22:49:34 +00:00
Viktor Barzin
aa21e926bb
Fall back to SQLite default when DB_CONNECTION_STRING is unset 2026-02-10 22:38:52 +00:00
Viktor Barzin
be12bdcb81
Add libgl1 and libglib2.0-0 to CI for OpenCV runtime deps 2026-02-10 22:32:44 +00:00
Viktor Barzin
35cf3fa862
Add pkg-config to CI test step for mysqlclient build 2026-02-10 22:26:38 +00:00
Viktor Barzin
f833309297
Refactor backend for cleaner error handling, DRY, and type safety
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
  into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
  RightmoveApiError; narrow catch clauses to specific exception types;
  fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
  _find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
  guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
  stale comments and missing type annotation, auth.py type:ignore,
  ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
  bypass rate limiter for test isolation, fix cache invalidation test to
  account for two-pattern scan loop
2026-02-10 22:19:24 +00:00
Viktor Barzin
6897820cc7
Add test steps to Drone CI pipelines before image builds
Frontend pipeline: runs vitest via node:24-alpine before building.
API pipeline: installs deps and runs pytest via python:3.13-slim before building.
Both steps fail-fast (-x) so broken tests block deployment.
2026-02-10 22:01:28 +00:00
Viktor Barzin
8d22c97320
Add comprehensive test suite: 219 new tests across backend and frontend
Backend (103 tests):
- Unit tests for listing_service, export_service, district_service
- Regression tests for API response contracts and query parameter validation
- Integration tests for API workflows, Redis listing cache, listing processor pipeline, and repository advanced queries
- E2E tests for streaming with filters, batching, caching, and task management

Frontend (116 tests):
- Service tests for apiClient, streamingService, taskService, listingService, healthService
- Hook tests for useTaskProgress (WebSocket + polling)
- Component tests for PropertyCard, FilterPanel, Header, ListView, TaskProgressDrawer, TaskIndicator, StreamingProgressBar, HealthIndicator
- E2E tests for filter-stream-display flow

Infrastructure:
- Add pytest-xdist and test markers (regression, integration, e2e)
- Add conftest fixtures: fake_redis, rent_listing_factory, seeded_repository
- Add vitest + testing-library + MSW for frontend testing
2026-02-10 21:59:45 +00:00
Viktor Barzin
a3ac9cc060
Fix Drone CI variable interpolation in API verify-deploy step
Drone expands ${VAR} as its own variables before the shell runs, so
${BASE_API} and ${DEPLOY} were replaced with empty strings. Use $VAR
(no braces) so the shell handles them instead. Also add fallback for
empty jq output to prevent "sh: out of range" errors.
2026-02-10 21:32:11 +00:00
Viktor Barzin
902f1b0852
Switch task progress to throttled event-driven updates
Replace timer-based _monitor_progress (1s sleep loop) with a
ProgressReporter class that publishes on actual state changes,
throttled to at most 1 publish per 250ms. A background flush
every 2s keeps ETA/elapsed current during quiet periods.

Switch WebSocket forwarder from get_message() polling (1s timeout)
to async pubsub.listen() for instant Redis-to-WebSocket delivery.

Combined latency improvement: ~1.5s average → ~250ms.
2026-02-10 21:24:33 +00:00
Viktor Barzin
b816f695f0
Replace deployment-level rollout check with pod-level verify-deploy
Check for a pod younger than 60s that is ready and running the exact
expected image tag, instead of polling deployment status fields.
2026-02-09 23:21:07 +00:00
Viktor Barzin
dbcd30679a
Add POI task phases to TaskPhase type union
Add 'starting' and 'computing' to TaskPhase so TypeScript accepts
the POI phase comparisons in TaskProgressDrawer.
2026-02-09 23:06:47 +00:00
Viktor Barzin
749e99de93
Add Claude agent definitions [ci skip] 2026-02-09 23:02:30 +00:00
Viktor Barzin
2d86213db5
Refactor task progress to unified useTaskProgress hook
Replace WebSocket-only useTaskWebSocket with useTaskProgress that
provides a unified task state interface. TaskIndicator no longer
manages its own polling or auth — it receives task state from the
parent via props. Rename wsTasks prop to tasks throughout.
2026-02-09 23:02:24 +00:00
Viktor Barzin
3616e678ac
Reduce task polling frequency and raise rate limits to prevent 429s
With 8+ active tasks, polling every 5s generates ~96 task_status
requests/min, exceeding the 60/60s rate limit. Two fixes:

- Adaptive polling: 30s when WebSocket is connected (safety net),
  5s only when WebSocket is down (primary source)
- Raise task_status rate limit to 200/60s and tasks_for_user to
  60/60s to handle burst scenarios (page reloads, WS reconnects)
2026-02-09 22:59:39 +00:00
Viktor Barzin
791b5a9d55
Fix real-time task progress by closing WS on pubsub exit and keeping polling active
Three interconnected bugs prevented progress updates from reaching the frontend:

1. _forward_pubsub could exit silently while _handle_client_messages kept
   the WebSocket alive (responding to pings), so the client never detected
   the broken forwarding path. Replace asyncio.gather with asyncio.wait
   (FIRST_COMPLETED) so both coroutines are cancelled together.

2. Polling was stopped on WS connect with no fallback if forwarding broke.
   Now polling runs always alongside WebSocket as a safety net.

3. Redis publish failures in task_progress_publisher were logged at DEBUG
   and the broken client was reused forever. Log at WARNING and reset the
   client so the next call reconnects.
2026-02-09 22:48:57 +00:00
Viktor Barzin
8d52bdf99d
Fix stale task polling by keeping it always active alongside WebSocket
Polling was disabled when wsConnected was true, but if the WS connected
while workers hadn't been redeployed (no pub/sub messages flowing), the
UI received no updates at all. Polling now always runs at 5s as the
baseline. WebSocket provides faster real-time updates on top when
available — the two coexist, last writer wins.
2026-02-09 21:37:07 +00:00
Viktor Barzin
f3cbeb3f5e
Cache Docker builder stages in Drone CI for faster builds
Push intermediate builder stages as :builder tags and use cache_from
to reuse dependency layers (pip install, npm ci) across builds.
2026-02-09 21:35:42 +00:00
Viktor Barzin
8559c4b461
Add real-time WebSocket task progress with multi-job drawer
Replace 5s HTTP polling with WebSocket-based real-time updates for task
progress. Celery workers publish progress to Redis pub/sub channels;
a FastAPI WebSocket endpoint subscribes and forwards to the browser.
Polling is kept as a 30s fallback when WebSocket is unavailable.

The task progress drawer now supports multiple concurrent jobs with a
tab bar for switching between scrape and POI distance tasks.

Backend:
- Add services/task_progress_publisher.py (Redis pub/sub bridge)
- Add api/ws_routes.py (WebSocket endpoint with JWT auth)
- Publish progress from listing_tasks and poi_tasks
- Publish REVOKED via pub/sub on cancel/clear to fix stuck UI

Frontend:
- Add useTaskWebSocket hook with reconnection and keepalive
- Add TaskState and WS message types
- TaskIndicator: WS-driven updates with polling fallback
- TaskProgressDrawer: multi-job tabs, POI phase timeline
- Guard against WS overwriting local cancel state
2026-02-09 21:31:45 +00:00
Viktor Barzin
73d19e29d5
Fix duplicate listings via staged Redis cache and frontend stream cancellation
Three-pronged fix for duplicate listings appearing in the UI:

1. Backend: Replace direct rpush cache writes with staged population
   (write to temp key, then atomic RENAME to live key). Skip cache
   writes entirely for POI-enriched requests. Clean staging keys on
   invalidation.

2. Frontend: Add AbortController to cancel in-flight streaming requests
   when loadListings is called again, preventing data mixing.

3. Frontend: Deduplicate features by URL during stream accumulation as
   a safety net against any remaining server-side duplicates.
2026-02-09 21:17:30 +00:00
Viktor Barzin
5b8aa98446
Reformat docker-compose command arrays and update tsconfig buildinfo 2026-02-09 20:52:44 +00:00
Viktor Barzin
f465ccb8b8
Fix alembic migration to rename buylisting.longtitude to longitude
The migration was incorrectly changed to a no-op, leaving the typo
in the buylisting table which caused OperationalError during scraping.
2026-02-09 20:47:30 +00:00
Viktor Barzin
0fb4504646
Fix stale browser cache after redeployment with proper nginx cache headers
index.html is served with Cache-Control: no-cache so the browser always
fetches the latest version with updated asset hashes. Hashed assets under
/assets/ are cached indefinitely since their filenames change on rebuild.

This prevents browsers from serving old cached JS bundles (including the
broken obfuscated build) after a new deployment.
2026-02-08 22:51:11 +00:00
Viktor Barzin
7319f77f1d
Remove JS obfuscator that broke Mapbox GL map rendering
vite-plugin-obfuscator processes ALL output chunks including vendor
libraries, corrupting Mapbox GL's WebGL shader string literals via
base64 encoding and string splitting. This caused the map to render
as a blank screen in production.

Vite's built-in esbuild minification already mangles identifiers and
removes whitespace, providing sufficient code protection.

Adds regression tests to prevent re-introducing obfuscation plugins.
2026-02-08 22:47:01 +00:00
Viktor Barzin
6a1c35946e
Add rollout wait step to Drone CI pipelines
Both frontend and API pipelines now wait for K8s deployments to fully
roll out before marking the build as successful. Polls the K8s API
every 5s for up to 300s, checking observedGeneration, updatedReplicas,
and readyReplicas to confirm the new image is live in production.
2026-02-08 20:28:02 +00:00
Viktor Barzin
5b566bab4c
Fix POI distance calculation reliability for remote/Celery execution
- Fix silent log loss: replace hardcoded "uvicorn.error" logger with __name__
  in osrm_client, otp_client, poi_distance_calculator, and poi_tasks (uvicorn
  logger has no handlers in Celery worker, so all errors were silently dropped)
- Add Celery retry: autoretry_for=(Exception,), max_retries=3, retry_backoff
- Add top-level exception handling in task with full traceback logging
- Fix upsert_distances: replace session.merge() (PK-based) with proper
  dialect-aware INSERT ON DUPLICATE KEY UPDATE / ON CONFLICT DO UPDATE
- Filter out listings with null/zero coordinates before routing
- Raise OSError when all routing engines fail with 0 results computed,
  distinguishing "nothing to compute" from "all engines unreachable"
2026-02-08 20:11:12 +00:00