Add ListingDecision SQLModel entity for tracking user like/dislike
decisions on properties, with unique constraint on (user_id, listing_id,
listing_type) and appropriate indexes.
Add mobile-responsive design with full feature parity:
- Bottom sheet (vaul) with 3 snap points for map+list coexistence
- Swipeable property cards with horizontal scroll-snap
- Hamburger menu with health, tasks, user info
- Full-screen map with repositioned legend (top-left on mobile)
- Filter FAB opening Sheet drawer
- TaskProgressDrawer from bottom on mobile
- All changes gated behind useIsMobile() hook (768px breakpoint)
- Desktop layout completely untouched
New components: MobileBottomSheet, SwipeableCardRow,
PropertyCardCompact, MobileMenu
Also fixes: idempotent longitude migration, React hooks order
- Set memory limit to 2048MiB for the "Run frontend tests" step
(node:24-alpine was OOM killed at the default 1Gi)
- Set memory limit to 2048MiB for the "Build frontend image" kaniko
step (also OOM killed at 1Gi)
- Increase vitest testTimeout to 30s (CI runner is ~75x slower than
local dev; tests were timing out at the default 5s)
Add last_updated timestamp to /api/status endpoint by querying
MAX(last_seen) across both listing tables. Display it in the
HealthIndicator as relative time (e.g. "2h ago") with full
date/time in the tooltip on hover.
Disable the default clone and use a custom clone step with up to 5
retry attempts in both frontend and api pipelines. This prevents
builds from failing due to transient "Could not resolve host" errors.
worker_ready fires in the main process after pool workers are forked,
so init_metrics() never ran in the child processes. Add a
worker_process_init handler to initialize OTel metrics in each worker.
Hook lives in .githooks/pre-push (tracked) and runs pytest inside the
Docker app container. start.sh auto-configures core.hooksPath so new
clones pick it up on first run.
Type-annotated metric variables (e.g. `geojson_cache_operations: Counter`)
don't exist as importable names until init_metrics() runs. Switch all
`from api.metrics import <metric>` to `import api.metrics as m` and
access instruments as attributes at runtime to avoid ImportError.
Structured logging via JsonFormatter replaces uvicorn's default format so
Loki can parse timestamps and fields. 14 business metrics (scrape stats,
throttle events, circuit breaker state, cache hit rate, OCR success rate,
Celery task lifecycle) are defined in a shared metrics module and
instrumented across the scraper pipeline, API, and workers. Celery
workers expose a Prometheus HTTP endpoint on configurable ports.
When a session token expires, API calls return 401 but nothing caught
it — errors were shown as generic dialogs or swallowed. Now both
apiClient and streamingService detect 401 responses and clear auth
state, which causes App.tsx to render the login modal automatically.
plugins/docker uses Docker-in-Docker which consumes significant
ephemeral storage for image layers. On nodes with disk pressure
(k8s-node1), the build pod gets evicted before completing.
Switch to plugins/kaniko which builds OCI images without a Docker
daemon, using significantly less ephemeral storage. Enable kaniko's
built-in layer caching via a dedicated cache repo.
Both frontend and API pipelines had separate "Cache builder stage"
steps that built and pushed intermediate Docker targets purely for
layer caching. Running these alongside the actual build steps doubled
the Docker build work per pipeline, causing pods to be evicted when
the node hit its ephemeral-storage threshold (~20GB).
The "Build image" steps already use cache_from with the existing
:builder tags from previous builds, so the separate cache steps
are redundant for the common case (unchanged package files).
- WebSocket: verify task ownership before allowing subscribe (security)
- POI routes: replace assert with HTTPException for production safety
- cancel_task: return HTTP 404 instead of 200 for missing tasks
- routing_config: add descriptive ValueError for invalid env vars
- POIManager: show error feedback instead of silently swallowing failures
- VisualizationCard: reset POI/travel mode state on metric switch
- Map: clean up heatmap layers/sources on unmount to prevent memory leak
- Update test to expect 404 from cancel_task ownership check
npm ci can also OOM during dependency installation. Move the heap
limit before npm ci so it applies to all Node processes. Bump Drone
pod limits to 4Gi (requests 2Gi) to cover Docker-in-Docker overhead.
- Replace `npm run build` (tsc -b && vite build) with `npx vite build`
in Dockerfile since Vite transpiles via SWC independently of tsc.
Type-checking is already done in the test step.
- Set Node heap to 1024MB (was 384MB which OOMed even for Vite)
- Bump Drone pod memory: requests 1.5Gi, limits 3Gi to cover
plugins/docker overhead
- Set NODE_OPTIONS=--max-old-space-size=512 in Dockerfile to cap tsc
heap usage within constrained CI pods
- Add resource requests (1Gi) and limits (2Gi) to frontend Docker
build steps in Drone pipeline
Test files use Vitest globals (vi, describe, it, expect) which aren't
available to tsc during production builds. Exclude __tests__ dirs and
*.test.* / *.spec.* files from tsconfig.app.json so tsc -b succeeds.
Add a 'test' stage to Dockerfile that extends runtime-base with the
venv and test dependencies (pytest, fakeredis, etc.) pre-installed.
Drone CI now builds and caches this image as :test, then uses it
directly for running tests — eliminating apt-get and pip install
on every build.
The fixture accepted in_memory_engine but never actually patched
database.engine or api.app.engine, causing tests to hit the real
SQLite path which fails in CI where data/ doesn't exist.
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
RightmoveApiError; narrow catch clauses to specific exception types;
fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
_find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
stale comments and missing type annotation, auth.py type:ignore,
ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
bypass rate limiter for test isolation, fix cache invalidation test to
account for two-pattern scan loop
Frontend pipeline: runs vitest via node:24-alpine before building.
API pipeline: installs deps and runs pytest via python:3.13-slim before building.
Both steps fail-fast (-x) so broken tests block deployment.
Drone expands ${VAR} as its own variables before the shell runs, so
${BASE_API} and ${DEPLOY} were replaced with empty strings. Use $VAR
(no braces) so the shell handles them instead. Also add fallback for
empty jq output to prevent "sh: out of range" errors.
Replace timer-based _monitor_progress (1s sleep loop) with a
ProgressReporter class that publishes on actual state changes,
throttled to at most 1 publish per 250ms. A background flush
every 2s keeps ETA/elapsed current during quiet periods.
Switch WebSocket forwarder from get_message() polling (1s timeout)
to async pubsub.listen() for instant Redis-to-WebSocket delivery.
Combined latency improvement: ~1.5s average → ~250ms.
Replace WebSocket-only useTaskWebSocket with useTaskProgress that
provides a unified task state interface. TaskIndicator no longer
manages its own polling or auth — it receives task state from the
parent via props. Rename wsTasks prop to tasks throughout.
With 8+ active tasks, polling every 5s generates ~96 task_status
requests/min, exceeding the 60/60s rate limit. Two fixes:
- Adaptive polling: 30s when WebSocket is connected (safety net),
5s only when WebSocket is down (primary source)
- Raise task_status rate limit to 200/60s and tasks_for_user to
60/60s to handle burst scenarios (page reloads, WS reconnects)
Three interconnected bugs prevented progress updates from reaching the frontend:
1. _forward_pubsub could exit silently while _handle_client_messages kept
the WebSocket alive (responding to pings), so the client never detected
the broken forwarding path. Replace asyncio.gather with asyncio.wait
(FIRST_COMPLETED) so both coroutines are cancelled together.
2. Polling was stopped on WS connect with no fallback if forwarding broke.
Now polling runs always alongside WebSocket as a safety net.
3. Redis publish failures in task_progress_publisher were logged at DEBUG
and the broken client was reused forever. Log at WARNING and reset the
client so the next call reconnects.
Polling was disabled when wsConnected was true, but if the WS connected
while workers hadn't been redeployed (no pub/sub messages flowing), the
UI received no updates at all. Polling now always runs at 5s as the
baseline. WebSocket provides faster real-time updates on top when
available — the two coexist, last writer wins.
Replace 5s HTTP polling with WebSocket-based real-time updates for task
progress. Celery workers publish progress to Redis pub/sub channels;
a FastAPI WebSocket endpoint subscribes and forwards to the browser.
Polling is kept as a 30s fallback when WebSocket is unavailable.
The task progress drawer now supports multiple concurrent jobs with a
tab bar for switching between scrape and POI distance tasks.
Backend:
- Add services/task_progress_publisher.py (Redis pub/sub bridge)
- Add api/ws_routes.py (WebSocket endpoint with JWT auth)
- Publish progress from listing_tasks and poi_tasks
- Publish REVOKED via pub/sub on cancel/clear to fix stuck UI
Frontend:
- Add useTaskWebSocket hook with reconnection and keepalive
- Add TaskState and WS message types
- TaskIndicator: WS-driven updates with polling fallback
- TaskProgressDrawer: multi-job tabs, POI phase timeline
- Guard against WS overwriting local cancel state
Three-pronged fix for duplicate listings appearing in the UI:
1. Backend: Replace direct rpush cache writes with staged population
(write to temp key, then atomic RENAME to live key). Skip cache
writes entirely for POI-enriched requests. Clean staging keys on
invalidation.
2. Frontend: Add AbortController to cancel in-flight streaming requests
when loadListings is called again, preventing data mixing.
3. Frontend: Deduplicate features by URL during stream accumulation as
a safety net against any remaining server-side duplicates.
index.html is served with Cache-Control: no-cache so the browser always
fetches the latest version with updated asset hashes. Hashed assets under
/assets/ are cached indefinitely since their filenames change on rebuild.
This prevents browsers from serving old cached JS bundles (including the
broken obfuscated build) after a new deployment.