Both /api/listing_geojson and /api/listing_geojson/stream now exclude
disliked listings by default. A decision_filter='everything' param
bypasses filtering. 2 integration tests verify the behavior.
PUT /api/decisions/{listing_id} to set decision,
GET /api/decisions to list all user decisions,
DELETE /api/decisions/{listing_id} to remove a decision.
All 6 API route tests pass.
Add ListingDecision SQLModel entity for tracking user like/dislike
decisions on properties, with unique constraint on (user_id, listing_id,
listing_type) and appropriate indexes.
Add mobile-responsive design with full feature parity:
- Bottom sheet (vaul) with 3 snap points for map+list coexistence
- Swipeable property cards with horizontal scroll-snap
- Hamburger menu with health, tasks, user info
- Full-screen map with repositioned legend (top-left on mobile)
- Filter FAB opening Sheet drawer
- TaskProgressDrawer from bottom on mobile
- All changes gated behind useIsMobile() hook (768px breakpoint)
- Desktop layout completely untouched
New components: MobileBottomSheet, SwipeableCardRow,
PropertyCardCompact, MobileMenu
Also fixes: idempotent longitude migration, React hooks order
- Set memory limit to 2048MiB for the "Run frontend tests" step
(node:24-alpine was OOM killed at the default 1Gi)
- Set memory limit to 2048MiB for the "Build frontend image" kaniko
step (also OOM killed at 1Gi)
- Increase vitest testTimeout to 30s (CI runner is ~75x slower than
local dev; tests were timing out at the default 5s)
Add last_updated timestamp to /api/status endpoint by querying
MAX(last_seen) across both listing tables. Display it in the
HealthIndicator as relative time (e.g. "2h ago") with full
date/time in the tooltip on hover.
Disable the default clone and use a custom clone step with up to 5
retry attempts in both frontend and api pipelines. This prevents
builds from failing due to transient "Could not resolve host" errors.
worker_ready fires in the main process after pool workers are forked,
so init_metrics() never ran in the child processes. Add a
worker_process_init handler to initialize OTel metrics in each worker.
Hook lives in .githooks/pre-push (tracked) and runs pytest inside the
Docker app container. start.sh auto-configures core.hooksPath so new
clones pick it up on first run.
Type-annotated metric variables (e.g. `geojson_cache_operations: Counter`)
don't exist as importable names until init_metrics() runs. Switch all
`from api.metrics import <metric>` to `import api.metrics as m` and
access instruments as attributes at runtime to avoid ImportError.
Structured logging via JsonFormatter replaces uvicorn's default format so
Loki can parse timestamps and fields. 14 business metrics (scrape stats,
throttle events, circuit breaker state, cache hit rate, OCR success rate,
Celery task lifecycle) are defined in a shared metrics module and
instrumented across the scraper pipeline, API, and workers. Celery
workers expose a Prometheus HTTP endpoint on configurable ports.
When a session token expires, API calls return 401 but nothing caught
it — errors were shown as generic dialogs or swallowed. Now both
apiClient and streamingService detect 401 responses and clear auth
state, which causes App.tsx to render the login modal automatically.
plugins/docker uses Docker-in-Docker which consumes significant
ephemeral storage for image layers. On nodes with disk pressure
(k8s-node1), the build pod gets evicted before completing.
Switch to plugins/kaniko which builds OCI images without a Docker
daemon, using significantly less ephemeral storage. Enable kaniko's
built-in layer caching via a dedicated cache repo.
Both frontend and API pipelines had separate "Cache builder stage"
steps that built and pushed intermediate Docker targets purely for
layer caching. Running these alongside the actual build steps doubled
the Docker build work per pipeline, causing pods to be evicted when
the node hit its ephemeral-storage threshold (~20GB).
The "Build image" steps already use cache_from with the existing
:builder tags from previous builds, so the separate cache steps
are redundant for the common case (unchanged package files).
- WebSocket: verify task ownership before allowing subscribe (security)
- POI routes: replace assert with HTTPException for production safety
- cancel_task: return HTTP 404 instead of 200 for missing tasks
- routing_config: add descriptive ValueError for invalid env vars
- POIManager: show error feedback instead of silently swallowing failures
- VisualizationCard: reset POI/travel mode state on metric switch
- Map: clean up heatmap layers/sources on unmount to prevent memory leak
- Update test to expect 404 from cancel_task ownership check
npm ci can also OOM during dependency installation. Move the heap
limit before npm ci so it applies to all Node processes. Bump Drone
pod limits to 4Gi (requests 2Gi) to cover Docker-in-Docker overhead.
- Replace `npm run build` (tsc -b && vite build) with `npx vite build`
in Dockerfile since Vite transpiles via SWC independently of tsc.
Type-checking is already done in the test step.
- Set Node heap to 1024MB (was 384MB which OOMed even for Vite)
- Bump Drone pod memory: requests 1.5Gi, limits 3Gi to cover
plugins/docker overhead
- Set NODE_OPTIONS=--max-old-space-size=512 in Dockerfile to cap tsc
heap usage within constrained CI pods
- Add resource requests (1Gi) and limits (2Gi) to frontend Docker
build steps in Drone pipeline
Test files use Vitest globals (vi, describe, it, expect) which aren't
available to tsc during production builds. Exclude __tests__ dirs and
*.test.* / *.spec.* files from tsconfig.app.json so tsc -b succeeds.
Add a 'test' stage to Dockerfile that extends runtime-base with the
venv and test dependencies (pytest, fakeredis, etc.) pre-installed.
Drone CI now builds and caches this image as :test, then uses it
directly for running tests — eliminating apt-get and pip install
on every build.
The fixture accepted in_memory_engine but never actually patched
database.engine or api.app.engine, causing tests to hit the real
SQLite path which fails in CI where data/ doesn't exist.
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
RightmoveApiError; narrow catch clauses to specific exception types;
fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
_find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
stale comments and missing type annotation, auth.py type:ignore,
ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
bypass rate limiter for test isolation, fix cache invalidation test to
account for two-pattern scan loop
Frontend pipeline: runs vitest via node:24-alpine before building.
API pipeline: installs deps and runs pytest via python:3.13-slim before building.
Both steps fail-fast (-x) so broken tests block deployment.
Drone expands ${VAR} as its own variables before the shell runs, so
${BASE_API} and ${DEPLOY} were replaced with empty strings. Use $VAR
(no braces) so the shell handles them instead. Also add fallback for
empty jq output to prevent "sh: out of range" errors.
Replace timer-based _monitor_progress (1s sleep loop) with a
ProgressReporter class that publishes on actual state changes,
throttled to at most 1 publish per 250ms. A background flush
every 2s keeps ETA/elapsed current during quiet periods.
Switch WebSocket forwarder from get_message() polling (1s timeout)
to async pubsub.listen() for instant Redis-to-WebSocket delivery.
Combined latency improvement: ~1.5s average → ~250ms.
Replace WebSocket-only useTaskWebSocket with useTaskProgress that
provides a unified task state interface. TaskIndicator no longer
manages its own polling or auth — it receives task state from the
parent via props. Rename wsTasks prop to tasks throughout.
With 8+ active tasks, polling every 5s generates ~96 task_status
requests/min, exceeding the 60/60s rate limit. Two fixes:
- Adaptive polling: 30s when WebSocket is connected (safety net),
5s only when WebSocket is down (primary source)
- Raise task_status rate limit to 200/60s and tasks_for_user to
60/60s to handle burst scenarios (page reloads, WS reconnects)