- Fix task status IDOR by adding ownership check; suppress traceback/error in production
- Passkey routes: return generic error messages for internal exceptions, keep ValueError for user-facing
- JWT_SECRET and OIDC_CLIENT_ID: raise RuntimeError in production when using defaults
- Rate limiter: add in-memory fallback counter when Redis is unavailable
- Fix X-Forwarded-For IP spoofing with trusted_proxy_depth (rightmost-N selection)
- Add SecurityHeadersMiddleware (X-Content-Type-Options, X-Frame-Options, CSP, conditional HSTS)
- CORS: add PUT/DELETE methods for POI routes
- POI input validation: field length and coordinate range constraints
- QueryParameters: add min_sqm <= max_sqm validation
The distance calculator always queried the rentlisting table regardless of
listing type because get_listings() defaulted to RentListing when called
without query_parameters. Added a listing_type parameter to get_listings()
and _get_model_for_query() so callers can select the correct table directly.
Simplify the filter UI to show only essential filters (type toggle, price/bedroom
range sliders, min size) by default, with advanced filters collapsed. Extract
visualization controls (color-by metric, POI travel mode) into a separate
VisualizationCard component. Wire up previously ignored backend filters: max_sqm,
min/max_price_per_sqm, and district_names now work end-to-end.
Math.round(values.length * 0.95) produces an out-of-bounds index when
the dataset has fewer than ~20 features (e.g. after tight travel time
filtering). values[outOfBounds] returns undefined, cascading to NaN
color stops which crash Mapbox's expression evaluator. Clamp both
min and max indices to values.length - 1.
Replace the single global max travel time filter with per-POI filters.
Each POI gets its own travel mode selector and max minutes input in the
filter panel. Listings must satisfy ALL active filters (AND logic).
Fix Mapbox "Input is not a number" error by ensuring color stops are
always strictly monotonic (guard min === max) and always set (even when
no valid metric values exist). Also filter Infinity values from the
color scale computation. Widen the filter panel from w-64 to w-80.
Thread onTaskCompleted callback from TaskIndicator through Header to App.tsx
so listings auto-refresh when a background task (e.g. POI distance calculation)
completes. Add AllPOIDistances component to PropertyCard that shows all user
POIs with travel times or — placeholder for missing modes.
- Update Geofabrik download URL from great-britain to united-kingdom
(old path returns 302 redirect to homepage).
- Switch OSRM Docker volumes from named volume to bind mount
(./osrm-data:/data) so osrm-setup.sh output is used directly.
- Add osrm-data/ to .gitignore (large binaries, regenerated by script).
After creating a POI, automatically trigger WALK and BICYCLE distance
calculations (cheap OSRM batch API). TRANSIT is excluded since it uses
the expensive OTP backend — users trigger it manually via the calculator
button. Failure is non-fatal: the POI is still created and calculation
can be retried manually.
- Fix OSRM client to use semicolons (not commas) for source/destination
indices in /table API requests. Commas caused "Query string malformed"
errors for any batch with more than one origin.
- Add error handling in poi_distance_calculator for unreachable routing
engines (OSRM/OTP). Connection failures now log an error and skip the
mode instead of crashing the entire Celery task.
Deployments and Services for osrm-foot (256-512MB), osrm-bicycle
(256-512MB), and OTP (1-2GB). Includes PVCs for data storage and an
init Job to download and pre-process Greater London OSM data.
POIManager component in FilterPanel for creating/deleting POIs and
triggering distance calculations. PropertyCard shows travel time badges
(walk/cycle/transit) per POI. Map renders POI locations as red markers.
API client extended with POST body support for POI endpoints.
Adds osrm-foot, osrm-bicycle, and otp services to Docker Compose under
a 'routing' profile (opt-in). Setup scripts download Greater London OSM
data and pre-process for OSRM foot/bicycle profiles, plus TfL GTFS for
OTP transit. Routing engine env vars added to .env.sample.
FastAPI router with CRUD endpoints for POIs, distance calculation
trigger, and distance queries. Streaming GeoJSON endpoint now accepts
include_poi_distances=true to inject travel times into features.
Celery task wraps the distance calculator with progress reporting.
POIRepository handles all database operations for POIs and distances
including upsert, cascading delete, and skip-on-recompute via
get_existing_distance_keys(). POI service provides unified high-level
functions shared by both CLI and API.
Introduces PointOfInterest (per-user named locations with lat/lng) and
POIDistance (travel time/distance per listing+POI+mode triple) SQLModel
entities, plus an Alembic migration to create both tables with indexes
and a composite unique constraint.
Per-user rate limits via Redis sliding window, IP-restricted /metrics
endpoint, audit logging of all requests, CORS tightening, and export
caps on listing/geojson endpoints.
The listing processor was hardcoded to create RentListing objects and
query only the rentlisting table. Buy listings fetched from Rightmove
were stored in the wrong table with missing fields. This threads
ListingType through ListingProcessor and all Step subclasses so the
correct model (RentListing/BuyListing) is created, the correct table
is queried, and buy-specific fields (service_charge, lease_left) are
parsed from the API response and included in GeoJSON streaming output.
Images are now tagged with both :latest and :${DRONE_BUILD_NUMBER}.
The deploy step uses JSON Patch to set the container image to the
specific build number tag, making deployments deterministic and
compatible with Terraform (which should ignore_changes on the image).
The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.
Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
The metric dropdown only updated form state, requiring "Apply Filters"
to take effect — which re-fetched identical data and remounted the Map,
making the result look unchanged. Now the Select's onValueChange
immediately propagates the metric to queryParameters, so the Map
re-renders in-place with the new color scheme, legend, and hex values.
Replace single-layer fade-out/swap/fade-in with a dual-layer (A/B)
cross-fade so old hexagons dissolve while new ones appear simultaneously.
Pans still do a direct data swap on the active layer since hex positions
are origin-anchored and stable.
Fix click/hover handlers to query both layers via queryRenderedFeatures,
and use moveLayer to keep the active layer on top so stale features from
the faded-out layer don't intercept clicks.
- Skip fly-to animation on initial data load (fitBounds duration: 0)
- Replace turf.hexGrid with origin-anchored manual hex generation so
hexagons stay stable across pan/zoom instead of flickering
- Quantize zoom to integers so hex pattern only changes at discrete levels
- Scale search radius with cell size so low-zoom hexagons find their data
- Make hexagons clickable with pointer cursor; each hex shows the exact
listings it aggregated via stored search bounds
- Remove unused SEARCH_BUFFER constant
Split runtime apt-get into a separate stage so BuildKit runs it
concurrently with the builder's apt-get + pip install. Add cache_from
to the Drone CI API build step to reuse layers from the previous image.
Drop transformers, matplotlib, geopy, and diskcache (~560MB) from
dependencies — none are imported in the codebase. Replace single-stage
poetry-based Docker build with a two-stage pip-based build: builder
stage installs into a venv from an exported requirements.txt, runtime
stage copies only the venv and app code. Eliminates poetry from the
image (fixes ARM segfaults) and removes ~200MB of build tools from
the final image.
Enable users to sign up and sign in using passkeys (biometrics/security
keys) without needing a manually-created Authentik account. The existing
SSO login remains as an alternative.
Backend:
- Add WebAuthn registration/authentication endpoints via py-webauthn
- Issue HS256 JWTs for passkey users, with Redis-backed challenge storage
- Dual JWT verification in auth middleware (issuer-based routing: passkey
HS256 vs Authentik RS256)
- PasskeyCredential model + migration making user.password nullable
- UserRepository with full CRUD for users and credentials
Frontend:
- AuthUser type abstraction unifying OIDC and passkey users
- Passkey service using @simplewebauthn/browser for WebAuthn ceremonies
- LoginModal redesigned with Sign In / Sign Up tabs
- Type migration from oidc-client-ts User to AuthUser across all services
and components
Containerize the frontend dev server (Vite) and add a Caddy reverse
proxy for HTTPS termination, replacing the manual local setup. The
Caddy config proxies /api/* to the backend and everything else to the
frontend dev server.
Also simplify start.sh: remove --local Poetry mode, extract
get_compose_cmd helper, and document new services and DEV_HOST env var.
DEV_HOST env var now controls the hostname, defaulting to localhost.
This makes the dev setup work on any machine without requiring a
specific DNS entry.
Replace the sequential fetch-all-then-process pipeline with a streaming
architecture where listing processing starts as soon as IDs become
available from each subquery. A producer task fetches pages and enqueues
new IDs (filtered inline against DB), while 20 consumer workers process
listings concurrently from the queue.
- Add ListingRepository.get_listing_ids() for fast ID-only projection
- Refactor listing_tasks.py: remove get_ids_to_process/dump_listings_and_monitor,
replace with unified producer/worker/monitor pipeline
- Apply same pattern to CLI path in listing_fetcher.py
- Remove 'filtering' phase from frontend, show combined fetch+process metrics
- Add fetching_done flag to TaskResult for phase transition tracking
The previous approach monkey-patched task.update_state on the Celery
Task instance, but the assignment didn't take effect (Celery's Task
singleton may prevent instance method shadowing). Additionally, the
celery.task logger level was left at WARNING by Celery's worker setup,
silencing all INFO-level log capture.
Fix:
- Replace wrapper with _update_task_state() helper that directly injects
logs from a module-level _active_log_buffer into every meta dict
- Attach TaskLogHandler to BOTH celery.task and uvicorn.error loggers
- Force both loggers to INFO level during task execution
- Every task.update_state call site now uses _update_task_state()
- Add phase-aware progress reporting across all crawl phases (splitting,
fetching, filtering, processing) with per-step counters
- Add TaskProgressDrawer component with phase timeline stepper, detail
counters, progress bar with ETA, and live worker log viewer
- Add on_step_complete callback to ListingProcessor for granular tracking
of details/images/OCR steps
- Extend QuerySplitter on_progress callback with structured counter data
- Capture celery worker logs via ring buffer handler and inject into task
state updates for frontend display
- Guard taskResult updates with phase presence check to prevent drawer
from blanking during state transitions
- Remove hard-coded limit=1000 default from listing_geojson and streaming
endpoints, allowing all matching results to be returned
- Add Redis caching service (db=2, 30min TTL) that caches query results
as Redis Lists for fast re-queries with reduced DB load
- Integrate cache into streaming endpoint: serve from cache on hit,
populate cache on miss during DB streaming
- Invalidate cache after scrape completes (both success and no-new-listings)
- Replace ScrollArea with react-virtuoso in ListView for virtual scrolling,
keeping only ~20-30 DOM nodes regardless of list size
- Handle metadata streaming message to show "0 / N" progress from start
- Throttle frontend state updates with requestAnimationFrame to prevent
UI jank from rapid re-renders during cached response streaming
The get_ids_to_process function was using set union instead of set
difference, causing it to return all existing listing IDs along with
new ones. This meant:
1. When there were no new listings, the task would iterate through all
existing listings, find nothing to process for each, and complete
almost instantly
2. The task showed no progress because processing was too fast
Fixed by:
- Changed `all_listing_ids.union(identifiers)` to `identifiers - all_listing_ids`
to only return IDs that are NOT already in the database
- Added explicit check for empty set with informative task state
"No new listings found" so users understand why the task completed quickly