This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
A real estate listing crawler and aggregator that scrapes property listings from Rightmove UK, extracts square meter data from floorplan images using OCR, calculates transit routes, and provides a web UI for browsing listings.
alembic revision -m "description" # Create new migration
```
### Code Formatting
```bash
yapf --style .style.yapf --recursive .
```
## Architecture
### Core Data Flow
1.**Scraping** (`rec/query.py`): Fetches listing IDs and details from Rightmove's Android API
2.**Processing** (`listing_processor.py`): Pipeline with steps for fetching details, downloading images, and OCR detection
3.**Storage**: SQLModel/SQLAlchemy with MySQL or SQLite, plus JSON files in `data/rs/<listing_id>/`
4.**API** (`api/app.py`): FastAPI endpoints authenticated via JWT from external Authentik service
5.**Background Tasks** (`tasks/listing_tasks.py`): Celery tasks for async listing processing with Redis broker
### Key Models
-`models/listing.py`: SQLModel entities (`RentListing`, `BuyListing`) with `QueryParameters` for filtering
-`data_access.py`: **DEPRECATED** - Legacy `Listing` dataclass for filesystem-based data access. Use `models.listing.RentListing` or `models.listing.BuyListing` instead.
### Services Layer (Unified CLI and API)
**IMPORTANT**: The `services/` directory contains unified handler functions that both the CLI and HTTP API use. This ensures consistency and code reuse.
#### High-level services (use these in CLI and API):
- **`listing_service.py`**: Listing operations
-`get_listings()` - Retrieve listings from database
-`refresh_listings()` - Fetch new listings from Rightmove (sync or async)
-`download_images()` - Download floorplan images
-`detect_floorplans()` - Run OCR on floorplans
-`calculate_routes()` - Calculate transit routes
- **`export_service.py`**: Export operations
-`export_to_csv()` - Export listings to CSV file
-`export_to_geojson()` - Export listings to GeoJSON (file or in-memory)
- **`district_service.py`**: District management
-`get_all_districts()` - Get district name → region ID mapping
-`get_district_names()` - Get list of district names
-`listing_fetcher.py`: Fetches listing data from Rightmove API
-`image_fetcher.py`: Downloads floorplan images
-`floorplan_detector.py`: OCR-based square meter detection
-`route_calculator.py`: Calculates transit routes using Google Maps API
-`query_splitter.py`: Intelligent query splitting to maximize data extraction
### Query Splitting System
Rightmove's API caps search results at ~1,500 listings per query. The query splitting system works around this limitation to fetch **all matching listings**.
#### How it works:
1.**Initial Split**: Queries are split by district and bedroom count
2.**Probe**: Each subquery is probed (minimal API request) to get `totalAvailableResults`
3.**Adaptive Split**: If results exceed threshold (1,200), the price range is binary-split
4.**Recursive Refinement**: Splitting continues until all subqueries are under threshold
5.**Full Fetch**: Each subquery fetches up to 60 pages (1,500 results max)