# Real Estate Crawler - Backend Documentation A property listing aggregator that scrapes Rightmove UK, extracts square meters via OCR, and calculates transit routes. ## Quick Start ```bash # Docker (recommended) - starts Redis, MySQL, API, and Celery ./start.sh # Or run locally with Poetry poetry install ./start.sh --local ``` API available at `http://localhost:5001` ## Dependencies | Dependency | Purpose | |------------|---------| | Python 3.11+ | Runtime | | Redis | Celery message broker | | MySQL/SQLite | Database | | Tesseract OCR | Floorplan text extraction | | Docker | Containerized deployment | ### Python Packages (key) - `fastapi` + `uvicorn` - HTTP API - `celery` - Background tasks - `sqlmodel` - ORM - `pytesseract` + `opencv` - OCR - `aiohttp` - Async HTTP client ## API Endpoints ### Health Check ```bash curl http://localhost:5001/api/status # {"status": "OK"} ``` ### Get Listings ```bash curl -H "Authorization: Bearer $TOKEN" \ "http://localhost:5001/api/listing?limit=10" ``` ### Get Listings as GeoJSON ```bash curl -H "Authorization: Bearer $TOKEN" \ "http://localhost:5001/api/listing_geojson?listing_type=RENT&min_bedrooms=2&max_price=3000" ``` ### Refresh Listings (async) ```bash curl -X POST -H "Authorization: Bearer $TOKEN" \ "http://localhost:5001/api/refresh_listings?listing_type=RENT&min_bedrooms=2&max_bedrooms=3&min_price=2000&max_price=4000" # {"task_id": "abc123", "message": "Task abc123 started"} ``` ### Check Task Status ```bash curl -H "Authorization: Bearer $TOKEN" \ "http://localhost:5001/api/task_status?task_id=abc123" # {"task_id": "abc123", "status": "SUCCESS", "result": "..."} ``` ### Get Districts ```bash curl -H "Authorization: Bearer $TOKEN" \ "http://localhost:5001/api/get_districts" # {"Westminster": "REGION^93965", "Camden": "REGION^93934", ...} ``` ## CLI Commands ```bash # Fetch listings from Rightmove python main.py dump-listings -t rent --min-bedrooms 2 --max-price 4000 # Download floorplan images python main.py dump-images # Run OCR on floorplans python main.py detect-floorplan # Calculate transit routes python main.py routing -d "10 Downing Street, London" -m TRANSIT -l 10 # Export to GeoJSON python main.py export-immoweb -O output.geojson -t rent --min-bedrooms 2 # Export to CSV python main.py export-csv -O output.csv -t rent # List available districts python main.py list-districts ``` ## Query Parameters | Parameter | Type | Description | |-----------|------|-------------| | `listing_type` | RENT/BUY | Property type | | `min_bedrooms` | int | Minimum bedrooms | | `max_bedrooms` | int | Maximum bedrooms | | `min_price` | int | Minimum price | | `max_price` | int | Maximum price | | `min_sqm` | int | Minimum square meters | | `district` | string | District name (repeatable) | | `furnish_types` | string | FURNISHED/UNFURNISHED/PART_FURNISHED | | `last_seen_days` | int | Only listings seen in last N days | ## Architecture ``` ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ CLI │ │ HTTP API │ │ Celery │ │ (main.py) │ │ (api/app.py)│ │ Worker │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └───────────────────┼───────────────────┘ │ ┌────────▼────────┐ │ Services │ │ (services/*.py) │ └────────┬────────┘ │ ┌────────────┼────────────┐ │ │ │ ┌──────▼──────┐ ┌───▼───┐ ┌──────▼──────┐ │ Repository │ │ Redis │ │ Rightmove │ │ (MySQL) │ │ │ │ API │ └─────────────┘ └───────┘ └─────────────┘ ``` ## Environment Variables ```bash # Database DB_CONNECTION_STRING=mysql://user:pass@localhost:3306/wrongmove # Redis (Celery) CELERY_BROKER_URL=redis://localhost:6379/0 CELERY_RESULT_BACKEND=redis://localhost:6379/0 # Google Maps (optional, for routing) ROUTING_API_KEY=your_api_key ``` ## Authentication API endpoints (except `/api/status`) require JWT authentication via Authentik OIDC. ```bash # Get token from Authentik, then: curl -H "Authorization: Bearer $TOKEN" http://localhost:5001/api/listing ``` ## Project Structure ``` ├── main.py # CLI entry point ├── api/app.py # FastAPI application ├── services/ # Business logic (shared by CLI + API) │ ├── listing_service.py │ ├── export_service.py │ ├── district_service.py │ └── task_service.py ├── repositories/ # Database access ├── models/ # SQLModel entities ├── rec/ # Core logic (query, OCR, routing) ├── tasks/ # Celery background tasks └── tests/ # Test suite ``` ## Running Tests ```bash pytest tests/ -v --cov=. mypy . ```