5.4 KiB
5.4 KiB
Real Estate Crawler - Backend Documentation
A property listing aggregator that scrapes Rightmove UK, extracts square meters via OCR, and calculates transit routes.
Quick Start
# Docker (recommended) - starts Redis, MySQL, API, and Celery
./start.sh
# Or run locally with Poetry
poetry install
./start.sh --local
API available at http://localhost:5001
Dependencies
| Dependency | Purpose |
|---|---|
| Python 3.11+ | Runtime |
| Redis | Celery message broker |
| MySQL/SQLite | Database |
| Tesseract OCR | Floorplan text extraction |
| Docker | Containerized deployment |
Python Packages (key)
fastapi+uvicorn- HTTP APIcelery- Background taskssqlmodel- ORMpytesseract+opencv- OCRaiohttp- Async HTTP client
API Endpoints
Health Check
curl http://localhost:5001/api/status
# {"status": "OK"}
Get Listings
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:5001/api/listing?limit=10"
Get Listings as GeoJSON
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:5001/api/listing_geojson?listing_type=RENT&min_bedrooms=2&max_price=3000"
Refresh Listings (async)
curl -X POST -H "Authorization: Bearer $TOKEN" \
"http://localhost:5001/api/refresh_listings?listing_type=RENT&min_bedrooms=2&max_bedrooms=3&min_price=2000&max_price=4000"
# {"task_id": "abc123", "message": "Task abc123 started"}
Check Task Status
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:5001/api/task_status?task_id=abc123"
# {"task_id": "abc123", "status": "SUCCESS", "result": "..."}
Get Districts
curl -H "Authorization: Bearer $TOKEN" \
"http://localhost:5001/api/get_districts"
# {"Westminster": "REGION^93965", "Camden": "REGION^93934", ...}
CLI Commands
# Fetch listings from Rightmove
python main.py dump-listings -t rent --min-bedrooms 2 --max-price 4000
# Download floorplan images
python main.py dump-images
# Run OCR on floorplans
python main.py detect-floorplan
# Calculate transit routes
python main.py routing -d "10 Downing Street, London" -m TRANSIT -l 10
# Export to GeoJSON
python main.py export-immoweb -O output.geojson -t rent --min-bedrooms 2
# Export to CSV
python main.py export-csv -O output.csv -t rent
# List available districts
python main.py list-districts
Query Parameters
| Parameter | Type | Description |
|---|---|---|
listing_type |
RENT/BUY | Property type |
min_bedrooms |
int | Minimum bedrooms |
max_bedrooms |
int | Maximum bedrooms |
min_price |
int | Minimum price |
max_price |
int | Maximum price |
min_sqm |
int | Minimum square meters |
district |
string | District name (repeatable) |
furnish_types |
string | FURNISHED/UNFURNISHED/PART_FURNISHED |
last_seen_days |
int | Only listings seen in last N days |
Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ CLI │ │ HTTP API │ │ Celery │
│ (main.py) │ │ (api/app.py)│ │ Worker │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────────┼───────────────────┘
│
┌────────▼────────┐
│ Services │
│ (services/*.py) │
└────────┬────────┘
│
┌────────────┼────────────┐
│ │ │
┌──────▼──────┐ ┌───▼───┐ ┌──────▼──────┐
│ Repository │ │ Redis │ │ Rightmove │
│ (MySQL) │ │ │ │ API │
└─────────────┘ └───────┘ └─────────────┘
Environment Variables
# Database
DB_CONNECTION_STRING=mysql://user:pass@localhost:3306/wrongmove
# Redis (Celery)
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
# Google Maps (optional, for routing)
ROUTING_API_KEY=your_api_key
Authentication
API endpoints (except /api/status) require JWT authentication via Authentik OIDC.
# Get token from Authentik, then:
curl -H "Authorization: Bearer $TOKEN" http://localhost:5001/api/listing
Project Structure
├── main.py # CLI entry point
├── api/app.py # FastAPI application
├── services/ # Business logic (shared by CLI + API)
│ ├── listing_service.py
│ ├── export_service.py
│ ├── district_service.py
│ └── task_service.py
├── repositories/ # Database access
├── models/ # SQLModel entities
├── rec/ # Core logic (query, OCR, routing)
├── tasks/ # Celery background tasks
└── tests/ # Test suite
Running Tests
pytest tests/ -v --cov=.
mypy .