Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/

The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.

Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
This commit is contained in:
Viktor Barzin 2026-02-07 23:01:20 +00:00
parent e2247be700
commit eafbc1ac52
No known key found for this signature in database
GPG key ID: 0EB088298288D958
221 changed files with 70 additions and 146140 deletions

View file

@ -2,11 +2,7 @@
## Project Overview
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI. The repo contains three sub-projects:
- **`crawler/`** — Main application (Python backend + React frontend). Has its own `CLAUDE.md` with detailed architecture docs.
- **`immoweb/`** — Separate scraper (Node.js, legacy/reference).
- **`vqa/`** — Visual QA / testing tooling.
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI.
## Command Execution
@ -19,18 +15,18 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
## Quick Reference
| Action | Command | Where |
|--------|---------|-------|
| Start all services | `./start.sh` | `crawler/` |
| Rebuild & start | `./start.sh --build` | `crawler/` |
| Stop services | `./start.sh --down` | `crawler/` |
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` | `crawler/` |
| Type check | `mypy .` | `crawler/` |
| Format code | `yapf --style .style.yapf --recursive .` | `crawler/` |
| Lint (CI runs Ruff) | `ruff check .` | `crawler/` |
| DB migration | `alembic upgrade head` | `crawler/` |
| New migration | `alembic revision -m "description"` | `crawler/` |
| Frontend dev | `cd frontend && ./start.sh` | `crawler/` |
| Action | Command |
|--------|---------|
| Start all services | `./start.sh` |
| Rebuild & start | `./start.sh --build` |
| Stop services | `./start.sh --down` |
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` |
| Type check | `mypy .` |
| Format code | `yapf --style .style.yapf --recursive .` |
| Lint (CI runs Ruff) | `ruff check .` |
| DB migration | `alembic upgrade head` |
| New migration | `alembic revision -m "description"` |
| Frontend dev | `cd frontend && ./start.sh` |
## Tech Stack
@ -49,7 +45,7 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
- **Repository pattern:** `repositories/` for database queries. Don't put raw SQL in services or API.
- **Tests:** pytest with `pytest-asyncio` (auto mode). Unit tests in `tests/unit/`, integration in `tests/integration/`.
## Architecture Layers (in `crawler/`)
## Architecture Layers
```
API (api/app.py) ←→ Services (services/) ←→ Repositories (repositories/)
@ -75,7 +71,7 @@ API (api/app.py) ←→ Services (services/) ←→ Repositories (repositori
## Environment Variables
See `crawler/.env.sample` for the full list. Key ones:
See `.env.sample` for the full list. Key ones:
- `DB_CONNECTION_STRING` — Database URL
- `CELERY_BROKER_URL` / `CELERY_RESULT_BACKEND` — Redis URLs
@ -92,4 +88,4 @@ See `crawler/.env.sample` for the full list. Key ones:
## Directories to Ignore
- `node_modules/`, `__pycache__/`, `.idea/`, `crawler/data/`, `venv/`, `_cache/`
- `node_modules/`, `__pycache__/`, `.idea/`, `data/`, `venv/`, `_cache/`

View file

@ -18,28 +18,28 @@ All commands run locally. Images are pushed to Docker Hub under the `viktorbarzi
| Component | Docker Hub repo | Dockerfile location |
|-----------|------------------------------------|------------------------------|
| API | `viktorbarzin/realestatecrawler` | `crawler/Dockerfile` |
| Frontend | `viktorbarzin/immoweb` | `crawler/frontend/Dockerfile`|
| API | `viktorbarzin/realestatecrawler` | `Dockerfile` |
| Frontend | `viktorbarzin/immoweb` | `frontend/Dockerfile` |
## Building Images
### Build API image
```bash
docker build -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/realestatecrawler:latest .
```
### Build Frontend image
```bash
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/immoweb:latest frontend/
```
### Build both
```bash
docker build -t viktorbarzin/realestatecrawler:latest crawler/ && \
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/realestatecrawler:latest . && \
docker build -t viktorbarzin/immoweb:latest frontend/
```
### Build with a specific tag (recommended for production)
@ -47,8 +47,8 @@ docker build -t viktorbarzin/immoweb:latest crawler/frontend/
```bash
# Use git commit SHA
GIT_SHA=$(git rev-parse --short HEAD)
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
```
## Pushing Images
@ -87,8 +87,8 @@ docker push viktorbarzin/immoweb:latest
GIT_SHA=$(git rev-parse --short HEAD)
# Build
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
# Push
docker push viktorbarzin/realestatecrawler:${GIT_SHA}

View file

@ -43,12 +43,12 @@ kubectl rollout restart deployment/realestate-crawler-api deployment/realestate-
GIT_SHA=$(git rev-parse --short HEAD)
# Build and push API
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
docker push viktorbarzin/realestatecrawler:latest
# Build and push Frontend
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
docker push viktorbarzin/immoweb:${GIT_SHA}
docker push viktorbarzin/immoweb:latest

View file

@ -12,7 +12,7 @@ date: 2026-02-06
# Dev Environment Management
Docker Compose orchestrates the dev environment locally from `crawler/`. All project
Docker Compose orchestrates the dev environment locally. All project
commands (pytest, alembic, mypy, python, etc.) must run inside the `app` container
via `docker compose exec app <command>`. Only docker/kubectl commands run on the host.
@ -20,85 +20,85 @@ via `docker compose exec app <command>`. Only docker/kubectl commands run on the
```bash
# Start all services (Redis, MySQL, API, Celery worker, Celery beat)
cd crawler && docker compose up
docker compose up
# Start in detached mode (background)
cd crawler && docker compose up -d
docker compose up -d
# Rebuild images and start (after Dockerfile or dependency changes)
cd crawler && docker compose up --build
docker compose up --build
# Or use the start.sh helper
cd crawler && ./start.sh # foreground
cd crawler && ./start.sh --build # rebuild first
./start.sh # foreground
./start.sh --build # rebuild first
```
## Stopping the Dev Environment
```bash
cd crawler && docker compose down
docker compose down
# Also remove volumes (fresh database, fresh Redis)
cd crawler && docker compose down -v
docker compose down -v
# Or use the helper
cd crawler && ./start.sh --down
./start.sh --down
```
## Checking Status
```bash
# List running containers
cd crawler && docker compose ps
docker compose ps
# Check health status
cd crawler && docker compose ps --format "table {{.Name}}\t{{.Status}}"
docker compose ps --format "table {{.Name}}\t{{.Status}}"
```
## Viewing Logs
```bash
# Follow all service logs
cd crawler && docker compose logs -f
docker compose logs -f
# Follow specific service logs
cd crawler && docker compose logs -f app
cd crawler && docker compose logs -f celery
cd crawler && docker compose logs -f celery-beat
cd crawler && docker compose logs -f mysql
cd crawler && docker compose logs -f redis
docker compose logs -f app
docker compose logs -f celery
docker compose logs -f celery-beat
docker compose logs -f mysql
docker compose logs -f redis
# Or use the helper
cd crawler && ./start.sh --logs
./start.sh --logs
```
## Restarting Individual Services
```bash
# Restart just the API (e.g., after config change)
cd crawler && docker compose restart app
docker compose restart app
# Restart Celery worker
cd crawler && docker compose restart celery
docker compose restart celery
# Rebuild and restart a single service
cd crawler && docker compose up --build app
docker compose up --build app
```
## Running Database Migrations
```bash
# Apply pending migrations
cd crawler && docker compose exec app alembic upgrade head
docker compose exec app alembic upgrade head
# Create a new migration
cd crawler && docker compose exec app alembic revision -m "description"
docker compose exec app alembic revision -m "description"
```
## Running Tests Inside Container
```bash
cd crawler && docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
```
## Running Any Command Inside Container
@ -107,14 +107,14 @@ All project commands must be run inside the `app` container:
```bash
# General pattern
cd crawler && docker compose exec app <command>
docker compose exec app <command>
# Examples
cd crawler && docker compose exec app python main.py dump-listings --type rent
cd crawler && docker compose exec app mypy .
cd crawler && docker compose exec app ruff check .
cd crawler && docker compose exec app poetry install
cd crawler && docker compose exec app bash # interactive shell
docker compose exec app python main.py dump-listings --type rent
docker compose exec app mypy .
docker compose exec app ruff check .
docker compose exec app poetry install
docker compose exec app bash # interactive shell
```
## Services and Ports
@ -130,7 +130,7 @@ cd crawler && docker compose exec app bash # interactive shell
## Environment Variables
Key env vars are set in `docker-compose.yml`. To override locally, create a `.env` file
in `crawler/` (see `.env.sample`). Key overrides:
(see `.env.sample`). Key overrides:
- `ROUTING_API_KEY` - Google Maps API key (passed from host env)
- `SCRAPE_SCHEDULES` - JSON array of periodic scrape configs (passed from host env)

View file

@ -0,0 +1,101 @@
---
name: python-313-redis-generic-type
description: |
Fix for "TypeError: <class 'redis.client.Redis'> is not a generic class" when using
redis-py with Python 3.13. Use when: (1) upgrading to Python 3.13 breaks redis type
annotations, (2) mypy passes but runtime fails with generic class error, (3) using
redis.Redis[str] or similar parameterized types. Covers redis-py generic type
compatibility with Python 3.13's stricter runtime generic checking.
author: Claude Code
version: 1.0.0
date: 2026-01-31
---
# Python 3.13 redis.Redis Generic Type Error
## Problem
Python 3.13 introduced stricter runtime checking for generic types. The redis-py library's
`Redis` class is not defined as a generic class at runtime, even though it works with type
checkers like mypy. This causes a `TypeError` when you use parameterized types like
`redis.Redis[str]` in type annotations that are evaluated at runtime.
## Context / Trigger Conditions
- Python 3.13 or later
- Using redis-py library
- Type annotation like `redis_client: redis.Redis[str]`
- Error message: `TypeError: <class 'redis.client.Redis'> is not a generic class`
- Works fine with mypy but fails at runtime
- Often appears when instantiating a class with this annotation
## Solution
### Option 1: Remove the type parameter (Recommended)
```python
# Before (breaks in Python 3.13)
redis_client: redis.Redis[str]
# After (works in all Python versions)
redis_client: redis.Redis # type: ignore[type-arg]
```
The `# type: ignore[type-arg]` comment silences mypy's warning about missing type arguments.
### Option 2: Use string annotation (deferred evaluation)
```python
from __future__ import annotations
redis_client: "redis.Redis[str]" # String annotation, not evaluated at runtime
```
### Option 3: Use TYPE_CHECKING guard
```python
from typing import TYPE_CHECKING
if TYPE_CHECKING:
RedisClient = redis.Redis[str]
else:
RedisClient = redis.Redis
redis_client: RedisClient
```
## Verification
1. Run your application with Python 3.13
2. The TypeError should no longer appear
3. Run mypy to ensure type checking still works (may need type: ignore comment)
## Example
### Before (Broken)
```python
import redis
class RedisRepository:
redis_client: redis.Redis[str] # TypeError at runtime in Python 3.13
def __init__(self):
self.redis_client = redis.Redis(host='localhost', decode_responses=True)
```
### After (Fixed)
```python
import redis
class RedisRepository:
redis_client: redis.Redis # type: ignore[type-arg]
def __init__(self):
self.redis_client = redis.Redis(host='localhost', decode_responses=True)
```
## Notes
- This is a breaking change in Python 3.13's handling of generic types
- The redis-py library may add proper generic support in future versions
- If using `decode_responses=True`, the client returns `str`; otherwise `bytes`
- The `type: ignore` comment is preferable to `Any` as it preserves some type safety
- This issue affects other libraries that aren't properly defined as Generic classes
## References
- [Python 3.13 Release Notes](https://docs.python.org/3.13/whatsnew/3.13.html)
- [redis-py GitHub Issues](https://github.com/redis/redis-py/issues)
- [PEP 585 - Type Hinting Generics In Standard Collections](https://peps.python.org/pep-0585/)

View file

@ -0,0 +1,132 @@
---
name: python-parentheses-comparison-bug
description: |
Debug Python comparison bug where parentheses around a variable cause unexpected behavior.
Use when: (1) condition always evaluates to False/True unexpectedly, (2) code like
"if (mylist) == 0" never triggers, (3) length check seems to not work, (4) comparison
with list/dict returns unexpected results. Common mistake where parentheses cause the
variable itself to be compared instead of its length.
author: Claude Code
version: 1.0.0
date: 2026-01-31
---
# Python Parentheses Comparison Bug
## Problem
A subtle Python bug where unnecessary parentheses around a variable in a comparison
cause the wrong value to be compared. The expression `(mylist) == 0` compares the list
itself to 0, not its length. Since a list is never equal to an integer, this always
returns False.
## Context / Trigger Conditions
- Condition that should sometimes be True is always False (or vice versa)
- Code pattern like `if (existing_items) == 0:` or `if (result) == expected:`
- The parentheses don't cause a syntax error but change semantics
- Often appears when copying/adapting code or during refactoring
- May pass code review because it "looks" correct
## Solution
### Identify the Bug Pattern
```python
# BUG: Compares list to 0, always False
if (existing_listings) == 0:
return True
# Also wrong: compares list to integer
if (items) == 5:
do_something()
```
### Fix: Use len() for Length Comparisons
```python
# CORRECT: Compares length to 0
if len(existing_listings) == 0:
return True
# Alternative: Use truthiness for empty check
if not existing_listings:
return True
# CORRECT: Compares length to integer
if len(items) == 5:
do_something()
```
## Verification
1. Add a debug print before the condition: `print(f"list={existing_listings}, len={len(existing_listings)}")`
2. Verify the condition now evaluates correctly
3. Write a unit test that exercises both branches of the condition
## Example
### Before (Broken)
```python
class FetchListingDetailsStep:
async def needs_processing(self, listing_id: int) -> bool:
existing_listings = await self.listing_repository.get_listings(
only_ids=[listing_id]
)
# BUG: This compares the list object to 0, which is always False
# The parentheses around existing_listings are misleading
if (existing_listings) == 0:
return True
return False
```
### After (Fixed)
```python
class FetchListingDetailsStep:
async def needs_processing(self, listing_id: int) -> bool:
existing_listings = await self.listing_repository.get_listings(
only_ids=[listing_id]
)
# CORRECT: Check if list is empty using len()
if len(existing_listings) == 0:
return True
return False
```
### Even Better (Pythonic)
```python
class FetchListingDetailsStep:
async def needs_processing(self, listing_id: int) -> bool:
existing_listings = await self.listing_repository.get_listings(
only_ids=[listing_id]
)
# Most Pythonic: Use truthiness
return not existing_listings
```
## Notes
- Python's truthiness: empty collections are falsy, non-empty are truthy
- This bug is particularly insidious because:
- It's syntactically valid
- It doesn't raise an exception
- The parentheses make it look intentional
- Code review may miss it
- Linters like pylint or flake8 won't catch this specific pattern
- Type checkers like mypy may warn about comparing incompatible types
- When debugging, add print statements to verify actual vs expected values
## Prevention
- Prefer `if not mylist:` over `if len(mylist) == 0:`
- Prefer `if mylist:` over `if len(mylist) > 0:`
- Remove unnecessary parentheses around single variables
- Enable mypy's strict mode which may catch type comparison issues
- Write unit tests that exercise both branches of conditions
## Related Patterns
```python
# These are all wrong (comparing object to number):
if (mydict) == 0: # Always False
if (mylist) > 0: # TypeError in Python 3
if (mystring) == 0: # Always False
# These are correct:
if len(mydict) == 0: # True if empty
if not mydict: # True if empty (preferred)
if len(mylist) > 0: # True if non-empty
if mylist: # True if non-empty (preferred)
```