Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/

The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.

Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
This commit is contained in:
Viktor Barzin 2026-02-07 23:01:20 +00:00
parent e2247be700
commit eafbc1ac52
No known key found for this signature in database
GPG key ID: 0EB088298288D958
221 changed files with 70 additions and 146140 deletions

View file

@ -2,11 +2,7 @@
## Project Overview
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI. The repo contains three sub-projects:
- **`crawler/`** — Main application (Python backend + React frontend). Has its own `CLAUDE.md` with detailed architecture docs.
- **`immoweb/`** — Separate scraper (Node.js, legacy/reference).
- **`vqa/`** — Visual QA / testing tooling.
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI.
## Command Execution
@ -19,18 +15,18 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
## Quick Reference
| Action | Command | Where |
|--------|---------|-------|
| Start all services | `./start.sh` | `crawler/` |
| Rebuild & start | `./start.sh --build` | `crawler/` |
| Stop services | `./start.sh --down` | `crawler/` |
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` | `crawler/` |
| Type check | `mypy .` | `crawler/` |
| Format code | `yapf --style .style.yapf --recursive .` | `crawler/` |
| Lint (CI runs Ruff) | `ruff check .` | `crawler/` |
| DB migration | `alembic upgrade head` | `crawler/` |
| New migration | `alembic revision -m "description"` | `crawler/` |
| Frontend dev | `cd frontend && ./start.sh` | `crawler/` |
| Action | Command |
|--------|---------|
| Start all services | `./start.sh` |
| Rebuild & start | `./start.sh --build` |
| Stop services | `./start.sh --down` |
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` |
| Type check | `mypy .` |
| Format code | `yapf --style .style.yapf --recursive .` |
| Lint (CI runs Ruff) | `ruff check .` |
| DB migration | `alembic upgrade head` |
| New migration | `alembic revision -m "description"` |
| Frontend dev | `cd frontend && ./start.sh` |
## Tech Stack
@ -49,7 +45,7 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
- **Repository pattern:** `repositories/` for database queries. Don't put raw SQL in services or API.
- **Tests:** pytest with `pytest-asyncio` (auto mode). Unit tests in `tests/unit/`, integration in `tests/integration/`.
## Architecture Layers (in `crawler/`)
## Architecture Layers
```
API (api/app.py) ←→ Services (services/) ←→ Repositories (repositories/)
@ -75,7 +71,7 @@ API (api/app.py) ←→ Services (services/) ←→ Repositories (repositori
## Environment Variables
See `crawler/.env.sample` for the full list. Key ones:
See `.env.sample` for the full list. Key ones:
- `DB_CONNECTION_STRING` — Database URL
- `CELERY_BROKER_URL` / `CELERY_RESULT_BACKEND` — Redis URLs
@ -92,4 +88,4 @@ See `crawler/.env.sample` for the full list. Key ones:
## Directories to Ignore
- `node_modules/`, `__pycache__/`, `.idea/`, `crawler/data/`, `venv/`, `_cache/`
- `node_modules/`, `__pycache__/`, `.idea/`, `data/`, `venv/`, `_cache/`

View file

@ -18,28 +18,28 @@ All commands run locally. Images are pushed to Docker Hub under the `viktorbarzi
| Component | Docker Hub repo | Dockerfile location |
|-----------|------------------------------------|------------------------------|
| API | `viktorbarzin/realestatecrawler` | `crawler/Dockerfile` |
| Frontend | `viktorbarzin/immoweb` | `crawler/frontend/Dockerfile`|
| API | `viktorbarzin/realestatecrawler` | `Dockerfile` |
| Frontend | `viktorbarzin/immoweb` | `frontend/Dockerfile` |
## Building Images
### Build API image
```bash
docker build -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/realestatecrawler:latest .
```
### Build Frontend image
```bash
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/immoweb:latest frontend/
```
### Build both
```bash
docker build -t viktorbarzin/realestatecrawler:latest crawler/ && \
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/realestatecrawler:latest . && \
docker build -t viktorbarzin/immoweb:latest frontend/
```
### Build with a specific tag (recommended for production)
@ -47,8 +47,8 @@ docker build -t viktorbarzin/immoweb:latest crawler/frontend/
```bash
# Use git commit SHA
GIT_SHA=$(git rev-parse --short HEAD)
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
```
## Pushing Images
@ -87,8 +87,8 @@ docker push viktorbarzin/immoweb:latest
GIT_SHA=$(git rev-parse --short HEAD)
# Build
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
# Push
docker push viktorbarzin/realestatecrawler:${GIT_SHA}

View file

@ -43,12 +43,12 @@ kubectl rollout restart deployment/realestate-crawler-api deployment/realestate-
GIT_SHA=$(git rev-parse --short HEAD)
# Build and push API
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
docker push viktorbarzin/realestatecrawler:latest
# Build and push Frontend
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
docker push viktorbarzin/immoweb:${GIT_SHA}
docker push viktorbarzin/immoweb:latest

View file

@ -12,7 +12,7 @@ date: 2026-02-06
# Dev Environment Management
Docker Compose orchestrates the dev environment locally from `crawler/`. All project
Docker Compose orchestrates the dev environment locally. All project
commands (pytest, alembic, mypy, python, etc.) must run inside the `app` container
via `docker compose exec app <command>`. Only docker/kubectl commands run on the host.
@ -20,85 +20,85 @@ via `docker compose exec app <command>`. Only docker/kubectl commands run on the
```bash
# Start all services (Redis, MySQL, API, Celery worker, Celery beat)
cd crawler && docker compose up
docker compose up
# Start in detached mode (background)
cd crawler && docker compose up -d
docker compose up -d
# Rebuild images and start (after Dockerfile or dependency changes)
cd crawler && docker compose up --build
docker compose up --build
# Or use the start.sh helper
cd crawler && ./start.sh # foreground
cd crawler && ./start.sh --build # rebuild first
./start.sh # foreground
./start.sh --build # rebuild first
```
## Stopping the Dev Environment
```bash
cd crawler && docker compose down
docker compose down
# Also remove volumes (fresh database, fresh Redis)
cd crawler && docker compose down -v
docker compose down -v
# Or use the helper
cd crawler && ./start.sh --down
./start.sh --down
```
## Checking Status
```bash
# List running containers
cd crawler && docker compose ps
docker compose ps
# Check health status
cd crawler && docker compose ps --format "table {{.Name}}\t{{.Status}}"
docker compose ps --format "table {{.Name}}\t{{.Status}}"
```
## Viewing Logs
```bash
# Follow all service logs
cd crawler && docker compose logs -f
docker compose logs -f
# Follow specific service logs
cd crawler && docker compose logs -f app
cd crawler && docker compose logs -f celery
cd crawler && docker compose logs -f celery-beat
cd crawler && docker compose logs -f mysql
cd crawler && docker compose logs -f redis
docker compose logs -f app
docker compose logs -f celery
docker compose logs -f celery-beat
docker compose logs -f mysql
docker compose logs -f redis
# Or use the helper
cd crawler && ./start.sh --logs
./start.sh --logs
```
## Restarting Individual Services
```bash
# Restart just the API (e.g., after config change)
cd crawler && docker compose restart app
docker compose restart app
# Restart Celery worker
cd crawler && docker compose restart celery
docker compose restart celery
# Rebuild and restart a single service
cd crawler && docker compose up --build app
docker compose up --build app
```
## Running Database Migrations
```bash
# Apply pending migrations
cd crawler && docker compose exec app alembic upgrade head
docker compose exec app alembic upgrade head
# Create a new migration
cd crawler && docker compose exec app alembic revision -m "description"
docker compose exec app alembic revision -m "description"
```
## Running Tests Inside Container
```bash
cd crawler && docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
```
## Running Any Command Inside Container
@ -107,14 +107,14 @@ All project commands must be run inside the `app` container:
```bash
# General pattern
cd crawler && docker compose exec app <command>
docker compose exec app <command>
# Examples
cd crawler && docker compose exec app python main.py dump-listings --type rent
cd crawler && docker compose exec app mypy .
cd crawler && docker compose exec app ruff check .
cd crawler && docker compose exec app poetry install
cd crawler && docker compose exec app bash # interactive shell
docker compose exec app python main.py dump-listings --type rent
docker compose exec app mypy .
docker compose exec app ruff check .
docker compose exec app poetry install
docker compose exec app bash # interactive shell
```
## Services and Ports
@ -130,7 +130,7 @@ cd crawler && docker compose exec app bash # interactive shell
## Environment Variables
Key env vars are set in `docker-compose.yml`. To override locally, create a `.env` file
in `crawler/` (see `.env.sample`). Key overrides:
(see `.env.sample`). Key overrides:
- `ROUTING_API_KEY` - Google Maps API key (passed from host env)
- `SCRAPE_SCHEDULES` - JSON array of periodic scrape configs (passed from host env)

View file

@ -21,8 +21,8 @@ steps:
password:
from_secret: dockerhub-token
repo: viktorbarzin/immoweb
dockerfile: crawler/frontend/Dockerfile
context: crawler/frontend
dockerfile: frontend/Dockerfile
context: frontend
auto_tag: true
- name: Update deployment
@ -55,8 +55,8 @@ steps:
password:
from_secret: dockerhub-token
repo: viktorbarzin/realestatecrawler
dockerfile: crawler/Dockerfile
context: crawler/
dockerfile: Dockerfile
context: .
auto_tag: true
cache_from: viktorbarzin/realestatecrawler:latest

14
.gitignore vendored
View file

@ -7,16 +7,16 @@ sqlite.db
.DS_Store
.ipynb_checkpoints/
**.env
crawler/data/wrongmove.db
crawler/data/
data/wrongmove.db
data/
crawler/frontend/caddy_dev/**
frontend/caddy_dev/**
# Runtime artifacts
crawler/celerybeat-schedule
crawler/celerybeat-schedule-shm
crawler/celerybeat-schedule-wal
celerybeat-schedule
celerybeat-schedule-shm
celerybeat-schedule-wal
*.log
*.ipynb
._*
crawler/create_commits.sh
create_commits.sh

View file

Before

Width:  |  Height:  |  Size: 1.5 KiB

After

Width:  |  Height:  |  Size: 1.5 KiB

Before After
Before After

View file

Before

Width:  |  Height:  |  Size: 4 KiB

After

Width:  |  Height:  |  Size: 4 KiB

Before After
Before After

Some files were not shown because too many files have changed in this diff Show more