Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/
The crawler subdirectory was the only active project. Moving it to the repo root simplifies paths and removes the unnecessary nesting. The vqa/ and immoweb/ directories were legacy/unused and have been removed. Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect the new flat structure.
This commit is contained in:
parent
e2247be700
commit
eafbc1ac52
221 changed files with 70 additions and 146140 deletions
|
|
@ -2,11 +2,7 @@
|
||||||
|
|
||||||
## Project Overview
|
## Project Overview
|
||||||
|
|
||||||
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI. The repo contains three sub-projects:
|
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI.
|
||||||
|
|
||||||
- **`crawler/`** — Main application (Python backend + React frontend). Has its own `CLAUDE.md` with detailed architecture docs.
|
|
||||||
- **`immoweb/`** — Separate scraper (Node.js, legacy/reference).
|
|
||||||
- **`vqa/`** — Visual QA / testing tooling.
|
|
||||||
|
|
||||||
## Command Execution
|
## Command Execution
|
||||||
|
|
||||||
|
|
@ -19,18 +15,18 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
|
||||||
|
|
||||||
## Quick Reference
|
## Quick Reference
|
||||||
|
|
||||||
| Action | Command | Where |
|
| Action | Command |
|
||||||
|--------|---------|-------|
|
|--------|---------|
|
||||||
| Start all services | `./start.sh` | `crawler/` |
|
| Start all services | `./start.sh` |
|
||||||
| Rebuild & start | `./start.sh --build` | `crawler/` |
|
| Rebuild & start | `./start.sh --build` |
|
||||||
| Stop services | `./start.sh --down` | `crawler/` |
|
| Stop services | `./start.sh --down` |
|
||||||
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` | `crawler/` |
|
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` |
|
||||||
| Type check | `mypy .` | `crawler/` |
|
| Type check | `mypy .` |
|
||||||
| Format code | `yapf --style .style.yapf --recursive .` | `crawler/` |
|
| Format code | `yapf --style .style.yapf --recursive .` |
|
||||||
| Lint (CI runs Ruff) | `ruff check .` | `crawler/` |
|
| Lint (CI runs Ruff) | `ruff check .` |
|
||||||
| DB migration | `alembic upgrade head` | `crawler/` |
|
| DB migration | `alembic upgrade head` |
|
||||||
| New migration | `alembic revision -m "description"` | `crawler/` |
|
| New migration | `alembic revision -m "description"` |
|
||||||
| Frontend dev | `cd frontend && ./start.sh` | `crawler/` |
|
| Frontend dev | `cd frontend && ./start.sh` |
|
||||||
|
|
||||||
## Tech Stack
|
## Tech Stack
|
||||||
|
|
||||||
|
|
@ -49,7 +45,7 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
|
||||||
- **Repository pattern:** `repositories/` for database queries. Don't put raw SQL in services or API.
|
- **Repository pattern:** `repositories/` for database queries. Don't put raw SQL in services or API.
|
||||||
- **Tests:** pytest with `pytest-asyncio` (auto mode). Unit tests in `tests/unit/`, integration in `tests/integration/`.
|
- **Tests:** pytest with `pytest-asyncio` (auto mode). Unit tests in `tests/unit/`, integration in `tests/integration/`.
|
||||||
|
|
||||||
## Architecture Layers (in `crawler/`)
|
## Architecture Layers
|
||||||
|
|
||||||
```
|
```
|
||||||
API (api/app.py) ←→ Services (services/) ←→ Repositories (repositories/)
|
API (api/app.py) ←→ Services (services/) ←→ Repositories (repositories/)
|
||||||
|
|
@ -75,7 +71,7 @@ API (api/app.py) ←→ Services (services/) ←→ Repositories (repositori
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
See `crawler/.env.sample` for the full list. Key ones:
|
See `.env.sample` for the full list. Key ones:
|
||||||
|
|
||||||
- `DB_CONNECTION_STRING` — Database URL
|
- `DB_CONNECTION_STRING` — Database URL
|
||||||
- `CELERY_BROKER_URL` / `CELERY_RESULT_BACKEND` — Redis URLs
|
- `CELERY_BROKER_URL` / `CELERY_RESULT_BACKEND` — Redis URLs
|
||||||
|
|
@ -92,4 +88,4 @@ See `crawler/.env.sample` for the full list. Key ones:
|
||||||
|
|
||||||
## Directories to Ignore
|
## Directories to Ignore
|
||||||
|
|
||||||
- `node_modules/`, `__pycache__/`, `.idea/`, `crawler/data/`, `venv/`, `_cache/`
|
- `node_modules/`, `__pycache__/`, `.idea/`, `data/`, `venv/`, `_cache/`
|
||||||
|
|
|
||||||
|
|
@ -18,28 +18,28 @@ All commands run locally. Images are pushed to Docker Hub under the `viktorbarzi
|
||||||
|
|
||||||
| Component | Docker Hub repo | Dockerfile location |
|
| Component | Docker Hub repo | Dockerfile location |
|
||||||
|-----------|------------------------------------|------------------------------|
|
|-----------|------------------------------------|------------------------------|
|
||||||
| API | `viktorbarzin/realestatecrawler` | `crawler/Dockerfile` |
|
| API | `viktorbarzin/realestatecrawler` | `Dockerfile` |
|
||||||
| Frontend | `viktorbarzin/immoweb` | `crawler/frontend/Dockerfile`|
|
| Frontend | `viktorbarzin/immoweb` | `frontend/Dockerfile` |
|
||||||
|
|
||||||
## Building Images
|
## Building Images
|
||||||
|
|
||||||
### Build API image
|
### Build API image
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker build -t viktorbarzin/realestatecrawler:latest crawler/
|
docker build -t viktorbarzin/realestatecrawler:latest .
|
||||||
```
|
```
|
||||||
|
|
||||||
### Build Frontend image
|
### Build Frontend image
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
|
docker build -t viktorbarzin/immoweb:latest frontend/
|
||||||
```
|
```
|
||||||
|
|
||||||
### Build both
|
### Build both
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker build -t viktorbarzin/realestatecrawler:latest crawler/ && \
|
docker build -t viktorbarzin/realestatecrawler:latest . && \
|
||||||
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
|
docker build -t viktorbarzin/immoweb:latest frontend/
|
||||||
```
|
```
|
||||||
|
|
||||||
### Build with a specific tag (recommended for production)
|
### Build with a specific tag (recommended for production)
|
||||||
|
|
@ -47,8 +47,8 @@ docker build -t viktorbarzin/immoweb:latest crawler/frontend/
|
||||||
```bash
|
```bash
|
||||||
# Use git commit SHA
|
# Use git commit SHA
|
||||||
GIT_SHA=$(git rev-parse --short HEAD)
|
GIT_SHA=$(git rev-parse --short HEAD)
|
||||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
|
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
|
||||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
|
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
|
||||||
```
|
```
|
||||||
|
|
||||||
## Pushing Images
|
## Pushing Images
|
||||||
|
|
@ -87,8 +87,8 @@ docker push viktorbarzin/immoweb:latest
|
||||||
GIT_SHA=$(git rev-parse --short HEAD)
|
GIT_SHA=$(git rev-parse --short HEAD)
|
||||||
|
|
||||||
# Build
|
# Build
|
||||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
|
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
|
||||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
|
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
|
||||||
|
|
||||||
# Push
|
# Push
|
||||||
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
|
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
|
||||||
|
|
|
||||||
|
|
@ -43,12 +43,12 @@ kubectl rollout restart deployment/realestate-crawler-api deployment/realestate-
|
||||||
GIT_SHA=$(git rev-parse --short HEAD)
|
GIT_SHA=$(git rev-parse --short HEAD)
|
||||||
|
|
||||||
# Build and push API
|
# Build and push API
|
||||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
|
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
|
||||||
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
|
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
|
||||||
docker push viktorbarzin/realestatecrawler:latest
|
docker push viktorbarzin/realestatecrawler:latest
|
||||||
|
|
||||||
# Build and push Frontend
|
# Build and push Frontend
|
||||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
|
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
|
||||||
docker push viktorbarzin/immoweb:${GIT_SHA}
|
docker push viktorbarzin/immoweb:${GIT_SHA}
|
||||||
docker push viktorbarzin/immoweb:latest
|
docker push viktorbarzin/immoweb:latest
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -12,7 +12,7 @@ date: 2026-02-06
|
||||||
|
|
||||||
# Dev Environment Management
|
# Dev Environment Management
|
||||||
|
|
||||||
Docker Compose orchestrates the dev environment locally from `crawler/`. All project
|
Docker Compose orchestrates the dev environment locally. All project
|
||||||
commands (pytest, alembic, mypy, python, etc.) must run inside the `app` container
|
commands (pytest, alembic, mypy, python, etc.) must run inside the `app` container
|
||||||
via `docker compose exec app <command>`. Only docker/kubectl commands run on the host.
|
via `docker compose exec app <command>`. Only docker/kubectl commands run on the host.
|
||||||
|
|
||||||
|
|
@ -20,85 +20,85 @@ via `docker compose exec app <command>`. Only docker/kubectl commands run on the
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Start all services (Redis, MySQL, API, Celery worker, Celery beat)
|
# Start all services (Redis, MySQL, API, Celery worker, Celery beat)
|
||||||
cd crawler && docker compose up
|
docker compose up
|
||||||
|
|
||||||
# Start in detached mode (background)
|
# Start in detached mode (background)
|
||||||
cd crawler && docker compose up -d
|
docker compose up -d
|
||||||
|
|
||||||
# Rebuild images and start (after Dockerfile or dependency changes)
|
# Rebuild images and start (after Dockerfile or dependency changes)
|
||||||
cd crawler && docker compose up --build
|
docker compose up --build
|
||||||
|
|
||||||
# Or use the start.sh helper
|
# Or use the start.sh helper
|
||||||
cd crawler && ./start.sh # foreground
|
./start.sh # foreground
|
||||||
cd crawler && ./start.sh --build # rebuild first
|
./start.sh --build # rebuild first
|
||||||
```
|
```
|
||||||
|
|
||||||
## Stopping the Dev Environment
|
## Stopping the Dev Environment
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd crawler && docker compose down
|
docker compose down
|
||||||
|
|
||||||
# Also remove volumes (fresh database, fresh Redis)
|
# Also remove volumes (fresh database, fresh Redis)
|
||||||
cd crawler && docker compose down -v
|
docker compose down -v
|
||||||
|
|
||||||
# Or use the helper
|
# Or use the helper
|
||||||
cd crawler && ./start.sh --down
|
./start.sh --down
|
||||||
```
|
```
|
||||||
|
|
||||||
## Checking Status
|
## Checking Status
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# List running containers
|
# List running containers
|
||||||
cd crawler && docker compose ps
|
docker compose ps
|
||||||
|
|
||||||
# Check health status
|
# Check health status
|
||||||
cd crawler && docker compose ps --format "table {{.Name}}\t{{.Status}}"
|
docker compose ps --format "table {{.Name}}\t{{.Status}}"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Viewing Logs
|
## Viewing Logs
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Follow all service logs
|
# Follow all service logs
|
||||||
cd crawler && docker compose logs -f
|
docker compose logs -f
|
||||||
|
|
||||||
# Follow specific service logs
|
# Follow specific service logs
|
||||||
cd crawler && docker compose logs -f app
|
docker compose logs -f app
|
||||||
cd crawler && docker compose logs -f celery
|
docker compose logs -f celery
|
||||||
cd crawler && docker compose logs -f celery-beat
|
docker compose logs -f celery-beat
|
||||||
cd crawler && docker compose logs -f mysql
|
docker compose logs -f mysql
|
||||||
cd crawler && docker compose logs -f redis
|
docker compose logs -f redis
|
||||||
|
|
||||||
# Or use the helper
|
# Or use the helper
|
||||||
cd crawler && ./start.sh --logs
|
./start.sh --logs
|
||||||
```
|
```
|
||||||
|
|
||||||
## Restarting Individual Services
|
## Restarting Individual Services
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Restart just the API (e.g., after config change)
|
# Restart just the API (e.g., after config change)
|
||||||
cd crawler && docker compose restart app
|
docker compose restart app
|
||||||
|
|
||||||
# Restart Celery worker
|
# Restart Celery worker
|
||||||
cd crawler && docker compose restart celery
|
docker compose restart celery
|
||||||
|
|
||||||
# Rebuild and restart a single service
|
# Rebuild and restart a single service
|
||||||
cd crawler && docker compose up --build app
|
docker compose up --build app
|
||||||
```
|
```
|
||||||
|
|
||||||
## Running Database Migrations
|
## Running Database Migrations
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Apply pending migrations
|
# Apply pending migrations
|
||||||
cd crawler && docker compose exec app alembic upgrade head
|
docker compose exec app alembic upgrade head
|
||||||
|
|
||||||
# Create a new migration
|
# Create a new migration
|
||||||
cd crawler && docker compose exec app alembic revision -m "description"
|
docker compose exec app alembic revision -m "description"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Running Tests Inside Container
|
## Running Tests Inside Container
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd crawler && docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
|
docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
|
||||||
```
|
```
|
||||||
|
|
||||||
## Running Any Command Inside Container
|
## Running Any Command Inside Container
|
||||||
|
|
@ -107,14 +107,14 @@ All project commands must be run inside the `app` container:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# General pattern
|
# General pattern
|
||||||
cd crawler && docker compose exec app <command>
|
docker compose exec app <command>
|
||||||
|
|
||||||
# Examples
|
# Examples
|
||||||
cd crawler && docker compose exec app python main.py dump-listings --type rent
|
docker compose exec app python main.py dump-listings --type rent
|
||||||
cd crawler && docker compose exec app mypy .
|
docker compose exec app mypy .
|
||||||
cd crawler && docker compose exec app ruff check .
|
docker compose exec app ruff check .
|
||||||
cd crawler && docker compose exec app poetry install
|
docker compose exec app poetry install
|
||||||
cd crawler && docker compose exec app bash # interactive shell
|
docker compose exec app bash # interactive shell
|
||||||
```
|
```
|
||||||
|
|
||||||
## Services and Ports
|
## Services and Ports
|
||||||
|
|
@ -130,7 +130,7 @@ cd crawler && docker compose exec app bash # interactive shell
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
Key env vars are set in `docker-compose.yml`. To override locally, create a `.env` file
|
Key env vars are set in `docker-compose.yml`. To override locally, create a `.env` file
|
||||||
in `crawler/` (see `.env.sample`). Key overrides:
|
(see `.env.sample`). Key overrides:
|
||||||
|
|
||||||
- `ROUTING_API_KEY` - Google Maps API key (passed from host env)
|
- `ROUTING_API_KEY` - Google Maps API key (passed from host env)
|
||||||
- `SCRAPE_SCHEDULES` - JSON array of periodic scrape configs (passed from host env)
|
- `SCRAPE_SCHEDULES` - JSON array of periodic scrape configs (passed from host env)
|
||||||
|
|
|
||||||
|
|
@ -21,8 +21,8 @@ steps:
|
||||||
password:
|
password:
|
||||||
from_secret: dockerhub-token
|
from_secret: dockerhub-token
|
||||||
repo: viktorbarzin/immoweb
|
repo: viktorbarzin/immoweb
|
||||||
dockerfile: crawler/frontend/Dockerfile
|
dockerfile: frontend/Dockerfile
|
||||||
context: crawler/frontend
|
context: frontend
|
||||||
auto_tag: true
|
auto_tag: true
|
||||||
|
|
||||||
- name: Update deployment
|
- name: Update deployment
|
||||||
|
|
@ -55,8 +55,8 @@ steps:
|
||||||
password:
|
password:
|
||||||
from_secret: dockerhub-token
|
from_secret: dockerhub-token
|
||||||
repo: viktorbarzin/realestatecrawler
|
repo: viktorbarzin/realestatecrawler
|
||||||
dockerfile: crawler/Dockerfile
|
dockerfile: Dockerfile
|
||||||
context: crawler/
|
context: .
|
||||||
auto_tag: true
|
auto_tag: true
|
||||||
cache_from: viktorbarzin/realestatecrawler:latest
|
cache_from: viktorbarzin/realestatecrawler:latest
|
||||||
|
|
||||||
|
|
|
||||||
14
.gitignore
vendored
14
.gitignore
vendored
|
|
@ -7,16 +7,16 @@ sqlite.db
|
||||||
.DS_Store
|
.DS_Store
|
||||||
.ipynb_checkpoints/
|
.ipynb_checkpoints/
|
||||||
**.env
|
**.env
|
||||||
crawler/data/wrongmove.db
|
data/wrongmove.db
|
||||||
crawler/data/
|
data/
|
||||||
|
|
||||||
crawler/frontend/caddy_dev/**
|
frontend/caddy_dev/**
|
||||||
|
|
||||||
# Runtime artifacts
|
# Runtime artifacts
|
||||||
crawler/celerybeat-schedule
|
celerybeat-schedule
|
||||||
crawler/celerybeat-schedule-shm
|
celerybeat-schedule-shm
|
||||||
crawler/celerybeat-schedule-wal
|
celerybeat-schedule-wal
|
||||||
*.log
|
*.log
|
||||||
*.ipynb
|
*.ipynb
|
||||||
._*
|
._*
|
||||||
crawler/create_commits.sh
|
create_commits.sh
|
||||||
|
|
|
||||||
|
Before Width: | Height: | Size: 1.5 KiB After Width: | Height: | Size: 1.5 KiB |
|
Before Width: | Height: | Size: 4 KiB After Width: | Height: | Size: 4 KiB |
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue