Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/
The crawler subdirectory was the only active project. Moving it to the repo root simplifies paths and removes the unnecessary nesting. The vqa/ and immoweb/ directories were legacy/unused and have been removed. Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect the new flat structure.
This commit is contained in:
parent
e2247be700
commit
eafbc1ac52
221 changed files with 70 additions and 146140 deletions
|
|
@ -2,11 +2,7 @@
|
|||
|
||||
## Project Overview
|
||||
|
||||
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI. The repo contains three sub-projects:
|
||||
|
||||
- **`crawler/`** — Main application (Python backend + React frontend). Has its own `CLAUDE.md` with detailed architecture docs.
|
||||
- **`immoweb/`** — Separate scraper (Node.js, legacy/reference).
|
||||
- **`vqa/`** — Visual QA / testing tooling.
|
||||
A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI.
|
||||
|
||||
## Command Execution
|
||||
|
||||
|
|
@ -19,18 +15,18 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
|
|||
|
||||
## Quick Reference
|
||||
|
||||
| Action | Command | Where |
|
||||
|--------|---------|-------|
|
||||
| Start all services | `./start.sh` | `crawler/` |
|
||||
| Rebuild & start | `./start.sh --build` | `crawler/` |
|
||||
| Stop services | `./start.sh --down` | `crawler/` |
|
||||
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` | `crawler/` |
|
||||
| Type check | `mypy .` | `crawler/` |
|
||||
| Format code | `yapf --style .style.yapf --recursive .` | `crawler/` |
|
||||
| Lint (CI runs Ruff) | `ruff check .` | `crawler/` |
|
||||
| DB migration | `alembic upgrade head` | `crawler/` |
|
||||
| New migration | `alembic revision -m "description"` | `crawler/` |
|
||||
| Frontend dev | `cd frontend && ./start.sh` | `crawler/` |
|
||||
| Action | Command |
|
||||
|--------|---------|
|
||||
| Start all services | `./start.sh` |
|
||||
| Rebuild & start | `./start.sh --build` |
|
||||
| Stop services | `./start.sh --down` |
|
||||
| Run tests | `pytest tests/ -v --cov=. --cov-report=term-missing` |
|
||||
| Type check | `mypy .` |
|
||||
| Format code | `yapf --style .style.yapf --recursive .` |
|
||||
| Lint (CI runs Ruff) | `ruff check .` |
|
||||
| DB migration | `alembic upgrade head` |
|
||||
| New migration | `alembic revision -m "description"` |
|
||||
| Frontend dev | `cd frontend && ./start.sh` |
|
||||
|
||||
## Tech Stack
|
||||
|
||||
|
|
@ -49,7 +45,7 @@ See `.claude/skills/` for detailed skills on dev environment, building, and depl
|
|||
- **Repository pattern:** `repositories/` for database queries. Don't put raw SQL in services or API.
|
||||
- **Tests:** pytest with `pytest-asyncio` (auto mode). Unit tests in `tests/unit/`, integration in `tests/integration/`.
|
||||
|
||||
## Architecture Layers (in `crawler/`)
|
||||
## Architecture Layers
|
||||
|
||||
```
|
||||
API (api/app.py) ←→ Services (services/) ←→ Repositories (repositories/)
|
||||
|
|
@ -75,7 +71,7 @@ API (api/app.py) ←→ Services (services/) ←→ Repositories (repositori
|
|||
|
||||
## Environment Variables
|
||||
|
||||
See `crawler/.env.sample` for the full list. Key ones:
|
||||
See `.env.sample` for the full list. Key ones:
|
||||
|
||||
- `DB_CONNECTION_STRING` — Database URL
|
||||
- `CELERY_BROKER_URL` / `CELERY_RESULT_BACKEND` — Redis URLs
|
||||
|
|
@ -92,4 +88,4 @@ See `crawler/.env.sample` for the full list. Key ones:
|
|||
|
||||
## Directories to Ignore
|
||||
|
||||
- `node_modules/`, `__pycache__/`, `.idea/`, `crawler/data/`, `venv/`, `_cache/`
|
||||
- `node_modules/`, `__pycache__/`, `.idea/`, `data/`, `venv/`, `_cache/`
|
||||
|
|
|
|||
|
|
@ -18,28 +18,28 @@ All commands run locally. Images are pushed to Docker Hub under the `viktorbarzi
|
|||
|
||||
| Component | Docker Hub repo | Dockerfile location |
|
||||
|-----------|------------------------------------|------------------------------|
|
||||
| API | `viktorbarzin/realestatecrawler` | `crawler/Dockerfile` |
|
||||
| Frontend | `viktorbarzin/immoweb` | `crawler/frontend/Dockerfile`|
|
||||
| API | `viktorbarzin/realestatecrawler` | `Dockerfile` |
|
||||
| Frontend | `viktorbarzin/immoweb` | `frontend/Dockerfile` |
|
||||
|
||||
## Building Images
|
||||
|
||||
### Build API image
|
||||
|
||||
```bash
|
||||
docker build -t viktorbarzin/realestatecrawler:latest crawler/
|
||||
docker build -t viktorbarzin/realestatecrawler:latest .
|
||||
```
|
||||
|
||||
### Build Frontend image
|
||||
|
||||
```bash
|
||||
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
|
||||
docker build -t viktorbarzin/immoweb:latest frontend/
|
||||
```
|
||||
|
||||
### Build both
|
||||
|
||||
```bash
|
||||
docker build -t viktorbarzin/realestatecrawler:latest crawler/ && \
|
||||
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
|
||||
docker build -t viktorbarzin/realestatecrawler:latest . && \
|
||||
docker build -t viktorbarzin/immoweb:latest frontend/
|
||||
```
|
||||
|
||||
### Build with a specific tag (recommended for production)
|
||||
|
|
@ -47,8 +47,8 @@ docker build -t viktorbarzin/immoweb:latest crawler/frontend/
|
|||
```bash
|
||||
# Use git commit SHA
|
||||
GIT_SHA=$(git rev-parse --short HEAD)
|
||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
|
||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
|
||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
|
||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
|
||||
```
|
||||
|
||||
## Pushing Images
|
||||
|
|
@ -87,8 +87,8 @@ docker push viktorbarzin/immoweb:latest
|
|||
GIT_SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# Build
|
||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
|
||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
|
||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
|
||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
|
||||
|
||||
# Push
|
||||
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
|
||||
|
|
|
|||
|
|
@ -43,12 +43,12 @@ kubectl rollout restart deployment/realestate-crawler-api deployment/realestate-
|
|||
GIT_SHA=$(git rev-parse --short HEAD)
|
||||
|
||||
# Build and push API
|
||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
|
||||
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest .
|
||||
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
|
||||
docker push viktorbarzin/realestatecrawler:latest
|
||||
|
||||
# Build and push Frontend
|
||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
|
||||
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest frontend/
|
||||
docker push viktorbarzin/immoweb:${GIT_SHA}
|
||||
docker push viktorbarzin/immoweb:latest
|
||||
|
||||
|
|
|
|||
|
|
@ -12,7 +12,7 @@ date: 2026-02-06
|
|||
|
||||
# Dev Environment Management
|
||||
|
||||
Docker Compose orchestrates the dev environment locally from `crawler/`. All project
|
||||
Docker Compose orchestrates the dev environment locally. All project
|
||||
commands (pytest, alembic, mypy, python, etc.) must run inside the `app` container
|
||||
via `docker compose exec app <command>`. Only docker/kubectl commands run on the host.
|
||||
|
||||
|
|
@ -20,85 +20,85 @@ via `docker compose exec app <command>`. Only docker/kubectl commands run on the
|
|||
|
||||
```bash
|
||||
# Start all services (Redis, MySQL, API, Celery worker, Celery beat)
|
||||
cd crawler && docker compose up
|
||||
docker compose up
|
||||
|
||||
# Start in detached mode (background)
|
||||
cd crawler && docker compose up -d
|
||||
docker compose up -d
|
||||
|
||||
# Rebuild images and start (after Dockerfile or dependency changes)
|
||||
cd crawler && docker compose up --build
|
||||
docker compose up --build
|
||||
|
||||
# Or use the start.sh helper
|
||||
cd crawler && ./start.sh # foreground
|
||||
cd crawler && ./start.sh --build # rebuild first
|
||||
./start.sh # foreground
|
||||
./start.sh --build # rebuild first
|
||||
```
|
||||
|
||||
## Stopping the Dev Environment
|
||||
|
||||
```bash
|
||||
cd crawler && docker compose down
|
||||
docker compose down
|
||||
|
||||
# Also remove volumes (fresh database, fresh Redis)
|
||||
cd crawler && docker compose down -v
|
||||
docker compose down -v
|
||||
|
||||
# Or use the helper
|
||||
cd crawler && ./start.sh --down
|
||||
./start.sh --down
|
||||
```
|
||||
|
||||
## Checking Status
|
||||
|
||||
```bash
|
||||
# List running containers
|
||||
cd crawler && docker compose ps
|
||||
docker compose ps
|
||||
|
||||
# Check health status
|
||||
cd crawler && docker compose ps --format "table {{.Name}}\t{{.Status}}"
|
||||
docker compose ps --format "table {{.Name}}\t{{.Status}}"
|
||||
```
|
||||
|
||||
## Viewing Logs
|
||||
|
||||
```bash
|
||||
# Follow all service logs
|
||||
cd crawler && docker compose logs -f
|
||||
docker compose logs -f
|
||||
|
||||
# Follow specific service logs
|
||||
cd crawler && docker compose logs -f app
|
||||
cd crawler && docker compose logs -f celery
|
||||
cd crawler && docker compose logs -f celery-beat
|
||||
cd crawler && docker compose logs -f mysql
|
||||
cd crawler && docker compose logs -f redis
|
||||
docker compose logs -f app
|
||||
docker compose logs -f celery
|
||||
docker compose logs -f celery-beat
|
||||
docker compose logs -f mysql
|
||||
docker compose logs -f redis
|
||||
|
||||
# Or use the helper
|
||||
cd crawler && ./start.sh --logs
|
||||
./start.sh --logs
|
||||
```
|
||||
|
||||
## Restarting Individual Services
|
||||
|
||||
```bash
|
||||
# Restart just the API (e.g., after config change)
|
||||
cd crawler && docker compose restart app
|
||||
docker compose restart app
|
||||
|
||||
# Restart Celery worker
|
||||
cd crawler && docker compose restart celery
|
||||
docker compose restart celery
|
||||
|
||||
# Rebuild and restart a single service
|
||||
cd crawler && docker compose up --build app
|
||||
docker compose up --build app
|
||||
```
|
||||
|
||||
## Running Database Migrations
|
||||
|
||||
```bash
|
||||
# Apply pending migrations
|
||||
cd crawler && docker compose exec app alembic upgrade head
|
||||
docker compose exec app alembic upgrade head
|
||||
|
||||
# Create a new migration
|
||||
cd crawler && docker compose exec app alembic revision -m "description"
|
||||
docker compose exec app alembic revision -m "description"
|
||||
```
|
||||
|
||||
## Running Tests Inside Container
|
||||
|
||||
```bash
|
||||
cd crawler && docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
|
||||
docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
|
||||
```
|
||||
|
||||
## Running Any Command Inside Container
|
||||
|
|
@ -107,14 +107,14 @@ All project commands must be run inside the `app` container:
|
|||
|
||||
```bash
|
||||
# General pattern
|
||||
cd crawler && docker compose exec app <command>
|
||||
docker compose exec app <command>
|
||||
|
||||
# Examples
|
||||
cd crawler && docker compose exec app python main.py dump-listings --type rent
|
||||
cd crawler && docker compose exec app mypy .
|
||||
cd crawler && docker compose exec app ruff check .
|
||||
cd crawler && docker compose exec app poetry install
|
||||
cd crawler && docker compose exec app bash # interactive shell
|
||||
docker compose exec app python main.py dump-listings --type rent
|
||||
docker compose exec app mypy .
|
||||
docker compose exec app ruff check .
|
||||
docker compose exec app poetry install
|
||||
docker compose exec app bash # interactive shell
|
||||
```
|
||||
|
||||
## Services and Ports
|
||||
|
|
@ -130,7 +130,7 @@ cd crawler && docker compose exec app bash # interactive shell
|
|||
## Environment Variables
|
||||
|
||||
Key env vars are set in `docker-compose.yml`. To override locally, create a `.env` file
|
||||
in `crawler/` (see `.env.sample`). Key overrides:
|
||||
(see `.env.sample`). Key overrides:
|
||||
|
||||
- `ROUTING_API_KEY` - Google Maps API key (passed from host env)
|
||||
- `SCRAPE_SCHEDULES` - JSON array of periodic scrape configs (passed from host env)
|
||||
|
|
|
|||
|
|
@ -21,8 +21,8 @@ steps:
|
|||
password:
|
||||
from_secret: dockerhub-token
|
||||
repo: viktorbarzin/immoweb
|
||||
dockerfile: crawler/frontend/Dockerfile
|
||||
context: crawler/frontend
|
||||
dockerfile: frontend/Dockerfile
|
||||
context: frontend
|
||||
auto_tag: true
|
||||
|
||||
- name: Update deployment
|
||||
|
|
@ -55,8 +55,8 @@ steps:
|
|||
password:
|
||||
from_secret: dockerhub-token
|
||||
repo: viktorbarzin/realestatecrawler
|
||||
dockerfile: crawler/Dockerfile
|
||||
context: crawler/
|
||||
dockerfile: Dockerfile
|
||||
context: .
|
||||
auto_tag: true
|
||||
cache_from: viktorbarzin/realestatecrawler:latest
|
||||
|
||||
|
|
|
|||
14
.gitignore
vendored
14
.gitignore
vendored
|
|
@ -7,16 +7,16 @@ sqlite.db
|
|||
.DS_Store
|
||||
.ipynb_checkpoints/
|
||||
**.env
|
||||
crawler/data/wrongmove.db
|
||||
crawler/data/
|
||||
data/wrongmove.db
|
||||
data/
|
||||
|
||||
crawler/frontend/caddy_dev/**
|
||||
frontend/caddy_dev/**
|
||||
|
||||
# Runtime artifacts
|
||||
crawler/celerybeat-schedule
|
||||
crawler/celerybeat-schedule-shm
|
||||
crawler/celerybeat-schedule-wal
|
||||
celerybeat-schedule
|
||||
celerybeat-schedule-shm
|
||||
celerybeat-schedule-wal
|
||||
*.log
|
||||
*.ipynb
|
||||
._*
|
||||
crawler/create_commits.sh
|
||||
create_commits.sh
|
||||
|
|
|
|||
|
Before Width: | Height: | Size: 1.5 KiB After Width: | Height: | Size: 1.5 KiB |
|
Before Width: | Height: | Size: 4 KiB After Width: | Height: | Size: 4 KiB |
Some files were not shown because too many files have changed in this diff Show more
Loading…
Add table
Add a link
Reference in a new issue