wrongmove/.claude/CLAUDE.md
Viktor Barzin c2acbf5d2e CI: migrate Docker build/push from Woodpecker to GitHub Actions
Was: Woodpecker built+pushed to DockerHub, then `kubectl set image` patched
the four Deployments to a pinned numeric tag. With Deployments pinned to
:51 (immutable tag), Keel polled forever and never saw a digest bump — and
no DockerHub pull-secret meant Keel hit 401 on the private repo at every
poll. The 4-Deployment setup also had a latent ImagePullBackOff risk: if a
node was replaced, fresh pulls would fail.

Now: GHA builds+pushes (.github/workflows/build-{api,frontend}.yml) on push
to master. Cluster Deployments reference :latest with an imagePullSecret
sourced from Vault via ESO (codified in infra/stacks/real-estate-crawler/
main.tf, separate commit). Keel polls :latest, sees the new digest after
each GHA build, and rolls all four Deployments.

- .github/workflows/build-api.yml: pytest (unit + integration/regression/
  e2e/test_listing_geojson) + buildx push viktorbarzin/realestatecrawler
  to {<8-char-sha>, latest}.
- .github/workflows/build-frontend.yml: vitest (all 4 ex-shards in one
  run) + Vite build with VITE_MAPBOX_TOKEN from GHA secret + buildx push
  viktorbarzin/immoweb to {<8-char-sha>, latest}.
- .woodpecker/{api,frontend}.yml renamed to
  .woodpecker/build-fallback-{api,frontend}.yml with `event: deployment`
  so they no longer fire on push — kept as manual-only fallback if GHA
  is down (CLAUDE.md convention from the 10 already-migrated projects).
- .claude/CLAUDE.md: Git Workflow section updated to reflect GHA as
  primary + the dockerhub-pull-secret wiring.

GHA repo secrets DOCKERHUB_TOKEN and MAPBOX_TOKEN populated from Vault
fields viktor.dockerhub_registry_password and ci/global.wrongmove-mapbox-token
respectively (DOCKERHUB_USERNAME=viktorbarzin was already set).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 19:11:31 +00:00

4.9 KiB

Realestate Crawler

Project Overview

A real estate listing aggregation platform that scrapes Rightmove UK listings, extracts square meter data from floorplan images via OCR, calculates transit routes, and serves an interactive map-based web UI.

Command Execution

All commands run inside Docker containers. The dev environment uses Docker Compose — start it first, then exec into containers for any operations.

  • Infrastructure commands (docker compose, kubectl) — Run locally on the Mac
  • All project commands (pytest, poetry, alembic, python, mypy, ruff, etc.) — Run inside the app container via docker compose exec app <command>

See .claude/skills/ for detailed skills on dev environment, building, and deploying.

Quick Reference

Action Command
Start all services ./start.sh
Rebuild & start ./start.sh --build
Stop services ./start.sh --down
Run tests pytest tests/ -v --cov=. --cov-report=term-missing
Type check mypy .
Format code yapf --style .style.yapf --recursive .
Lint (CI runs Ruff) ruff check .
DB migration alembic upgrade head
New migration alembic revision -m "description"
Frontend dev cd frontend && ./start.sh

Tech Stack

  • Backend: Python 3.13, FastAPI, SQLModel (SQLAlchemy 2), Celery + Redis, pytesseract/OpenCV
  • Frontend: React 19, TypeScript, Vite, Tailwind CSS, Radix UI, Mapbox GL
  • Database: MySQL 9 (prod) / SQLite (local dev), Alembic migrations
  • Infrastructure: Docker Compose (dev), Kubernetes (prod), Woodpecker CI

Code Conventions

  • Type checking: Strict mypy (disallow_untyped_defs=true). All functions need type annotations.
  • Formatting: YAPF (.style.yapf config) + Ruff linter (CI on PRs).
  • Async: Scraper layer uses async/await throughout with aiohttp.
  • Models: SQLModel for DB entities, Pydantic for request/response validation.
  • Service layer: services/ contains unified handlers used by both CLI and API — add new business logic here, not in API routes or CLI commands directly.
  • Repository pattern: repositories/ for database queries. Don't put raw SQL in services or API.
  • Tests: pytest with pytest-asyncio (auto mode). Unit tests in tests/unit/, integration in tests/integration/.

Architecture Layers

API (api/app.py)  ←→  Services (services/)  ←→  Repositories (repositories/)
     ↑                      ↑                           ↑
  Auth (OIDC)         Core Logic (rec/)            SQLModel (models/)
                           ↑
                    Celery Tasks (tasks/)
  • rec/ — Core scraping, OCR, routing logic. Contains circuit breaker and throttle detection.
  • services/ — Orchestration. Query splitting, listing fetching, caching.
  • api/ — FastAPI routes, auth middleware, metrics.
  • models/RentListing, BuyListing SQLModel entities.
  • tasks/ — Celery background tasks with Redis broker.
  • config/ — Env-var-based configuration (scraper settings, schedules).

Key Design Decisions

  • Query splitting works around Rightmove's ~1500-result API cap by adaptively splitting queries by district, bedrooms, and price bands (binary search). See services/query_splitter.py.
  • Circuit breaker (rec/circuit_breaker.py) and throttle detection (rec/throttle_detector.py) protect against rate limiting.
  • Streaming API uses NDJSON for progressive loading of large result sets.
  • Redis serves dual duty as Celery broker and GeoJSON cache.

Environment Variables

See .env.sample for the full list. Key ones:

  • DB_CONNECTION_STRING — Database URL
  • CELERY_BROKER_URL / CELERY_RESULT_BACKEND — Redis URLs
  • ROUTING_API_KEY — Google Maps API key
  • RIGHTMOVE_* — Scraper tuning (concurrency, delays, thresholds, proxy)
  • SCRAPE_SCHEDULES — JSON array of periodic scrape configs

Git Workflow

  • CI: GitHub Actions (.github/workflows/build-api.yml, build-frontend.yml) builds + pushes Docker images to DockerHub on master push. Keel in the cluster watches :latest on viktorbarzin/realestatecrawler and viktorbarzin/immoweb and rolls the four realestate-crawler-* Deployments on digest change. No Woodpecker deploy POST — Keel is the rollout mechanism.
  • Pull-secret on the namespace: dockerhub-pull-secret, synced from Vault secret/viktor.dockerhub_registry_password via ExternalSecret (codified in infra/stacks/real-estate-crawler/main.tf). Required because the DockerHub repos are private.
  • Fallback: .woodpecker/build-fallback-{api,frontend}.yml (event: deployment, manual-only) preserves the in-cluster build path if GHA is down.
  • Linting: GitHub Actions runs Ruff on PR diffs (.github/workflows/ruff.yml).
  • Keep commits focused — one logical change per commit.
  • Group related files (e.g., code + its tests) in the same commit.

Directories to Ignore

  • node_modules/, __pycache__/, .idea/, data/, venv/, _cache/