Refactor codebase following Clean Code principles and add 229 tests

- Extract helpers to reduce function sizes (listing_tasks, app.py, query.py, listing_fetcher)
  - Replace nonlocal mutations with _PipelineState dataclass in listing_tasks
  - Fix bugs: isinstance→equality check in repository, verify_exp for OIDC tokens
  - Consolidate duplicate filter methods in listing_repository
  - Move hardcoded config to env vars with backward-compatible defaults
  - Simplify CLI decorator to auto-build QueryParameters
  - Add deprecation docstring to data_access.py
  - Test count: 158 → 387 (all passing)
This commit is contained in:
Viktor Barzin 2026-02-07 20:19:57 +00:00
parent 7e05b3c971
commit 150342bb9e
No known key found for this signature in database
GPG key ID: 0EB088298288D958
48 changed files with 5029 additions and 990 deletions

View file

@ -0,0 +1,113 @@
---
name: build-and-push
description: |
Build Docker images for the API and frontend, and push them to Docker Hub.
Use when: (1) user wants to build new Docker images locally, (2) push images
to the registry before deploying, (3) tag images for a release. Covers both
the Python/FastAPI backend and the React/Nginx frontend.
author: Claude Code
version: 1.0.0
date: 2026-02-06
---
# Build and Push Docker Images
All commands run locally. Images are pushed to Docker Hub under the `viktorbarzin` namespace.
## Image Registries
| Component | Docker Hub repo | Dockerfile location |
|-----------|------------------------------------|------------------------------|
| API | `viktorbarzin/realestatecrawler` | `crawler/Dockerfile` |
| Frontend | `viktorbarzin/immoweb` | `crawler/frontend/Dockerfile`|
## Building Images
### Build API image
```bash
docker build -t viktorbarzin/realestatecrawler:latest crawler/
```
### Build Frontend image
```bash
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
```
### Build both
```bash
docker build -t viktorbarzin/realestatecrawler:latest crawler/ && \
docker build -t viktorbarzin/immoweb:latest crawler/frontend/
```
### Build with a specific tag (recommended for production)
```bash
# Use git commit SHA
GIT_SHA=$(git rev-parse --short HEAD)
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
```
## Pushing Images
### Login to Docker Hub (if not already)
```bash
docker login -u viktorbarzin
```
### Push API image
```bash
docker push viktorbarzin/realestatecrawler:latest
```
### Push Frontend image
```bash
docker push viktorbarzin/immoweb:latest
```
### Push with specific tag
```bash
GIT_SHA=$(git rev-parse --short HEAD)
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
docker push viktorbarzin/realestatecrawler:latest
docker push viktorbarzin/immoweb:${GIT_SHA}
docker push viktorbarzin/immoweb:latest
```
## Build and Push Everything (Full Release)
```bash
GIT_SHA=$(git rev-parse --short HEAD)
# Build
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
# Push
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
docker push viktorbarzin/realestatecrawler:latest
docker push viktorbarzin/immoweb:${GIT_SHA}
docker push viktorbarzin/immoweb:latest
```
## CI/CD Note
Drone CI automatically builds and pushes images on push to `master` (see `.drone.yml`).
The manual process above is for when you need to build/push outside of CI, such as:
- Hotfix deployments
- Testing image builds locally before pushing
- Deploying from a non-master branch
## Notes
- The API Dockerfile installs system deps (OpenCV, Tesseract, MariaDB client) and Python deps via Poetry
- The Frontend Dockerfile is a multi-stage build: Node builder -> Nginx runtime
- Always tag with both `:latest` and a specific tag (git SHA or version) for traceability
- Use `docker buildx` for cross-platform builds if deploying to ARM nodes

View file

@ -0,0 +1,158 @@
---
name: deploy-to-kubernetes
description: |
Deploy the realestate-crawler to the Kubernetes cluster. Use when: (1) user wants
to deploy after building new images, (2) rollout restart to pick up new images,
(3) check deployment status, pod health, or logs in production, (4) scale
deployments up or down, (5) debug production issues.
author: Claude Code
version: 1.0.0
date: 2026-02-06
---
# Deploy to Kubernetes
All kubectl commands run locally against the K8s cluster at `10.0.20.100:6443`.
Namespace: `realestate-crawler`.
## Deployments
| Deployment | Image | Component |
|--------------------------|------------------------------------|-----------|
| realestate-crawler-api | viktorbarzin/realestatecrawler | API |
| realestate-crawler-ui | viktorbarzin/immoweb | Frontend |
## Deploying New Images
### Rolling restart (picks up :latest after push)
```bash
# Restart API deployment
kubectl rollout restart deployment/realestate-crawler-api -n realestate-crawler
# Restart Frontend deployment
kubectl rollout restart deployment/realestate-crawler-ui -n realestate-crawler
# Restart both
kubectl rollout restart deployment/realestate-crawler-api deployment/realestate-crawler-ui -n realestate-crawler
```
### Full deploy workflow (build, push, restart)
```bash
GIT_SHA=$(git rev-parse --short HEAD)
# Build and push API
docker build -t viktorbarzin/realestatecrawler:${GIT_SHA} -t viktorbarzin/realestatecrawler:latest crawler/
docker push viktorbarzin/realestatecrawler:${GIT_SHA}
docker push viktorbarzin/realestatecrawler:latest
# Build and push Frontend
docker build -t viktorbarzin/immoweb:${GIT_SHA} -t viktorbarzin/immoweb:latest crawler/frontend/
docker push viktorbarzin/immoweb:${GIT_SHA}
docker push viktorbarzin/immoweb:latest
# Restart deployments to pick up new images
kubectl rollout restart deployment/realestate-crawler-api -n realestate-crawler
kubectl rollout restart deployment/realestate-crawler-ui -n realestate-crawler
```
### Deploy a specific image tag
```bash
# Set API to a specific image version
kubectl set image deployment/realestate-crawler-api \
realestate-crawler-api=viktorbarzin/realestatecrawler:abc1234 \
-n realestate-crawler
# Set Frontend to a specific version
kubectl set image deployment/realestate-crawler-ui \
realestate-crawler-ui=viktorbarzin/immoweb:abc1234 \
-n realestate-crawler
```
## Checking Deployment Status
```bash
# List all resources in namespace
kubectl get all -n realestate-crawler
# Check deployment status
kubectl get deployments -n realestate-crawler
# Check rollout status (waits for completion)
kubectl rollout status deployment/realestate-crawler-api -n realestate-crawler
kubectl rollout status deployment/realestate-crawler-ui -n realestate-crawler
# Check pods
kubectl get pods -n realestate-crawler
# Describe a specific pod (events, conditions, image)
kubectl describe pod <pod-name> -n realestate-crawler
# Check which image a pod is running
kubectl get pods -n realestate-crawler -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[*].image}{"\n"}{end}'
```
## Viewing Logs
```bash
# API logs
kubectl logs deployment/realestate-crawler-api -n realestate-crawler --tail=100 -f
# Frontend logs
kubectl logs deployment/realestate-crawler-ui -n realestate-crawler --tail=100 -f
# Logs from a specific pod
kubectl logs <pod-name> -n realestate-crawler --tail=100 -f
# Previous container logs (if pod crashed/restarted)
kubectl logs <pod-name> -n realestate-crawler --previous
```
## Scaling
```bash
# Scale API
kubectl scale deployment/realestate-crawler-api --replicas=2 -n realestate-crawler
# Scale down (e.g., for maintenance)
kubectl scale deployment/realestate-crawler-api --replicas=0 -n realestate-crawler
```
## Rollback
```bash
# View rollout history
kubectl rollout history deployment/realestate-crawler-api -n realestate-crawler
# Rollback to previous version
kubectl rollout undo deployment/realestate-crawler-api -n realestate-crawler
# Rollback to specific revision
kubectl rollout undo deployment/realestate-crawler-api --to-revision=3 -n realestate-crawler
```
## Debugging Production Issues
```bash
# Exec into a running API pod
kubectl exec -it deployment/realestate-crawler-api -n realestate-crawler -- bash
# Run a one-off command
kubectl exec deployment/realestate-crawler-api -n realestate-crawler -- python -c "print('hello')"
# Check pod events (useful for crash loops, image pull errors)
kubectl get events -n realestate-crawler --sort-by='.lastTimestamp' | tail -20
# Port-forward to a pod for local debugging
kubectl port-forward deployment/realestate-crawler-api 5001:5001 -n realestate-crawler
```
## Notes
- Drone CI handles automated deployments on push to `master` (see `.drone.yml`)
- Use manual deployment for hotfixes, testing, or deploying from non-master branches
- The K8s cluster is at `10.0.20.100:6443` (context: `kubernetes-admin@kubernetes`)
- If pods aren't picking up new `:latest` images, check the `kubernetes-latest-tag-image-pull` skill
- Always verify the rollout completed with `kubectl rollout status` after deploying

View file

@ -0,0 +1,144 @@
---
name: dev-environment
description: |
Start, stop, rebuild, and manage the local Docker Compose development environment
for the realestate-crawler project. Use when: (1) user wants to start/stop the dev
environment, (2) needs to rebuild after code changes, (3) wants to check service
status or view logs, (4) needs to run database migrations.
author: Claude Code
version: 1.0.0
date: 2026-02-06
---
# Dev Environment Management
Docker Compose orchestrates the dev environment locally from `crawler/`. All project
commands (pytest, alembic, mypy, python, etc.) must run inside the `app` container
via `docker compose exec app <command>`. Only docker/kubectl commands run on the host.
## Starting the Dev Environment
```bash
# Start all services (Redis, MySQL, API, Celery worker, Celery beat)
cd crawler && docker compose up
# Start in detached mode (background)
cd crawler && docker compose up -d
# Rebuild images and start (after Dockerfile or dependency changes)
cd crawler && docker compose up --build
# Or use the start.sh helper
cd crawler && ./start.sh # foreground
cd crawler && ./start.sh --build # rebuild first
```
## Stopping the Dev Environment
```bash
cd crawler && docker compose down
# Also remove volumes (fresh database, fresh Redis)
cd crawler && docker compose down -v
# Or use the helper
cd crawler && ./start.sh --down
```
## Checking Status
```bash
# List running containers
cd crawler && docker compose ps
# Check health status
cd crawler && docker compose ps --format "table {{.Name}}\t{{.Status}}"
```
## Viewing Logs
```bash
# Follow all service logs
cd crawler && docker compose logs -f
# Follow specific service logs
cd crawler && docker compose logs -f app
cd crawler && docker compose logs -f celery
cd crawler && docker compose logs -f celery-beat
cd crawler && docker compose logs -f mysql
cd crawler && docker compose logs -f redis
# Or use the helper
cd crawler && ./start.sh --logs
```
## Restarting Individual Services
```bash
# Restart just the API (e.g., after config change)
cd crawler && docker compose restart app
# Restart Celery worker
cd crawler && docker compose restart celery
# Rebuild and restart a single service
cd crawler && docker compose up --build app
```
## Running Database Migrations
```bash
# Apply pending migrations
cd crawler && docker compose exec app alembic upgrade head
# Create a new migration
cd crawler && docker compose exec app alembic revision -m "description"
```
## Running Tests Inside Container
```bash
cd crawler && docker compose exec app pytest tests/ -v --cov=. --cov-report=term-missing
```
## Running Any Command Inside Container
All project commands must be run inside the `app` container:
```bash
# General pattern
cd crawler && docker compose exec app <command>
# Examples
cd crawler && docker compose exec app python main.py dump-listings --type rent
cd crawler && docker compose exec app mypy .
cd crawler && docker compose exec app ruff check .
cd crawler && docker compose exec app poetry install
cd crawler && docker compose exec app bash # interactive shell
```
## Services and Ports
| Service | Container | Port | Description |
|-------------|-----------------|-------|--------------------------------|
| redis | rec-redis | 6379 | Celery broker + GeoJSON cache |
| mysql | rec-mysql | 3306 | Primary database |
| app | rec-app | 5001 | FastAPI server (hot-reload) |
| celery | rec-celery | - | Background task worker |
| celery-beat | rec-celery-beat | - | Periodic task scheduler |
## Environment Variables
Key env vars are set in `docker-compose.yml`. To override locally, create a `.env` file
in `crawler/` (see `.env.sample`). Key overrides:
- `ROUTING_API_KEY` - Google Maps API key (passed from host env)
- `SCRAPE_SCHEDULES` - JSON array of periodic scrape configs (passed from host env)
## Notes
- The API server has hot-reload enabled for `api/`, `services/`, `repositories/`, and `models/` directories
- Source code is bind-mounted into containers, so local edits are reflected immediately
- Python virtualenv is stored in a named Docker volume (`app_venv`) shared across app, celery, and celery-beat
- MySQL data persists in the `mysql_data` volume; Redis data in `redis_data`
- Use `docker compose down -v` to reset all data (volumes)