diff --git a/docs/plans/2026-02-21-build-optimization-plan.md b/docs/plans/2026-02-21-build-optimization-plan.md new file mode 100644 index 0000000..f849fe4 --- /dev/null +++ b/docs/plans/2026-02-21-build-optimization-plan.md @@ -0,0 +1,842 @@ +# Build Optimization Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Minimize Docker build times and production image sizes for both backend and frontend. + +**Architecture:** Remove unused dependencies (watchdog, tqdm, pandas, apprise), switch to opencv-python-headless, replace pip with uv in Docker builds, add BuildKit cache mounts, harden images with non-root users and healthchecks. + +**Tech Stack:** Python 3.13, uv, Docker BuildKit, node:24-alpine, nginx:alpine + +--- + +### Task 1: Remove `watchdog` from pyproject.toml + +**Files:** +- Modify: `pyproject.toml:29` (remove `watchdog = "^6.0.0"`) + +**Step 1: Remove the dependency** + +In `pyproject.toml`, delete this line: +``` +watchdog = "^6.0.0" +``` + +**Step 2: Verify no imports exist** + +Run: `grep -r "import watchdog\|from watchdog" --include="*.py" .` +Expected: No output (zero matches). + +**Step 3: Commit** + +```bash +git add pyproject.toml +git commit -m "Remove unused watchdog dependency" +``` + +--- + +### Task 2: Replace `tqdm` with logging in `services/image_fetcher.py` + +**Files:** +- Modify: `services/image_fetcher.py` + +**Step 1: Replace tqdm import with asyncio and logging** + +Replace: +```python +from tqdm.asyncio import tqdm +``` +With: +```python +import asyncio +import logging + +logger = logging.getLogger(__name__) +``` + +(If `asyncio` or `logging` or `logger` are already imported, don't duplicate.) + +**Step 2: Replace tqdm.gather with asyncio.gather + logging** + +Replace: +```python + async with aiohttp.ClientSession() as session: + updated_listings = await tqdm.gather( + *[ + dump_images_for_listing(listing, image_base_path, session=session) + for listing in listings + ] + ) +``` +With: +```python + async with aiohttp.ClientSession() as session: + logger.info("Downloading images for %d listings", len(listings)) + updated_listings = await asyncio.gather( + *[ + dump_images_for_listing(listing, image_base_path, session=session) + for listing in listings + ] + ) + logger.info("Finished downloading images for %d listings", len(listings)) +``` + +**Step 3: Run tests** + +Run: `pytest tests/ -v --tb=short -q` +Expected: All tests pass. + +**Step 4: Commit** + +```bash +git add services/image_fetcher.py +git commit -m "Replace tqdm with logging in image_fetcher" +``` + +--- + +### Task 3: Replace `tqdm` with logging in `services/floorplan_detector.py` + +**Files:** +- Modify: `services/floorplan_detector.py` + +**Step 1: Replace tqdm import with asyncio and logging** + +Replace: +```python +from tqdm.asyncio import tqdm +``` +With: +```python +import asyncio +import logging + +logger = logging.getLogger(__name__) +``` + +(Skip if already imported.) + +**Step 2: Replace tqdm.gather with asyncio.gather + logging** + +Replace: +```python + updated_listings = [ + listing + for listing in await tqdm.gather( + *[_calculate_sqm_ocr(listing, semaphore) for listing in listings] + ) + if listing is not None + ] +``` +With: +```python + logger.info("Detecting floorplans for %d listings", len(listings)) + updated_listings = [ + listing + for listing in await asyncio.gather( + *[_calculate_sqm_ocr(listing, semaphore) for listing in listings] + ) + if listing is not None + ] + logger.info("Finished floorplan detection, %d listings updated", len(updated_listings)) +``` + +**Step 3: Run tests** + +Run: `pytest tests/ -v --tb=short -q` +Expected: All tests pass. + +**Step 4: Commit** + +```bash +git add services/floorplan_detector.py +git commit -m "Replace tqdm with logging in floorplan_detector" +``` + +--- + +### Task 4: Replace `tqdm` with logging in `services/route_calculator.py` + +**Files:** +- Modify: `services/route_calculator.py` + +**Step 1: Replace tqdm import with asyncio and logging** + +Replace: +```python +from tqdm.asyncio import tqdm +``` +With: +```python +import asyncio +import logging + +logger = logging.getLogger(__name__) +``` + +(Skip if already imported.) + +**Step 2: Replace tqdm.gather with asyncio.gather + logging** + +Replace: +```python + destination_mode = DestinationMode(destination_address, travel_mode) + updated_listings = await tqdm.gather( + *[update_routing_info(listing, destination_mode) for listing in listings], + total=len(listings), + desc="Updating routing info", + ) +``` +With: +```python + destination_mode = DestinationMode(destination_address, travel_mode) + logger.info("Calculating routes for %d listings", len(listings)) + updated_listings = await asyncio.gather( + *[update_routing_info(listing, destination_mode) for listing in listings], + ) + logger.info("Finished route calculation for %d listings", len(listings)) +``` + +**Step 3: Run tests** + +Run: `pytest tests/ -v --tb=short -q` +Expected: All tests pass. + +**Step 4: Commit** + +```bash +git add services/route_calculator.py +git commit -m "Replace tqdm with logging in route_calculator" +``` + +--- + +### Task 5: Replace `tqdm` with logging in `repositories/listing_repository.py` + +**Files:** +- Modify: `repositories/listing_repository.py` + +**Step 1: Replace tqdm import with logging** + +Replace: +```python +from tqdm import tqdm +``` +With: +```python +import logging +``` + +Add `logger = logging.getLogger(__name__)` if not already present. + +**Step 2: Replace tqdm iterator with plain loop + logging** + +Replace: +```python + for listing in tqdm(listings, desc="Upserting listings"): +``` +With: +```python + logger.info("Upserting %d listings", len(listings)) + for listing in listings: +``` + +**Step 3: Remove `tqdm` from `pyproject.toml`** + +Delete: +``` +tqdm = "^4.66.2" +``` + +**Step 4: Run tests** + +Run: `pytest tests/ -v --tb=short -q` +Expected: All tests pass. + +**Step 5: Commit** + +```bash +git add repositories/listing_repository.py services/image_fetcher.py services/floorplan_detector.py services/route_calculator.py pyproject.toml +git commit -m "Remove tqdm dependency, replace with logging" +``` + +--- + +### Task 6: Replace `pandas` with stdlib `csv` in `csv_exporter.py` + +**Files:** +- Modify: `csv_exporter.py` + +**Step 1: Rewrite csv_exporter.py** + +Replace the entire file with: + +```python +import csv +import json +from pathlib import Path + +from models.listing import QueryParameters +from repositories.listing_repository import ListingRepository + + +DROP_COLUMNS = {"_sa_instance_state", "additional_info"} +ENSURE_COLUMNS = ("service_charge", "lease_left", "square_meters") + + +async def export_to_csv( + repository: ListingRepository, + output_file: Path, + query_parameters: QueryParameters | None = None, +) -> None: + listings = await repository.get_listings(query_parameters=query_parameters) + rows = [ + {k: v for k, v in listing.__dict__.items() if k not in DROP_COLUMNS} + for listing in listings + ] + + if not rows: + output_file.write_text("") + return + + # Read decisions file if present + decisions: dict[str, str] = {} + decisions_path = Path("data/decisions.json") + if decisions_path.exists(): + with open(decisions_path) as f: + decisions = json.load(f) + + for row in rows: + # Add decision column + row["decision"] = decisions.get(str(row.get("id"))) + + # Ensure optional columns exist + for col in ENSURE_COLUMNS: + row.setdefault(col, None) + + # Replace -1 sentinel in square_meters + if row.get("square_meters") == -1: + row["square_meters"] = None + + # Compute price_per_sqm + sqm = row.get("square_meters") + price = row.get("price") + if sqm and sqm > 0 and price: + row["price_per_sqm"] = round(price / sqm, 2) + else: + row["price_per_sqm"] = None + + # Sort by price_per_sqm (None values last) + rows.sort(key=lambda r: (r["price_per_sqm"] is None, r["price_per_sqm"] or 0)) + + fieldnames = list(rows[0].keys()) + with open(output_file, "w", newline="") as f: + writer = csv.DictWriter(f, fieldnames=fieldnames) + writer.writeheader() + writer.writerows(rows) +``` + +**Step 2: Remove `pandas` from `pyproject.toml`** + +Delete: +``` +pandas = "^2.2.1" +``` + +**Step 3: Run tests** + +Run: `pytest tests/ -v --tb=short -q` +Expected: All tests pass. + +**Step 4: Commit** + +```bash +git add csv_exporter.py pyproject.toml +git commit -m "Replace pandas with stdlib csv in csv_exporter" +``` + +--- + +### Task 7: Replace `apprise` with direct HTTP in `notifications.py` + +**Files:** +- Modify: `notifications.py` + +**Step 1: Rewrite notifications.py** + +Replace the entire file with: + +```python +import json +import logging +import os + +import aiohttp + +logger = logging.getLogger(__name__) + + +async def send_notification(body: str, title: str = "") -> bool: + webhook_url = os.environ.get("SLACK_WEBHOOK_URL") + if not webhook_url: + logger.debug("No SLACK_WEBHOOK_URL configured, skipping notification") + return False + + text = f"*{title}*\n{body}" if title else body + try: + async with aiohttp.ClientSession() as session: + async with session.post( + webhook_url, + data=json.dumps({"text": text}), + headers={"Content-Type": "application/json"}, + ) as resp: + return resp.status == 200 + except Exception: + logger.exception("Failed to send Slack notification") + return False +``` + +**Step 2: Remove `apprise` from `pyproject.toml`** + +Delete: +``` +apprise = "^1.9.3" +``` + +**Step 3: Run tests** + +Run: `pytest tests/ -v --tb=short -q` +Expected: All tests pass (tests mock `send_notification`). + +**Step 4: Commit** + +```bash +git add notifications.py pyproject.toml +git commit -m "Replace apprise with direct Slack webhook call" +``` + +--- + +### Task 8: Switch `opencv-python` to `opencv-python-headless` and add `httpx` to prod deps + +**Files:** +- Modify: `pyproject.toml` + +**Step 1: Change opencv-python to headless** + +Replace: +``` +opencv-python = "^4.11.0.86" +``` +With: +``` +opencv-python-headless = "^4.11.0.86" +``` + +**Step 2: Move httpx to production dependencies** + +Add under `[tool.poetry.dependencies]`: +``` +httpx = "^0.27.0" +``` + +Remove `httpx` from `[tool.poetry.group.dev.dependencies]`. + +**Step 3: Commit** + +```bash +git add pyproject.toml +git commit -m "Switch to opencv-python-headless, add httpx to prod deps" +``` + +--- + +### Task 9: Regenerate requirements.txt + +**Files:** +- Modify: `requirements.txt` + +**Step 1: Regenerate requirements.txt from updated pyproject.toml** + +Run: +```bash +docker compose exec -T app poetry export -f requirements.txt -o requirements.txt +``` + +If poetry is not available in the container, run locally: +```bash +poetry lock --no-update && poetry export -f requirements.txt -o requirements.txt +``` + +**Step 2: Verify the removed packages are gone** + +Run: +```bash +grep -i "watchdog\|tqdm\|pandas\|apprise\|opencv-python==" requirements.txt +``` +Expected: No matches for watchdog, tqdm, pandas, apprise. Only `opencv-python-headless` should appear. + +**Step 3: Commit** + +```bash +git add requirements.txt pyproject.toml poetry.lock +git commit -m "Regenerate requirements.txt after dependency cleanup" +``` + +--- + +### Task 10: Optimize backend Dockerfile (uv, cache mounts, non-root, healthcheck) + +**Files:** +- Modify: `Dockerfile` + +**Step 1: Rewrite the Dockerfile** + +Replace the entire `Dockerfile` with: + +```dockerfile +# syntax=docker/dockerfile:1 + +# Stage 1: Install build tools and Python dependencies +FROM python:3.13-slim AS builder + +COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv + +RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ + --mount=type=cache,target=/var/lib/apt/lists,sharing=locked \ + apt-get update && apt-get install -y --no-install-recommends \ + build-essential \ + gcc \ + python3-dev \ + libopencv-dev \ + libmariadb-dev + +WORKDIR /app + +COPY requirements.txt ./ + +# Install dependencies into a venv using uv (10-25x faster than pip) +RUN python -m venv /app/.venv && \ + --mount=type=cache,target=/root/.cache/uv \ + /app/.venv/bin/uv pip install -r requirements.txt + +# Stage 2: Runtime system dependencies (runs in parallel with builder) +FROM python:3.13-slim AS runtime-base + +RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ + --mount=type=cache,target=/var/lib/apt/lists,sharing=locked \ + apt-get update && apt-get install -y --no-install-recommends \ + libglib2.0-0 \ + tesseract-ocr \ + tesseract-ocr-eng \ + libmariadb3 \ + curl + +# Stage 3: Test — runtime deps + venv + test dependencies + run tests +FROM runtime-base AS test + +WORKDIR /app + +COPY --from=builder /app/.venv /app/.venv +ENV PATH="/app/.venv/bin:$PATH" + +COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv + +RUN --mount=type=cache,target=/root/.cache/uv \ + uv pip install --python /app/.venv/bin/python pytest pytest-asyncio pytest-xdist httpx aioresponses fakeredis + +COPY . . + +RUN pytest tests/ -x -q + +# Stage 4: Final image — combine venv from builder + runtime base +FROM runtime-base + +RUN adduser --system --no-create-home appuser + +WORKDIR /app + +# Copy the venv from the builder stage +COPY --from=builder /app/.venv /app/.venv + +ENV PATH="/app/.venv/bin:$PATH" + +# Copy the application code +COPY . . + +RUN chown -R appuser /app + +USER appuser + +EXPOSE 5001 + +HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=40s \ + CMD curl -f http://localhost:5001/api/status || exit 1 + +CMD ["sh", "-c", "alembic upgrade head && uvicorn api.app:app --host 0.0.0.0 --port 5001 --no-server-header"] +``` + +Key changes: +- `# syntax=docker/dockerfile:1` for BuildKit +- `uv` instead of `pip` (copied from official image) +- BuildKit cache mounts for apt and uv +- Removed `libgl1` (not needed with opencv-python-headless) +- Added `curl` to runtime-base (for healthcheck) +- Non-root `appuser` +- `HEALTHCHECK` using existing `/api/status` endpoint + +**Step 2: Test the build locally** + +Run: +```bash +DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test . +``` +Expected: Build completes. Tests pass in the test stage. + +**Step 3: Commit** + +```bash +git add Dockerfile +git commit -m "Optimize Dockerfile: uv, BuildKit cache mounts, non-root user, healthcheck" +``` + +--- + +### Task 11: Optimize frontend Dockerfile (cache mounts, non-root, healthcheck) + +**Files:** +- Modify: `frontend/Dockerfile` + +**Step 1: Rewrite the frontend Dockerfile** + +Replace the entire `frontend/Dockerfile` with: + +```dockerfile +# syntax=docker/dockerfile:1 + +# Stage 1: Install dependencies (cached if package-lock.json unchanged) +FROM node:24-alpine AS deps + +WORKDIR /app + +# Limit Node.js heap to avoid OOM in constrained CI environments +ENV NODE_OPTIONS="--max-old-space-size=1024" + +# Copy package files first for better layer caching +COPY package.json package-lock.json* ./ + +RUN --mount=type=cache,target=/root/.npm \ + npm ci + +# Stage 2: Run tests (fails the build if tests fail) +FROM deps AS test + +COPY . . + +RUN npx vitest run + +# Stage 3: Build production bundle +FROM deps AS builder + +COPY . . + +# Skip tsc type-checking (vitest already validated); Vite transpiles via SWC +RUN npx vite build + +# Stage 4: Serve with nginx +FROM nginx:alpine + +# Remove default nginx static files +RUN rm -rf /usr/share/nginx/html/* + +WORKDIR /app + +COPY --from=builder /app/dist /usr/share/nginx/html +COPY --from=builder /app/nginx.conf /etc/nginx/conf.d/default.conf + +RUN chown -R nginx:nginx /usr/share/nginx/html + +USER nginx + +EXPOSE 80 + +HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ + CMD wget --no-verbose --tries=1 --spider http://localhost:80/ || exit 1 + +CMD ["nginx", "-g", "daemon off;"] +``` + +Key changes: +- `# syntax=docker/dockerfile:1` for BuildKit +- npm cache mount (`/root/.npm`) +- Non-root `nginx` user +- `HEALTHCHECK` with wget (available in alpine) + +**Step 2: Test the build locally** + +Run: +```bash +DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/ +``` +Expected: Build completes. Tests pass. + +**Step 3: Commit** + +```bash +git add frontend/Dockerfile +git commit -m "Optimize frontend Dockerfile: npm cache mount, non-root nginx, healthcheck" +``` + +--- + +### Task 12: Clean up frontend package.json + +**Files:** +- Modify: `frontend/package.json` + +**Step 1: Remove unused dependencies** + +Remove from `dependencies`: +- `@tabler/icons-react` +- `crossfilter2` +- `date-fns` + +**Step 2: Move @types to devDependencies** + +Move from `dependencies` to `devDependencies`: +- `@types/crossfilter` +- `@types/d3` +- `@types/mapbox-gl` +- `@types/turf` + +**Step 3: Remove @types/crossfilter (crossfilter2 removed)** + +Since `crossfilter2` is being removed, also remove `@types/crossfilter` entirely. + +**Step 4: Reinstall to update lockfile** + +Run: +```bash +cd frontend && npm install +``` + +**Step 5: Run frontend tests** + +Run: +```bash +cd frontend && npx vitest run +``` +Expected: All tests pass. + +**Step 6: Commit** + +```bash +git add frontend/package.json frontend/package-lock.json +git commit -m "Remove unused frontend deps, move @types to devDependencies" +``` + +--- + +### Task 13: Expand frontend .dockerignore + +**Files:** +- Modify: `frontend/.dockerignore` + +**Step 1: Add additional exclusions** + +Replace `frontend/.dockerignore` with: + +``` +node_modules +dist +.vite +coverage +.git +.env.local +.env +*.md +``` + +**Step 2: Commit** + +```bash +git add frontend/.dockerignore +git commit -m "Expand frontend .dockerignore to exclude build artifacts" +``` + +--- + +### Task 14: Enable BuildKit in Drone CI API pipeline + +**Files:** +- Modify: `.drone.yml` + +**Step 1: Add DOCKER_BUILDKIT environment to API build step** + +In the `.drone.yml` file, under the `api` pipeline's "Build and test API image" step, add the `build_args` setting: + +```yaml + - name: Build and test API image + image: plugins/docker + environment: + DOCKER_BUILDKIT: 1 + settings: + username: viktorbarzin + password: + from_secret: dockerhub-token + repo: viktorbarzin/realestatecrawler + dockerfile: Dockerfile + context: . + cache_from: + - viktorbarzin/realestatecrawler:latest + - viktorbarzin/realestatecrawler:builder + tags: + - latest + - builder + - "${DRONE_BUILD_NUMBER}" +``` + +**Step 2: Commit** + +```bash +git add .drone.yml +git commit -m "Enable BuildKit in Drone API pipeline" +``` + +--- + +### Task 15: Final integration test + +**Step 1: Build both images locally** + +```bash +DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test . +DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/ +``` +Expected: Both build successfully. + +**Step 2: Run docker compose up to verify everything works** + +```bash +docker compose up -d --build +``` + +**Step 3: Run full test suite** + +```bash +docker compose exec -T app pytest tests/ -v --tb=short +``` +Expected: All tests pass. + +**Step 4: Verify image sizes (informational)** + +```bash +docker images | grep -E "realestate|immoweb" +``` + +**Step 5: Commit any remaining changes if needed** + +```bash +git status +```