# Build Optimization Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Minimize Docker build times and production image sizes for both backend and frontend. **Architecture:** Remove unused dependencies (watchdog, tqdm, pandas, apprise), switch to opencv-python-headless, replace pip with uv in Docker builds, add BuildKit cache mounts, harden images with non-root users and healthchecks. **Tech Stack:** Python 3.13, uv, Docker BuildKit, node:24-alpine, nginx:alpine --- ### Task 1: Remove `watchdog` from pyproject.toml **Files:** - Modify: `pyproject.toml:29` (remove `watchdog = "^6.0.0"`) **Step 1: Remove the dependency** In `pyproject.toml`, delete this line: ``` watchdog = "^6.0.0" ``` **Step 2: Verify no imports exist** Run: `grep -r "import watchdog\|from watchdog" --include="*.py" .` Expected: No output (zero matches). **Step 3: Commit** ```bash git add pyproject.toml git commit -m "Remove unused watchdog dependency" ``` --- ### Task 2: Replace `tqdm` with logging in `services/image_fetcher.py` **Files:** - Modify: `services/image_fetcher.py` **Step 1: Replace tqdm import with asyncio and logging** Replace: ```python from tqdm.asyncio import tqdm ``` With: ```python import asyncio import logging logger = logging.getLogger(__name__) ``` (If `asyncio` or `logging` or `logger` are already imported, don't duplicate.) **Step 2: Replace tqdm.gather with asyncio.gather + logging** Replace: ```python async with aiohttp.ClientSession() as session: updated_listings = await tqdm.gather( *[ dump_images_for_listing(listing, image_base_path, session=session) for listing in listings ] ) ``` With: ```python async with aiohttp.ClientSession() as session: logger.info("Downloading images for %d listings", len(listings)) updated_listings = await asyncio.gather( *[ dump_images_for_listing(listing, image_base_path, session=session) for listing in listings ] ) logger.info("Finished downloading images for %d listings", len(listings)) ``` **Step 3: Run tests** Run: `pytest tests/ -v --tb=short -q` Expected: All tests pass. **Step 4: Commit** ```bash git add services/image_fetcher.py git commit -m "Replace tqdm with logging in image_fetcher" ``` --- ### Task 3: Replace `tqdm` with logging in `services/floorplan_detector.py` **Files:** - Modify: `services/floorplan_detector.py` **Step 1: Replace tqdm import with asyncio and logging** Replace: ```python from tqdm.asyncio import tqdm ``` With: ```python import asyncio import logging logger = logging.getLogger(__name__) ``` (Skip if already imported.) **Step 2: Replace tqdm.gather with asyncio.gather + logging** Replace: ```python updated_listings = [ listing for listing in await tqdm.gather( *[_calculate_sqm_ocr(listing, semaphore) for listing in listings] ) if listing is not None ] ``` With: ```python logger.info("Detecting floorplans for %d listings", len(listings)) updated_listings = [ listing for listing in await asyncio.gather( *[_calculate_sqm_ocr(listing, semaphore) for listing in listings] ) if listing is not None ] logger.info("Finished floorplan detection, %d listings updated", len(updated_listings)) ``` **Step 3: Run tests** Run: `pytest tests/ -v --tb=short -q` Expected: All tests pass. **Step 4: Commit** ```bash git add services/floorplan_detector.py git commit -m "Replace tqdm with logging in floorplan_detector" ``` --- ### Task 4: Replace `tqdm` with logging in `services/route_calculator.py` **Files:** - Modify: `services/route_calculator.py` **Step 1: Replace tqdm import with asyncio and logging** Replace: ```python from tqdm.asyncio import tqdm ``` With: ```python import asyncio import logging logger = logging.getLogger(__name__) ``` (Skip if already imported.) **Step 2: Replace tqdm.gather with asyncio.gather + logging** Replace: ```python destination_mode = DestinationMode(destination_address, travel_mode) updated_listings = await tqdm.gather( *[update_routing_info(listing, destination_mode) for listing in listings], total=len(listings), desc="Updating routing info", ) ``` With: ```python destination_mode = DestinationMode(destination_address, travel_mode) logger.info("Calculating routes for %d listings", len(listings)) updated_listings = await asyncio.gather( *[update_routing_info(listing, destination_mode) for listing in listings], ) logger.info("Finished route calculation for %d listings", len(listings)) ``` **Step 3: Run tests** Run: `pytest tests/ -v --tb=short -q` Expected: All tests pass. **Step 4: Commit** ```bash git add services/route_calculator.py git commit -m "Replace tqdm with logging in route_calculator" ``` --- ### Task 5: Replace `tqdm` with logging in `repositories/listing_repository.py` **Files:** - Modify: `repositories/listing_repository.py` **Step 1: Replace tqdm import with logging** Replace: ```python from tqdm import tqdm ``` With: ```python import logging ``` Add `logger = logging.getLogger(__name__)` if not already present. **Step 2: Replace tqdm iterator with plain loop + logging** Replace: ```python for listing in tqdm(listings, desc="Upserting listings"): ``` With: ```python logger.info("Upserting %d listings", len(listings)) for listing in listings: ``` **Step 3: Remove `tqdm` from `pyproject.toml`** Delete: ``` tqdm = "^4.66.2" ``` **Step 4: Run tests** Run: `pytest tests/ -v --tb=short -q` Expected: All tests pass. **Step 5: Commit** ```bash git add repositories/listing_repository.py services/image_fetcher.py services/floorplan_detector.py services/route_calculator.py pyproject.toml git commit -m "Remove tqdm dependency, replace with logging" ``` --- ### Task 6: Replace `pandas` with stdlib `csv` in `csv_exporter.py` **Files:** - Modify: `csv_exporter.py` **Step 1: Rewrite csv_exporter.py** Replace the entire file with: ```python import csv import json from pathlib import Path from models.listing import QueryParameters from repositories.listing_repository import ListingRepository DROP_COLUMNS = {"_sa_instance_state", "additional_info"} ENSURE_COLUMNS = ("service_charge", "lease_left", "square_meters") async def export_to_csv( repository: ListingRepository, output_file: Path, query_parameters: QueryParameters | None = None, ) -> None: listings = await repository.get_listings(query_parameters=query_parameters) rows = [ {k: v for k, v in listing.__dict__.items() if k not in DROP_COLUMNS} for listing in listings ] if not rows: output_file.write_text("") return # Read decisions file if present decisions: dict[str, str] = {} decisions_path = Path("data/decisions.json") if decisions_path.exists(): with open(decisions_path) as f: decisions = json.load(f) for row in rows: # Add decision column row["decision"] = decisions.get(str(row.get("id"))) # Ensure optional columns exist for col in ENSURE_COLUMNS: row.setdefault(col, None) # Replace -1 sentinel in square_meters if row.get("square_meters") == -1: row["square_meters"] = None # Compute price_per_sqm sqm = row.get("square_meters") price = row.get("price") if sqm and sqm > 0 and price: row["price_per_sqm"] = round(price / sqm, 2) else: row["price_per_sqm"] = None # Sort by price_per_sqm (None values last) rows.sort(key=lambda r: (r["price_per_sqm"] is None, r["price_per_sqm"] or 0)) fieldnames = list(rows[0].keys()) with open(output_file, "w", newline="") as f: writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() writer.writerows(rows) ``` **Step 2: Remove `pandas` from `pyproject.toml`** Delete: ``` pandas = "^2.2.1" ``` **Step 3: Run tests** Run: `pytest tests/ -v --tb=short -q` Expected: All tests pass. **Step 4: Commit** ```bash git add csv_exporter.py pyproject.toml git commit -m "Replace pandas with stdlib csv in csv_exporter" ``` --- ### Task 7: Replace `apprise` with direct HTTP in `notifications.py` **Files:** - Modify: `notifications.py` **Step 1: Rewrite notifications.py** Replace the entire file with: ```python import json import logging import os import aiohttp logger = logging.getLogger(__name__) async def send_notification(body: str, title: str = "") -> bool: webhook_url = os.environ.get("SLACK_WEBHOOK_URL") if not webhook_url: logger.debug("No SLACK_WEBHOOK_URL configured, skipping notification") return False text = f"*{title}*\n{body}" if title else body try: async with aiohttp.ClientSession() as session: async with session.post( webhook_url, data=json.dumps({"text": text}), headers={"Content-Type": "application/json"}, ) as resp: return resp.status == 200 except Exception: logger.exception("Failed to send Slack notification") return False ``` **Step 2: Remove `apprise` from `pyproject.toml`** Delete: ``` apprise = "^1.9.3" ``` **Step 3: Run tests** Run: `pytest tests/ -v --tb=short -q` Expected: All tests pass (tests mock `send_notification`). **Step 4: Commit** ```bash git add notifications.py pyproject.toml git commit -m "Replace apprise with direct Slack webhook call" ``` --- ### Task 8: Switch `opencv-python` to `opencv-python-headless` and add `httpx` to prod deps **Files:** - Modify: `pyproject.toml` **Step 1: Change opencv-python to headless** Replace: ``` opencv-python = "^4.11.0.86" ``` With: ``` opencv-python-headless = "^4.11.0.86" ``` **Step 2: Move httpx to production dependencies** Add under `[tool.poetry.dependencies]`: ``` httpx = "^0.27.0" ``` Remove `httpx` from `[tool.poetry.group.dev.dependencies]`. **Step 3: Commit** ```bash git add pyproject.toml git commit -m "Switch to opencv-python-headless, add httpx to prod deps" ``` --- ### Task 9: Regenerate requirements.txt **Files:** - Modify: `requirements.txt` **Step 1: Regenerate requirements.txt from updated pyproject.toml** Run: ```bash docker compose exec -T app poetry export -f requirements.txt -o requirements.txt ``` If poetry is not available in the container, run locally: ```bash poetry lock --no-update && poetry export -f requirements.txt -o requirements.txt ``` **Step 2: Verify the removed packages are gone** Run: ```bash grep -i "watchdog\|tqdm\|pandas\|apprise\|opencv-python==" requirements.txt ``` Expected: No matches for watchdog, tqdm, pandas, apprise. Only `opencv-python-headless` should appear. **Step 3: Commit** ```bash git add requirements.txt pyproject.toml poetry.lock git commit -m "Regenerate requirements.txt after dependency cleanup" ``` --- ### Task 10: Optimize backend Dockerfile (uv, cache mounts, non-root, healthcheck) **Files:** - Modify: `Dockerfile` **Step 1: Rewrite the Dockerfile** Replace the entire `Dockerfile` with: ```dockerfile # syntax=docker/dockerfile:1 # Stage 1: Install build tools and Python dependencies FROM python:3.13-slim AS builder COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ --mount=type=cache,target=/var/lib/apt/lists,sharing=locked \ apt-get update && apt-get install -y --no-install-recommends \ build-essential \ gcc \ python3-dev \ libopencv-dev \ libmariadb-dev WORKDIR /app COPY requirements.txt ./ # Install dependencies into a venv using uv (10-25x faster than pip) RUN python -m venv /app/.venv && \ --mount=type=cache,target=/root/.cache/uv \ /app/.venv/bin/uv pip install -r requirements.txt # Stage 2: Runtime system dependencies (runs in parallel with builder) FROM python:3.13-slim AS runtime-base RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \ --mount=type=cache,target=/var/lib/apt/lists,sharing=locked \ apt-get update && apt-get install -y --no-install-recommends \ libglib2.0-0 \ tesseract-ocr \ tesseract-ocr-eng \ libmariadb3 \ curl # Stage 3: Test — runtime deps + venv + test dependencies + run tests FROM runtime-base AS test WORKDIR /app COPY --from=builder /app/.venv /app/.venv ENV PATH="/app/.venv/bin:$PATH" COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv RUN --mount=type=cache,target=/root/.cache/uv \ uv pip install --python /app/.venv/bin/python pytest pytest-asyncio pytest-xdist httpx aioresponses fakeredis COPY . . RUN pytest tests/ -x -q # Stage 4: Final image — combine venv from builder + runtime base FROM runtime-base RUN adduser --system --no-create-home appuser WORKDIR /app # Copy the venv from the builder stage COPY --from=builder /app/.venv /app/.venv ENV PATH="/app/.venv/bin:$PATH" # Copy the application code COPY . . RUN chown -R appuser /app USER appuser EXPOSE 5001 HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=40s \ CMD curl -f http://localhost:5001/api/status || exit 1 CMD ["sh", "-c", "alembic upgrade head && uvicorn api.app:app --host 0.0.0.0 --port 5001 --no-server-header"] ``` Key changes: - `# syntax=docker/dockerfile:1` for BuildKit - `uv` instead of `pip` (copied from official image) - BuildKit cache mounts for apt and uv - Removed `libgl1` (not needed with opencv-python-headless) - Added `curl` to runtime-base (for healthcheck) - Non-root `appuser` - `HEALTHCHECK` using existing `/api/status` endpoint **Step 2: Test the build locally** Run: ```bash DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test . ``` Expected: Build completes. Tests pass in the test stage. **Step 3: Commit** ```bash git add Dockerfile git commit -m "Optimize Dockerfile: uv, BuildKit cache mounts, non-root user, healthcheck" ``` --- ### Task 11: Optimize frontend Dockerfile (cache mounts, non-root, healthcheck) **Files:** - Modify: `frontend/Dockerfile` **Step 1: Rewrite the frontend Dockerfile** Replace the entire `frontend/Dockerfile` with: ```dockerfile # syntax=docker/dockerfile:1 # Stage 1: Install dependencies (cached if package-lock.json unchanged) FROM node:24-alpine AS deps WORKDIR /app # Limit Node.js heap to avoid OOM in constrained CI environments ENV NODE_OPTIONS="--max-old-space-size=1024" # Copy package files first for better layer caching COPY package.json package-lock.json* ./ RUN --mount=type=cache,target=/root/.npm \ npm ci # Stage 2: Run tests (fails the build if tests fail) FROM deps AS test COPY . . RUN npx vitest run # Stage 3: Build production bundle FROM deps AS builder COPY . . # Skip tsc type-checking (vitest already validated); Vite transpiles via SWC RUN npx vite build # Stage 4: Serve with nginx FROM nginx:alpine # Remove default nginx static files RUN rm -rf /usr/share/nginx/html/* WORKDIR /app COPY --from=builder /app/dist /usr/share/nginx/html COPY --from=builder /app/nginx.conf /etc/nginx/conf.d/default.conf RUN chown -R nginx:nginx /usr/share/nginx/html USER nginx EXPOSE 80 HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD wget --no-verbose --tries=1 --spider http://localhost:80/ || exit 1 CMD ["nginx", "-g", "daemon off;"] ``` Key changes: - `# syntax=docker/dockerfile:1` for BuildKit - npm cache mount (`/root/.npm`) - Non-root `nginx` user - `HEALTHCHECK` with wget (available in alpine) **Step 2: Test the build locally** Run: ```bash DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/ ``` Expected: Build completes. Tests pass. **Step 3: Commit** ```bash git add frontend/Dockerfile git commit -m "Optimize frontend Dockerfile: npm cache mount, non-root nginx, healthcheck" ``` --- ### Task 12: Clean up frontend package.json **Files:** - Modify: `frontend/package.json` **Step 1: Remove unused dependencies** Remove from `dependencies`: - `@tabler/icons-react` - `crossfilter2` - `date-fns` **Step 2: Move @types to devDependencies** Move from `dependencies` to `devDependencies`: - `@types/crossfilter` - `@types/d3` - `@types/mapbox-gl` - `@types/turf` **Step 3: Remove @types/crossfilter (crossfilter2 removed)** Since `crossfilter2` is being removed, also remove `@types/crossfilter` entirely. **Step 4: Reinstall to update lockfile** Run: ```bash cd frontend && npm install ``` **Step 5: Run frontend tests** Run: ```bash cd frontend && npx vitest run ``` Expected: All tests pass. **Step 6: Commit** ```bash git add frontend/package.json frontend/package-lock.json git commit -m "Remove unused frontend deps, move @types to devDependencies" ``` --- ### Task 13: Expand frontend .dockerignore **Files:** - Modify: `frontend/.dockerignore` **Step 1: Add additional exclusions** Replace `frontend/.dockerignore` with: ``` node_modules dist .vite coverage .git .env.local .env *.md ``` **Step 2: Commit** ```bash git add frontend/.dockerignore git commit -m "Expand frontend .dockerignore to exclude build artifacts" ``` --- ### Task 14: Enable BuildKit in Drone CI API pipeline **Files:** - Modify: `.drone.yml` **Step 1: Add DOCKER_BUILDKIT environment to API build step** In the `.drone.yml` file, under the `api` pipeline's "Build and test API image" step, add the `build_args` setting: ```yaml - name: Build and test API image image: plugins/docker environment: DOCKER_BUILDKIT: 1 settings: username: viktorbarzin password: from_secret: dockerhub-token repo: viktorbarzin/realestatecrawler dockerfile: Dockerfile context: . cache_from: - viktorbarzin/realestatecrawler:latest - viktorbarzin/realestatecrawler:builder tags: - latest - builder - "${DRONE_BUILD_NUMBER}" ``` **Step 2: Commit** ```bash git add .drone.yml git commit -m "Enable BuildKit in Drone API pipeline" ``` --- ### Task 15: Final integration test **Step 1: Build both images locally** ```bash DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test . DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/ ``` Expected: Both build successfully. **Step 2: Run docker compose up to verify everything works** ```bash docker compose up -d --build ``` **Step 3: Run full test suite** ```bash docker compose exec -T app pytest tests/ -v --tb=short ``` Expected: All tests pass. **Step 4: Verify image sizes (informational)** ```bash docker images | grep -E "realestate|immoweb" ``` **Step 5: Commit any remaining changes if needed** ```bash git status ```