wrongmove/docs/plans/2026-02-21-build-optimization-plan.md
Viktor Barzin c7c3331d30
Migrate CI from Drone to Woodpecker
Replace .drone.yml with .woodpecker/ pipeline configs (frontend.yml, api.yml).
Convert Drone env vars to Woodpecker equivalents (CI_PIPELINE_NUMBER, CI_COMMIT_SHA),
use woodpeckerci/plugin-git for clone with retry, woodpeckerci/plugin-slack for
notifications, and plugins/docker for image builds. Update all docs and skills.
2026-02-23 22:00:09 +00:00

18 KiB

Build Optimization Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Minimize Docker build times and production image sizes for both backend and frontend.

Architecture: Remove unused dependencies (watchdog, tqdm, pandas, apprise), switch to opencv-python-headless, replace pip with uv in Docker builds, add BuildKit cache mounts, harden images with non-root users and healthchecks.

Tech Stack: Python 3.13, uv, Docker BuildKit, node:24-alpine, nginx:alpine


Task 1: Remove watchdog from pyproject.toml

Files:

  • Modify: pyproject.toml:29 (remove watchdog = "^6.0.0")

Step 1: Remove the dependency

In pyproject.toml, delete this line:

watchdog = "^6.0.0"

Step 2: Verify no imports exist

Run: grep -r "import watchdog\|from watchdog" --include="*.py" . Expected: No output (zero matches).

Step 3: Commit

git add pyproject.toml
git commit -m "Remove unused watchdog dependency"

Task 2: Replace tqdm with logging in services/image_fetcher.py

Files:

  • Modify: services/image_fetcher.py

Step 1: Replace tqdm import with asyncio and logging

Replace:

from tqdm.asyncio import tqdm

With:

import asyncio
import logging

logger = logging.getLogger(__name__)

(If asyncio or logging or logger are already imported, don't duplicate.)

Step 2: Replace tqdm.gather with asyncio.gather + logging

Replace:

    async with aiohttp.ClientSession() as session:
        updated_listings = await tqdm.gather(
            *[
                dump_images_for_listing(listing, image_base_path, session=session)
                for listing in listings
            ]
        )

With:

    async with aiohttp.ClientSession() as session:
        logger.info("Downloading images for %d listings", len(listings))
        updated_listings = await asyncio.gather(
            *[
                dump_images_for_listing(listing, image_base_path, session=session)
                for listing in listings
            ]
        )
        logger.info("Finished downloading images for %d listings", len(listings))

Step 3: Run tests

Run: pytest tests/ -v --tb=short -q Expected: All tests pass.

Step 4: Commit

git add services/image_fetcher.py
git commit -m "Replace tqdm with logging in image_fetcher"

Task 3: Replace tqdm with logging in services/floorplan_detector.py

Files:

  • Modify: services/floorplan_detector.py

Step 1: Replace tqdm import with asyncio and logging

Replace:

from tqdm.asyncio import tqdm

With:

import asyncio
import logging

logger = logging.getLogger(__name__)

(Skip if already imported.)

Step 2: Replace tqdm.gather with asyncio.gather + logging

Replace:

    updated_listings = [
        listing
        for listing in await tqdm.gather(
            *[_calculate_sqm_ocr(listing, semaphore) for listing in listings]
        )
        if listing is not None
    ]

With:

    logger.info("Detecting floorplans for %d listings", len(listings))
    updated_listings = [
        listing
        for listing in await asyncio.gather(
            *[_calculate_sqm_ocr(listing, semaphore) for listing in listings]
        )
        if listing is not None
    ]
    logger.info("Finished floorplan detection, %d listings updated", len(updated_listings))

Step 3: Run tests

Run: pytest tests/ -v --tb=short -q Expected: All tests pass.

Step 4: Commit

git add services/floorplan_detector.py
git commit -m "Replace tqdm with logging in floorplan_detector"

Task 4: Replace tqdm with logging in services/route_calculator.py

Files:

  • Modify: services/route_calculator.py

Step 1: Replace tqdm import with asyncio and logging

Replace:

from tqdm.asyncio import tqdm

With:

import asyncio
import logging

logger = logging.getLogger(__name__)

(Skip if already imported.)

Step 2: Replace tqdm.gather with asyncio.gather + logging

Replace:

    destination_mode = DestinationMode(destination_address, travel_mode)
    updated_listings = await tqdm.gather(
        *[update_routing_info(listing, destination_mode) for listing in listings],
        total=len(listings),
        desc="Updating routing info",
    )

With:

    destination_mode = DestinationMode(destination_address, travel_mode)
    logger.info("Calculating routes for %d listings", len(listings))
    updated_listings = await asyncio.gather(
        *[update_routing_info(listing, destination_mode) for listing in listings],
    )
    logger.info("Finished route calculation for %d listings", len(listings))

Step 3: Run tests

Run: pytest tests/ -v --tb=short -q Expected: All tests pass.

Step 4: Commit

git add services/route_calculator.py
git commit -m "Replace tqdm with logging in route_calculator"

Task 5: Replace tqdm with logging in repositories/listing_repository.py

Files:

  • Modify: repositories/listing_repository.py

Step 1: Replace tqdm import with logging

Replace:

from tqdm import tqdm

With:

import logging

Add logger = logging.getLogger(__name__) if not already present.

Step 2: Replace tqdm iterator with plain loop + logging

Replace:

        for listing in tqdm(listings, desc="Upserting listings"):

With:

        logger.info("Upserting %d listings", len(listings))
        for listing in listings:

Step 3: Remove tqdm from pyproject.toml

Delete:

tqdm = "^4.66.2"

Step 4: Run tests

Run: pytest tests/ -v --tb=short -q Expected: All tests pass.

Step 5: Commit

git add repositories/listing_repository.py services/image_fetcher.py services/floorplan_detector.py services/route_calculator.py pyproject.toml
git commit -m "Remove tqdm dependency, replace with logging"

Task 6: Replace pandas with stdlib csv in csv_exporter.py

Files:

  • Modify: csv_exporter.py

Step 1: Rewrite csv_exporter.py

Replace the entire file with:

import csv
import json
from pathlib import Path

from models.listing import QueryParameters
from repositories.listing_repository import ListingRepository


DROP_COLUMNS = {"_sa_instance_state", "additional_info"}
ENSURE_COLUMNS = ("service_charge", "lease_left", "square_meters")


async def export_to_csv(
    repository: ListingRepository,
    output_file: Path,
    query_parameters: QueryParameters | None = None,
) -> None:
    listings = await repository.get_listings(query_parameters=query_parameters)
    rows = [
        {k: v for k, v in listing.__dict__.items() if k not in DROP_COLUMNS}
        for listing in listings
    ]

    if not rows:
        output_file.write_text("")
        return

    # Read decisions file if present
    decisions: dict[str, str] = {}
    decisions_path = Path("data/decisions.json")
    if decisions_path.exists():
        with open(decisions_path) as f:
            decisions = json.load(f)

    for row in rows:
        # Add decision column
        row["decision"] = decisions.get(str(row.get("id")))

        # Ensure optional columns exist
        for col in ENSURE_COLUMNS:
            row.setdefault(col, None)

        # Replace -1 sentinel in square_meters
        if row.get("square_meters") == -1:
            row["square_meters"] = None

        # Compute price_per_sqm
        sqm = row.get("square_meters")
        price = row.get("price")
        if sqm and sqm > 0 and price:
            row["price_per_sqm"] = round(price / sqm, 2)
        else:
            row["price_per_sqm"] = None

    # Sort by price_per_sqm (None values last)
    rows.sort(key=lambda r: (r["price_per_sqm"] is None, r["price_per_sqm"] or 0))

    fieldnames = list(rows[0].keys())
    with open(output_file, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(rows)

Step 2: Remove pandas from pyproject.toml

Delete:

pandas = "^2.2.1"

Step 3: Run tests

Run: pytest tests/ -v --tb=short -q Expected: All tests pass.

Step 4: Commit

git add csv_exporter.py pyproject.toml
git commit -m "Replace pandas with stdlib csv in csv_exporter"

Task 7: Replace apprise with direct HTTP in notifications.py

Files:

  • Modify: notifications.py

Step 1: Rewrite notifications.py

Replace the entire file with:

import json
import logging
import os

import aiohttp

logger = logging.getLogger(__name__)


async def send_notification(body: str, title: str = "") -> bool:
    webhook_url = os.environ.get("SLACK_WEBHOOK_URL")
    if not webhook_url:
        logger.debug("No SLACK_WEBHOOK_URL configured, skipping notification")
        return False

    text = f"*{title}*\n{body}" if title else body
    try:
        async with aiohttp.ClientSession() as session:
            async with session.post(
                webhook_url,
                data=json.dumps({"text": text}),
                headers={"Content-Type": "application/json"},
            ) as resp:
                return resp.status == 200
    except Exception:
        logger.exception("Failed to send Slack notification")
        return False

Step 2: Remove apprise from pyproject.toml

Delete:

apprise = "^1.9.3"

Step 3: Run tests

Run: pytest tests/ -v --tb=short -q Expected: All tests pass (tests mock send_notification).

Step 4: Commit

git add notifications.py pyproject.toml
git commit -m "Replace apprise with direct Slack webhook call"

Task 8: Switch opencv-python to opencv-python-headless and add httpx to prod deps

Files:

  • Modify: pyproject.toml

Step 1: Change opencv-python to headless

Replace:

opencv-python = "^4.11.0.86"

With:

opencv-python-headless = "^4.11.0.86"

Step 2: Move httpx to production dependencies

Add under [tool.poetry.dependencies]:

httpx = "^0.27.0"

Remove httpx from [tool.poetry.group.dev.dependencies].

Step 3: Commit

git add pyproject.toml
git commit -m "Switch to opencv-python-headless, add httpx to prod deps"

Task 9: Regenerate requirements.txt

Files:

  • Modify: requirements.txt

Step 1: Regenerate requirements.txt from updated pyproject.toml

Run:

docker compose exec -T app poetry export -f requirements.txt -o requirements.txt

If poetry is not available in the container, run locally:

poetry lock --no-update && poetry export -f requirements.txt -o requirements.txt

Step 2: Verify the removed packages are gone

Run:

grep -i "watchdog\|tqdm\|pandas\|apprise\|opencv-python==" requirements.txt

Expected: No matches for watchdog, tqdm, pandas, apprise. Only opencv-python-headless should appear.

Step 3: Commit

git add requirements.txt pyproject.toml poetry.lock
git commit -m "Regenerate requirements.txt after dependency cleanup"

Task 10: Optimize backend Dockerfile (uv, cache mounts, non-root, healthcheck)

Files:

  • Modify: Dockerfile

Step 1: Rewrite the Dockerfile

Replace the entire Dockerfile with:

# syntax=docker/dockerfile:1

# Stage 1: Install build tools and Python dependencies
FROM python:3.13-slim AS builder

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
    apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    gcc \
    python3-dev \
    libopencv-dev \
    libmariadb-dev

WORKDIR /app

COPY requirements.txt ./

# Install dependencies into a venv using uv (10-25x faster than pip)
RUN python -m venv /app/.venv && \
    --mount=type=cache,target=/root/.cache/uv \
    /app/.venv/bin/uv pip install -r requirements.txt

# Stage 2: Runtime system dependencies (runs in parallel with builder)
FROM python:3.13-slim AS runtime-base

RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
    --mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
    apt-get update && apt-get install -y --no-install-recommends \
    libglib2.0-0 \
    tesseract-ocr \
    tesseract-ocr-eng \
    libmariadb3 \
    curl

# Stage 3: Test — runtime deps + venv + test dependencies + run tests
FROM runtime-base AS test

WORKDIR /app

COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"

COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

RUN --mount=type=cache,target=/root/.cache/uv \
    uv pip install --python /app/.venv/bin/python pytest pytest-asyncio pytest-xdist httpx aioresponses fakeredis

COPY . .

RUN pytest tests/ -x -q

# Stage 4: Final image — combine venv from builder + runtime base
FROM runtime-base

RUN adduser --system --no-create-home appuser

WORKDIR /app

# Copy the venv from the builder stage
COPY --from=builder /app/.venv /app/.venv

ENV PATH="/app/.venv/bin:$PATH"

# Copy the application code
COPY . .

RUN chown -R appuser /app

USER appuser

EXPOSE 5001

HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=40s \
  CMD curl -f http://localhost:5001/api/status || exit 1

CMD ["sh", "-c", "alembic upgrade head && uvicorn api.app:app --host 0.0.0.0 --port 5001 --no-server-header"]

Key changes:

  • # syntax=docker/dockerfile:1 for BuildKit
  • uv instead of pip (copied from official image)
  • BuildKit cache mounts for apt and uv
  • Removed libgl1 (not needed with opencv-python-headless)
  • Added curl to runtime-base (for healthcheck)
  • Non-root appuser
  • HEALTHCHECK using existing /api/status endpoint

Step 2: Test the build locally

Run:

DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test .

Expected: Build completes. Tests pass in the test stage.

Step 3: Commit

git add Dockerfile
git commit -m "Optimize Dockerfile: uv, BuildKit cache mounts, non-root user, healthcheck"

Task 11: Optimize frontend Dockerfile (cache mounts, non-root, healthcheck)

Files:

  • Modify: frontend/Dockerfile

Step 1: Rewrite the frontend Dockerfile

Replace the entire frontend/Dockerfile with:

# syntax=docker/dockerfile:1

# Stage 1: Install dependencies (cached if package-lock.json unchanged)
FROM node:24-alpine AS deps

WORKDIR /app

# Limit Node.js heap to avoid OOM in constrained CI environments
ENV NODE_OPTIONS="--max-old-space-size=1024"

# Copy package files first for better layer caching
COPY package.json package-lock.json* ./

RUN --mount=type=cache,target=/root/.npm \
    npm ci

# Stage 2: Run tests (fails the build if tests fail)
FROM deps AS test

COPY . .

RUN npx vitest run

# Stage 3: Build production bundle
FROM deps AS builder

COPY . .

# Skip tsc type-checking (vitest already validated); Vite transpiles via SWC
RUN npx vite build

# Stage 4: Serve with nginx
FROM nginx:alpine

# Remove default nginx static files
RUN rm -rf /usr/share/nginx/html/*

WORKDIR /app

COPY --from=builder /app/dist /usr/share/nginx/html
COPY --from=builder /app/nginx.conf /etc/nginx/conf.d/default.conf

RUN chown -R nginx:nginx /usr/share/nginx/html

USER nginx

EXPOSE 80

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:80/ || exit 1

CMD ["nginx", "-g", "daemon off;"]

Key changes:

  • # syntax=docker/dockerfile:1 for BuildKit
  • npm cache mount (/root/.npm)
  • Non-root nginx user
  • HEALTHCHECK with wget (available in alpine)

Step 2: Test the build locally

Run:

DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/

Expected: Build completes. Tests pass.

Step 3: Commit

git add frontend/Dockerfile
git commit -m "Optimize frontend Dockerfile: npm cache mount, non-root nginx, healthcheck"

Task 12: Clean up frontend package.json

Files:

  • Modify: frontend/package.json

Step 1: Remove unused dependencies

Remove from dependencies:

  • @tabler/icons-react
  • crossfilter2
  • date-fns

Step 2: Move @types to devDependencies

Move from dependencies to devDependencies:

  • @types/crossfilter
  • @types/d3
  • @types/mapbox-gl
  • @types/turf

Step 3: Remove @types/crossfilter (crossfilter2 removed)

Since crossfilter2 is being removed, also remove @types/crossfilter entirely.

Step 4: Reinstall to update lockfile

Run:

cd frontend && npm install

Step 5: Run frontend tests

Run:

cd frontend && npx vitest run

Expected: All tests pass.

Step 6: Commit

git add frontend/package.json frontend/package-lock.json
git commit -m "Remove unused frontend deps, move @types to devDependencies"

Task 13: Expand frontend .dockerignore

Files:

  • Modify: frontend/.dockerignore

Step 1: Add additional exclusions

Replace frontend/.dockerignore with:

node_modules
dist
.vite
coverage
.git
.env.local
.env
*.md

Step 2: Commit

git add frontend/.dockerignore
git commit -m "Expand frontend .dockerignore to exclude build artifacts"

Task 14: Enable BuildKit in Woodpecker CI API pipeline

Files:

  • Modify: .woodpecker/api.yml

Step 1: Add DOCKER_BUILDKIT environment to API build step

In the .woodpecker/api.yml file, under the "build-api-image" step, add the build_args setting:

  - name: build-api-image
    image: plugins/docker
    environment:
      DOCKER_BUILDKIT: 1
    settings:
      username: viktorbarzin
      password:
        from_secret: dockerhub-token
      repo: viktorbarzin/realestatecrawler
      dockerfile: Dockerfile
      context: .
      cache_from:
        - viktorbarzin/realestatecrawler:latest
        - viktorbarzin/realestatecrawler:builder
      tags:
        - latest
        - builder
        - "${CI_PIPELINE_NUMBER}"

Step 2: Commit

git add .woodpecker/api.yml
git commit -m "Enable BuildKit in Woodpecker API pipeline"

Task 15: Final integration test

Step 1: Build both images locally

DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test .
DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/

Expected: Both build successfully.

Step 2: Run docker compose up to verify everything works

docker compose up -d --build

Step 3: Run full test suite

docker compose exec -T app pytest tests/ -v --tb=short

Expected: All tests pass.

Step 4: Verify image sizes (informational)

docker images | grep -E "realestate|immoweb"

Step 5: Commit any remaining changes if needed

git status