18 KiB
Build Optimization Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Minimize Docker build times and production image sizes for both backend and frontend.
Architecture: Remove unused dependencies (watchdog, tqdm, pandas, apprise), switch to opencv-python-headless, replace pip with uv in Docker builds, add BuildKit cache mounts, harden images with non-root users and healthchecks.
Tech Stack: Python 3.13, uv, Docker BuildKit, node:24-alpine, nginx:alpine
Task 1: Remove watchdog from pyproject.toml
Files:
- Modify:
pyproject.toml:29(removewatchdog = "^6.0.0")
Step 1: Remove the dependency
In pyproject.toml, delete this line:
watchdog = "^6.0.0"
Step 2: Verify no imports exist
Run: grep -r "import watchdog\|from watchdog" --include="*.py" .
Expected: No output (zero matches).
Step 3: Commit
git add pyproject.toml
git commit -m "Remove unused watchdog dependency"
Task 2: Replace tqdm with logging in services/image_fetcher.py
Files:
- Modify:
services/image_fetcher.py
Step 1: Replace tqdm import with asyncio and logging
Replace:
from tqdm.asyncio import tqdm
With:
import asyncio
import logging
logger = logging.getLogger(__name__)
(If asyncio or logging or logger are already imported, don't duplicate.)
Step 2: Replace tqdm.gather with asyncio.gather + logging
Replace:
async with aiohttp.ClientSession() as session:
updated_listings = await tqdm.gather(
*[
dump_images_for_listing(listing, image_base_path, session=session)
for listing in listings
]
)
With:
async with aiohttp.ClientSession() as session:
logger.info("Downloading images for %d listings", len(listings))
updated_listings = await asyncio.gather(
*[
dump_images_for_listing(listing, image_base_path, session=session)
for listing in listings
]
)
logger.info("Finished downloading images for %d listings", len(listings))
Step 3: Run tests
Run: pytest tests/ -v --tb=short -q
Expected: All tests pass.
Step 4: Commit
git add services/image_fetcher.py
git commit -m "Replace tqdm with logging in image_fetcher"
Task 3: Replace tqdm with logging in services/floorplan_detector.py
Files:
- Modify:
services/floorplan_detector.py
Step 1: Replace tqdm import with asyncio and logging
Replace:
from tqdm.asyncio import tqdm
With:
import asyncio
import logging
logger = logging.getLogger(__name__)
(Skip if already imported.)
Step 2: Replace tqdm.gather with asyncio.gather + logging
Replace:
updated_listings = [
listing
for listing in await tqdm.gather(
*[_calculate_sqm_ocr(listing, semaphore) for listing in listings]
)
if listing is not None
]
With:
logger.info("Detecting floorplans for %d listings", len(listings))
updated_listings = [
listing
for listing in await asyncio.gather(
*[_calculate_sqm_ocr(listing, semaphore) for listing in listings]
)
if listing is not None
]
logger.info("Finished floorplan detection, %d listings updated", len(updated_listings))
Step 3: Run tests
Run: pytest tests/ -v --tb=short -q
Expected: All tests pass.
Step 4: Commit
git add services/floorplan_detector.py
git commit -m "Replace tqdm with logging in floorplan_detector"
Task 4: Replace tqdm with logging in services/route_calculator.py
Files:
- Modify:
services/route_calculator.py
Step 1: Replace tqdm import with asyncio and logging
Replace:
from tqdm.asyncio import tqdm
With:
import asyncio
import logging
logger = logging.getLogger(__name__)
(Skip if already imported.)
Step 2: Replace tqdm.gather with asyncio.gather + logging
Replace:
destination_mode = DestinationMode(destination_address, travel_mode)
updated_listings = await tqdm.gather(
*[update_routing_info(listing, destination_mode) for listing in listings],
total=len(listings),
desc="Updating routing info",
)
With:
destination_mode = DestinationMode(destination_address, travel_mode)
logger.info("Calculating routes for %d listings", len(listings))
updated_listings = await asyncio.gather(
*[update_routing_info(listing, destination_mode) for listing in listings],
)
logger.info("Finished route calculation for %d listings", len(listings))
Step 3: Run tests
Run: pytest tests/ -v --tb=short -q
Expected: All tests pass.
Step 4: Commit
git add services/route_calculator.py
git commit -m "Replace tqdm with logging in route_calculator"
Task 5: Replace tqdm with logging in repositories/listing_repository.py
Files:
- Modify:
repositories/listing_repository.py
Step 1: Replace tqdm import with logging
Replace:
from tqdm import tqdm
With:
import logging
Add logger = logging.getLogger(__name__) if not already present.
Step 2: Replace tqdm iterator with plain loop + logging
Replace:
for listing in tqdm(listings, desc="Upserting listings"):
With:
logger.info("Upserting %d listings", len(listings))
for listing in listings:
Step 3: Remove tqdm from pyproject.toml
Delete:
tqdm = "^4.66.2"
Step 4: Run tests
Run: pytest tests/ -v --tb=short -q
Expected: All tests pass.
Step 5: Commit
git add repositories/listing_repository.py services/image_fetcher.py services/floorplan_detector.py services/route_calculator.py pyproject.toml
git commit -m "Remove tqdm dependency, replace with logging"
Task 6: Replace pandas with stdlib csv in csv_exporter.py
Files:
- Modify:
csv_exporter.py
Step 1: Rewrite csv_exporter.py
Replace the entire file with:
import csv
import json
from pathlib import Path
from models.listing import QueryParameters
from repositories.listing_repository import ListingRepository
DROP_COLUMNS = {"_sa_instance_state", "additional_info"}
ENSURE_COLUMNS = ("service_charge", "lease_left", "square_meters")
async def export_to_csv(
repository: ListingRepository,
output_file: Path,
query_parameters: QueryParameters | None = None,
) -> None:
listings = await repository.get_listings(query_parameters=query_parameters)
rows = [
{k: v for k, v in listing.__dict__.items() if k not in DROP_COLUMNS}
for listing in listings
]
if not rows:
output_file.write_text("")
return
# Read decisions file if present
decisions: dict[str, str] = {}
decisions_path = Path("data/decisions.json")
if decisions_path.exists():
with open(decisions_path) as f:
decisions = json.load(f)
for row in rows:
# Add decision column
row["decision"] = decisions.get(str(row.get("id")))
# Ensure optional columns exist
for col in ENSURE_COLUMNS:
row.setdefault(col, None)
# Replace -1 sentinel in square_meters
if row.get("square_meters") == -1:
row["square_meters"] = None
# Compute price_per_sqm
sqm = row.get("square_meters")
price = row.get("price")
if sqm and sqm > 0 and price:
row["price_per_sqm"] = round(price / sqm, 2)
else:
row["price_per_sqm"] = None
# Sort by price_per_sqm (None values last)
rows.sort(key=lambda r: (r["price_per_sqm"] is None, r["price_per_sqm"] or 0))
fieldnames = list(rows[0].keys())
with open(output_file, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
Step 2: Remove pandas from pyproject.toml
Delete:
pandas = "^2.2.1"
Step 3: Run tests
Run: pytest tests/ -v --tb=short -q
Expected: All tests pass.
Step 4: Commit
git add csv_exporter.py pyproject.toml
git commit -m "Replace pandas with stdlib csv in csv_exporter"
Task 7: Replace apprise with direct HTTP in notifications.py
Files:
- Modify:
notifications.py
Step 1: Rewrite notifications.py
Replace the entire file with:
import json
import logging
import os
import aiohttp
logger = logging.getLogger(__name__)
async def send_notification(body: str, title: str = "") -> bool:
webhook_url = os.environ.get("SLACK_WEBHOOK_URL")
if not webhook_url:
logger.debug("No SLACK_WEBHOOK_URL configured, skipping notification")
return False
text = f"*{title}*\n{body}" if title else body
try:
async with aiohttp.ClientSession() as session:
async with session.post(
webhook_url,
data=json.dumps({"text": text}),
headers={"Content-Type": "application/json"},
) as resp:
return resp.status == 200
except Exception:
logger.exception("Failed to send Slack notification")
return False
Step 2: Remove apprise from pyproject.toml
Delete:
apprise = "^1.9.3"
Step 3: Run tests
Run: pytest tests/ -v --tb=short -q
Expected: All tests pass (tests mock send_notification).
Step 4: Commit
git add notifications.py pyproject.toml
git commit -m "Replace apprise with direct Slack webhook call"
Task 8: Switch opencv-python to opencv-python-headless and add httpx to prod deps
Files:
- Modify:
pyproject.toml
Step 1: Change opencv-python to headless
Replace:
opencv-python = "^4.11.0.86"
With:
opencv-python-headless = "^4.11.0.86"
Step 2: Move httpx to production dependencies
Add under [tool.poetry.dependencies]:
httpx = "^0.27.0"
Remove httpx from [tool.poetry.group.dev.dependencies].
Step 3: Commit
git add pyproject.toml
git commit -m "Switch to opencv-python-headless, add httpx to prod deps"
Task 9: Regenerate requirements.txt
Files:
- Modify:
requirements.txt
Step 1: Regenerate requirements.txt from updated pyproject.toml
Run:
docker compose exec -T app poetry export -f requirements.txt -o requirements.txt
If poetry is not available in the container, run locally:
poetry lock --no-update && poetry export -f requirements.txt -o requirements.txt
Step 2: Verify the removed packages are gone
Run:
grep -i "watchdog\|tqdm\|pandas\|apprise\|opencv-python==" requirements.txt
Expected: No matches for watchdog, tqdm, pandas, apprise. Only opencv-python-headless should appear.
Step 3: Commit
git add requirements.txt pyproject.toml poetry.lock
git commit -m "Regenerate requirements.txt after dependency cleanup"
Task 10: Optimize backend Dockerfile (uv, cache mounts, non-root, healthcheck)
Files:
- Modify:
Dockerfile
Step 1: Rewrite the Dockerfile
Replace the entire Dockerfile with:
# syntax=docker/dockerfile:1
# Stage 1: Install build tools and Python dependencies
FROM python:3.13-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
gcc \
python3-dev \
libopencv-dev \
libmariadb-dev
WORKDIR /app
COPY requirements.txt ./
# Install dependencies into a venv using uv (10-25x faster than pip)
RUN python -m venv /app/.venv && \
--mount=type=cache,target=/root/.cache/uv \
/app/.venv/bin/uv pip install -r requirements.txt
# Stage 2: Runtime system dependencies (runs in parallel with builder)
FROM python:3.13-slim AS runtime-base
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
libglib2.0-0 \
tesseract-ocr \
tesseract-ocr-eng \
libmariadb3 \
curl
# Stage 3: Test — runtime deps + venv + test dependencies + run tests
FROM runtime-base AS test
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --python /app/.venv/bin/python pytest pytest-asyncio pytest-xdist httpx aioresponses fakeredis
COPY . .
RUN pytest tests/ -x -q
# Stage 4: Final image — combine venv from builder + runtime base
FROM runtime-base
RUN adduser --system --no-create-home appuser
WORKDIR /app
# Copy the venv from the builder stage
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"
# Copy the application code
COPY . .
RUN chown -R appuser /app
USER appuser
EXPOSE 5001
HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=40s \
CMD curl -f http://localhost:5001/api/status || exit 1
CMD ["sh", "-c", "alembic upgrade head && uvicorn api.app:app --host 0.0.0.0 --port 5001 --no-server-header"]
Key changes:
# syntax=docker/dockerfile:1for BuildKituvinstead ofpip(copied from official image)- BuildKit cache mounts for apt and uv
- Removed
libgl1(not needed with opencv-python-headless) - Added
curlto runtime-base (for healthcheck) - Non-root
appuser HEALTHCHECKusing existing/api/statusendpoint
Step 2: Test the build locally
Run:
DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test .
Expected: Build completes. Tests pass in the test stage.
Step 3: Commit
git add Dockerfile
git commit -m "Optimize Dockerfile: uv, BuildKit cache mounts, non-root user, healthcheck"
Task 11: Optimize frontend Dockerfile (cache mounts, non-root, healthcheck)
Files:
- Modify:
frontend/Dockerfile
Step 1: Rewrite the frontend Dockerfile
Replace the entire frontend/Dockerfile with:
# syntax=docker/dockerfile:1
# Stage 1: Install dependencies (cached if package-lock.json unchanged)
FROM node:24-alpine AS deps
WORKDIR /app
# Limit Node.js heap to avoid OOM in constrained CI environments
ENV NODE_OPTIONS="--max-old-space-size=1024"
# Copy package files first for better layer caching
COPY package.json package-lock.json* ./
RUN --mount=type=cache,target=/root/.npm \
npm ci
# Stage 2: Run tests (fails the build if tests fail)
FROM deps AS test
COPY . .
RUN npx vitest run
# Stage 3: Build production bundle
FROM deps AS builder
COPY . .
# Skip tsc type-checking (vitest already validated); Vite transpiles via SWC
RUN npx vite build
# Stage 4: Serve with nginx
FROM nginx:alpine
# Remove default nginx static files
RUN rm -rf /usr/share/nginx/html/*
WORKDIR /app
COPY --from=builder /app/dist /usr/share/nginx/html
COPY --from=builder /app/nginx.conf /etc/nginx/conf.d/default.conf
RUN chown -R nginx:nginx /usr/share/nginx/html
USER nginx
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:80/ || exit 1
CMD ["nginx", "-g", "daemon off;"]
Key changes:
# syntax=docker/dockerfile:1for BuildKit- npm cache mount (
/root/.npm) - Non-root
nginxuser HEALTHCHECKwith wget (available in alpine)
Step 2: Test the build locally
Run:
DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/
Expected: Build completes. Tests pass.
Step 3: Commit
git add frontend/Dockerfile
git commit -m "Optimize frontend Dockerfile: npm cache mount, non-root nginx, healthcheck"
Task 12: Clean up frontend package.json
Files:
- Modify:
frontend/package.json
Step 1: Remove unused dependencies
Remove from dependencies:
@tabler/icons-reactcrossfilter2date-fns
Step 2: Move @types to devDependencies
Move from dependencies to devDependencies:
@types/crossfilter@types/d3@types/mapbox-gl@types/turf
Step 3: Remove @types/crossfilter (crossfilter2 removed)
Since crossfilter2 is being removed, also remove @types/crossfilter entirely.
Step 4: Reinstall to update lockfile
Run:
cd frontend && npm install
Step 5: Run frontend tests
Run:
cd frontend && npx vitest run
Expected: All tests pass.
Step 6: Commit
git add frontend/package.json frontend/package-lock.json
git commit -m "Remove unused frontend deps, move @types to devDependencies"
Task 13: Expand frontend .dockerignore
Files:
- Modify:
frontend/.dockerignore
Step 1: Add additional exclusions
Replace frontend/.dockerignore with:
node_modules
dist
.vite
coverage
.git
.env.local
.env
*.md
Step 2: Commit
git add frontend/.dockerignore
git commit -m "Expand frontend .dockerignore to exclude build artifacts"
Task 14: Enable BuildKit in Drone CI API pipeline
Files:
- Modify:
.drone.yml
Step 1: Add DOCKER_BUILDKIT environment to API build step
In the .drone.yml file, under the api pipeline's "Build and test API image" step, add the build_args setting:
- name: Build and test API image
image: plugins/docker
environment:
DOCKER_BUILDKIT: 1
settings:
username: viktorbarzin
password:
from_secret: dockerhub-token
repo: viktorbarzin/realestatecrawler
dockerfile: Dockerfile
context: .
cache_from:
- viktorbarzin/realestatecrawler:latest
- viktorbarzin/realestatecrawler:builder
tags:
- latest
- builder
- "${DRONE_BUILD_NUMBER}"
Step 2: Commit
git add .drone.yml
git commit -m "Enable BuildKit in Drone API pipeline"
Task 15: Final integration test
Step 1: Build both images locally
DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test .
DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/
Expected: Both build successfully.
Step 2: Run docker compose up to verify everything works
docker compose up -d --build
Step 3: Run full test suite
docker compose exec -T app pytest tests/ -v --tb=short
Expected: All tests pass.
Step 4: Verify image sizes (informational)
docker images | grep -E "realestate|immoweb"
Step 5: Commit any remaining changes if needed
git status