Add build optimization implementation plan

This commit is contained in:
Viktor Barzin 2026-02-21 19:36:33 +00:00
parent 54c0999b94
commit d488208a26
No known key found for this signature in database
GPG key ID: 0EB088298288D958

View file

@ -0,0 +1,842 @@
# Build Optimization Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Minimize Docker build times and production image sizes for both backend and frontend.
**Architecture:** Remove unused dependencies (watchdog, tqdm, pandas, apprise), switch to opencv-python-headless, replace pip with uv in Docker builds, add BuildKit cache mounts, harden images with non-root users and healthchecks.
**Tech Stack:** Python 3.13, uv, Docker BuildKit, node:24-alpine, nginx:alpine
---
### Task 1: Remove `watchdog` from pyproject.toml
**Files:**
- Modify: `pyproject.toml:29` (remove `watchdog = "^6.0.0"`)
**Step 1: Remove the dependency**
In `pyproject.toml`, delete this line:
```
watchdog = "^6.0.0"
```
**Step 2: Verify no imports exist**
Run: `grep -r "import watchdog\|from watchdog" --include="*.py" .`
Expected: No output (zero matches).
**Step 3: Commit**
```bash
git add pyproject.toml
git commit -m "Remove unused watchdog dependency"
```
---
### Task 2: Replace `tqdm` with logging in `services/image_fetcher.py`
**Files:**
- Modify: `services/image_fetcher.py`
**Step 1: Replace tqdm import with asyncio and logging**
Replace:
```python
from tqdm.asyncio import tqdm
```
With:
```python
import asyncio
import logging
logger = logging.getLogger(__name__)
```
(If `asyncio` or `logging` or `logger` are already imported, don't duplicate.)
**Step 2: Replace tqdm.gather with asyncio.gather + logging**
Replace:
```python
async with aiohttp.ClientSession() as session:
updated_listings = await tqdm.gather(
*[
dump_images_for_listing(listing, image_base_path, session=session)
for listing in listings
]
)
```
With:
```python
async with aiohttp.ClientSession() as session:
logger.info("Downloading images for %d listings", len(listings))
updated_listings = await asyncio.gather(
*[
dump_images_for_listing(listing, image_base_path, session=session)
for listing in listings
]
)
logger.info("Finished downloading images for %d listings", len(listings))
```
**Step 3: Run tests**
Run: `pytest tests/ -v --tb=short -q`
Expected: All tests pass.
**Step 4: Commit**
```bash
git add services/image_fetcher.py
git commit -m "Replace tqdm with logging in image_fetcher"
```
---
### Task 3: Replace `tqdm` with logging in `services/floorplan_detector.py`
**Files:**
- Modify: `services/floorplan_detector.py`
**Step 1: Replace tqdm import with asyncio and logging**
Replace:
```python
from tqdm.asyncio import tqdm
```
With:
```python
import asyncio
import logging
logger = logging.getLogger(__name__)
```
(Skip if already imported.)
**Step 2: Replace tqdm.gather with asyncio.gather + logging**
Replace:
```python
updated_listings = [
listing
for listing in await tqdm.gather(
*[_calculate_sqm_ocr(listing, semaphore) for listing in listings]
)
if listing is not None
]
```
With:
```python
logger.info("Detecting floorplans for %d listings", len(listings))
updated_listings = [
listing
for listing in await asyncio.gather(
*[_calculate_sqm_ocr(listing, semaphore) for listing in listings]
)
if listing is not None
]
logger.info("Finished floorplan detection, %d listings updated", len(updated_listings))
```
**Step 3: Run tests**
Run: `pytest tests/ -v --tb=short -q`
Expected: All tests pass.
**Step 4: Commit**
```bash
git add services/floorplan_detector.py
git commit -m "Replace tqdm with logging in floorplan_detector"
```
---
### Task 4: Replace `tqdm` with logging in `services/route_calculator.py`
**Files:**
- Modify: `services/route_calculator.py`
**Step 1: Replace tqdm import with asyncio and logging**
Replace:
```python
from tqdm.asyncio import tqdm
```
With:
```python
import asyncio
import logging
logger = logging.getLogger(__name__)
```
(Skip if already imported.)
**Step 2: Replace tqdm.gather with asyncio.gather + logging**
Replace:
```python
destination_mode = DestinationMode(destination_address, travel_mode)
updated_listings = await tqdm.gather(
*[update_routing_info(listing, destination_mode) for listing in listings],
total=len(listings),
desc="Updating routing info",
)
```
With:
```python
destination_mode = DestinationMode(destination_address, travel_mode)
logger.info("Calculating routes for %d listings", len(listings))
updated_listings = await asyncio.gather(
*[update_routing_info(listing, destination_mode) for listing in listings],
)
logger.info("Finished route calculation for %d listings", len(listings))
```
**Step 3: Run tests**
Run: `pytest tests/ -v --tb=short -q`
Expected: All tests pass.
**Step 4: Commit**
```bash
git add services/route_calculator.py
git commit -m "Replace tqdm with logging in route_calculator"
```
---
### Task 5: Replace `tqdm` with logging in `repositories/listing_repository.py`
**Files:**
- Modify: `repositories/listing_repository.py`
**Step 1: Replace tqdm import with logging**
Replace:
```python
from tqdm import tqdm
```
With:
```python
import logging
```
Add `logger = logging.getLogger(__name__)` if not already present.
**Step 2: Replace tqdm iterator with plain loop + logging**
Replace:
```python
for listing in tqdm(listings, desc="Upserting listings"):
```
With:
```python
logger.info("Upserting %d listings", len(listings))
for listing in listings:
```
**Step 3: Remove `tqdm` from `pyproject.toml`**
Delete:
```
tqdm = "^4.66.2"
```
**Step 4: Run tests**
Run: `pytest tests/ -v --tb=short -q`
Expected: All tests pass.
**Step 5: Commit**
```bash
git add repositories/listing_repository.py services/image_fetcher.py services/floorplan_detector.py services/route_calculator.py pyproject.toml
git commit -m "Remove tqdm dependency, replace with logging"
```
---
### Task 6: Replace `pandas` with stdlib `csv` in `csv_exporter.py`
**Files:**
- Modify: `csv_exporter.py`
**Step 1: Rewrite csv_exporter.py**
Replace the entire file with:
```python
import csv
import json
from pathlib import Path
from models.listing import QueryParameters
from repositories.listing_repository import ListingRepository
DROP_COLUMNS = {"_sa_instance_state", "additional_info"}
ENSURE_COLUMNS = ("service_charge", "lease_left", "square_meters")
async def export_to_csv(
repository: ListingRepository,
output_file: Path,
query_parameters: QueryParameters | None = None,
) -> None:
listings = await repository.get_listings(query_parameters=query_parameters)
rows = [
{k: v for k, v in listing.__dict__.items() if k not in DROP_COLUMNS}
for listing in listings
]
if not rows:
output_file.write_text("")
return
# Read decisions file if present
decisions: dict[str, str] = {}
decisions_path = Path("data/decisions.json")
if decisions_path.exists():
with open(decisions_path) as f:
decisions = json.load(f)
for row in rows:
# Add decision column
row["decision"] = decisions.get(str(row.get("id")))
# Ensure optional columns exist
for col in ENSURE_COLUMNS:
row.setdefault(col, None)
# Replace -1 sentinel in square_meters
if row.get("square_meters") == -1:
row["square_meters"] = None
# Compute price_per_sqm
sqm = row.get("square_meters")
price = row.get("price")
if sqm and sqm > 0 and price:
row["price_per_sqm"] = round(price / sqm, 2)
else:
row["price_per_sqm"] = None
# Sort by price_per_sqm (None values last)
rows.sort(key=lambda r: (r["price_per_sqm"] is None, r["price_per_sqm"] or 0))
fieldnames = list(rows[0].keys())
with open(output_file, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
```
**Step 2: Remove `pandas` from `pyproject.toml`**
Delete:
```
pandas = "^2.2.1"
```
**Step 3: Run tests**
Run: `pytest tests/ -v --tb=short -q`
Expected: All tests pass.
**Step 4: Commit**
```bash
git add csv_exporter.py pyproject.toml
git commit -m "Replace pandas with stdlib csv in csv_exporter"
```
---
### Task 7: Replace `apprise` with direct HTTP in `notifications.py`
**Files:**
- Modify: `notifications.py`
**Step 1: Rewrite notifications.py**
Replace the entire file with:
```python
import json
import logging
import os
import aiohttp
logger = logging.getLogger(__name__)
async def send_notification(body: str, title: str = "") -> bool:
webhook_url = os.environ.get("SLACK_WEBHOOK_URL")
if not webhook_url:
logger.debug("No SLACK_WEBHOOK_URL configured, skipping notification")
return False
text = f"*{title}*\n{body}" if title else body
try:
async with aiohttp.ClientSession() as session:
async with session.post(
webhook_url,
data=json.dumps({"text": text}),
headers={"Content-Type": "application/json"},
) as resp:
return resp.status == 200
except Exception:
logger.exception("Failed to send Slack notification")
return False
```
**Step 2: Remove `apprise` from `pyproject.toml`**
Delete:
```
apprise = "^1.9.3"
```
**Step 3: Run tests**
Run: `pytest tests/ -v --tb=short -q`
Expected: All tests pass (tests mock `send_notification`).
**Step 4: Commit**
```bash
git add notifications.py pyproject.toml
git commit -m "Replace apprise with direct Slack webhook call"
```
---
### Task 8: Switch `opencv-python` to `opencv-python-headless` and add `httpx` to prod deps
**Files:**
- Modify: `pyproject.toml`
**Step 1: Change opencv-python to headless**
Replace:
```
opencv-python = "^4.11.0.86"
```
With:
```
opencv-python-headless = "^4.11.0.86"
```
**Step 2: Move httpx to production dependencies**
Add under `[tool.poetry.dependencies]`:
```
httpx = "^0.27.0"
```
Remove `httpx` from `[tool.poetry.group.dev.dependencies]`.
**Step 3: Commit**
```bash
git add pyproject.toml
git commit -m "Switch to opencv-python-headless, add httpx to prod deps"
```
---
### Task 9: Regenerate requirements.txt
**Files:**
- Modify: `requirements.txt`
**Step 1: Regenerate requirements.txt from updated pyproject.toml**
Run:
```bash
docker compose exec -T app poetry export -f requirements.txt -o requirements.txt
```
If poetry is not available in the container, run locally:
```bash
poetry lock --no-update && poetry export -f requirements.txt -o requirements.txt
```
**Step 2: Verify the removed packages are gone**
Run:
```bash
grep -i "watchdog\|tqdm\|pandas\|apprise\|opencv-python==" requirements.txt
```
Expected: No matches for watchdog, tqdm, pandas, apprise. Only `opencv-python-headless` should appear.
**Step 3: Commit**
```bash
git add requirements.txt pyproject.toml poetry.lock
git commit -m "Regenerate requirements.txt after dependency cleanup"
```
---
### Task 10: Optimize backend Dockerfile (uv, cache mounts, non-root, healthcheck)
**Files:**
- Modify: `Dockerfile`
**Step 1: Rewrite the Dockerfile**
Replace the entire `Dockerfile` with:
```dockerfile
# syntax=docker/dockerfile:1
# Stage 1: Install build tools and Python dependencies
FROM python:3.13-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
build-essential \
gcc \
python3-dev \
libopencv-dev \
libmariadb-dev
WORKDIR /app
COPY requirements.txt ./
# Install dependencies into a venv using uv (10-25x faster than pip)
RUN python -m venv /app/.venv && \
--mount=type=cache,target=/root/.cache/uv \
/app/.venv/bin/uv pip install -r requirements.txt
# Stage 2: Runtime system dependencies (runs in parallel with builder)
FROM python:3.13-slim AS runtime-base
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt/lists,sharing=locked \
apt-get update && apt-get install -y --no-install-recommends \
libglib2.0-0 \
tesseract-ocr \
tesseract-ocr-eng \
libmariadb3 \
curl
# Stage 3: Test — runtime deps + venv + test dependencies + run tests
FROM runtime-base AS test
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
RUN --mount=type=cache,target=/root/.cache/uv \
uv pip install --python /app/.venv/bin/python pytest pytest-asyncio pytest-xdist httpx aioresponses fakeredis
COPY . .
RUN pytest tests/ -x -q
# Stage 4: Final image — combine venv from builder + runtime base
FROM runtime-base
RUN adduser --system --no-create-home appuser
WORKDIR /app
# Copy the venv from the builder stage
COPY --from=builder /app/.venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"
# Copy the application code
COPY . .
RUN chown -R appuser /app
USER appuser
EXPOSE 5001
HEALTHCHECK --interval=30s --timeout=10s --retries=3 --start-period=40s \
CMD curl -f http://localhost:5001/api/status || exit 1
CMD ["sh", "-c", "alembic upgrade head && uvicorn api.app:app --host 0.0.0.0 --port 5001 --no-server-header"]
```
Key changes:
- `# syntax=docker/dockerfile:1` for BuildKit
- `uv` instead of `pip` (copied from official image)
- BuildKit cache mounts for apt and uv
- Removed `libgl1` (not needed with opencv-python-headless)
- Added `curl` to runtime-base (for healthcheck)
- Non-root `appuser`
- `HEALTHCHECK` using existing `/api/status` endpoint
**Step 2: Test the build locally**
Run:
```bash
DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test .
```
Expected: Build completes. Tests pass in the test stage.
**Step 3: Commit**
```bash
git add Dockerfile
git commit -m "Optimize Dockerfile: uv, BuildKit cache mounts, non-root user, healthcheck"
```
---
### Task 11: Optimize frontend Dockerfile (cache mounts, non-root, healthcheck)
**Files:**
- Modify: `frontend/Dockerfile`
**Step 1: Rewrite the frontend Dockerfile**
Replace the entire `frontend/Dockerfile` with:
```dockerfile
# syntax=docker/dockerfile:1
# Stage 1: Install dependencies (cached if package-lock.json unchanged)
FROM node:24-alpine AS deps
WORKDIR /app
# Limit Node.js heap to avoid OOM in constrained CI environments
ENV NODE_OPTIONS="--max-old-space-size=1024"
# Copy package files first for better layer caching
COPY package.json package-lock.json* ./
RUN --mount=type=cache,target=/root/.npm \
npm ci
# Stage 2: Run tests (fails the build if tests fail)
FROM deps AS test
COPY . .
RUN npx vitest run
# Stage 3: Build production bundle
FROM deps AS builder
COPY . .
# Skip tsc type-checking (vitest already validated); Vite transpiles via SWC
RUN npx vite build
# Stage 4: Serve with nginx
FROM nginx:alpine
# Remove default nginx static files
RUN rm -rf /usr/share/nginx/html/*
WORKDIR /app
COPY --from=builder /app/dist /usr/share/nginx/html
COPY --from=builder /app/nginx.conf /etc/nginx/conf.d/default.conf
RUN chown -R nginx:nginx /usr/share/nginx/html
USER nginx
EXPOSE 80
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:80/ || exit 1
CMD ["nginx", "-g", "daemon off;"]
```
Key changes:
- `# syntax=docker/dockerfile:1` for BuildKit
- npm cache mount (`/root/.npm`)
- Non-root `nginx` user
- `HEALTHCHECK` with wget (available in alpine)
**Step 2: Test the build locally**
Run:
```bash
DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/
```
Expected: Build completes. Tests pass.
**Step 3: Commit**
```bash
git add frontend/Dockerfile
git commit -m "Optimize frontend Dockerfile: npm cache mount, non-root nginx, healthcheck"
```
---
### Task 12: Clean up frontend package.json
**Files:**
- Modify: `frontend/package.json`
**Step 1: Remove unused dependencies**
Remove from `dependencies`:
- `@tabler/icons-react`
- `crossfilter2`
- `date-fns`
**Step 2: Move @types to devDependencies**
Move from `dependencies` to `devDependencies`:
- `@types/crossfilter`
- `@types/d3`
- `@types/mapbox-gl`
- `@types/turf`
**Step 3: Remove @types/crossfilter (crossfilter2 removed)**
Since `crossfilter2` is being removed, also remove `@types/crossfilter` entirely.
**Step 4: Reinstall to update lockfile**
Run:
```bash
cd frontend && npm install
```
**Step 5: Run frontend tests**
Run:
```bash
cd frontend && npx vitest run
```
Expected: All tests pass.
**Step 6: Commit**
```bash
git add frontend/package.json frontend/package-lock.json
git commit -m "Remove unused frontend deps, move @types to devDependencies"
```
---
### Task 13: Expand frontend .dockerignore
**Files:**
- Modify: `frontend/.dockerignore`
**Step 1: Add additional exclusions**
Replace `frontend/.dockerignore` with:
```
node_modules
dist
.vite
coverage
.git
.env.local
.env
*.md
```
**Step 2: Commit**
```bash
git add frontend/.dockerignore
git commit -m "Expand frontend .dockerignore to exclude build artifacts"
```
---
### Task 14: Enable BuildKit in Drone CI API pipeline
**Files:**
- Modify: `.drone.yml`
**Step 1: Add DOCKER_BUILDKIT environment to API build step**
In the `.drone.yml` file, under the `api` pipeline's "Build and test API image" step, add the `build_args` setting:
```yaml
- name: Build and test API image
image: plugins/docker
environment:
DOCKER_BUILDKIT: 1
settings:
username: viktorbarzin
password:
from_secret: dockerhub-token
repo: viktorbarzin/realestatecrawler
dockerfile: Dockerfile
context: .
cache_from:
- viktorbarzin/realestatecrawler:latest
- viktorbarzin/realestatecrawler:builder
tags:
- latest
- builder
- "${DRONE_BUILD_NUMBER}"
```
**Step 2: Commit**
```bash
git add .drone.yml
git commit -m "Enable BuildKit in Drone API pipeline"
```
---
### Task 15: Final integration test
**Step 1: Build both images locally**
```bash
DOCKER_BUILDKIT=1 docker build -t realestate-crawler:test .
DOCKER_BUILDKIT=1 docker build -t immoweb:test frontend/
```
Expected: Both build successfully.
**Step 2: Run docker compose up to verify everything works**
```bash
docker compose up -d --build
```
**Step 3: Run full test suite**
```bash
docker compose exec -T app pytest tests/ -v --tb=short
```
Expected: All tests pass.
**Step 4: Verify image sizes (informational)**
```bash
docker images | grep -E "realestate|immoweb"
```
**Step 5: Commit any remaining changes if needed**
```bash
git status
```