Add time-to-first-property implementation plan

7 tasks: K8s resource limits, adaptive first batch, deferred decision fetch, Server-Timing headers, bulk POI distances endpoint, frontend waterfall elimination, and end-to-end verification.
2026-02-22 13:16:42 +00:00 · 2026-02-22 13:16:42 +00:00 · 8db7b60493
commit 8db7b60493
parent 6d653dba63
1 changed files with 683 additions and 0 deletions
--- a/docs/plans/2026-02-22-time-to-first-property-plan.md
+++ b/docs/plans/2026-02-22-time-to-first-property-plan.md
@ -0,0 +1,683 @@
+# Time-to-First-Property Performance Optimization — Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Reduce time-to-first-property from 5s+ to under 1s by eliminating CPU throttling, frontend waterfalls, and pre-stream blocking work.
+
+**Architecture:** Six independent changes: (1) K8s resource limits fix, (2) adaptive first-batch sizing on the backend, (3) deferred decision-ID fetch, (4) Server-Timing headers, (5) bulk POI distances endpoint, (6) frontend waterfall elimination. Tasks 1-4 are backend-only; task 5 spans backend+frontend; task 6 is frontend-only.
+
+**Tech Stack:** Python/FastAPI, Redis, K8s YAML, React/TypeScript, NDJSON streaming
+
+---
+
+### Task 1: Set Explicit API Pod Resources in K8s
+
+The `realestate-crawler-api` deployment inherits LimitRange defaults (500m CPU / 1Gi memory). The 500m CPU limit causes kernel throttling during streaming requests (~240-460ms lost per request).
+
+**Files:**
+- Create: `k8s/api-deployment.yaml`
+
+**Step 1: Create the K8s deployment manifest**
+
+Extract the current deployment, then set explicit resources. Run:
+
+```bash
+KUBECONFIG=.config kubectl get deployment realestate-crawler-api -n realestate-crawler -o yaml > k8s/api-deployment.yaml
+```
+
+Then edit `k8s/api-deployment.yaml` to set these resource values on the container spec:
+
+```yaml
+resources:
+  requests:
+    cpu: "50m"
+    memory: "128Mi"
+  limits:
+    cpu: "2000m"
+    memory: "1Gi"
+```
+
+Remove server-managed fields (`managedFields`, `resourceVersion`, `uid`, `creationTimestamp`, `generation`, `status`) to make it a clean declarative manifest.
+
+**Step 2: Also create a manifest for celery-beat**
+
+Extract and clean up `realestate-crawler-celery-beat` the same way. It only needs minimal resources:
+
+```yaml
+resources:
+  requests:
+    cpu: "10m"
+    memory: "64Mi"
+  limits:
+    cpu: "200m"
+    memory: "256Mi"
+```
+
+**Step 3: Apply and verify**
+
+```bash
+KUBECONFIG=.config kubectl apply -f k8s/api-deployment.yaml
+KUBECONFIG=.config kubectl rollout status deployment/realestate-crawler-api -n realestate-crawler
+```
+
+Verify throttling is eliminated by re-running the performance test from the design doc and checking `nr_throttled` delta is 0.
+
+**Step 4: Commit**
+
+```bash
+git add k8s/api-deployment.yaml k8s/celery-beat-deployment.yaml
+git commit -m "Set explicit resource limits for API and celery-beat pods
+
+Overrides LimitRange defaults (500m CPU) which caused kernel CPU
+throttling during streaming requests. API gets 2000m CPU limit,
+celery-beat gets 200m."
+```
+
+---
+
+### Task 2: Adaptive First Batch Size
+
+Send a small "primer" batch (5 features) immediately, then switch to normal batch size. This reduces first-property latency by ~10x on the server side.
+
+**Files:**
+- Modify: `api/app.py:55` (add `FIRST_BATCH_SIZE` constant)
+- Modify: `api/app.py:271-311` (`_stream_from_cache`)
+- Modify: `api/app.py:314-383` (`_stream_from_db`)
+- Test: `tests/test_listing_geojson.py`
+
+**Step 1: Write the failing test**
+
+Add to `tests/test_listing_geojson.py` in `TestStreamingEndpoint`:
+
+```python
+def test_first_batch_is_smaller(self, client, mock_repository):
+    """Test that the first batch is smaller than subsequent batches for fast first paint."""
+    response = client.get("/api/listing_geojson/stream?listing_type=RENT&batch_size=50&limit=10")
+    lines = response.text.strip().split("\n")
+    messages = [json.loads(line) for line in lines]
+
+    batch_messages = [m for m in messages if m["type"] == "batch"]
+    assert len(batch_messages) >= 1
+
+    # First batch should contain FIRST_BATCH_SIZE (5) or fewer features
+    from api.app import FIRST_BATCH_SIZE
+    assert len(batch_messages[0]["features"]) <= FIRST_BATCH_SIZE
+```
+
+**Step 2: Run test to verify it fails**
+
+```bash
+pytest tests/test_listing_geojson.py::TestStreamingEndpoint::test_first_batch_is_smaller -v
+```
+
+Expected: FAIL — `FIRST_BATCH_SIZE` not defined, or first batch has 50 features.
+
+**Step 3: Implement adaptive batching**
+
+In `api/app.py`, add the constant near line 55:
+
+```python
+FIRST_BATCH_SIZE = 5
+```
+
+In `_stream_from_cache` (around line 291), replace the single-batch-size loop with adaptive logic:
+
+```python
+count = 0
+is_first_batch = True
+for feature_batch in get_cached_features(query_parameters, batch_size=batch_size):
+    # Apply decision filtering
+    if decision_filter != "everything":
+        feature_batch = [
+            f for f in feature_batch
+            if _should_include(
+                f.get("properties", {}).get("id", 0),
+                decision_filter,
+                disliked_ids,
+                liked_ids,
+            )
+        ]
+    if limit and count + len(feature_batch) > limit:
+        feature_batch = feature_batch[:limit - count]
+    count += len(feature_batch)
+    if feature_batch:
+        if is_first_batch:
+            # Yield a small primer batch for fast first paint
+            yield json.dumps({"type": "batch", "features": feature_batch[:FIRST_BATCH_SIZE]}) + "\n"
+            remainder = feature_batch[FIRST_BATCH_SIZE:]
+            if remainder:
+                yield json.dumps({"type": "batch", "features": remainder}) + "\n"
+            is_first_batch = False
+        else:
+            yield json.dumps({"type": "batch", "features": feature_batch}) + "\n"
+    if limit and count >= limit:
+        break
+```
+
+In `_stream_from_db` (around line 358-366), replace the batch accumulation logic:
+
+```python
+count = 0
+batch: list[dict] = []
+is_first_batch = True
+current_batch_target = FIRST_BATCH_SIZE
+for row in repository.stream_listings_optimized(
+    query_parameters, limit=limit, page_size=batch_size
+):
+    feature = convert_row_to_geojson(row, query_parameters.listing_type.value)
+    if poi_distances_lookup and row['id'] in poi_distances_lookup:
+        feature['properties']['poi_distances'] = poi_distances_lookup[row['id']]
+
+    if not _should_include(row['id'], decision_filter, disliked_ids, liked_ids):
+        if staging_key:
+            cache_features_batch_staged(staging_key, [feature])
+        continue
+
+    batch.append(feature)
+    count += 1
+
+    if len(batch) >= current_batch_target:
+        if staging_key:
+            cache_features_batch_staged(staging_key, batch)
+        yield json.dumps({"type": "batch", "features": batch}) + "\n"
+        batch = []
+        if is_first_batch:
+            is_first_batch = False
+            current_batch_target = batch_size
+```
+
+**Step 4: Run tests**
+
+```bash
+pytest tests/test_listing_geojson.py -v
+```
+
+Expected: All pass, including `test_first_batch_is_smaller`.
+
+**Step 5: Commit**
+
+```bash
+git add api/app.py tests/test_listing_geojson.py
+git commit -m "Send smaller first batch (5 features) for faster first paint
+
+Subsequent batches use the normal batch_size (default 50). This
+reduces server-side time-to-first-property by ~10x since only 5
+features need to be serialized before the first yield."
+```
+
+---
+
+### Task 3: Defer Decision ID Fetch
+
+Move the decision ID lookup out of the blocking pre-stream path. Start streaming immediately, fetch decision IDs concurrently.
+
+**Files:**
+- Modify: `api/app.py:385-459` (`stream_listing_geojson`)
+- Modify: `api/app.py:271-311` (`_stream_from_cache`)
+- Modify: `api/app.py:314-383` (`_stream_from_db`)
+- Test: `tests/test_listing_geojson.py`
+
+**Step 1: Write the failing test**
+
+Add to `tests/test_listing_geojson.py` in `TestStreamingEndpoint`:
+
+```python
+def test_streaming_still_filters_disliked(self, client):
+    """Test that disliked listings are filtered even with deferred decision loading."""
+    with patch("api.app.get_cached_count", return_value=None), \
+         patch("api.app._get_user_id_safe", return_value=42), \
+         patch("api.app.ListingRepository") as MockRepo, \
+         patch("api.app.DecisionRepository"), \
+         patch("api.app.decision_service") as mock_ds:
+
+        mock_ds.get_disliked_ids.return_value = {2}
+
+        mock_instance = MagicMock()
+        mock_instance.count_listings.return_value = 3
+        mock_instance.stream_listings_optimized.return_value = iter([
+            {
+                'id': 1, 'price': 2000.0, 'number_of_bedrooms': 2,
+                'square_meters': 50.0, 'longitude': -0.1, 'latitude': 51.5,
+                'photo_thumbnail': None, 'last_seen': datetime.now(),
+                'agency': None, 'price_history_json': '[]', 'available_from': None,
+            },
+            {
+                'id': 2, 'price': 2500.0, 'number_of_bedrooms': 2,
+                'square_meters': 60.0, 'longitude': -0.12, 'latitude': 51.51,
+                'photo_thumbnail': None, 'last_seen': datetime.now(),
+                'agency': None, 'price_history_json': '[]', 'available_from': None,
+            },
+        ])
+        MockRepo.return_value = mock_instance
+
+        response = client.get("/api/listing_geojson/stream?listing_type=RENT&limit=10")
+        messages = [json.loads(line) for line in response.text.strip().split("\n")]
+
+        all_features = []
+        for m in messages:
+            if m["type"] == "batch":
+                all_features.extend(m["features"])
+
+        # Listing 2 should be filtered out
+        feature_ids = [f["properties"]["id"] for f in all_features]
+        assert 2 not in feature_ids
+        assert 1 in feature_ids
+```
+
+**Step 2: Run test to verify it passes (existing behavior)**
+
+```bash
+pytest tests/test_listing_geojson.py::TestStreamingEndpoint::test_streaming_still_filters_disliked -v
+```
+
+Expected: PASS — this is a behavior-preservation test.
+
+**Step 3: Refactor to defer decision loading**
+
+In `stream_listing_geojson` (line ~415-431), replace the synchronous decision loading with a lazy loader pattern. Pass the user email and decision_filter to the generators, and let them fetch IDs on demand:
+
+Change the `stream_listing_geojson` endpoint to pass `user_email` and `decision_filter` to the generators instead of pre-fetched ID sets. The generators call `_get_user_id_safe` and `decision_service` internally after yielding the metadata message.
+
+In both `_stream_from_cache` and `_stream_from_db`, add parameters `user_email: str | None = None` and `decision_filter: str = "all"`. After yielding the metadata message, resolve the decision IDs:
+
+```python
+# Resolve decision IDs (deferred to after metadata is sent)
+disliked_ids: set[int] | None = None
+liked_ids: set[int] | None = None
+if decision_filter != "everything" and user_email:
+    user_id = _get_user_id_safe(user_email)
+    if user_id is not None:
+        decision_repo = DecisionRepository(engine)
+        listing_type_str = query_parameters.listing_type.value
+        if decision_filter == "liked":
+            liked_ids = decision_service.get_liked_ids(
+                decision_repo, user_id, listing_type_str
+            )
+        else:
+            disliked_ids = decision_service.get_disliked_ids(
+                decision_repo, user_id, listing_type_str
+            )
+```
+
+Remove the `disliked_ids` and `liked_ids` parameters from the generator signatures and the pre-fetch from `stream_listing_geojson`.
+
+**Step 4: Run all tests**
+
+```bash
+pytest tests/test_listing_geojson.py -v
+```
+
+Expected: All pass — decision filtering still works, but now happens after the metadata message.
+
+**Step 5: Commit**
+
+```bash
+git add api/app.py tests/test_listing_geojson.py
+git commit -m "Defer decision ID fetch to after metadata message
+
+Decision IDs are now loaded inside the streaming generators after
+the metadata message is yielded, eliminating a blocking DB query
+from the pre-stream path (~100-200ms improvement to TTFB)."
+```
+
+---
+
+### Task 4: Add Server-Timing Headers
+
+Add `Server-Timing` headers to the streaming response so latency breakdown is visible in browser DevTools.
+
+**Files:**
+- Modify: `api/app.py:385-459` (`stream_listing_geojson`)
+- Test: `tests/test_listing_geojson.py`
+
+**Step 1: Write the failing test**
+
+Add to `TestStreamingEndpoint`:
+
+```python
+def test_streaming_includes_server_timing_header(self, client, mock_repository):
+    """Test that streaming response includes Server-Timing header."""
+    response = client.get("/api/listing_geojson/stream?listing_type=RENT&limit=10")
+    assert response.status_code == 200
+    assert "server-timing" in response.headers
+    # Should contain at least the cache_check timing
+    assert "cache_check" in response.headers["server-timing"]
+```
+
+**Step 2: Run test to verify it fails**
+
+```bash
+pytest tests/test_listing_geojson.py::TestStreamingEndpoint::test_streaming_includes_server_timing_header -v
+```
+
+Expected: FAIL — no `server-timing` header.
+
+**Step 3: Implement Server-Timing**
+
+In `stream_listing_geojson`, add timing around the cache check and build a `Server-Timing` header:
+
+```python
+import time
+
+# ... inside stream_listing_geojson, after rate limit logic:
+timings: list[str] = []
+
+t0 = time.monotonic()
+cached_count = get_cached_count(query_parameters)
+timings.append(f"cache_check;dur={(time.monotonic() - t0) * 1000:.1f}")
+
+# ... rest of the logic to choose generator ...
+
+return StreamingResponse(
+    generator,
+    media_type="application/x-ndjson",
+    headers={
+        "Cache-Control": "no-cache",
+        "X-Accel-Buffering": "no",
+        "Server-Timing": ", ".join(timings),
+    }
+)
+```
+
+**Step 4: Run tests**
+
+```bash
+pytest tests/test_listing_geojson.py -v
+```
+
+Expected: All pass.
+
+**Step 5: Commit**
+
+```bash
+git add api/app.py tests/test_listing_geojson.py
+git commit -m "Add Server-Timing header to streaming endpoint
+
+Reports cache_check latency in the response header, visible in
+browser DevTools Network tab for ongoing performance monitoring."
+```
+
+---
+
+### Task 5: Bulk POI Distances Endpoint
+
+Add a new endpoint that returns all POI distances for a user in one call, keyed by listing ID. This decouples POI distances from the listing stream, allowing the stream to always hit the Redis cache.
+
+**Files:**
+- Modify: `api/poi_routes.py` (add new endpoint)
+- Modify: `frontend/src/services/poiService.ts` (add bulk fetch function)
+- Modify: `frontend/src/services/index.ts` (re-export)
+- Modify: `frontend/src/constants/index.ts` (add endpoint constant)
+- Test: `tests/unit/test_poi_validation.py` or new test file
+
+**Step 1: Write the failing test**
+
+Create or add to a test file for the POI routes:
+
+```python
+# In tests/test_listing_geojson.py or a new tests/test_poi_distances_bulk.py
+
+def test_bulk_poi_distances_returns_dict(client):
+    """Test bulk POI distances endpoint returns listing_id -> distances mapping."""
+    with patch("api.poi_routes._get_user_id", return_value=42), \
+         patch("api.poi_routes.POIRepository") as MockRepo, \
+         patch("api.poi_routes.poi_service") as mock_svc:
+
+        mock_pois = [MagicMock(id=1, name="Office")]
+        mock_svc.get_user_pois.return_value = mock_pois
+        mock_svc.get_distances_for_listings.return_value = [
+            MagicMock(listing_id=100, poi_id=1, travel_mode="WALK",
+                      duration_seconds=600, distance_meters=800),
+        ]
+
+        mock_repo_instance = MagicMock()
+        MockRepo.return_value = mock_repo_instance
+        mock_repo_instance.get_listing_ids.return_value = [100, 200]
+
+        response = client.get("/api/poi/distances/bulk?listing_type=RENT")
+        assert response.status_code == 200
+        data = response.json()
+        assert "100" in data  # JSON keys are strings
+        assert data["100"][0]["poi_name"] == "Office"
+```
+
+**Step 2: Run test to verify it fails**
+
+```bash
+pytest tests/test_poi_distances_bulk.py -v  # or wherever you placed it
+```
+
+Expected: FAIL — endpoint doesn't exist.
+
+**Step 3: Implement the bulk endpoint**
+
+In `api/poi_routes.py`, add:
+
+```python
+@poi_router.get("/distances/bulk")
+async def get_bulk_distances(
+    user: Annotated[User, Depends(get_current_user)],
+    listing_type: ListingType = ListingType.RENT,
+) -> dict[int, list[POIDistanceResponse]]:
+    """Get all POI distances for the current user, keyed by listing ID.
+
+    Used by the frontend to decouple POI distance loading from the main
+    listing stream, allowing the stream to always hit the Redis cache.
+    """
+    user_id = _get_user_id(user)
+    repo = POIRepository(engine)
+    pois = {p.id: p for p in poi_service.get_user_pois(repo, user_id)}
+    if not pois:
+        return {}
+
+    from repositories.listing_repository import ListingRepository
+    from database import engine as db_engine
+    listing_repo = ListingRepository(db_engine)
+    all_ids = list(listing_repo.get_listing_ids(listing_type))
+    if not all_ids:
+        return {}
+
+    distances = poi_service.get_distances_for_listings(
+        repo, all_ids, listing_type, user_id
+    )
+
+    result: dict[int, list[POIDistanceResponse]] = {}
+    for d in distances:
+        poi_name = pois[d.poi_id].name if d.poi_id in pois else "Unknown"
+        result.setdefault(d.listing_id, []).append(
+            POIDistanceResponse(
+                poi_id=d.poi_id,
+                poi_name=poi_name,
+                travel_mode=d.travel_mode,
+                duration_seconds=d.duration_seconds,
+                distance_meters=d.distance_meters,
+            )
+        )
+    return result
+```
+
+Note: The `/distances/bulk` route must be registered before `/distances` (which takes query params) to avoid route conflicts. Verify the order in the file.
+
+**Step 4: Add frontend fetch function**
+
+In `frontend/src/services/poiService.ts`, add:
+
+```typescript
+export async function fetchBulkPOIDistances(
+  user: AuthUser,
+  listingType: 'RENT' | 'BUY' = 'RENT'
+): Promise<Record<number, POIDistanceInfo[]>> {
+  return apiRequest<Record<number, POIDistanceInfo[]>>(user, '/api/poi/distances/bulk', {
+    params: { listing_type: listingType },
+  });
+}
+```
+
+In `frontend/src/services/index.ts`, add `fetchBulkPOIDistances` to the `poiService` re-export line.
+
+**Step 5: Run tests**
+
+```bash
+pytest tests/ -v --tb=short -k "poi_distances_bulk or streaming"
+```
+
+Expected: All pass.
+
+**Step 6: Commit**
+
+```bash
+git add api/poi_routes.py frontend/src/services/poiService.ts frontend/src/services/index.ts
+git commit -m "Add bulk POI distances endpoint for decoupled loading
+
+New GET /api/poi/distances/bulk returns all POI distances keyed by
+listing ID, allowing the frontend to fetch distances separately
+from the listing stream and keep the stream on the cached path."
+```
+
+---
+
+### Task 6: Eliminate Frontend Waterfall
+
+Remove the sequential dependency: POI fetch → loadListings. Fire both in parallel. Merge POI distances into features after both resolve.
+
+**Files:**
+- Modify: `frontend/src/App.tsx:120-203` (auto-load, loadListings, POI merge)
+- Modify: `frontend/src/services/streamingService.ts:68-73` (remove includePoiDistances)
+
+**Step 1: Modify `loadListings` to never request POI distances inline**
+
+In `frontend/src/App.tsx`, change the `loadListings` callback. Remove `includePoiDistances` from the streaming call — it should always be `false`:
+
+```typescript
+for await (const batch of streamListingGeoJSON(user, parameters, (progress) => {
+  setStreamingProgress(progress);
+}, { signal: controller.signal })) {
+```
+
+Remove `includePoiDistances` from the options entirely.
+
+**Step 2: Fire loadListings immediately on user auth, don't wait for POIs**
+
+Change the auto-load `useEffect` (around line 259-272) to fire `loadListings` immediately. Remove `userPOIs` from the `loadListings` dependency array:
+
+```typescript
+// Auto-load data with default filters when user is authenticated
+useEffect(() => {
+  if (!user || initialLoadTriggeredRef.current) {
+    return;
+  }
+  initialLoadTriggeredRef.current = true;
+
+  const defaultParams: ParameterValues = {
+    ...DEFAULT_FILTER_VALUES,
+    available_from: new Date(),
+  };
+
+  loadListings(defaultParams);
+}, [user, loadListings]);
+```
+
+The `loadListings` `useCallback` dependency array should no longer include `userPOIs`.
+
+**Step 3: Add POI distance merging after both resolve**
+
+Add a new `useEffect` that triggers when both `listingData` and `userPOIs` are available. It fetches bulk POI distances and merges them into the existing features:
+
+```typescript
+// Merge POI distances into listing data after both are available
+useEffect(() => {
+  if (!user || !listingData || userPOIs.length === 0) return;
+
+  let cancelled = false;
+  fetchBulkPOIDistances(user, queryParameters?.listing_type as 'RENT' | 'BUY' ?? 'RENT')
+    .then((distanceLookup) => {
+      if (cancelled) return;
+      // Merge distances into existing features
+      const updatedFeatures = accumulatedFeaturesRef.current.map(feature => {
+        const id = feature.properties?.id;
+        if (id && distanceLookup[id]) {
+          return {
+            ...feature,
+            properties: {
+              ...feature.properties,
+              poi_distances: distanceLookup[id],
+            },
+          };
+        }
+        return feature;
+      });
+      accumulatedFeaturesRef.current = updatedFeatures;
+      setListingData({
+        type: 'FeatureCollection',
+        features: [...updatedFeatures],
+      });
+    })
+    .catch(() => {}); // POI distances are best-effort
+
+  return () => { cancelled = true; };
+}, [user, listingData?.features.length, userPOIs.length]);
+```
+
+Import `fetchBulkPOIDistances` from `@/services`.
+
+**Step 4: Clean up streamingService**
+
+In `frontend/src/services/streamingService.ts`, remove the `includePoiDistances` option from the function signature and query string building (lines 75-77):
+
+Remove:
+```typescript
+if (options?.includePoiDistances) {
+  params.include_poi_distances = 'true';
+}
+```
+
+Update the options type to remove `includePoiDistances`.
+
+**Step 5: Run frontend tests**
+
+```bash
+cd frontend && npx vitest run --reporter=verbose
+```
+
+Expected: All pass (some tests may need the `includePoiDistances` option removed from their calls).
+
+**Step 6: Commit**
+
+```bash
+git add frontend/src/App.tsx frontend/src/services/streamingService.ts
+git commit -m "Eliminate frontend POI waterfall for faster initial load
+
+Listing stream fires immediately on auth without waiting for POI
+fetch. POI distances are fetched separately and merged into features
+after both resolve. This saves ~200-500ms on initial load and keeps
+the stream on the cached Redis path."
+```
+
+---
+
+### Task 7: Verify End-to-End
+
+**Step 1: Re-run the performance test from the design doc**
+
+Port-forward to the API pod and run the same Python timing script from the design diagnostics, comparing before/after:
+
+```bash
+KUBECONFIG=.config kubectl port-forward deployment/realestate-crawler-api -n realestate-crawler 15001:5001 &
+# Run timing script...
+```
+
+Verify:
+- `nr_throttled` delta is 0 during the request (Task 1)
+- First batch arrives within ~100ms of connection (Tasks 2+3)
+- `Server-Timing` header is present (Task 4)
+
+**Step 2: Test in browser**
+
+Open the app, clear cache, log in, and observe:
+- First property markers appear within ~1s
+- POI distances populate shortly after
+- All filters work correctly
+
+**Step 3: Commit the design doc update with results**
+
+Update `docs/plans/2026-02-22-time-to-first-property-design.md` with measured results and commit.