Add time-to-first-property performance optimization design

Measured baseline: 877ms TTFP cold, 334ms warm, with CPU throttling
confirmed (500m limit, 10-12 throttle events per streaming request).
Plan covers: CPU limit bump, frontend waterfall elimination, adaptive
first batch, deferred decision filtering, POI decoupling, and
Server-Timing headers.

2026-02-22 13:14:01 +00:00

6.4 KiB

Raw Blame History

Time-to-First-Property Performance Optimization

Problem

Initial page load takes 5+ seconds before the first property appears on screen, even when the Redis GeoJSON cache is warm. The target is sub-500ms time-to-first-property.

Measured Baseline (2026-02-22)

10,000 RENT listings, streaming endpoint via port-forward (no ingress overhead):

Metric	Cold (DB)	Warm (Cache)
Time to first property	877ms	334ms
Total stream time	13,411ms	2,060ms
Throughput	745 feat/s	4,854 feat/s

Real-world user experience adds: Traefik ingress + TLS (~50-100ms), frontend auth waterfall (~100-300ms), POI fetch blocking stream start (~200-500ms), and browser NDJSON parsing + rendering. Total: 5+ seconds.

Root Causes

1. CPU Throttling (High Impact)

The API pods have no explicit resource declarations. The namespace LimitRange (created 2026-02-21) silently injects defaults:

CPU limit: 500m (0.5 cores)
Memory limit: 1Gi

Measured throttling during a single streaming request:

Path	Throttle events	CPU time stolen
Cold (DB)	+12	+240ms
Warm (cache)	+10	+462ms

Cgroup proof: cpu.max = 50000/100000, nr_throttled increases by 10-12 per request.

Current pod usage at idle is only 3m CPU / 134-159Mi memory, so the limit only bites during streaming bursts when the pod needs to serialize thousands of GeoJSON features.

2. Frontend Waterfall (Medium Impact)

The request chain is sequential:

Page load -> Auth check -> user state -> POI fetch (blocks) -> loadListings()

loadListings() reads userPOIs.length to set includePoiDistances=true, so it cannot fire until the POI fetch resolves. When POI distances are requested, the Redis cache is bypassed entirely, forcing a DB path every time.

3. Pre-Stream Server Work (Medium Impact)

Before the first byte is sent, stream_listing_geojson() synchronously:

Fetches disliked/liked IDs from DB (decision_service.get_disliked_ids)
Checks Redis cache count
If POIs enabled: builds full POI distances lookup table

Steps 1 and 3 are blocking DB queries that delay the metadata message.

4. Batch Size (Low Impact)

First batch requires 50 features to accumulate before yielding. At 500m CPU, building 50 GeoJSON features from cache (Redis LRANGE + JSON parse + re-serialize) takes measurable time.

Design

Change 1: Set Explicit API Pod Resources

Add explicit resource requests/limits to the realestate-crawler-api deployment, overriding the LimitRange defaults.

Proposed values:

Requests: 50m CPU, 128Mi memory
Limits: 2000m CPU, 1Gi memory

This gives the pod 4x the current CPU headroom during streaming bursts while keeping requests modest for scheduling. The celery-beat pod should also get explicit resources since it only needs minimal CPU.

Implementation: K8s deployment manifest stored in k8s/api-deployment.yaml and applied via kubectl.

Change 2: Decouple POI Distances from Listing Stream

Split POI distance fetching into a separate request so the listing stream can fire immediately and hit the cache.

loadListings() always sends includePoiDistances=false
New endpoint GET /api/listing_poi_distances?listing_type=RENT returns {<listing_id>: [distances...]} for the user's configured POIs
Frontend fetches POI distances in parallel, merges into existing features when both resolve
Redis cache is no longer bypassed when POIs are configured

Trade-off: POI-based distance filters won't apply until the POI distances request completes (~200-300ms after first property renders). Acceptable since the user sees data immediately.

Change 3: Adaptive First Batch Size

Send a small primer batch (5 features) immediately, then switch to normal batch size (50) for throughput.

Backend change in _stream_from_cache() and _stream_from_db():

First yield: after 5 features
Subsequent yields: after batch_size features

Expected improvement: first property rendered ~10x faster on the server side.

Change 4: Defer Decision ID Fetch

Move the decision ID lookup (disliked/liked sets) out of the blocking pre-stream path:

Send metadata message immediately
Fetch decision IDs concurrently with the first (small) batch
Apply decision filtering starting from the second batch
The first 5 features may include a disliked listing, but client-side filtering in processedListingData already handles this

Change 5: Eliminate Frontend POI Waterfall

Current: useEffect(user) -> fetchPOIs() -> loadListings()

Proposed: Fire both in parallel on user auth:

useEffect(user) -> fetchPOIs()          (async, no blocking)
                -> loadListings()        (fires immediately)

POI distances arrive separately (Change 2), so the stream doesn't need to wait.

Change 6: Server-Timing Headers

Add Server-Timing headers to the streaming response for ongoing observability:

Server-Timing: auth;dur=12, decisions;dur=85, cache_check;dur=3, first_batch;dur=42

Visible in browser DevTools Network tab without any frontend code changes.

Expected Impact

Change	TTFP improvement	Effort
CPU limit bump (500m -> 2000m)	~200-400ms (eliminates throttling)	Low
Eliminate POI waterfall	~200-500ms (parallel fetch)	Medium
Adaptive first batch (50 -> 5)	~150-300ms (fewer features to build)	Low
Defer decision IDs	~100-200ms (no pre-stream DB query)	Low
Decouple POI distances endpoint	Enables cache hits with POIs	Medium
Server-Timing headers	Observability, no direct improvement	Low

Conservative estimate: warm cache TTFP drops from 334ms to ~100-150ms server-side, and real-world user experience drops from 5s+ to under 1s.

What Does NOT Change

NDJSON protocol (same 3 message types: metadata, batch, complete)
Redis cache structure, TTL, or key format
Database schema or repository layer
Client-side decision filtering in processedListingData
Total data delivered (all features still streamed)

Files Modified

k8s/api-deployment.yaml (new) - explicit API pod resources
api/app.py - adaptive batch, deferred decisions, Server-Timing, POI distances endpoint
services/listing_cache.py - no changes
frontend/src/App.tsx - parallel fetch, POI merge
frontend/src/services/streamingService.ts - remove POI coupling
frontend/src/services/listingService.ts - new POI distances fetch function

6.4 KiB Raw Blame History