Measured baseline: 877ms TTFP cold, 334ms warm, with CPU throttling confirmed (500m limit, 10-12 throttle events per streaming request). Plan covers: CPU limit bump, frontend waterfall elimination, adaptive first batch, deferred decision filtering, POI decoupling, and Server-Timing headers.
6.4 KiB
Time-to-First-Property Performance Optimization
Problem
Initial page load takes 5+ seconds before the first property appears on screen, even when the Redis GeoJSON cache is warm. The target is sub-500ms time-to-first-property.
Measured Baseline (2026-02-22)
10,000 RENT listings, streaming endpoint via port-forward (no ingress overhead):
| Metric | Cold (DB) | Warm (Cache) |
|---|---|---|
| Time to first property | 877ms | 334ms |
| Total stream time | 13,411ms | 2,060ms |
| Throughput | 745 feat/s | 4,854 feat/s |
Real-world user experience adds: Traefik ingress + TLS (~50-100ms), frontend auth waterfall (~100-300ms), POI fetch blocking stream start (~200-500ms), and browser NDJSON parsing + rendering. Total: 5+ seconds.
Root Causes
1. CPU Throttling (High Impact)
The API pods have no explicit resource declarations. The namespace LimitRange (created 2026-02-21) silently injects defaults:
- CPU limit: 500m (0.5 cores)
- Memory limit: 1Gi
Measured throttling during a single streaming request:
| Path | Throttle events | CPU time stolen |
|---|---|---|
| Cold (DB) | +12 | +240ms |
| Warm (cache) | +10 | +462ms |
Cgroup proof: cpu.max = 50000/100000, nr_throttled increases by 10-12 per request.
Current pod usage at idle is only 3m CPU / 134-159Mi memory, so the limit only bites during streaming bursts when the pod needs to serialize thousands of GeoJSON features.
2. Frontend Waterfall (Medium Impact)
The request chain is sequential:
Page load -> Auth check -> user state -> POI fetch (blocks) -> loadListings()
loadListings() reads userPOIs.length to set includePoiDistances=true, so it cannot fire until the POI fetch resolves. When POI distances are requested, the Redis cache is bypassed entirely, forcing a DB path every time.
3. Pre-Stream Server Work (Medium Impact)
Before the first byte is sent, stream_listing_geojson() synchronously:
- Fetches disliked/liked IDs from DB (
decision_service.get_disliked_ids) - Checks Redis cache count
- If POIs enabled: builds full POI distances lookup table
Steps 1 and 3 are blocking DB queries that delay the metadata message.
4. Batch Size (Low Impact)
First batch requires 50 features to accumulate before yielding. At 500m CPU, building 50 GeoJSON features from cache (Redis LRANGE + JSON parse + re-serialize) takes measurable time.
Design
Change 1: Set Explicit API Pod Resources
Add explicit resource requests/limits to the realestate-crawler-api deployment, overriding the LimitRange defaults.
Proposed values:
- Requests: 50m CPU, 128Mi memory
- Limits: 2000m CPU, 1Gi memory
This gives the pod 4x the current CPU headroom during streaming bursts while keeping requests modest for scheduling. The celery-beat pod should also get explicit resources since it only needs minimal CPU.
Implementation: K8s deployment manifest stored in k8s/api-deployment.yaml and applied via kubectl.
Change 2: Decouple POI Distances from Listing Stream
Split POI distance fetching into a separate request so the listing stream can fire immediately and hit the cache.
loadListings()always sendsincludePoiDistances=false- New endpoint
GET /api/listing_poi_distances?listing_type=RENTreturns{<listing_id>: [distances...]}for the user's configured POIs - Frontend fetches POI distances in parallel, merges into existing features when both resolve
- Redis cache is no longer bypassed when POIs are configured
Trade-off: POI-based distance filters won't apply until the POI distances request completes (~200-300ms after first property renders). Acceptable since the user sees data immediately.
Change 3: Adaptive First Batch Size
Send a small primer batch (5 features) immediately, then switch to normal batch size (50) for throughput.
Backend change in _stream_from_cache() and _stream_from_db():
- First yield: after 5 features
- Subsequent yields: after
batch_sizefeatures
Expected improvement: first property rendered ~10x faster on the server side.
Change 4: Defer Decision ID Fetch
Move the decision ID lookup (disliked/liked sets) out of the blocking pre-stream path:
- Send
metadatamessage immediately - Fetch decision IDs concurrently with the first (small) batch
- Apply decision filtering starting from the second batch
- The first 5 features may include a disliked listing, but client-side filtering in
processedListingDataalready handles this
Change 5: Eliminate Frontend POI Waterfall
Current: useEffect(user) -> fetchPOIs() -> loadListings()
Proposed: Fire both in parallel on user auth:
useEffect(user) -> fetchPOIs() (async, no blocking)
-> loadListings() (fires immediately)
POI distances arrive separately (Change 2), so the stream doesn't need to wait.
Change 6: Server-Timing Headers
Add Server-Timing headers to the streaming response for ongoing observability:
Server-Timing: auth;dur=12, decisions;dur=85, cache_check;dur=3, first_batch;dur=42
Visible in browser DevTools Network tab without any frontend code changes.
Expected Impact
| Change | TTFP improvement | Effort |
|---|---|---|
| CPU limit bump (500m -> 2000m) | ~200-400ms (eliminates throttling) | Low |
| Eliminate POI waterfall | ~200-500ms (parallel fetch) | Medium |
| Adaptive first batch (50 -> 5) | ~150-300ms (fewer features to build) | Low |
| Defer decision IDs | ~100-200ms (no pre-stream DB query) | Low |
| Decouple POI distances endpoint | Enables cache hits with POIs | Medium |
| Server-Timing headers | Observability, no direct improvement | Low |
Conservative estimate: warm cache TTFP drops from 334ms to ~100-150ms server-side, and real-world user experience drops from 5s+ to under 1s.
What Does NOT Change
- NDJSON protocol (same 3 message types: metadata, batch, complete)
- Redis cache structure, TTL, or key format
- Database schema or repository layer
- Client-side decision filtering in
processedListingData - Total data delivered (all features still streamed)
Files Modified
k8s/api-deployment.yaml(new) - explicit API pod resourcesapi/app.py- adaptive batch, deferred decisions, Server-Timing, POI distances endpointservices/listing_cache.py- no changesfrontend/src/App.tsx- parallel fetch, POI mergefrontend/src/services/streamingService.ts- remove POI couplingfrontend/src/services/listingService.ts- new POI distances fetch function