CI test-unit failed on pipeline #49 because the three TestSetupPeriodicTasks
cases asserted exact call counts on `sender.add_periodic_task` and the
new unconditional `daily-market-aggregator` registration bumped each by
one. Fix:
- `tasks/listing_tasks.py`: lifted the market-aggregator registration
out of the SchedulesConfig try-block — it's now independent of the
user's SCRAPE_SCHEDULES (a malformed scrape config no longer takes
the aggregator down with it).
- `tests/unit/test_listing_tasks.py`: updated the four cases to account
for the +1 unconditional aggregator call. `test_handles_config_error_
gracefully` now asserts the aggregator still registers when
SchedulesConfig.from_env raises (regression coverage for the
independence guarantee).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two surfaces wired up so the user can "get a vibe of the market":
**Per-listing** — each PropertyCard now shows a small pill next to the
price when the listing's total_price moved >=1% over a 14-day lookback
(e.g. "↓ £200 (-4%) in 14d"). Drops render green, rises render red.
Computed from `price_history_json` by the daily aggregator and
denormalised onto the listing row so the streaming endpoint just
passes it through.
**Macro** — new always-visible inline strip above the chip strip
showing today's median total price, median £/m², and listing count
for the current filter's bedroom band, each with a 30-day % delta:
"Rent · 1-2 bed · 30d: Median £2,500 ↓ -4% · £/m² £50 ↓ -2% · Listings 4,200 ↑ +5%".
Both data sources are populated daily at 04:00 UTC by a new Celery
beat task that fires 1h after the 03:00 RENT scrape and feeds two
sinks: a per-listing update pass and an upsert to a new
`dailylistingaggregate` table keyed on
(snapshot_date, listing_type, min_bedrooms, max_bedrooms).
## Backend
- `models/listing.py`: Listing parent gains `price_14d_ago` + `price_
change_pct_14d` nullable floats (inherited by RentListing/BuyListing).
New `DailyListingAggregate` table model with unique constraint on
(date, type, min_bed, max_bed).
- Alembic `a8b9c0d1e2f3`: adds the two columns to both listing tables
and creates the aggregate table + date index.
- `services/market_aggregator.py` (new): `compute_trend_for_listing`,
`update_per_listing_trend` (batched, idempotent), `_stats` (median
+ mean filtered to positive finite values), `compute_aggregate_
snapshot` (dialect-aware MySQL / SQLite upsert), `fetch_trend_
series` (range query for the API).
- `tasks/market_tasks.py` (new): `compute_daily_market_aggregates_task`
Celery task wrapping both stages.
- `tasks/listing_tasks.py:setup_periodic_tasks`: registers the daily
task at 04:00 UTC alongside the existing scrape schedules.
- `celery_app.py`: includes the new tasks module.
- `api/app.py`: new `GET /api/market_trend?listing_type=&min_bedrooms=&
max_bedrooms=&days=` endpoint returning the daily series.
- `ui_exporter.py`: GeoJSON feature properties now carry
`price_14d_ago` and `price_change_pct_14d` so the frontend can
render the badge without an extra round-trip.
## Frontend
- `types/index.ts`: new `MarketTrendPoint`; `PropertyProperties` gains
the two optional trend fields.
- `components/PropertyCard.tsx`: derived `trendBadge` (>=1% threshold,
null-safe) rendered as a small pill on both card variants.
- `hooks/useMarketTrend.ts` (new): fetches the trend series, derives
current-vs-oldest deltas per metric (% change rounded to 1dp).
- `components/MarketTrendStrip.tsx` (new): compact inline strip with
three metric cells. Hidden when the aggregator hasn't produced any
rows yet (graceful start during the first week post-launch).
- `App.tsx`: renders the strip above the chip strip whenever the
active queryParameters are known.
## Tests
- pytest: 10 new (trend math edge cases including null history,
malformed JSON, only-recent entries, drops, rises, zero current
price; _stats empty / nonpositive filtering; upsert idempotency on
an in-memory SQLite seed). 34 decision + aggregator tests pass.
- vitest: 8 new (useMarketTrend fetch URL, two-point delta,
single-point null delta, empty series; PropertyCard trend badge
arrow direction + sign for drops/rises, noise threshold, null
guard). 229 tests pass total, tsc clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Type-annotated metric variables (e.g. `geojson_cache_operations: Counter`)
don't exist as importable names until init_metrics() runs. Switch all
`from api.metrics import <metric>` to `import api.metrics as m` and
access instruments as attributes at runtime to avoid ImportError.
Structured logging via JsonFormatter replaces uvicorn's default format so
Loki can parse timestamps and fields. 14 business metrics (scrape stats,
throttle events, circuit breaker state, cache hit rate, OCR success rate,
Celery task lifecycle) are defined in a shared metrics module and
instrumented across the scraper pipeline, API, and workers. Celery
workers expose a Prometheus HTTP endpoint on configurable ports.
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
RightmoveApiError; narrow catch clauses to specific exception types;
fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
_find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
stale comments and missing type annotation, auth.py type:ignore,
ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
bypass rate limiter for test isolation, fix cache invalidation test to
account for two-pattern scan loop
Replace timer-based _monitor_progress (1s sleep loop) with a
ProgressReporter class that publishes on actual state changes,
throttled to at most 1 publish per 250ms. A background flush
every 2s keeps ETA/elapsed current during quiet periods.
Switch WebSocket forwarder from get_message() polling (1s timeout)
to async pubsub.listen() for instant Redis-to-WebSocket delivery.
Combined latency improvement: ~1.5s average → ~250ms.
Replace 5s HTTP polling with WebSocket-based real-time updates for task
progress. Celery workers publish progress to Redis pub/sub channels;
a FastAPI WebSocket endpoint subscribes and forwards to the browser.
Polling is kept as a 30s fallback when WebSocket is unavailable.
The task progress drawer now supports multiple concurrent jobs with a
tab bar for switching between scrape and POI distance tasks.
Backend:
- Add services/task_progress_publisher.py (Redis pub/sub bridge)
- Add api/ws_routes.py (WebSocket endpoint with JWT auth)
- Publish progress from listing_tasks and poi_tasks
- Publish REVOKED via pub/sub on cancel/clear to fix stuck UI
Frontend:
- Add useTaskWebSocket hook with reconnection and keepalive
- Add TaskState and WS message types
- TaskIndicator: WS-driven updates with polling fallback
- TaskProgressDrawer: multi-job tabs, POI phase timeline
- Guard against WS overwriting local cancel state
The listing processor was hardcoded to create RentListing objects and
query only the rentlisting table. Buy listings fetched from Rightmove
were stored in the wrong table with missing fields. This threads
ListingType through ListingProcessor and all Step subclasses so the
correct model (RentListing/BuyListing) is created, the correct table
is queried, and buy-specific fields (service_charge, lease_left) are
parsed from the API response and included in GeoJSON streaming output.
The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.
Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
2026-02-07 23:01:20 +00:00
Renamed from crawler/tasks/listing_tasks.py (Browse further)