Commit graph

14 commits

Author SHA1 Message Date
Viktor Barzin
25912eac0c
Fix metric imports: use module-level access instead of name imports
Type-annotated metric variables (e.g. `geojson_cache_operations: Counter`)
don't exist as importable names until init_metrics() runs.  Switch all
`from api.metrics import <metric>` to `import api.metrics as m` and
access instruments as attributes at runtime to avoid ImportError.
2026-02-14 11:21:49 +00:00
Viktor Barzin
d6edb747d2
Add structured JSON logging, OTel business metrics, and Grafana dashboard
Structured logging via JsonFormatter replaces uvicorn's default format so
Loki can parse timestamps and fields.  14 business metrics (scrape stats,
throttle events, circuit breaker state, cache hit rate, OCR success rate,
Celery task lifecycle) are defined in a shared metrics module and
instrumented across the scraper pipeline, API, and workers.  Celery
workers expose a Prometheus HTTP endpoint on configurable ports.
2026-02-14 10:59:12 +00:00
Viktor Barzin
f833309297
Refactor backend for cleaner error handling, DRY, and type safety
- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths
  into _check_counter and _enforce_limit helpers, add proper type annotations
- Replace bare Exception raises with FloorplanDownloadError and
  RightmoveApiError; narrow catch clauses to specific exception types;
  fix Step base class to inherit from ABC
- Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract
  _find_tenure_value helper to deduplicate tenure parsing
- Extract _build_poi_distances_lookup from stream endpoint to reduce nesting
- Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels,
  guard against division by zero on missing square meters
- Fix notifications.py broken list[Surface]() constructor, database.py
  stale comments and missing type annotation, auth.py type:ignore,
  ui_exporter.py stale TODO
- Fix 3 pre-existing test failures: mock cache layer in streaming tests,
  bypass rate limiter for test isolation, fix cache invalidation test to
  account for two-pattern scan loop
2026-02-10 22:19:24 +00:00
Viktor Barzin
5b566bab4c
Fix POI distance calculation reliability for remote/Celery execution
- Fix silent log loss: replace hardcoded "uvicorn.error" logger with __name__
  in osrm_client, otp_client, poi_distance_calculator, and poi_tasks (uvicorn
  logger has no handlers in Celery worker, so all errors were silently dropped)
- Add Celery retry: autoretry_for=(Exception,), max_retries=3, retry_backoff
- Add top-level exception handling in task with full traceback logging
- Fix upsert_distances: replace session.merge() (PK-based) with proper
  dialect-aware INSERT ON DUPLICATE KEY UPDATE / ON CONFLICT DO UPDATE
- Filter out listings with null/zero coordinates before routing
- Raise OSError when all routing engines fail with 0 results computed,
  distinguishing "nothing to compute" from "all engines unreachable"
2026-02-08 20:11:12 +00:00
Viktor Barzin
8a5d1b3787
Fix POI distance calculation: OSRM index separator and error handling
- Fix OSRM client to use semicolons (not commas) for source/destination
  indices in /table API requests. Commas caused "Query string malformed"
  errors for any batch with more than one origin.
- Add error handling in poi_distance_calculator for unreachable routing
  engines (OSRM/OTP). Connection failures now log an error and skip the
  mode instead of crashing the entire Celery task.
2026-02-08 14:50:09 +00:00
Viktor Barzin
da0a56895d
Add self-hosted routing clients and distance calculator
RoutingConfig loads OSRM/OTP URLs from env vars. OSRM client uses the
/table endpoint for batch NxM distance matrices (walk/cycle). OTP client
uses GraphQL API for transit routes. POI distance calculator orchestrates
both, skipping already-computed distances and reporting progress.
2026-02-08 13:14:37 +00:00
Viktor Barzin
eafbc1ac52
Flatten repo structure: move crawler/ to root, remove vqa/ and immoweb/
The crawler subdirectory was the only active project. Moving it to the
repo root simplifies paths and removes the unnecessary nesting. The
vqa/ and immoweb/ directories were legacy/unused and have been removed.

Updated .drone.yml, .gitignore, .claude/ docs, and skills to reflect
the new flat structure.
2026-02-07 23:01:20 +00:00
Kadir
e2f7998ee9 merging the visual query answering with the crawler. Monorepo go! 2024-03-01 16:42:48 +01:00
Kadir Tugan
f30115eecd Fixing __repr__ for RightmoveListing and adding a small testing script 2023-11-18 13:25:00 +02:00
Kadir Tugan
8698b41e0e adding lat/lon to the db 2023-11-18 13:09:03 +02:00
Kadir Tugan
72ff6a7202 adding query to fetch a single detailpage 2023-11-18 12:56:15 +02:00
Kadir Tugan
31d760cf30 adding more parameters to the function 2023-11-18 12:38:54 +02:00
Kadir Tugan
d49b11e28b ruff format 2023-11-18 12:30:04 +02:00
Kadir Tugan
65fc4cdda4 updating poetry 2023-11-18 12:20:22 +02:00