Commit graph

9 commits

Author SHA1 Message Date
Viktor Barzin
150342bb9e
Refactor codebase following Clean Code principles and add 229 tests
- Extract helpers to reduce function sizes (listing_tasks, app.py, query.py, listing_fetcher)
  - Replace nonlocal mutations with _PipelineState dataclass in listing_tasks
  - Fix bugs: isinstance→equality check in repository, verify_exp for OIDC tokens
  - Consolidate duplicate filter methods in listing_repository
  - Move hardcoded config to env vars with backward-compatible defaults
  - Simplify CLI decorator to auto-build QueryParameters
  - Add deprecation docstring to data_access.py
  - Test count: 158 → 387 (all passing)
2026-02-07 20:19:57 +00:00
Viktor Barzin
a8b7eace48
Add passkey (WebAuthn) authentication with self-registration
Enable users to sign up and sign in using passkeys (biometrics/security
keys) without needing a manually-created Authentik account. The existing
SSO login remains as an alternative.

Backend:
- Add WebAuthn registration/authentication endpoints via py-webauthn
- Issue HS256 JWTs for passkey users, with Redis-backed challenge storage
- Dual JWT verification in auth middleware (issuer-based routing: passkey
  HS256 vs Authentik RS256)
- PasskeyCredential model + migration making user.password nullable
- UserRepository with full CRUD for users and credentials

Frontend:
- AuthUser type abstraction unifying OIDC and passkey users
- Passkey service using @simplewebauthn/browser for WebAuthn ceremonies
- LoginModal redesigned with Sign In / Sign Up tabs
- Type migration from oidc-client-ts User to AuthUser across all services
  and components
2026-02-07 00:34:47 +00:00
Viktor Barzin
b9f576ae2b
Stream-process listings as IDs arrive via asyncio.Queue
Replace the sequential fetch-all-then-process pipeline with a streaming
architecture where listing processing starts as soon as IDs become
available from each subquery. A producer task fetches pages and enqueues
new IDs (filtered inline against DB), while 20 consumer workers process
listings concurrently from the queue.

- Add ListingRepository.get_listing_ids() for fast ID-only projection
- Refactor listing_tasks.py: remove get_ids_to_process/dump_listings_and_monitor,
  replace with unified producer/worker/monitor pipeline
- Apply same pattern to CLI path in listing_fetcher.py
- Remove 'filtering' phase from frontend, show combined fetch+process metrics
- Add fetching_done flag to TaskResult for phase transition tracking
2026-02-06 23:43:54 +00:00
Viktor Barzin
b4837e1603
Add crawl job progress drawer with phase tracking and live logs
- Add phase-aware progress reporting across all crawl phases (splitting,
  fetching, filtering, processing) with per-step counters
- Add TaskProgressDrawer component with phase timeline stepper, detail
  counters, progress bar with ETA, and live worker log viewer
- Add on_step_complete callback to ListingProcessor for granular tracking
  of details/images/OCR steps
- Extend QuerySplitter on_progress callback with structured counter data
- Capture celery worker logs via ring buffer handler and inject into task
  state updates for frontend display
- Guard taskResult updates with phase presence check to prevent drawer
  from blanking during state transitions
2026-02-06 22:37:53 +00:00
Viktor Barzin
d205d15c74 Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
Viktor Barzin
5514fa6381 Remove 1000-result limit, add Redis caching and virtual scrolling
- Remove hard-coded limit=1000 default from listing_geojson and streaming
  endpoints, allowing all matching results to be returned
- Add Redis caching service (db=2, 30min TTL) that caches query results
  as Redis Lists for fast re-queries with reduced DB load
- Integrate cache into streaming endpoint: serve from cache on hit,
  populate cache on miss during DB streaming
- Invalidate cache after scrape completes (both success and no-new-listings)
- Replace ScrollArea with react-virtuoso in ListView for virtual scrolling,
  keeping only ~20-30 DOM nodes regardless of list size
- Handle metadata streaming message to show "0 / N" progress from start
- Throttle frontend state updates with requestAnimationFrame to prevent
  UI jank from rapid re-renders during cached response streaming
2026-02-06 20:47:36 +00:00
Viktor Barzin
f880664a98 Add throttling detection and circuit breaker for Rightmove scraper 2026-02-06 20:47:36 +00:00
Viktor Barzin
e8293c6042 Add intelligent query splitting to maximize Rightmove data extraction 2026-02-06 20:47:36 +00:00
Viktor Barzin
835a2a9d53 Fix stuck Celery tasks and add purge all tasks functionality 2026-02-01 20:40:07 +00:00