Drop transformers, matplotlib, geopy, and diskcache (~560MB) from
dependencies — none are imported in the codebase. Replace single-stage
poetry-based Docker build with a two-stage pip-based build: builder
stage installs into a venv from an exported requirements.txt, runtime
stage copies only the venv and app code. Eliminates poetry from the
image (fixes ARM segfaults) and removes ~200MB of build tools from
the final image.
Enable users to sign up and sign in using passkeys (biometrics/security
keys) without needing a manually-created Authentik account. The existing
SSO login remains as an alternative.
Backend:
- Add WebAuthn registration/authentication endpoints via py-webauthn
- Issue HS256 JWTs for passkey users, with Redis-backed challenge storage
- Dual JWT verification in auth middleware (issuer-based routing: passkey
HS256 vs Authentik RS256)
- PasskeyCredential model + migration making user.password nullable
- UserRepository with full CRUD for users and credentials
Frontend:
- AuthUser type abstraction unifying OIDC and passkey users
- Passkey service using @simplewebauthn/browser for WebAuthn ceremonies
- LoginModal redesigned with Sign In / Sign Up tabs
- Type migration from oidc-client-ts User to AuthUser across all services
and components
Containerize the frontend dev server (Vite) and add a Caddy reverse
proxy for HTTPS termination, replacing the manual local setup. The
Caddy config proxies /api/* to the backend and everything else to the
frontend dev server.
Also simplify start.sh: remove --local Poetry mode, extract
get_compose_cmd helper, and document new services and DEV_HOST env var.
DEV_HOST env var now controls the hostname, defaulting to localhost.
This makes the dev setup work on any machine without requiring a
specific DNS entry.
Replace the sequential fetch-all-then-process pipeline with a streaming
architecture where listing processing starts as soon as IDs become
available from each subquery. A producer task fetches pages and enqueues
new IDs (filtered inline against DB), while 20 consumer workers process
listings concurrently from the queue.
- Add ListingRepository.get_listing_ids() for fast ID-only projection
- Refactor listing_tasks.py: remove get_ids_to_process/dump_listings_and_monitor,
replace with unified producer/worker/monitor pipeline
- Apply same pattern to CLI path in listing_fetcher.py
- Remove 'filtering' phase from frontend, show combined fetch+process metrics
- Add fetching_done flag to TaskResult for phase transition tracking
The previous approach monkey-patched task.update_state on the Celery
Task instance, but the assignment didn't take effect (Celery's Task
singleton may prevent instance method shadowing). Additionally, the
celery.task logger level was left at WARNING by Celery's worker setup,
silencing all INFO-level log capture.
Fix:
- Replace wrapper with _update_task_state() helper that directly injects
logs from a module-level _active_log_buffer into every meta dict
- Attach TaskLogHandler to BOTH celery.task and uvicorn.error loggers
- Force both loggers to INFO level during task execution
- Every task.update_state call site now uses _update_task_state()
- Add phase-aware progress reporting across all crawl phases (splitting,
fetching, filtering, processing) with per-step counters
- Add TaskProgressDrawer component with phase timeline stepper, detail
counters, progress bar with ETA, and live worker log viewer
- Add on_step_complete callback to ListingProcessor for granular tracking
of details/images/OCR steps
- Extend QuerySplitter on_progress callback with structured counter data
- Capture celery worker logs via ring buffer handler and inject into task
state updates for frontend display
- Guard taskResult updates with phase presence check to prevent drawer
from blanking during state transitions
- Remove hard-coded limit=1000 default from listing_geojson and streaming
endpoints, allowing all matching results to be returned
- Add Redis caching service (db=2, 30min TTL) that caches query results
as Redis Lists for fast re-queries with reduced DB load
- Integrate cache into streaming endpoint: serve from cache on hit,
populate cache on miss during DB streaming
- Invalidate cache after scrape completes (both success and no-new-listings)
- Replace ScrollArea with react-virtuoso in ListView for virtual scrolling,
keeping only ~20-30 DOM nodes regardless of list size
- Handle metadata streaming message to show "0 / N" progress from start
- Throttle frontend state updates with requestAnimationFrame to prevent
UI jank from rapid re-renders during cached response streaming
The get_ids_to_process function was using set union instead of set
difference, causing it to return all existing listing IDs along with
new ones. This meant:
1. When there were no new listings, the task would iterate through all
existing listings, find nothing to process for each, and complete
almost instantly
2. The task showed no progress because processing was too fast
Fixed by:
- Changed `all_listing_ids.union(identifiers)` to `identifiers - all_listing_ids`
to only return IDs that are NOT already in the database
- Added explicit check for empty set with informative task state
"No new listings found" so users understand why the task completed quickly
* adding ruff auto check for pull requests as well as fixing all ruff errors
* More ruff fixes: forgot half of the ruff checks
Forgot to do a git add all :D
---------
Co-authored-by: Kadir <git@k8n.dev>
- Cleaned up some deps and moved them to the dev section
- Moved from mysqlclient to pymysql which is a python native one which does not require the OS to have the correct mysql lib
- Added a podman compose file so we can have all dependencies in one place easily without the need to install redis or a database locally
For podman install
- podman
- podman-compose (do a poetry sync I think?)
- podman-compose up to start the containers