wrongmove/crawler
Viktor Barzin 7e8f1f0339
Fix live logs: replace monkey-patch with direct log injection
The previous approach monkey-patched task.update_state on the Celery
Task instance, but the assignment didn't take effect (Celery's Task
singleton may prevent instance method shadowing). Additionally, the
celery.task logger level was left at WARNING by Celery's worker setup,
silencing all INFO-level log capture.

Fix:
- Replace wrapper with _update_task_state() helper that directly injects
  logs from a module-level _active_log_buffer into every meta dict
- Attach TaskLogHandler to BOTH celery.task and uvicorn.error loggers
- Force both loggers to INFO level during task execution
- Every task.update_state call site now uses _update_task_state()
2026-02-06 23:13:01 +00:00
..
.claude Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
alembic Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
api Remove 1000-result limit, add Redis caching and virtual scrolling 2026-02-06 20:47:36 +00:00
config Add throttling detection and circuit breaker for Rightmove scraper 2026-02-06 20:47:36 +00:00
data removing decisions from the repo 2025-06-18 00:59:16 +01:00
docs Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
frontend Add crawl job progress drawer with phase tracking and live logs 2026-02-06 22:37:53 +00:00
models Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
rec Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
repositories Add proper buy listing support with type-aware UI filters and display 2026-02-01 19:13:29 +00:00
services Add crawl job progress drawer with phase tracking and live logs 2026-02-06 22:37:53 +00:00
tasks Fix live logs: replace monkey-patch with direct log injection 2026-02-06 23:13:01 +00:00
tests Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
utils Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
.dockerignore update dockerignore 2026-02-01 20:22:03 +00:00
.env.sample Add throttling detection and circuit breaker for Rightmove scraper 2026-02-06 20:47:36 +00:00
.style.yapf add readme and .style.yapf for formatting 2025-05-07 20:58:22 +00:00
alembic.ini migrate to using db connection string from env 2025-06-22 21:16:55 +00:00
celery_app.py migrate background tasks to celery 2025-06-22 21:18:52 +00:00
CLAUDE.md Update CLAUDE.md with dev environment and git workflow preferences 2026-02-06 21:44:45 +00:00
csv_exporter.py Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
data_access.py Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
database.py bugfix - print sql statemnts only in dev 2025-06-30 23:22:12 +00:00
docker-compose.yml Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
Dockerfile Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
exploration.ipynb remove some empty fields in exploration notebook 2025-05-17 23:24:40 +00:00
GUIDE merging the visual query answering with the crawler. Monorepo go! 2024-03-01 16:42:48 +01:00
listing_processor.py Add crawl job progress drawer with phase tracking and live logs 2026-02-06 22:37:53 +00:00
main.py Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
notifications.py Used ruff to cleanup 2025-09-14 19:02:30 +01:00
ocr.ipynb notebooks 2025-03-30 23:42:29 +01:00
podman-compose.yml Fix deps and move to a better local environment 2025-09-14 19:00:26 +01:00
poetry.lock Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
pyproject.toml Add intelligent query splitting to maximize Rightmove data extraction 2026-02-06 20:47:36 +00:00
README.md update readme with instructions on how to run everything 2025-08-28 21:39:20 +00:00
redis_repository.py Fix stuck Celery tasks and add purge all tasks functionality 2026-02-01 20:40:07 +00:00
runall.sh some cleanups 2025-06-08 20:58:28 +00:00
start.sh Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
TASKS.md adding tasks and updating exploration notebook 2024-04-01 15:10:15 +02:00
ui_exporter.py Add proper buy listing support with type-aware UI filters and display 2026-02-01 19:13:29 +00:00

Setup

  1. Instal deps:
poetry install && cp .env.sample .env
  1. Check .env if you want to customize settings for broker and db
  2. run ./start.sh

This starts the backend

To start the fronend:

cd frontend && cp .env.sample .env

Change the DEV_HOST to any name you want to use to access the web interface.

Next, setup the DNS record (e.g in your /etc/hosts) file. This is important as auth is done via external [authentik] service that needs to redirect to a name.

Run ./start.sh

This starts a Caddy proxy with correct certificates, and npm dev server. All requests going to the frontend are forwarded to the npm server and the ones for the backed (that go to /api/*) are forwarded to the backend service.

Lastly, reachout to Viktor to allowlist your DEV_HOST so that authentik can authorize callbacks to your host.

Formatting

yapf --style .style.yapf --recursive .

For VSCode - install yapf extension. Enable formatting using yap and the style file in this repo (there may be an easier way; I put this in my user settings json):

{
    "[python]": {
        "editor.formatOnSaveMode": "file",
        "editor.formatOnSave": true,
        "editor.defaultFormatter": "eeyore.yapf",
        "editor.formatOnType": false
      },
      "yapf.args": ["--style", "/home/wizard/code/realestate-crawler/crawler/.style.yapf"]
}

ADB commands (from /Applications/BlueStacks.app/Contents/MacOS):

Set proxy

./hd-adb shell settings put global http_proxy 192.168.9.110:8080

Disable proxy:

/hd-adb shell settings put global http_proxy :0

Connect adb

./hd-adb connect 127.0.0.1:5555

Disconnect adb

/hd-adb disconnect