wrongmove/crawler
Viktor Barzin ceb943f198 Fix refresh listings returning immediate success with no progress
The get_ids_to_process function was using set union instead of set
difference, causing it to return all existing listing IDs along with
new ones. This meant:

1. When there were no new listings, the task would iterate through all
   existing listings, find nothing to process for each, and complete
   almost instantly

2. The task showed no progress because processing was too fast

Fixed by:
- Changed `all_listing_ids.union(identifiers)` to `identifiers - all_listing_ids`
  to only return IDs that are NOT already in the database
- Added explicit check for empty set with informative task state
  "No new listings found" so users understand why the task completed quickly
2026-02-01 22:30:37 +00:00
..
alembic add alembic mutation for logitute name 2026-01-28 21:00:46 +00:00
api Fix stuck Celery tasks and add purge all tasks functionality 2026-02-01 20:40:07 +00:00
config Add configurable scheduling, UI health/task indicators, and auto-load map with default filters 2026-02-01 17:28:37 +00:00
data removing decisions from the repo 2025-06-18 00:59:16 +01:00
frontend Fix stuck Celery tasks and add purge all tasks functionality 2026-02-01 20:40:07 +00:00
models More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
proof_of_concept More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
rec More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
repositories Add proper buy listing support with type-aware UI filters and display 2026-02-01 19:13:29 +00:00
services Fix stuck Celery tasks and add purge all tasks functionality 2026-02-01 20:40:07 +00:00
tasks Fix refresh listings returning immediate success with no progress 2026-02-01 22:30:37 +00:00
.dockerignore update dockerignore 2026-02-01 20:22:03 +00:00
.env.sample Add configurable scheduling, UI health/task indicators, and auto-load map with default filters 2026-02-01 17:28:37 +00:00
.style.yapf add readme and .style.yapf for formatting 2025-05-07 20:58:22 +00:00
5_routing.py More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
9_recalculate_regex_squaremeter.py ruff format 2024-03-25 20:48:48 +00:00
91_recalculate_floorplan.py reformat most things 2025-05-07 21:25:40 +00:00
alembic.ini migrate to using db connection string from env 2025-06-22 21:16:55 +00:00
celery_app.py migrate background tasks to celery 2025-06-22 21:18:52 +00:00
csv_exporter.py use query params to filter out models; also make csv exporter work with models 2025-06-08 17:01:33 +00:00
data_access.py More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
database.py bugfix - print sql statemnts only in dev 2025-06-30 23:22:12 +00:00
docker-compose.yml Add configurable scheduling, UI health/task indicators, and auto-load map with default filters 2026-02-01 17:28:37 +00:00
Dockerfile change api port to 5001 2025-06-24 19:12:20 +00:00
exploration.ipynb remove some empty fields in exploration notebook 2025-05-17 23:24:40 +00:00
GUIDE merging the visual query answering with the crawler. Monorepo go! 2024-03-01 16:42:48 +01:00
listing_processor.py More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
main.py More ruff fixes (#2) 2025-09-14 19:44:03 +01:00
main_tmp.py ruff format 2024-03-25 20:48:48 +00:00
notifications.py Used ruff to cleanup 2025-09-14 19:02:30 +01:00
ocr.ipynb notebooks 2025-03-30 23:42:29 +01:00
podman-compose.yml Fix deps and move to a better local environment 2025-09-14 19:00:26 +01:00
poetry.lock replace pymysql with mysqlclient as it is "better"; also fix an issue in the ui exporter that had wrong imports 2025-10-18 09:58:55 +00:00
pyproject.toml replace pymysql with mysqlclient as it is "better"; also fix an issue in the ui exporter that had wrong imports 2025-10-18 09:58:55 +00:00
README.md update readme with instructions on how to run everything 2025-08-28 21:39:20 +00:00
redis_repository.py Fix stuck Celery tasks and add purge all tasks functionality 2026-02-01 20:40:07 +00:00
runall.sh some cleanups 2025-06-08 20:58:28 +00:00
start.sh add redis container in start.sh in case it is not running in dev mode 2025-08-28 20:48:42 +00:00
TASKS.md adding tasks and updating exploration notebook 2024-04-01 15:10:15 +02:00
ui_exporter.py Add proper buy listing support with type-aware UI filters and display 2026-02-01 19:13:29 +00:00

Setup

  1. Instal deps:
poetry install && cp .env.sample .env
  1. Check .env if you want to customize settings for broker and db
  2. run ./start.sh

This starts the backend

To start the fronend:

cd frontend && cp .env.sample .env

Change the DEV_HOST to any name you want to use to access the web interface.

Next, setup the DNS record (e.g in your /etc/hosts) file. This is important as auth is done via external [authentik] service that needs to redirect to a name.

Run ./start.sh

This starts a Caddy proxy with correct certificates, and npm dev server. All requests going to the frontend are forwarded to the npm server and the ones for the backed (that go to /api/*) are forwarded to the backend service.

Lastly, reachout to Viktor to allowlist your DEV_HOST so that authentik can authorize callbacks to your host.

Formatting

yapf --style .style.yapf --recursive .

For VSCode - install yapf extension. Enable formatting using yap and the style file in this repo (there may be an easier way; I put this in my user settings json):

{
    "[python]": {
        "editor.formatOnSaveMode": "file",
        "editor.formatOnSave": true,
        "editor.defaultFormatter": "eeyore.yapf",
        "editor.formatOnType": false
      },
      "yapf.args": ["--style", "/home/wizard/code/realestate-crawler/crawler/.style.yapf"]
}

ADB commands (from /Applications/BlueStacks.app/Contents/MacOS):

Set proxy

./hd-adb shell settings put global http_proxy 192.168.9.110:8080

Disable proxy:

/hd-adb shell settings put global http_proxy :0

Connect adb

./hd-adb connect 127.0.0.1:5555

Disconnect adb

/hd-adb disconnect