The get_ids_to_process function was using set union instead of set difference, causing it to return all existing listing IDs along with new ones. This meant: 1. When there were no new listings, the task would iterate through all existing listings, find nothing to process for each, and complete almost instantly 2. The task showed no progress because processing was too fast Fixed by: - Changed `all_listing_ids.union(identifiers)` to `identifiers - all_listing_ids` to only return IDs that are NOT already in the database - Added explicit check for empty set with informative task state "No new listings found" so users understand why the task completed quickly |
||
|---|---|---|
| .. | ||
| alembic | ||
| api | ||
| config | ||
| data | ||
| frontend | ||
| models | ||
| proof_of_concept | ||
| rec | ||
| repositories | ||
| services | ||
| tasks | ||
| .dockerignore | ||
| .env.sample | ||
| .style.yapf | ||
| 5_routing.py | ||
| 9_recalculate_regex_squaremeter.py | ||
| 91_recalculate_floorplan.py | ||
| alembic.ini | ||
| celery_app.py | ||
| csv_exporter.py | ||
| data_access.py | ||
| database.py | ||
| docker-compose.yml | ||
| Dockerfile | ||
| exploration.ipynb | ||
| GUIDE | ||
| listing_processor.py | ||
| main.py | ||
| main_tmp.py | ||
| notifications.py | ||
| ocr.ipynb | ||
| podman-compose.yml | ||
| poetry.lock | ||
| pyproject.toml | ||
| README.md | ||
| redis_repository.py | ||
| runall.sh | ||
| start.sh | ||
| TASKS.md | ||
| ui_exporter.py | ||
Setup
- Instal deps:
poetry install && cp .env.sample .env
- Check
.envif you want to customize settings for broker and db - run
./start.sh
This starts the backend
To start the fronend:
cd frontend && cp .env.sample .env
Change the DEV_HOST to any name you want to use to access the web interface.
Next, setup the DNS record (e.g in your /etc/hosts) file. This is important as auth is done via external [authentik] service that needs to redirect to a name.
Run ./start.sh
This starts a Caddy proxy with correct certificates, and npm dev server.
All requests going to the frontend are forwarded to the npm server and the ones for the backed (that go to /api/*) are forwarded to the backend service.
Lastly, reachout to Viktor to allowlist your DEV_HOST so that authentik can authorize callbacks to your host.
Formatting
yapf --style .style.yapf --recursive .
For VSCode - install yapf extension. Enable formatting using yap and the style file in this repo (there may be an easier way; I put this in my user settings json):
{
"[python]": {
"editor.formatOnSaveMode": "file",
"editor.formatOnSave": true,
"editor.defaultFormatter": "eeyore.yapf",
"editor.formatOnType": false
},
"yapf.args": ["--style", "/home/wizard/code/realestate-crawler/crawler/.style.yapf"]
}
ADB commands (from /Applications/BlueStacks.app/Contents/MacOS):
Set proxy
./hd-adb shell settings put global http_proxy 192.168.9.110:8080
Disable proxy:
/hd-adb shell settings put global http_proxy :0
Connect adb
./hd-adb connect 127.0.0.1:5555
Disconnect adb
/hd-adb disconnect