Commit graph

16 commits

Author SHA1 Message Date
Viktor Barzin
0b9d50af47
reformat with black; looks better 2025-05-31 23:50:43 +00:00
Viktor Barzin
b873eaf203
fix types and format 2025-05-18 12:30:49 +00:00
Viktor Barzin
68cc70bd11
dump images using aiohttp and concurrently 2025-05-17 22:43:05 +00:00
Viktor Barzin
01ac24b4b7
use aiohttp to fetch details concurrently 2025-05-17 22:34:27 +00:00
Viktor Barzin
3e7a144fb4
make dumping details async 2025-05-17 22:11:33 +00:00
Viktor Barzin
e424361ed9
parameterize dump_detail to use a custom data dir and also move data dir param as part of the click context 2025-05-14 20:32:37 +00:00
Viktor Barzin
c2196c15c1
[2/n] click-ify - add 2_dump_detail command
run with
poetry run python main.py --step dump_detail
2025-05-11 19:02:23 +00:00
Kadir
302ca95cfb fixing full detail dumping 2025-02-16 03:02:21 +00:00
Kadir
4b6b8628c2 add runall script, update parameters to 4 bed etc and allow incremental updating 2024-11-23 22:57:22 +00:00
Kadir
f98cd02696 adding better exception messages and interrupt free crawling 2024-05-06 18:54:55 +01:00
Kadir
874fed3e8f format and committing notebook 2024-04-05 11:38:55 +01:00
Kadir
5305451fe8 changing detail downloads to prefiltering first. Making the progress bar more accurate and frontloading 2024-04-01 20:28:37 +02:00
Kadir
acc67192c9 add counter to 2_dump_details to understand how many new are crawled 2024-03-30 19:24:03 +01:00
Kadir
d777558b34 ruff format 2024-03-25 20:48:48 +00:00
Kadir
36258d877f crawling for 3 and refactoring to allow incremental crawls 2024-03-11 14:43:53 +00:00
Kadir
508aa02812 Real crawling scripts and floorplan detection
1. get all listings
2. get all detail jsons
3. get all images
4. get all floorplans
5. detecting floorplans

Also updating dependencies for huggingface etc.
2024-03-10 18:49:39 +00:00