Commit graph

20 commits

Author SHA1 Message Date
Kadir
b1e0a414cf Used ruff to cleanup
I hope it just works right as I cannot test things if they work
2025-09-14 19:02:30 +01:00
Viktor Barzin
91a0436f7f
migrate processing to a pipeline approach where each listing is processed in a pipeline in parallel and status reported back to track progress 2025-07-27 18:33:39 +00:00
Viktor Barzin
20ff91d663
reduce concurrency when fetching images + add retries 2025-07-01 16:12:06 +00:00
Viktor Barzin
59c33428c2
bugfix - reraise the error when getting 429 so that we retry later 2025-06-30 23:24:16 +00:00
Viktor Barzin
84a55eefde
add default dir path for image dumps 2025-06-22 21:15:50 +00:00
Viktor Barzin
4e13dbdb7f
retry transient errors from rightmove when fetching images 2025-06-21 12:06:19 +00:00
Viktor Barzin
289206afc0
some cleanups 2025-06-08 20:58:28 +00:00
Viktor Barzin
ba87d07cd2
migrate dump images command to use model listings 2025-06-07 13:56:00 +00:00
Viktor Barzin
0b9d50af47
reformat with black; looks better 2025-05-31 23:50:43 +00:00
Viktor Barzin
b873eaf203
fix types and format 2025-05-18 12:30:49 +00:00
Viktor Barzin
47347543d2
update runall script to use the click entrypoint 2025-05-17 23:14:00 +00:00
Viktor Barzin
68cc70bd11
dump images using aiohttp and concurrently 2025-05-17 22:43:05 +00:00
Viktor Barzin
07fef7fbab
parameterize dump images step to work with custom data paths 2025-05-14 21:02:32 +00:00
Viktor Barzin
70e8ef9f95
[3/n] click-ify add dump images command
run with
poetry run python main.py --step dump_images
2025-05-11 19:04:19 +00:00
Kadir
6d343e52e7 adding days updated 2024-08-11 19:36:25 +01:00
Kadir
7ce54a01bd removing download of the normal images.. I dont use them 2024-04-01 15:26:34 +02:00
Kadir
d777558b34 ruff format 2024-03-25 20:48:48 +00:00
Kadir
37e3e8ad6f using ruff 2024-03-25 20:48:27 +00:00
Kadir
36258d877f crawling for 3 and refactoring to allow incremental crawls 2024-03-11 14:43:53 +00:00
Kadir
508aa02812 Real crawling scripts and floorplan detection
1. get all listings
2. get all detail jsons
3. get all images
4. get all floorplans
5. detecting floorplans

Also updating dependencies for huggingface etc.
2024-03-10 18:49:39 +00:00