Commit graph

38 commits

Author SHA1 Message Date
Viktor Barzin
f880664a98 Add throttling detection and circuit breaker for Rightmove scraper 2026-02-06 20:47:36 +00:00
Viktor Barzin
e8293c6042 Add intelligent query splitting to maximize Rightmove data extraction 2026-02-06 20:47:36 +00:00
Kadir
b1e0a414cf Used ruff to cleanup
I hope it just works right as I cannot test things if they work
2025-09-14 19:02:30 +01:00
Viktor Barzin
d4b22deda0
save user queries in redis so that user can refresh the page and still come back to their latest task 2025-07-06 12:02:25 +00:00
Viktor Barzin
20ff91d663
reduce concurrency when fetching images + add retries 2025-07-01 16:12:06 +00:00
Viktor Barzin
bf79d3c977
convert district at the last moment - when we send the query 2025-06-22 13:05:53 +00:00
Viktor Barzin
9b2653ce91
add tenacity to retry transient blockouts by rightmove 2025-06-08 20:59:04 +00:00
Viktor Barzin
289206afc0
some cleanups 2025-06-08 20:58:28 +00:00
Viktor Barzin
e317d2ec54
use query params to filter out models; also make csv exporter work with models 2025-06-08 17:01:33 +00:00
Viktor Barzin
4f5a934fa9
refactor dump listings to start using model instead of the data_access object 2025-06-07 12:46:53 +00:00
Viktor Barzin
842f7cefbe
merge dump listings and dump details commands - fetch both details and listings in the same command 2025-06-07 12:00:23 +00:00
Viktor Barzin
29213f3d26
refactor the semaphore when dumping listings 2025-06-06 20:09:16 +00:00
Viktor Barzin
f7fb891648
add models 2025-06-04 21:09:29 +00:00
Viktor Barzin
0acd417d34
add option to filter for min sqm per listing 2025-06-01 19:26:24 +00:00
Viktor Barzin
8b90ecde11
add filter for last seen days 2025-06-01 15:26:38 +00:00
Viktor Barzin
11315359d2
reuse query params when exporting to immoweb and allow filtering from available date 2025-06-01 15:17:14 +00:00
Viktor Barzin
9735db72a0
limit the number of concurrenct requests when dumping listings as right move blocks us 2025-06-01 00:27:12 +00:00
Viktor Barzin
24bf44caf9
trust env when fetching details and listings. this allows setting proxy env vars. I use this in conjunction with a tor proxy to avoid ip-bans from rightmove 2025-06-01 00:12:31 +00:00
Viktor Barzin
1122f5a96f
use aiohttp for dumping listing query 2025-05-31 23:48:45 +00:00
Viktor Barzin
7e8c79d3d1
add command to export the data in a way that the ui (immoweb) can consume 2025-05-26 19:37:06 +00:00
Viktor Barzin
9f3e466b23
add filter for furnished/unfurnished type for rented listings 2025-05-18 17:22:48 +00:00
Viktor Barzin
b873eaf203
fix types and format 2025-05-18 12:30:49 +00:00
Viktor Barzin
01ac24b4b7
use aiohttp to fetch details concurrently 2025-05-17 22:34:27 +00:00
Viktor Barzin
3e7a144fb4
make dumping details async 2025-05-17 22:11:33 +00:00
Viktor Barzin
ad879f2d4f
convert listings dump to asyncio 2025-05-17 21:55:42 +00:00
Viktor Barzin
df24c2c1b7
add cli param for querying properties to rent
example:
python main.py --data-dir data/rs2 dump-listings --max-price 3500 --min-bedrooms 2 --max-bedrooms 4 --district islington -t rent
2025-05-17 21:22:39 +00:00
Kadir
bb2488a63b Removing sqlalchemy and the db part as it was never used 2025-05-12 01:17:45 +01:00
Viktor Barzin
835494d29f
reformat most things 2025-05-07 21:25:40 +00:00
Kadir
f98cd02696 adding better exception messages and interrupt free crawling 2024-05-06 18:54:55 +01:00
Kadir
b16fee0648 changing to 500 listings per query 2024-04-01 20:28:15 +02:00
Kadir
40285245d5 remove debug print statements 2024-03-30 18:35:32 +01:00
Kadir
ec86b572b3 adding districts and location identifier to search 2024-03-30 18:31:49 +01:00
Kadir
4c40462bb8 adding property type 2024-03-25 20:58:35 +00:00
Kadir
d777558b34 ruff format 2024-03-25 20:48:48 +00:00
Kadir
a9b8d4d630 adding musthave and lastXdays 2024-03-13 16:21:54 +00:00
Kadir
508aa02812 Real crawling scripts and floorplan detection
1. get all listings
2. get all detail jsons
3. get all images
4. get all floorplans
5. detecting floorplans

Also updating dependencies for huggingface etc.
2024-03-10 18:49:39 +00:00
Kadir
46bb641026 adding floorplans, detail json, refactored the folders 2024-03-07 22:02:09 +00:00
Kadir
e2f7998ee9 merging the visual query answering with the crawler. Monorepo go! 2024-03-01 16:42:48 +01:00
Renamed from rec/query.py (Browse further)