Commit graph

58 commits

Author SHA1 Message Date
Viktor Barzin
d205d15c74 Add services layer, tests, streaming UI, and cleanup legacy code 2026-02-06 20:55:10 +00:00
Viktor Barzin
f880664a98 Add throttling detection and circuit breaker for Rightmove scraper 2026-02-06 20:47:36 +00:00
Viktor Barzin
e8293c6042 Add intelligent query splitting to maximize Rightmove data extraction 2026-02-06 20:47:36 +00:00
Kadir
0801aaf200
More ruff fixes (#2)
* adding ruff auto check for pull requests as well as fixing all ruff errors

* More ruff fixes: forgot half of the ruff checks

Forgot to do a git add all :D

---------

Co-authored-by: Kadir <git@k8n.dev>
2025-09-14 19:44:03 +01:00
Kadir
b1e0a414cf Used ruff to cleanup
I hope it just works right as I cannot test things if they work
2025-09-14 19:02:30 +01:00
Viktor Barzin
d4b22deda0
save user queries in redis so that user can refresh the page and still come back to their latest task 2025-07-06 12:02:25 +00:00
Viktor Barzin
20ff91d663
reduce concurrency when fetching images + add retries 2025-07-01 16:12:06 +00:00
Viktor Barzin
bf79d3c977
convert district at the last moment - when we send the query 2025-06-22 13:05:53 +00:00
Viktor Barzin
9b2653ce91
add tenacity to retry transient blockouts by rightmove 2025-06-08 20:59:04 +00:00
Viktor Barzin
289206afc0
some cleanups 2025-06-08 20:58:28 +00:00
Viktor Barzin
e317d2ec54
use query params to filter out models; also make csv exporter work with models 2025-06-08 17:01:33 +00:00
Viktor Barzin
325823e631
refactor detect floorplan to use model listings 2025-06-07 14:30:32 +00:00
Viktor Barzin
4f5a934fa9
refactor dump listings to start using model instead of the data_access object 2025-06-07 12:46:53 +00:00
Viktor Barzin
842f7cefbe
merge dump listings and dump details commands - fetch both details and listings in the same command 2025-06-07 12:00:23 +00:00
Viktor Barzin
29213f3d26
refactor the semaphore when dumping listings 2025-06-06 20:09:16 +00:00
Viktor Barzin
f7fb891648
add models 2025-06-04 21:09:29 +00:00
Viktor Barzin
0acd417d34
add option to filter for min sqm per listing 2025-06-01 19:26:24 +00:00
Viktor Barzin
8b90ecde11
add filter for last seen days 2025-06-01 15:26:38 +00:00
Viktor Barzin
11315359d2
reuse query params when exporting to immoweb and allow filtering from available date 2025-06-01 15:17:14 +00:00
Viktor Barzin
9735db72a0
limit the number of concurrenct requests when dumping listings as right move blocks us 2025-06-01 00:27:12 +00:00
Viktor Barzin
24bf44caf9
trust env when fetching details and listings. this allows setting proxy env vars. I use this in conjunction with a tor proxy to avoid ip-bans from rightmove 2025-06-01 00:12:31 +00:00
Viktor Barzin
0b9d50af47
reformat with black; looks better 2025-05-31 23:50:43 +00:00
Viktor Barzin
1122f5a96f
use aiohttp for dumping listing query 2025-05-31 23:48:45 +00:00
Viktor Barzin
7e8c79d3d1
add command to export the data in a way that the ui (immoweb) can consume 2025-05-26 19:37:06 +00:00
Viktor Barzin
1e0f302178
defer transformers and pytesseract imports to when used. this shortens startup time of all other commands quite a bit 2025-05-21 21:24:57 +00:00
Viktor Barzin
10ae25e0d3
allow caching routing by destination and travel mode; also export all travel methods to csv column 2025-05-20 22:12:56 +00:00
Viktor Barzin
482fff689b
parameterize routing logic - extract api key as env var; allow searching dest by address; limit the number of listings to process to prevent accidental api key usage 2025-05-18 21:13:50 +00:00
Viktor Barzin
ea24847cae
add stratford region 2025-05-18 18:02:32 +00:00
Viktor Barzin
9f3e466b23
add filter for furnished/unfurnished type for rented listings 2025-05-18 17:22:48 +00:00
Viktor Barzin
b873eaf203
fix types and format 2025-05-18 12:30:49 +00:00
Viktor Barzin
01ac24b4b7
use aiohttp to fetch details concurrently 2025-05-17 22:34:27 +00:00
Viktor Barzin
3e7a144fb4
make dumping details async 2025-05-17 22:11:33 +00:00
Viktor Barzin
ad879f2d4f
convert listings dump to asyncio 2025-05-17 21:55:42 +00:00
Viktor Barzin
df24c2c1b7
add cli param for querying properties to rent
example:
python main.py --data-dir data/rs2 dump-listings --max-price 3500 --min-bedrooms 2 --max-bedrooms 4 --district islington -t rent
2025-05-17 21:22:39 +00:00
Viktor Barzin
ea56555884 refactor main.py click to use click commands to allow passing parameters to commands and enable fetching districts by district name 2025-05-14 19:42:08 +00:00
Kadir
bb2488a63b Removing sqlalchemy and the db part as it was never used 2025-05-12 01:17:45 +01:00
Viktor Barzin
835494d29f
reformat most things 2025-05-07 21:25:40 +00:00
Kadir
2c2adcfa7c improving OCR 2025-03-30 23:41:52 +01:00
Kadir
e0e7853c8c updating districts of relevance to me 2025-02-14 21:35:44 +00:00
Kadir
dbf72e42e3 removing districts which are far away from the default crawl 2024-09-23 01:43:20 +01:00
Kadir
4dfbcc64c1 floorplan rec: fix regex rule + filter out extreme values 2024-08-25 12:11:55 +02:00
Kadir
2e94bda7fa refac 2024-08-25 09:50:41 +02:00
Kadir
f98cd02696 adding better exception messages and interrupt free crawling 2024-05-06 18:54:55 +01:00
Kadir
874fed3e8f format and committing notebook 2024-04-05 11:38:55 +01:00
Kadir
b16fee0648 changing to 500 listings per query 2024-04-01 20:28:15 +02:00
Kadir
a7a4f88a39 sort districts 2024-03-30 19:24:35 +01:00
Kadir
40285245d5 remove debug print statements 2024-03-30 18:35:32 +01:00
Kadir
ec86b572b3 adding districts and location identifier to search 2024-03-30 18:31:49 +01:00
Kadir
4c40462bb8 adding property type 2024-03-25 20:58:35 +00:00
Kadir
d777558b34 ruff format 2024-03-25 20:48:48 +00:00