Viktor Barzin
9b2653ce91
add tenacity to retry transient blockouts by rightmove
2025-06-08 20:59:04 +00:00
Viktor Barzin
289206afc0
some cleanups
2025-06-08 20:58:28 +00:00
Viktor Barzin
e317d2ec54
use query params to filter out models; also make csv exporter work with models
2025-06-08 17:01:33 +00:00
Viktor Barzin
325823e631
refactor detect floorplan to use model listings
2025-06-07 14:30:32 +00:00
Viktor Barzin
4f5a934fa9
refactor dump listings to start using model instead of the data_access object
2025-06-07 12:46:53 +00:00
Viktor Barzin
842f7cefbe
merge dump listings and dump details commands - fetch both details and listings in the same command
2025-06-07 12:00:23 +00:00
Viktor Barzin
29213f3d26
refactor the semaphore when dumping listings
2025-06-06 20:09:16 +00:00
Viktor Barzin
f7fb891648
add models
2025-06-04 21:09:29 +00:00
Viktor Barzin
0acd417d34
add option to filter for min sqm per listing
2025-06-01 19:26:24 +00:00
Viktor Barzin
8b90ecde11
add filter for last seen days
2025-06-01 15:26:38 +00:00
Viktor Barzin
11315359d2
reuse query params when exporting to immoweb and allow filtering from available date
2025-06-01 15:17:14 +00:00
Viktor Barzin
9735db72a0
limit the number of concurrenct requests when dumping listings as right move blocks us
2025-06-01 00:27:12 +00:00
Viktor Barzin
24bf44caf9
trust env when fetching details and listings. this allows setting proxy env vars. I use this in conjunction with a tor proxy to avoid ip-bans from rightmove
2025-06-01 00:12:31 +00:00
Viktor Barzin
0b9d50af47
reformat with black; looks better
2025-05-31 23:50:43 +00:00
Viktor Barzin
1122f5a96f
use aiohttp for dumping listing query
2025-05-31 23:48:45 +00:00
Viktor Barzin
7e8c79d3d1
add command to export the data in a way that the ui (immoweb) can consume
2025-05-26 19:37:06 +00:00
Viktor Barzin
1e0f302178
defer transformers and pytesseract imports to when used. this shortens startup time of all other commands quite a bit
2025-05-21 21:24:57 +00:00
Viktor Barzin
10ae25e0d3
allow caching routing by destination and travel mode; also export all travel methods to csv column
2025-05-20 22:12:56 +00:00
Viktor Barzin
482fff689b
parameterize routing logic - extract api key as env var; allow searching dest by address; limit the number of listings to process to prevent accidental api key usage
2025-05-18 21:13:50 +00:00
Viktor Barzin
ea24847cae
add stratford region
2025-05-18 18:02:32 +00:00
Viktor Barzin
9f3e466b23
add filter for furnished/unfurnished type for rented listings
2025-05-18 17:22:48 +00:00
Viktor Barzin
b873eaf203
fix types and format
2025-05-18 12:30:49 +00:00
Viktor Barzin
01ac24b4b7
use aiohttp to fetch details concurrently
2025-05-17 22:34:27 +00:00
Viktor Barzin
3e7a144fb4
make dumping details async
2025-05-17 22:11:33 +00:00
Viktor Barzin
ad879f2d4f
convert listings dump to asyncio
2025-05-17 21:55:42 +00:00
Viktor Barzin
df24c2c1b7
add cli param for querying properties to rent
...
example:
python main.py --data-dir data/rs2 dump-listings --max-price 3500 --min-bedrooms 2 --max-bedrooms 4 --district islington -t rent
2025-05-17 21:22:39 +00:00
Viktor Barzin
ea56555884
refactor main.py click to use click commands to allow passing parameters to commands and enable fetching districts by district name
2025-05-14 19:42:08 +00:00
Kadir
bb2488a63b
Removing sqlalchemy and the db part as it was never used
2025-05-12 01:17:45 +01:00
Viktor Barzin
835494d29f
reformat most things
2025-05-07 21:25:40 +00:00
Kadir
2c2adcfa7c
improving OCR
2025-03-30 23:41:52 +01:00
Kadir
e0e7853c8c
updating districts of relevance to me
2025-02-14 21:35:44 +00:00
Kadir
dbf72e42e3
removing districts which are far away from the default crawl
2024-09-23 01:43:20 +01:00
Kadir
4dfbcc64c1
floorplan rec: fix regex rule + filter out extreme values
2024-08-25 12:11:55 +02:00
Kadir
2e94bda7fa
refac
2024-08-25 09:50:41 +02:00
Kadir
f98cd02696
adding better exception messages and interrupt free crawling
2024-05-06 18:54:55 +01:00
Kadir
874fed3e8f
format and committing notebook
2024-04-05 11:38:55 +01:00
Kadir
b16fee0648
changing to 500 listings per query
2024-04-01 20:28:15 +02:00
Kadir
a7a4f88a39
sort districts
2024-03-30 19:24:35 +01:00
Kadir
40285245d5
remove debug print statements
2024-03-30 18:35:32 +01:00
Kadir
ec86b572b3
adding districts and location identifier to search
2024-03-30 18:31:49 +01:00
Kadir
4c40462bb8
adding property type
2024-03-25 20:58:35 +00:00
Kadir
d777558b34
ruff format
2024-03-25 20:48:48 +00:00
Kadir
47f7b2b672
Adding initial walking time and identifier to the information
2024-03-25 20:47:31 +00:00
Kadir
4dea766a12
fixing floorplan detection and adding recalculation method
2024-03-18 00:56:39 +00:00
Kadir
d4f87bed76
add routing and util to the right places
2024-03-13 16:22:53 +00:00
Kadir
a9b8d4d630
adding musthave and lastXdays
2024-03-13 16:21:54 +00:00
Kadir
d108bf11ee
adding tesseract OCR for floorplan detection
2024-03-10 22:32:34 +00:00
Kadir
508aa02812
Real crawling scripts and floorplan detection
...
1. get all listings
2. get all detail jsons
3. get all images
4. get all floorplans
5. detecting floorplans
Also updating dependencies for huggingface etc.
2024-03-10 18:49:39 +00:00
Kadir
46bb641026
adding floorplans, detail json, refactored the folders
2024-03-07 22:02:09 +00:00
Kadir
e2f7998ee9
merging the visual query answering with the crawler. Monorepo go!
2024-03-01 16:42:48 +01:00