Commit graph

245 commits

Author SHA1 Message Date
Kadir
2c2adcfa7c improving OCR 2025-03-30 23:41:52 +01:00
Kadir
1b69fd4305 adding new dependencies 2025-03-02 17:11:48 +00:00
Kadir
244de72877 reverting isRemoved function to fix 2025-02-16 04:44:42 +00:00
Kadir
302ca95cfb fixing full detail dumping 2025-02-16 03:02:21 +00:00
Kadir
e0e7853c8c updating districts of relevance to me 2025-02-14 21:35:44 +00:00
Kadir
bb765a9312 making last_seen to a relative date 2025-02-14 21:21:50 +00:00
Kadir
29c8d1960b adding last seen date into the listing 2025-01-26 21:41:18 +00:00
Kadir
4b6b8628c2 add runall script, update parameters to 4 bed etc and allow incremental updating 2024-11-23 22:57:22 +00:00
Kadir
dbf72e42e3 removing districts which are far away from the default crawl 2024-09-23 01:43:20 +01:00
Kadir
f66007bc85 adding status and other fields 2024-09-22 11:31:32 +01:00
Kadir
4dfbcc64c1 floorplan rec: fix regex rule + filter out extreme values 2024-08-25 12:11:55 +02:00
Kadir
2e94bda7fa refac 2024-08-25 09:50:41 +02:00
Kadir
6d343e52e7 adding days updated 2024-08-11 19:36:25 +01:00
Kadir
f98cd02696 adding better exception messages and interrupt free crawling 2024-05-06 18:54:55 +01:00
Kadir
874fed3e8f format and committing notebook 2024-04-05 11:38:55 +01:00
Kadir
69a5bba65e adding service charge to excel 2024-04-05 11:38:31 +01:00
Kadir
b5e316d1df making sure the routing is only applied to listings which are not done already 2024-04-05 11:38:09 +01:00
Kadir
966b9007a0 adding service charge as the info into the sheet 2024-04-05 11:37:03 +01:00
Kadir
5305451fe8 changing detail downloads to prefiltering first. Making the progress bar more accurate and frontloading 2024-04-01 20:28:37 +02:00
Kadir
b16fee0648 changing to 500 listings per query 2024-04-01 20:28:15 +02:00
Kadir
7ce54a01bd removing download of the normal images.. I dont use them 2024-04-01 15:26:34 +02:00
Kadir
c8c53b8696 Reduced need for routing by limiting to a radius of 7 miles 2024-04-01 15:22:08 +02:00
Kadir
6a43a7f485 adding tasks and updating exploration notebook 2024-04-01 15:10:15 +02:00
Kadir
a7a4f88a39 sort districts 2024-03-30 19:24:35 +01:00
Kadir
acc67192c9 add counter to 2_dump_details to understand how many new are crawled 2024-03-30 19:24:03 +01:00
Kadir
5720e68547 Rewriting 1_dump to enabling crawling of more real estate 2024-03-30 19:23:19 +01:00
Kadir
40285245d5 remove debug print statements 2024-03-30 18:35:32 +01:00
Kadir
ec86b572b3 adding districts and location identifier to search 2024-03-30 18:31:49 +01:00
Kadir
4c40462bb8 adding property type 2024-03-25 20:58:35 +00:00
Kadir
d777558b34 ruff format 2024-03-25 20:48:48 +00:00
Kadir
37e3e8ad6f using ruff 2024-03-25 20:48:27 +00:00
Kadir
d5a0964b12 exploration notebook 2024-03-25 20:47:50 +00:00
Kadir
47f7b2b672 Adding initial walking time and identifier to the information 2024-03-25 20:47:31 +00:00
Kadir
ce632c795d fixing gitignores 2024-03-25 20:47:15 +00:00
Kadir
e79e24ff98 fix: remove debug statement 2024-03-18 01:02:06 +00:00
Kadir
4dea766a12 fixing floorplan detection and adding recalculation method 2024-03-18 00:56:39 +00:00
Kadir
335adc0856 add routing, incremental crawling, travel time, lease and development 2024-03-13 16:24:57 +00:00
Kadir
d4f87bed76 add routing and util to the right places 2024-03-13 16:22:53 +00:00
Kadir
a9b8d4d630 adding musthave and lastXdays 2024-03-13 16:21:54 +00:00
Kadir
36258d877f crawling for 3 and refactoring to allow incremental crawls 2024-03-11 14:43:53 +00:00
Kadir
de2639f9c3 fixing bugs, adding properties for querying and analysis§ 2024-03-11 09:44:37 +00:00
Kadir
d108bf11ee adding tesseract OCR for floorplan detection 2024-03-10 22:32:34 +00:00
Kadir
508aa02812 Real crawling scripts and floorplan detection
1. get all listings
2. get all detail jsons
3. get all images
4. get all floorplans
5. detecting floorplans

Also updating dependencies for huggingface etc.
2024-03-10 18:49:39 +00:00
Kadir
46bb641026 adding floorplans, detail json, refactored the folders 2024-03-07 22:02:09 +00:00
Kadir
e2f7998ee9 merging the visual query answering with the crawler. Monorepo go! 2024-03-01 16:42:48 +01:00