Commit graph

55 commits

Author SHA1 Message Date
Viktor Barzin
c2196c15c1
[2/n] click-ify - add 2_dump_detail command
run with
poetry run python main.py --step dump_detail
2025-05-11 19:02:23 +00:00
Viktor Barzin
90b531f5d9
[1/n] click-ify - add entrypoint for click script and add
1_dump_listings command

run via:
poetry run python main.py --step dump_listings
2025-05-11 18:59:41 +00:00
Viktor Barzin
0a66efa48a
update pyproject.toml to use latest version of dev dependency section 2025-05-07 21:56:12 +00:00
Viktor Barzin
835494d29f
reformat most things 2025-05-07 21:25:40 +00:00
Viktor Barzin
bd7c781adb
remove requirements.txt as we are using poetry...im blind.. 2025-05-07 21:06:43 +00:00
Viktor Barzin
028eef63a8
add readme and .style.yapf for formatting 2025-05-07 20:58:22 +00:00
Viktor Barzin
d24b667e73
add sqlalchemy to requirements.txt 2025-05-07 20:57:28 +00:00
Viktor Barzin
dd2de477bc
Add requirements.txt 2025-05-07 17:31:29 +00:00
Kadir
8f8956818e refactor routing executor to:
- be nicer
- include checks on floorplan
- include checks on if listing disabled
2025-03-31 03:06:41 +01:00
Kadir
cd21bd0bb6 notebooks 2025-03-30 23:42:29 +01:00
Kadir
2c2adcfa7c improving OCR 2025-03-30 23:41:52 +01:00
Kadir
1b69fd4305 adding new dependencies 2025-03-02 17:11:48 +00:00
Kadir
244de72877 reverting isRemoved function to fix 2025-02-16 04:44:42 +00:00
Kadir
302ca95cfb fixing full detail dumping 2025-02-16 03:02:21 +00:00
Kadir
e0e7853c8c updating districts of relevance to me 2025-02-14 21:35:44 +00:00
Kadir
bb765a9312 making last_seen to a relative date 2025-02-14 21:21:50 +00:00
Kadir
29c8d1960b adding last seen date into the listing 2025-01-26 21:41:18 +00:00
Kadir
4b6b8628c2 add runall script, update parameters to 4 bed etc and allow incremental updating 2024-11-23 22:57:22 +00:00
Kadir
dbf72e42e3 removing districts which are far away from the default crawl 2024-09-23 01:43:20 +01:00
Kadir
f66007bc85 adding status and other fields 2024-09-22 11:31:32 +01:00
Kadir
4dfbcc64c1 floorplan rec: fix regex rule + filter out extreme values 2024-08-25 12:11:55 +02:00
Kadir
2e94bda7fa refac 2024-08-25 09:50:41 +02:00
Kadir
6d343e52e7 adding days updated 2024-08-11 19:36:25 +01:00
Kadir
f98cd02696 adding better exception messages and interrupt free crawling 2024-05-06 18:54:55 +01:00
Kadir
874fed3e8f format and committing notebook 2024-04-05 11:38:55 +01:00
Kadir
69a5bba65e adding service charge to excel 2024-04-05 11:38:31 +01:00
Kadir
b5e316d1df making sure the routing is only applied to listings which are not done already 2024-04-05 11:38:09 +01:00
Kadir
966b9007a0 adding service charge as the info into the sheet 2024-04-05 11:37:03 +01:00
Kadir
5305451fe8 changing detail downloads to prefiltering first. Making the progress bar more accurate and frontloading 2024-04-01 20:28:37 +02:00
Kadir
b16fee0648 changing to 500 listings per query 2024-04-01 20:28:15 +02:00
Kadir
7ce54a01bd removing download of the normal images.. I dont use them 2024-04-01 15:26:34 +02:00
Kadir
c8c53b8696 Reduced need for routing by limiting to a radius of 7 miles 2024-04-01 15:22:08 +02:00
Kadir
6a43a7f485 adding tasks and updating exploration notebook 2024-04-01 15:10:15 +02:00
Kadir
a7a4f88a39 sort districts 2024-03-30 19:24:35 +01:00
Kadir
acc67192c9 add counter to 2_dump_details to understand how many new are crawled 2024-03-30 19:24:03 +01:00
Kadir
5720e68547 Rewriting 1_dump to enabling crawling of more real estate 2024-03-30 19:23:19 +01:00
Kadir
40285245d5 remove debug print statements 2024-03-30 18:35:32 +01:00
Kadir
ec86b572b3 adding districts and location identifier to search 2024-03-30 18:31:49 +01:00
Kadir
4c40462bb8 adding property type 2024-03-25 20:58:35 +00:00
Kadir
d777558b34 ruff format 2024-03-25 20:48:48 +00:00
Kadir
37e3e8ad6f using ruff 2024-03-25 20:48:27 +00:00
Kadir
d5a0964b12 exploration notebook 2024-03-25 20:47:50 +00:00
Kadir
47f7b2b672 Adding initial walking time and identifier to the information 2024-03-25 20:47:31 +00:00
Kadir
ce632c795d fixing gitignores 2024-03-25 20:47:15 +00:00
Kadir
e79e24ff98 fix: remove debug statement 2024-03-18 01:02:06 +00:00
Kadir
4dea766a12 fixing floorplan detection and adding recalculation method 2024-03-18 00:56:39 +00:00
Kadir
335adc0856 add routing, incremental crawling, travel time, lease and development 2024-03-13 16:24:57 +00:00
Kadir
d4f87bed76 add routing and util to the right places 2024-03-13 16:22:53 +00:00
Kadir
a9b8d4d630 adding musthave and lastXdays 2024-03-13 16:21:54 +00:00
Kadir
36258d877f crawling for 3 and refactoring to allow incremental crawls 2024-03-11 14:43:53 +00:00