Refactor backend for cleaner error handling, DRY, and type safety

- Extract rate limiter DRY: consolidate 3 duplicated check/respond paths into _check_counter and _enforce_limit helpers, add proper type annotations - Replace bare Exception raises with FloorplanDownloadError and RightmoveApiError; narrow catch clauses to specific exception types; fix Step base class to inherit from ABC - Consolidate MAX_OCR_WORKERS into config/scraper_config.py; extract _find_tenure_value helper to deduplicate tenure parsing - Extract _build_poi_distances_lookup from stream endpoint to reduce nesting - Fix csv_exporter: optional decisions.json, NaN instead of -1 sentinels, guard against division by zero on missing square meters - Fix notifications.py broken list[Surface]() constructor, database.py stale comments and missing type annotation, auth.py type:ignore, ui_exporter.py stale TODO - Fix 3 pre-existing test failures: mock cache layer in streaming tests, bypass rate limiter for test isolation, fix cache invalidation test to account for two-pattern scan loop
2026-02-10 22:19:24 +00:00 · 2026-02-10 22:19:24 +00:00 · f833309297
commit f833309297
parent 6897820cc7
20 changed files with 199 additions and 178 deletions
--- a/csv_exporter.py
+++ b/csv_exporter.py
@ -14,27 +14,30 @@ async def export_to_csv(
    df = pd.DataFrame(ds)

    # read decisions on file
-    decisions_path = "data/decisions.json"
-    decisions = pd.read_json(decisions_path)
-    df.loc[:, "decision"] = df.id.apply(lambda x: decisions.get(x))
+    decisions_path = Path("data/decisions.json")
+    if decisions_path.exists():
+        decisions = pd.read_json(decisions_path)
+        df.loc[:, "decision"] = df.id.apply(lambda x: decisions.get(x))

    # remove _sa_instance_state column
    drop_columns = ["_sa_instance_state", "additional_info"]
    df = df.drop(columns=drop_columns)

-    # fill in gap values for service charge and lease left for Excel filters
-    if "service_charge" not in df.columns:
-        df.loc[:, "service_charge"] = -1
-    df.loc[:, "service_charge"] = df.service_charge.fillna(-1)
-    if "lease_left" not in df.columns:
-        df.loc[:, "lease_left"] = -1
-    df.loc[:, "lease_left"] = df.lease_left.fillna(-1)
-    if "square_meters" not in df.columns:
-        df.loc[:, "square_meters"] = -1
-    df.loc[:, "square_meters"] = df.square_meters.fillna(-1)
+    # Ensure columns exist with NaN defaults for clean CSV output
+    for col in ("service_charge", "lease_left", "square_meters"):
+        if col not in df.columns:
+            df.loc[:, col] = float("nan")

-    # Add price per sqm column
-    df.loc[:, "price_per_sqm"] = df.price / df.square_meters
+    # Replace -1 sentinel values with NaN
+    df.loc[:, "square_meters"] = df.square_meters.replace({-1: float("nan")})
+
+    # Add price per sqm column (guard against zero/missing square_meters)
+    df.loc[:, "price_per_sqm"] = df.apply(
+        lambda row: round(row.price / row.square_meters, 2)
+        if row.square_meters and row.square_meters > 0
+        else None,
+        axis=1,
+    )

    df = df.sort_values(by=["price_per_sqm"], ascending=True)
    df.to_csv(str(output_file), index=False)