broker-sync/broker_sync/sinks/wealthfolio.py
Viktor Barzin c830856ba1 imap: route IE BUYs to ISA first-£20k / GIA overflow per UK tax year
## Context

Viktor's InvestEngine account has both an ISA and a GIA wrapper. Trade
confirmation emails (info@investengine.com) are identical between them —
subject "Here's how your portfolio looks now", body shows "Client name:
Viktor Barzin" with no portfolio/account type. That left the IMAP parser
hardcoded to route every IE BUY to the ISA (invest-engine-primary),
which produced a 2339-share over-count when 2023-24 GIA buys landed in
the ISA during the 2026-04-18 reconciliation.

Viktor's rule: from 6 April each tax year, BUYs fill ISA up to the
£20,000 cap, then overflow to GIA. This commit codifies that rule in a
standalone batch splitter and applies it at the ImapProvider boundary.

Also picks up a silent-drop bug surfaced during the same reconciliation:
WF's /import (unlike /import/check) rejects naive datetimes with
"Invalid date". The sink now coerces tzinfo=UTC defensively so every
provider gets the same guarantee.

## This change

- `_split_ie_by_isa_cap(activities)` — sorts all IE-ISA BUYs by date and
  walks them once per UK tax year (6 April boundary). A BUY whose running
  tax-year total BEFORE it is strictly below £20k stays on the ISA;
  otherwise it flips to a new `invest-engine-gia` account_id. No
  fractional splits — boundary activities go whole to whichever bucket
  their pre-running-total dictates. Non-IE and non-BUY activities pass
  through unchanged.
- `ImapProvider.accounts()` gains an `invest-engine-gia` Account so the
  pipeline's `_ensure_accounts` can resolve both.
- `ImapProvider.fetch()` calls the splitter on the full batch before
  applying the `since`/`before` date filter — batch-level sort
  guarantees consistent routing regardless of the order IMAP returns
  messages.
- `WealthfolioSink._activity_to_import_row` coerces naive datetimes to
  UTC so the row passes WF /import validation.

## What is NOT in this change

- No retroactive re-routing of data already in WF. Historical
  finance-mysql rows (all lumped to `invest-engine-primary` or
  `invest-engine-gia` by the existing heuristic) keep their current
  account assignment. If a past tax-year was routed "wrong" under the
  new rule, that's corrected manually via the WF API, not here.
- No change to the Schwab or trading212 paths.

## Verification

### Automated

\`\`\`
$ poetry run pytest tests/providers/test_imap.py -v
tests/providers/test_imap.py::test_uk_tax_year_start_before_april_6_rolls_back PASSED
tests/providers/test_imap.py::test_single_tax_year_under_cap_stays_isa PASSED
tests/providers/test_imap.py::test_overflow_past_cap_flips_to_gia PASSED
tests/providers/test_imap.py::test_tax_year_boundary_resets_cap PASSED
tests/providers/test_imap.py::test_out_of_order_activities_sorted_before_cap_applied PASSED
tests/providers/test_imap.py::test_non_ie_activities_passed_through_unchanged PASSED
6 passed in 0.36s

$ poetry run pytest -q --ignore=tests/test_cli.py
116 passed, 1 skipped in 2.76s

$ poetry run ruff check broker_sync/providers/imap.py broker_sync/sinks/wealthfolio.py
All checks passed!

$ poetry run mypy broker_sync/providers/imap.py broker_sync/sinks/wealthfolio.py
Success: no issues found in 2 source files
\`\`\`

### Manual verification

The tzinfo fix was validated against the live WF instance during the
2026-04-18 reconciliation — before the fix, /import returned
\`"errors": {"symbol": ["Invalid date '2022-05-24T00:00:00'."]}\` for
every IMAP activity; after, the same payload imported cleanly.

The splitter was not exercised against live IMAP data because Viktor's
mailbox only has Apr 2022 → Feb 2024 emails, all inside finance.position's
existing coverage. Running IMAP ingest with \`since=2024-04-06\` yields
fetched=0. The unit tests cover the boundary arithmetic; a live run
will happen when newer emails are parsed (or when finance coverage is
re-scoped).

## Reproduce locally

1. \`poetry install\`
2. \`poetry run pytest tests/providers/test_imap.py\`
3. Expected: 6 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:02:49 +00:00

264 lines
10 KiB
Python

from __future__ import annotations
import json
from collections.abc import Iterable
from datetime import UTC
from pathlib import Path
from typing import Any
import httpx
from broker_sync.models import Account, Activity
_LOGIN_PATH = "/api/v1/auth/login"
_ACCOUNTS_PATH = "/api/v1/accounts"
_IMPORT_CHECK = "/api/v1/activities/import/check"
_IMPORT_REAL = "/api/v1/activities/import"
class WealthfolioError(Exception):
pass
class WealthfolioUnauthorizedError(WealthfolioError):
"""Raised when login itself fails (bad creds or Wealthfolio down).
Distinct from a 401 on a random endpoint — those trigger an
automatic re-login attempt.
"""
class ImportValidationError(WealthfolioError):
"""`/activities/import/check` returned a non-2xx. We never reach the real import."""
class WealthfolioSink:
"""Push canonical Activities to Wealthfolio via its CSV import endpoint.
Auth is JWT-cookie via POST /api/v1/auth/login. Cookies are persisted
to disk so CronJob pods can reuse them across runs (Wealthfolio's
/auth/login is 5-req/min rate-limited).
Not multi-process safe — file locking is added in Phase 1 when we
fan out to multiple CronJobs.
"""
def __init__(
self,
*,
base_url: str,
username: str,
password: str,
session_path: Path | str,
transport: httpx.AsyncBaseTransport | None = None,
) -> None:
self._username = username
self._password = password
self._session_path = Path(session_path)
self._client = httpx.AsyncClient(
base_url=base_url.rstrip("/"),
timeout=30.0,
transport=transport,
)
async def close(self) -> None:
await self._client.aclose()
# -- session --
def _load_cookies(self) -> dict[str, str]:
if not self._session_path.exists():
return {}
raw = json.loads(self._session_path.read_text())
got = raw.get("cookies", {})
assert isinstance(got, dict)
return got
def _save_cookies(self, cookies: dict[str, str]) -> None:
self._session_path.parent.mkdir(parents=True, exist_ok=True)
self._session_path.write_text(json.dumps({"cookies": cookies}))
async def login(self) -> None:
# Wealthfolio 3.2's LoginRequest is `{ password: String }` only — a
# username key is rejected as an unknown field (HTTP 400). The
# `username` constructor arg is kept for a future Wealthfolio
# release that may add multi-user support.
resp = await self._client.post(
_LOGIN_PATH,
json={"password": self._password},
)
if resp.status_code == 401:
raise WealthfolioUnauthorizedError("Wealthfolio /auth/login returned 401")
resp.raise_for_status()
cookies = dict(resp.cookies.items())
if not cookies:
raise WealthfolioError("/auth/login returned 2xx but no Set-Cookie")
self._save_cookies(cookies)
@staticmethod
def _cookie_header(cookies: dict[str, str]) -> str:
return "; ".join(f"{k}={v}" for k, v in cookies.items())
async def _request(self, method: str, path: str, **kw: Any) -> httpx.Response:
cookies = self._load_cookies()
headers = dict(kw.pop("headers", {}) or {})
if cookies:
headers["Cookie"] = self._cookie_header(cookies)
resp = await self._client.request(method, path, headers=headers, **kw)
if resp.status_code == 401 and path != _LOGIN_PATH:
await self.login()
cookies = self._load_cookies()
headers["Cookie"] = self._cookie_header(cookies)
resp = await self._client.request(method, path, headers=headers, **kw)
return resp
# -- accounts --
async def list_accounts(self) -> list[dict[str, Any]]:
resp = await self._request("GET", _ACCOUNTS_PATH)
resp.raise_for_status()
raw = resp.json()
assert isinstance(raw, list)
return raw
async def ensure_account(self, account: Account) -> str:
"""Idempotently create the account and return Wealthfolio's UUID for it.
Wealthfolio generates its own UUIDs on POST /accounts, ignoring any
`id` we supply. We identify accounts by (provider, providerAccountId)
which Wealthfolio DOES preserve verbatim. Our own Account.id is
used as the providerAccountId.
"""
existing = await self.list_accounts()
for a in existing:
if (a.get("provider") == account.provider and a.get("providerAccountId") == account.id):
wf_id = a.get("id")
assert isinstance(wf_id, str)
return wf_id
# NewAccount is camelCase with required booleans.
# See apps/server/src/models.rs#NewAccount.
resp = await self._request(
"POST",
_ACCOUNTS_PATH,
json={
"name": account.name,
"accountType": str(account.account_type),
"currency": account.currency,
"isDefault": False,
"isActive": True,
"isArchived": False,
"trackingMode": "TRANSACTIONS",
"provider": account.provider,
"providerAccountId": account.id,
},
)
resp.raise_for_status()
created = resp.json()
wf_id = created.get("id")
if not isinstance(wf_id, str):
raise WealthfolioError(f"POST /accounts returned no id: {created}")
return wf_id
# -- activity import --
@staticmethod
def _activity_to_import_row(a: Activity) -> dict[str, Any]:
"""Match Wealthfolio's ActivityImport struct (camelCase JSON)."""
# WF /import rejects naive datetimes with "Invalid date" (even though
# /import/check accepts them) — coerce to UTC if tzinfo is missing.
date = a.date if a.date.tzinfo is not None else a.date.replace(tzinfo=UTC)
row: dict[str, Any] = {
"date": date.isoformat(),
"symbol": a.symbol or "$CASH",
"activityType": str(a.activity_type),
"currency": a.currency,
"accountId": a.account_id,
# Required booleans on ActivityImport (no defaults in Rust struct).
"isDraft": False,
"isValid": True,
}
if a.quantity is not None:
row["quantity"] = format(a.quantity, "f")
if a.unit_price is not None:
row["unitPrice"] = format(a.unit_price, "f")
if a.amount is not None:
row["amount"] = format(a.amount, "f")
if a.fee:
row["fee"] = format(a.fee, "f")
if a.notes:
row["comment"] = a.notes
return row
async def import_activities(self, activities: Iterable[Activity]) -> list[dict[str, Any]]:
rows = [self._activity_to_import_row(a) for a in activities]
if not rows:
return []
# Step 1 — /import/check hydrates each row with resolved asset_id,
# exchange_mic, quote_ccy, instrument_type, quote_mode (and flags
# errors). The /import endpoint on Wealthfolio 3.2+ DOES NOT
# re-resolve — if we send the un-enriched row the activity is
# silently dropped (import returns 200 OK with activities=[] in
# the payload). We must feed check's output into import.
check = await self._request("POST", _IMPORT_CHECK, json={"activities": rows})
if check.status_code >= 400:
try:
payload = check.json()
except Exception:
payload = {"raw": check.text}
raise ImportValidationError(f"Wealthfolio /import/check rejected: {payload}")
checked = check.json()
if not isinstance(checked, list):
raise ImportValidationError(
f"Wealthfolio /import/check returned non-list: {type(checked).__name__}")
invalid = [r for r in checked if isinstance(r, dict) and r.get("errors")]
if invalid:
raise ImportValidationError(f"Wealthfolio /import/check flagged {len(invalid)} row(s); "
f"first: {invalid[0]}")
# Drop any row the server marked is_valid=false (shouldn't happen
# without errors, but defensive).
valid_rows = [r for r in checked if isinstance(r, dict) and r.get("isValid")]
real = await self._request("POST", _IMPORT_REAL, json={"activities": valid_rows})
real.raise_for_status()
raw = real.json()
# Two observed response shapes:
# - {activities:[...], importRunId:"...", summary:{total,imported,skipped,...}}
# - bare list (older builds)
if isinstance(raw, dict) and "activities" in raw:
got = raw["activities"]
summary = raw.get("summary") if isinstance(raw.get("summary"), dict) else None
elif isinstance(raw, list):
got = raw
summary = None
else:
got = []
summary = None
# Summary.imported is THE truth. The `activities` field echoes input
# with errors annotated — its length equals input even when zero
# actually persisted.
if summary is not None:
imported_n = int(summary.get("imported", 0))
total_n = int(summary.get("total", len(valid_rows)))
if imported_n < total_n:
err_msg = summary.get("errorMessage") or "no errorMessage"
skipped = int(summary.get("skipped", 0))
dupes = int(summary.get("duplicates", 0))
raise ImportValidationError(f"Wealthfolio /import persisted {imported_n}/{total_n} "
f"(skipped={skipped} duplicates={dupes}). "
f"errorMessage: {err_msg}")
# Legacy silent-drop guard for no-summary responses.
elif valid_rows and not got:
first_warn = next(
(r.get("warnings") for r in checked if isinstance(r, dict) and r.get("warnings")),
None,
)
raise ImportValidationError(
f"Wealthfolio /import silently dropped all {len(valid_rows)} rows. "
f"First checked row: {checked[0] if checked else 'none'}. "
f"First warning: {first_warn}")
assert isinstance(got, list)
return [r for r in got if isinstance(r, dict)]