Commit graph

10 commits

Author SHA1 Message Date
Viktor Barzin
4e2da87637 sinks: detect silent Wealthfolio /import drops
After the check step returns isValid=true + no errors, a row can still
be silently dropped by /import (response returns activities=[] on
200 OK). Root-cause is usually a field that check hydrates but
/import re-normalises differently (date string form, asset_id resolution).

When we send N valid rows and get back 0, raise ImportValidationError
with a snippet of the check output + first warning — gives the
operator a concrete hint to fix the producer instead of silently
growing dedup against activities that never landed.

poetry run pytest -q   →  109 passed, 1 skipped
poetry run mypy        →  clean
poetry run ruff check  →  clean
2026-04-17 22:24:36 +00:00
Viktor Barzin
6efd03570a Add imap-ingest CLI + ImapProvider: route emails to IE/Schwab parsers
Wires the IE + Schwab email parsers into an actual runnable sync. Walks
the IMAP mailbox, routes each message by sender domain:
  - *@investengine.com → invest_engine.parse_invest_engine_email
  - *@schwab.com       → schwab.parse_schwab_email
then pushes the resulting Activities through the shared pipeline.

broker-sync imap-ingest — new CLI command taking IMAP_HOST/USER/PASSWORD/
DIRECTORY (mirrors the old wealthfolio-sync image's env shape so the
Terraform CronJob's existing env wiring works unchanged).

Verified: poetry run pytest -q → 109 passed + 1 skipped; mypy strict
clean (37 files); ruff + yapf clean.
2026-04-17 22:12:05 +00:00
Viktor Barzin
b363032e42 sinks: feed /import/check enrichment into /import body
/import/check hydrates each ActivityImport with resolved assetId,
exchangeMic, quoteCcy, instrumentType, quoteMode. The /import endpoint
on Wealthfolio 3.2 does NOT re-resolve — passing an un-enriched row
returns 200 OK but silently drops the activity (activities=[] in the
response).

The first live run returned `imported=63 failed=0` but nothing reached
the database. Fixed by posting the hydrated rows from the check response
to /import instead of the original.

Requires the test to also return list-shaped check responses (matches
the upstream Json<Vec<ActivityImport>> signature on the Rust side).

poetry run pytest -q     70 passed
poetry run mypy          clean
poetry run ruff check    clean
2026-04-17 20:54:17 +00:00
Viktor Barzin
80ca009373 Match Wealthfolio accounts by providerAccountId, remap accountId on import
Context: Wealthfolio 3.2 generates its own UUIDs on POST /accounts, ignoring any
`id` we supply. Our logical Account.id lives on as `providerAccountId`, which
WF preserves verbatim.

Live run created six duplicate accounts because ensure_account looked up by
our `id`, never found it, and POSTed a new account on every attempt. Deleted
the duplicates manually via DELETE /accounts/{id}.

This change:
- ensure_account now returns Wealthfolio's UUID; matches existing via
  (provider, providerAccountId)
- pipeline remaps activity.account_id to the WF UUID at submission time
  but keeps dedup keyed on our stable id (WF resets must not blow away
  the whole dedup history)
- test updates to the new account-shape + dedup key expectations

poetry run pytest -q    70 passed
poetry run mypy         clean
poetry run ruff check   clean
2026-04-17 20:44:32 +00:00
Viktor Barzin
ba672a1633 sinks: add required isDraft/isValid fields on ActivityImport 2026-04-17 20:37:38 +00:00
Viktor Barzin
1d23bf6ed7 sinks: switch Wealthfolio import to JSON body (not multipart CSV)
/api/v1/activities/import expects Content-Type: application/json with body
{"activities": [ActivityImport, ...]} where ActivityImport is camelCase:
{date, symbol, activityType, quantity?, unitPrice?, currency, fee?, amount?,
comment?, accountId?, ...}. Source: crates/core/src/activities/activities_model.rs

Live run failure was HTTP 415 Unsupported Media Type because we were uploading
a CSV in multipart/form-data; that endpoint is JSON-only.

Also handle two response shapes on import — older builds return a list, current
build returns {activities: [...]}.

Test plan:
  poetry run pytest -q   →  70 passed
  poetry run mypy        →  clean
  poetry run ruff check  →  clean
2026-04-17 20:34:12 +00:00
Viktor Barzin
ea881e272b sinks: match Wealthfolio NewAccount camelCase schema + required booleans
Wealthfolio 3.2's POST /api/v1/accounts was 422ing on live traffic — its
NewAccount struct uses camelCase field names and requires isDefault +
isActive as booleans. Reference:
https://github.com/afadil/wealthfolio/blob/main/apps/server/src/models.rs#L~145

Sends trackingMode=TRANSACTIONS so Wealthfolio computes holdings from
our imported activities (vs HOLDINGS mode which requires periodic
holdings snapshots). Populates providerAccountId so the broker account
is traceable back to our sync's id scheme.

Test plan:
  poetry run pytest -q   →  70 passed
  poetry run mypy        →  clean
  poetry run ruff check  →  clean

Live re-run of the backfill Job follows this commit's image rebuild.
2026-04-17 20:29:43 +00:00
Viktor Barzin
66cf0e0399 Fix live Wealthfolio login + Dockerfile poetry path
Context
-------
Two live-integration bugs surfaced during the Phase 0.5 auth-spike
run against the restored production Wealthfolio.

1. Wealthfolio 3.2's LoginRequest schema is `{ password: String }` —
   it rejects any request with an unknown `username` field as HTTP
   400 (empty body, hard to debug). Upstream source:
   https://github.com/afadil/wealthfolio/blob/main/apps/server/src/auth.rs#L86-L88

2. Dockerfile referenced `/opt/poetry/bin/poetry` but pip install
   puts poetry on the normal PATH; POETRY_HOME only affects the
   self-installer, not `pip install`. Exit 127 in GHA build.

This change
-----------
- WealthfolioSink.login() sends `{password}` only; kept `username`
  constructor arg as a stub for the day Wealthfolio adds multi-user.
- Dockerfile drops POETRY_HOME and uses `poetry` on PATH.
- Test: `_login_ok` now asserts body == {"password": "hunter2"}
  ("hunter2" is the XKCD placeholder — not a real credential).

Test plan
---------
## Automated
- poetry run pytest -q  →  70 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 29 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification (executed live)
```
kubectl -n wealthfolio port-forward svc/wealthfolio 18080:80 &
WF_BASE_URL=http://localhost:18080 WF_USERNAME=admin \
WF_PASSWORD=<from-vault> \
poetry run broker-sync auth-spike
→ "Logged in. 1 account(s) visible."
```
2026-04-17 20:17:24 +00:00
Viktor Barzin
e7da408a85 Add WealthfolioSink with CSV import + cookie reuse
Context
-------
This is the Phase 0.5 deliverable — the hardest-to-validate unknown
in the plan. Wealthfolio auth is JWT HttpOnly cookie with a 5-req/min
login rate limit. CronJob pods are ephemeral, so we persist cookies
to disk between runs (shared PVC in production).

Plan stress-test also flagged: use the CSV import path, not per-row
JSON POST. Wealthfolio's UI uses /activities/import and its dedup
logic is battle-tested; CSVs double as audit artefacts we can replay.

This change
-----------
- WealthfolioSink (httpx async): login with username/password, persists
  cookie dict to session_path on disk, attaches it as a Cookie header
  on subsequent calls.
- 401 on a non-login endpoint triggers a single re-login + retry.
- ensure_account() is idempotent — GETs the account list first, only
  POSTs /accounts if id is missing.
- import_activities() always runs /activities/import/check first; any
  non-2xx there raises ImportValidationError and we never touch the
  real import endpoint. Protects against half-written state when the
  broker emits a symbol Wealthfolio doesn't know.
- httpx.MockTransport-based tests cover: login persistence, 401 on
  login raises UnauthorizedError, session reuse from disk, 401 retry
  path, ensure_account idempotency + creation, import dry-run-then-real
  sequencing, halt on check failure.

Not yet covered (deferred):
- Multi-process file lock on session_path (single-process enough for
  now; Phase 1 adds it when multiple CronJobs run concurrently).
- 429 jittered backoff (TBD when Wealthfolio actually rate-limits us).

Test plan
---------
## Automated
- poetry run pytest -q  →  31 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 17 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Live auth spike against https://wealthfolio.viktorbarzin.me deferred
until the password is seeded into Vault at secret/broker-sync/wealthfolio
in a follow-up commit (needs Viktor's Vault session).
2026-04-17 19:22:34 +00:00
Viktor Barzin
f306dc9605 Add Provider protocol and normaliser
Context
-------
Every broker connector needs a uniform shape so the orchestrator can
fan out without knowing provider-specific details. Normalisation (GBP
conversion) lives outside providers on purpose — keeping providers
native-currency-emitters means we can re-normalise historical activity
when HMRC rates land without re-fetching from the broker.

This change
-----------
- providers/base.py: Provider Protocol with `accounts()` and async
  `fetch(since, before)` iterator. No abstract base class — duck-typed
  Protocol so each concrete provider stays independent.
- normaliser.py: takes a native Activity + FxCache, returns a copy
  with amount_gbp/fx_rate_gbp/fx_rate_source filled in. Two modes:
  qty*price for BUY/SELL, amount for DIVIDEND/DEPOSIT/etc.
- Namespace packages for providers/, providers/parsers/, sinks/ so
  future modules slot in cleanly.

Test plan
---------
## Automated
- poetry run pytest -q  →  23 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 14 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable at this layer.
2026-04-17 19:20:12 +00:00