Commit graph

33 commits

Author SHA1 Message Date
Viktor Barzin
1d1e20b72b schwab: detect vest-confirmation emails + emit VestEvent
Some checks are pending
CI / test (push) Waiting to run
CI / build (push) Blocked by required conditions
CI / deploy (push) Blocked by required conditions
Extends parse_schwab_email to handle Schwab's RSU Release Confirmation
emails alongside the existing trade confirmations. Adds:

- `VestEvent` dataclass in models.py — carries vest_date, ticker,
  shares_vested, shares_sold_to_cover, fmv_at_vest_usd, tax_withheld_usd.
  Written to payslip_ingest.rsu_vest_events by a postgres sink (pending
  a real email fixture + cross-service DB grant).
- `parse_schwab_email_full()` — new entry point returning both
  `list[Activity]` and `VestEvent | None`. The legacy
  `parse_schwab_email()` shape is preserved for existing callers.
- Vest-release dispatch heuristic: HTML body mentions "Release
  Confirmation" / "Award Vesting" / "RSU Release". On match, extract
  vest fields via label regexes; the full vest becomes a BUY Activity
  and the sell-to-cover slice becomes a SELL Activity at the same FMV
  (net zero cash on the day). Gross vest + sell-to-cover returned so
  Wealthfolio gets the full portfolio picture.
- Tests: 3 new (vest roundtrip, unparseable-vest safety, legacy shape
  preserved); existing 6 unchanged.

The regex heuristics will need tightening once a real email sample
exists — the HTML structure observed in public Schwab emails may
differ in material ways. For now, unmatched vest bodies return
empty-result (no Activity, no VestEvent) rather than crashing the
IMAP batch.

Part of: code-860
2026-04-19 18:27:58 +00:00
Viktor Barzin
6f3bcea23e ci: fix ruff E501 + mypy None-comparison warning
test_imap.py:49 — one-line comment ran past the 100-char line limit
introduced in commit c830856. Split the "£20,000 cap" note onto its
own line above the call.

test_fidelity_planviewer.py:108 — mypy flagged `offset.amount > 0`
where amount is typed Decimal | None. Added an explicit `is not None`
guard; runtime behaviour unchanged (we already check offset is not
None two lines earlier).

$ poetry run ruff check . → All checks passed!
$ poetry run mypy broker_sync tests → Success: no issues found in 43 source files
$ poetry run pytest -q → 133 passed, 1 skipped

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 22:52:38 +00:00
Viktor Barzin
6450201af0 pipeline: emit matching DEPOSIT/WITHDRAWAL for every BUY/SELL
## Context

The 2026-04-18 reconciliation ended with Wealthfolio's historical Net
Worth chart showing cliff-jumps on 5 dates — the single-day lump cash
offsets we'd posted to "zero out" phantom cash. An operational fix
replaced those 6 lumps with 231 per-BUY/SELL matched DEPOSIT/WITHDRAWAL
rows (see code-r9n note). That made the chart smooth — but only for
today's data. Any future broker-sync run would re-introduce phantom
cash because providers emit BUY/SELL only; nothing on the cash side.

This commit bakes the match into the pipeline so **future syncs
self-balance cash at import time** and the chart stays smooth.

## This change

- broker_sync/pipeline.py
  - New _matched_cash_flow(a): returns a DEPOSIT for a BUY (amount =
    qty * unit_price + fee) or a WITHDRAWAL for a SELL (amount =
    qty * unit_price - fee). Returns None for every other activity
    type — DEPOSIT/WITHDRAWAL/DIVIDEND/etc. already touch cash
    directly. The synthetic activity carries a deterministic
    external_id `cash-flow-match:<buy|sell>:<original external_id>`
    so SyncRecordStore dedup handles idempotency across runs.
  - New _with_cash_flow_match(a): expand helper — returns [a] or
    [a, match]. Pure, testable.
  - sync_provider_to_wealthfolio loops over the expansion, so each
    activity may now contribute up to two rows to the batch. `fetched`
    still counts provider-side activities only; `new_after_dedup` +
    `imported` + `failed` count expanded rows.
- tests/test_pipeline.py
  - Updated two existing pipeline integration tests to reflect the
    now-larger batch shape (3 BUYs become 6 rows after expansion).
  - 5 new unit tests for the helpers: BUY → DEPOSIT with fee,
    SELL → WITHDRAWAL net of fee, DEPOSIT/WITHDRAWAL/DIVIDEND pass
    through, zero-amount trades skipped, _with_cash_flow_match
    returns the right cardinality.

## What is NOT in this change

- Provider-level opt-out (e.g., Provider.emits_matching_cash_flow =
  True). No current provider emits real cash flows alongside trades
  (Trading212 only calls /orders, not /transactions), so the default
  "always match" is safe. If we ever wire a provider that pulls real
  bank-transfer dates, add the opt-out then.
- Retroactive cleanup of already-imported WF accounts — already done
  operationally today.

## Verification

### Automated

$ poetry run pytest tests/test_pipeline.py -v
7 passed in 0.40s

$ poetry run pytest -q
133 passed, 1 skipped in 8.58s

$ poetry run mypy broker_sync/pipeline.py tests/test_pipeline.py
Success: no issues found in 2 source files

$ poetry run ruff check broker_sync/pipeline.py tests/test_pipeline.py
All checks passed!

### Manual — next sync

Once this image ships and broker-sync-trading212 / broker-sync-imap /
broker-sync-fidelity run, confirm:
1. kubectl -n broker-sync logs job/<next-run> → fetched=N new=2N
   imported=2N failed=0 (doubled due to matches).
2. WF /api/v1/holdings?accountId=<uuid> → cash ≈ £0 for every currency
   after import.
3. Net Worth chart has no new cliff-jumps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 19:12:49 +00:00
Viktor Barzin
804e6a89de fidelity-planviewer: wire provider to real PlanViewer session + JSON API
## Context

Prior commit 832732a scaffolded the provider with a stub fetch() that
raised FidelityProviderConfigError. This commit replaces the stub with
the end-to-end ingest flow, validated against the real PlanViewer site
during a live login session on 2026-04-18.

Fidelity UK PlanViewer mixes a legacy Struts2 HTML app
(www.planviewer.fidelity.co.uk) with a React SPA at
pv.planviewer.fidelity.co.uk. Authentication is PingFederate OAuth2 at
id.fidelity.co.uk — password + memorable word + SMS OTP, with a
remember-device cookie that keeps the session alive for weeks. The
transaction history is server-rendered HTML at DisplayMyPlanMemberTransHist.action;
current fund holdings come from the DisplayValuation.action JSON XHR.

Both live behind the same cookie jar, so one Playwright session (seeded
interactively once, kept alive via storage_state) can scrape both.

## This change

- broker_sync/providers/parsers/fidelity.py (NEW)
  - parse_transactions_html: extracts cash-impacting rows from the
    #myplan_member_transhist_support table, skips Bulk Switches (no cash
    movement), emits FidelityCashTx with deterministic external_id for
    dedup.
  - parse_valuation_json: lifts fund code + name + units + price +
    contribution-type breakdown from the JSON payload.
- broker_sync/providers/fidelity_planviewer.py (REWRITTEN)
  - FidelityPlanViewerProvider.fetch() now loads storage_state, boots
    headless Chromium, navigates landing → main page (to hydrate the
    SPA session + capture DisplayValuation XHR) → transactions page
    with a wide 01 Jan 1990 → today window. Raises FidelitySessionError
    if PlanViewer shows the 15-min idle page or redirects back to
    id.fidelity.co.uk.
  - _gains_offset_activity emits a synthetic DEPOSIT/WITHDRAWAL with a
    date-keyed external_id so WF Net Worth reconciles to the
    Fidelity-reported pot value without stacking duplicates across
    monthly runs.
  - Rolls storage_state back to disk after each run, extending session
    TTL.
- tests/providers/test_fidelity_planviewer.py (EXTENDED)
  - 8 tests against a real captured fixture: account shape, guard on
    missing storage_state, full-fixture round-trip (51 txs summing to
    £102,004.15), Bulk Switch filtered, deterministic external_id,
    valuation parse with fund-code resolution, gains-offset direction
    + skip-when-empty.
- tests/fixtures/fidelity/transactions-full.html + valuation.json (NEW)
  - Sanitised captures from the 2026-04-18 live session.

## What is NOT in this change

- CronJob + Vault secret wiring + Prometheus alert in
  infra/stacks/broker-sync/main.tf — next commit.
- Dockerfile Chromium install — next commit.
- The scrape-and-import was already done manually (51 activities +
  1 gains offset imported into WF account a7d6208d); this commit
  productionises the code path so the monthly cron can do the same.

## Verification

### Automated

$ poetry run pytest tests/providers/test_fidelity_planviewer.py -v
8 passed in 0.88s

$ poetry run pytest -q
128 passed, 1 skipped in 1.41s

$ poetry run mypy broker_sync/providers/fidelity_planviewer.py broker_sync/providers/parsers/fidelity.py
Success: no issues found in 2 source files

$ poetry run ruff check broker_sync/providers/fidelity_planviewer.py broker_sync/providers/parsers/fidelity.py
All checks passed!

### Manual verification (2026-04-18 live run)

1. poetry run broker-sync fidelity-seed (headed browser + SMS OTP) —
   captured storage_state, staged to Vault.
2. Inline import script hit the same code paths the provider now runs;
   52 activities imported into a new WF WORKPLACE_PENSION account, WF
   Net Worth jumped from £865,358 → £1,003,083.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:47:38 +00:00
Viktor Barzin
832732a419 fidelity-planviewer: scaffold provider + CLI (seed + stub ingest)
## Context

UK workplace pension at planviewer.fidelity.co.uk has no public API; the SPA
calls a private JSON backend at prd.wiciam.fidelity.co.uk/cvmfe/api/*. Viktor
confirmed in DevTools that an OPTIONS preflight lists auth headers
(ch, fid, rid, sid, tbid, theosreferer, ua). Full reverse-engineering of the
endpoint paths is pending Viktor's POST cURL paste for transactions +
holdings views.

Until those endpoints are captured, ship the scaffold: provider module, CLI
commands, tests, docs. This unblocks installing Playwright in the image and
lets Viktor run the one-off seed command on his laptop ahead of the data
integration.

## This change

- broker_sync/providers/fidelity_planviewer.py
  - FidelityCreds namedtuple (storage_state_path, plan_id).
  - FidelitySessionError (401 → re-seed), FidelityProviderConfigError.
  - FidelityPlanViewerProvider: .accounts() returns a single
    WORKPLACE_PENSION account, .fetch() raises until endpoints are wired.
- broker_sync/cli.py
  - fidelity-seed: launches headed Chromium so Viktor can log in and tick
    "Remember device", then dumps storage_state.json.
  - fidelity-ingest: stub matching the invest-engine / trading212 CLI
    shape; reads storage_state + plan_id, pipes through the shared pipeline.
- tests/providers/test_fidelity_planviewer.py
  - Asserts the single-account shape + the loud-failure guard.
- docs/providers/fidelity-planviewer.md
  - Architecture diagram, one-time seed procedure, backfill + monthly
    commands, alert runbook.
- pyproject.toml
  - playwright ^1.47 as a first-class dep (used only by fidelity-seed and
    later by the session-refresh step in fidelity-ingest).

## What is NOT in this change

- Endpoint wiring in provider.fetch() — blocked on DevTools POST cURL.
- Infra CronJob + Vault secret + Prometheus alert — lands once the first
  manual backfill succeeds and we know the Chromium image size is fine.
- Dockerfile Chromium install — same trigger.

## Verification

### Automated

$ poetry run pytest tests/providers/test_fidelity_planviewer.py -v
2 passed in 0.08s

$ poetry run pytest -q
122 passed, 1 skipped in 1.07s

$ poetry run mypy broker_sync/providers/fidelity_planviewer.py broker_sync/cli.py
Success: no issues found in 2 source files

$ poetry run ruff check broker_sync/providers/fidelity_planviewer.py broker_sync/cli.py tests/providers/test_fidelity_planviewer.py
All checks passed!

### Manual (Viktor, later)

1. poetry install && poetry run playwright install chromium
2. poetry run broker-sync fidelity-seed --out /tmp/state.json
3. Chromium opens → log in → tick "Remember device" → press Enter
4. vault kv patch secret/broker-sync fidelity_storage_state=@/tmp/state.json

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:09:04 +00:00
Viktor Barzin
c830856ba1 imap: route IE BUYs to ISA first-£20k / GIA overflow per UK tax year
## Context

Viktor's InvestEngine account has both an ISA and a GIA wrapper. Trade
confirmation emails (info@investengine.com) are identical between them —
subject "Here's how your portfolio looks now", body shows "Client name:
Viktor Barzin" with no portfolio/account type. That left the IMAP parser
hardcoded to route every IE BUY to the ISA (invest-engine-primary),
which produced a 2339-share over-count when 2023-24 GIA buys landed in
the ISA during the 2026-04-18 reconciliation.

Viktor's rule: from 6 April each tax year, BUYs fill ISA up to the
£20,000 cap, then overflow to GIA. This commit codifies that rule in a
standalone batch splitter and applies it at the ImapProvider boundary.

Also picks up a silent-drop bug surfaced during the same reconciliation:
WF's /import (unlike /import/check) rejects naive datetimes with
"Invalid date". The sink now coerces tzinfo=UTC defensively so every
provider gets the same guarantee.

## This change

- `_split_ie_by_isa_cap(activities)` — sorts all IE-ISA BUYs by date and
  walks them once per UK tax year (6 April boundary). A BUY whose running
  tax-year total BEFORE it is strictly below £20k stays on the ISA;
  otherwise it flips to a new `invest-engine-gia` account_id. No
  fractional splits — boundary activities go whole to whichever bucket
  their pre-running-total dictates. Non-IE and non-BUY activities pass
  through unchanged.
- `ImapProvider.accounts()` gains an `invest-engine-gia` Account so the
  pipeline's `_ensure_accounts` can resolve both.
- `ImapProvider.fetch()` calls the splitter on the full batch before
  applying the `since`/`before` date filter — batch-level sort
  guarantees consistent routing regardless of the order IMAP returns
  messages.
- `WealthfolioSink._activity_to_import_row` coerces naive datetimes to
  UTC so the row passes WF /import validation.

## What is NOT in this change

- No retroactive re-routing of data already in WF. Historical
  finance-mysql rows (all lumped to `invest-engine-primary` or
  `invest-engine-gia` by the existing heuristic) keep their current
  account assignment. If a past tax-year was routed "wrong" under the
  new rule, that's corrected manually via the WF API, not here.
- No change to the Schwab or trading212 paths.

## Verification

### Automated

\`\`\`
$ poetry run pytest tests/providers/test_imap.py -v
tests/providers/test_imap.py::test_uk_tax_year_start_before_april_6_rolls_back PASSED
tests/providers/test_imap.py::test_single_tax_year_under_cap_stays_isa PASSED
tests/providers/test_imap.py::test_overflow_past_cap_flips_to_gia PASSED
tests/providers/test_imap.py::test_tax_year_boundary_resets_cap PASSED
tests/providers/test_imap.py::test_out_of_order_activities_sorted_before_cap_applied PASSED
tests/providers/test_imap.py::test_non_ie_activities_passed_through_unchanged PASSED
6 passed in 0.36s

$ poetry run pytest -q --ignore=tests/test_cli.py
116 passed, 1 skipped in 2.76s

$ poetry run ruff check broker_sync/providers/imap.py broker_sync/sinks/wealthfolio.py
All checks passed!

$ poetry run mypy broker_sync/providers/imap.py broker_sync/sinks/wealthfolio.py
Success: no issues found in 2 source files
\`\`\`

### Manual verification

The tzinfo fix was validated against the live WF instance during the
2026-04-18 reconciliation — before the fix, /import returned
\`"errors": {"symbol": ["Invalid date '2022-05-24T00:00:00'."]}\` for
every IMAP activity; after, the same payload imported cleanly.

The splitter was not exercised against live IMAP data because Viktor's
mailbox only has Apr 2022 → Feb 2024 emails, all inside finance.position's
existing coverage. Running IMAP ingest with \`since=2024-04-06\` yields
fetched=0. The unit tests cover the boundary arithmetic; a live run
will happen when newer emails are parsed (or when finance coverage is
re-scoped).

## Reproduce locally

1. \`poetry install\`
2. \`poetry run pytest tests/providers/test_imap.py\`
3. Expected: 6 passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:02:49 +00:00
Viktor Barzin
a190875f63 Add finance_mysql provider + CLI for historical backfill
finance.position (171 rows, 2020-06-07 to 2025-12-19) is the only source
of InvestEngine + Schwab trade history pre-dating the broker-sync project.
This provider reads it once and pushes every row into the correct WF
account (.L tickers → IE ISA, others → Schwab).

Dedup: external_id = 'finance-mysql:position:<PK>' — idempotent on re-run.
Auth: aiomysql as MySQL root (user-authorized) against the standalone
mysql:8.4 in-cluster service.

New CLI: broker-sync finance-mysql-import
New tests: 5 unit tests covering route, symbol normalise, BUY/SELL
detection.

poetry run pytest -q   →  114 passed, 1 skipped
poetry run mypy        →  clean (aiomysql shielded with type: ignore)
poetry run ruff check  →  clean
2026-04-17 22:38:21 +00:00
Viktor Barzin
6efd03570a Add imap-ingest CLI + ImapProvider: route emails to IE/Schwab parsers
Wires the IE + Schwab email parsers into an actual runnable sync. Walks
the IMAP mailbox, routes each message by sender domain:
  - *@investengine.com → invest_engine.parse_invest_engine_email
  - *@schwab.com       → schwab.parse_schwab_email
then pushes the resulting Activities through the shared pipeline.

broker-sync imap-ingest — new CLI command taking IMAP_HOST/USER/PASSWORD/
DIRECTORY (mirrors the old wealthfolio-sync image's env shape so the
Terraform CronJob's existing env wiring works unchanged).

Verified: poetry run pytest -q → 109 passed + 1 skipped; mypy strict
clean (37 files); ruff + yapf clean.
2026-04-17 22:12:05 +00:00
Viktor Barzin
f089b8b93a Add Schwab email parser (port from finance/)
Schwab's workplace-RSU confirmation emails have 5 data td elements
with class='dark-background-body' align='right': date, direction, qty,
ticker, price-with-currency-sign. One email → one Activity.

- parse_schwab_email(raw_html) -> list[Activity] (1-item or empty)
- Empty on any parse failure (IMAP batch shouldn't crash on one bad mail)
- Deterministic external_id ('schwab📅ticker:type:qty') — stable
  across re-pulls so dedup works
- Hardcoded to account 'schwab-workplace' / AccountType.GIA / USD
- 6 unit tests: SELL + BUY happy path, malformed, missing cells,
  external-id stability, commas in price

Dropped from the original finance port:
- msg_timestamp-based external id (non-deterministic — would re-import
  on every IMAP walk). Replaced with a hash-stable key.
- Currency.from_sign() currency hack. Schwab US is USD-only; we'll add
  FX when that changes.

poetry run pytest -q   →  109 passed, 1 skipped
poetry run mypy        →  clean (added types-python-dateutil)
poetry run ruff check  →  clean
2026-04-17 22:08:40 +00:00
Viktor Barzin
1aa60ce348 Merge ie-email-parser: HTML + CSV fallbacks + failure-mode tests
# Conflicts:
#	broker_sync/providers/parsers/invest_engine.py
#	tests/providers/parsers/test_invest_engine.py
2026-04-17 22:06:29 +00:00
Viktor Barzin
87526898e6 Pin InvestEngine parser failure modes — empty-on-junk + partial-match
Context: The port's graceful-failure contract was implicit in the way
each strategy returns None/[] on malformed input, but without tests it
was an accidental property that could regress silently. Codify it.

Two invariants, each backed by a fixture:

1. Junk email → empty list, never raise.
   `unparseable.eml` is a pure-marketing IE newsletter with no order
   data. All three strategies try and fail; parse_invest_engine_email
   returns []. No exception leaks.

2. Partial HTML email → intact orders only.
   `html_partial_match.eml` has two nested summary tables: one with a
   valid VUAG order, one that is missing both the ticker and "Bought N
   @ £P" rows (simulates IE dropping content mid-render). The parser
   returns just the VUAG order.

No implementation change needed — the behaviour existed as a side
effect of _try_html_summary_table returning None on missing fields.
These tests lock it down so future refactors can't quietly break it.

Test plan:
  poetry run pytest tests/providers/parsers/ -q   →  8 passed in 0.19s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py   →  clean
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py   →  All checks passed!
  poetry run yapf --diff   →  clean (no diff)

Manual verification:
- Load unparseable.eml → parse returns [].
- Load html_partial_match.eml → parse returns exactly 1 activity (VUAG).
2026-04-17 22:02:48 +00:00
Viktor Barzin
020ba16723 Add CSV attachment fallback for InvestEngine email parser
Context: IE has not (yet) sent CSV-attached statements in production,
but the upstream parser had _extract_positions_csv as a third fallback
for exactly this case. Keeping the fallback preserves behaviour-parity
with the legacy parser and makes future statement support one fixture
away — the shape is documented by column set, not scraped live.

Unlike the upstream which split the body on whitespace and broke on any
embedded commas in names, this port walks real MIME attachments using
Python's csv.DictReader. A part qualifies as CSV if:
- its Content-Type is text/csv / application/csv / application/vnd.ms-excel, OR
- its filename ends in .csv (defence against IE mis-labelling the part)

Rows missing required columns or containing unparseable numbers/dates
are skipped silently — consistent with the "partial match" contract:
a half-corrupt CSV yields whatever rows were intact. Required columns:
ticker, unit_price, quantity, date (YYYY-MM-DD), currency. Non-GBP
rows are filtered because the IE ISA is strictly sterling — flagging
this assumption in the review notes.

This change:
- Adds `_parse_csv_attachment(raw_email)` as the third strategy after
  text/plain and text/html; it re-parses the raw email bytes so we can
  inspect Content-Type/filename on each part.
- Flags symbols/currencies, filters non-GBP, and runs each row through
  the shared `_build_activity` so external_id formation matches every
  other strategy (dedup stays consistent across strategies).
- Fixture `csv_attachment.eml` has three rows (VUAG, SWDA, VUSA) in a
  `text/csv` part with a `.csv` filename — covers both detection paths.

Test plan:
  poetry run pytest tests/providers/parsers/ -q   →  6 passed in 0.15s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py  →  clean
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py  →  All checks passed!
  poetry run yapf --diff  →  clean (no diff)

Manual verification: load csv_attachment.eml, call parse_invest_engine_email,
assert 3 activities each with symbol in {VUAG,SWDA,VUSA}, currency=GBP,
notes containing "csv".
2026-04-17 22:01:46 +00:00
Viktor Barzin
f49918c74d Add broker-sync invest-engine CLI subcommand
Context: Phase 2b wiring — hand the bearer-token InvestEngineProvider
into the existing sync pipeline (sync_provider_to_wealthfolio), mirroring
the trading212 subcommand.

Environment contract:

  WF_BASE_URL, WF_USERNAME, WF_PASSWORD, WF_SESSION_PATH  (shared with trading212)
  IE_BEARER_TOKEN                                         (devtools-pasted)
  IE_TOKEN_EXPIRES_AT                                     (ISO-8601; Viktor sets on paste)
  BROKER_SYNC_DATA_DIR                                    (sync.db + checkpoint state)

Exit codes:

  0 = clean run
  1 = some rows failed to import (mirrors trading212 behaviour)
  2 = token already expired per IE_TOKEN_EXPIRES_AT, or malformed ISO
      timestamp, or live 401 response from IE (InvestEngineTokenExpiredError),
      or unknown --mode flag

The pre-request expiry check is deliberate: a CronJob that runs during
the refresh window would otherwise waste a request on a dead token and
get the same 401 that we already know about from the clock. Exit 2
from the clock-only path also separates "token is old" from "wealthfolio
rejected a batch" in the CronJob alert pipeline.

Mode defaults:

  --mode steady    → since = now - 30d  (bigger window than T212's 7d
                    because the IE sync only runs once a month in steady
                    state; 30d guarantees no gap even after a missed run)
  --mode backfill  → since = None       (full history)

This change:
 - `invest-engine` subcommand added to broker_sync/cli.py
 - Token-expiry pre-check (clock), IE_TOKEN_EXPIRES_AT ISO parsing with
   a UTC default for naive timestamps, and graceful handling of
   InvestEngineTokenExpiredError surfaced during pipeline run
 - 3 new tests in tests/test_cli.py covering the 3 exit-2 paths

## Automated

poetry run pytest tests/test_cli.py -v
======================== 4 passed in 0.28s =========================

poetry run pytest -q
98 passed, 1 skipped in 0.85s

poetry run mypy --strict .
Success: no issues found in 34 source files

poetry run ruff check .
All checks passed!

## Manual Verification

  1. Populate Vault keys per the docstring in
     broker_sync/providers/invest_engine.py (Viktor pastes token + sets
     expires_at to the Monday morning of next month).
  2. Set env:
       export WF_BASE_URL=https://wealthfolio.viktorbarzin.me
       export WF_USERNAME=viktor
       export WF_PASSWORD=<from Vault>
       export IE_BEARER_TOKEN=<from Vault>
       export IE_TOKEN_EXPIRES_AT=<from Vault>
       export BROKER_SYNC_DATA_DIR=/tmp/ie-smoke
  3. poetry run broker-sync invest-engine --mode backfill
     Expected: single line "invest-engine: fetched=N new=M imported=M failed=0"
     on success; exit 2 with "InvestEngine token expired..." if the clock
     or server disagrees; exit 2 with "IE_TOKEN_EXPIRES_AT not a valid
     ISO-8601 timestamp..." if the env var is malformed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:59:31 +00:00
Viktor Barzin
72d348e294 Add HTML table fallback for InvestEngine email parser
Context: Plain-text IE emails vanished around 2024-Q2 when IE switched to
an HTML-only template with per-order nested summary tables. The RFC 2822
line parser returns [] on those modern emails, so we need a fallback
that walks the HTML table structure.

Upstream _extract_from_html parsed a fixed DOM path (table[1].tr[10].
table) and only handled ONE order per email. The real IE HTML template
nests one summary <table> per ticker inside the second top-level table —
multiple orders in a single batched confirmation are common — so this
port walks every leaf table (no child <table>) and interprets each one
as an independent trade summary. Structural (non-leaf) tables are
skipped to avoid double-counting via get_text().

This change:
- `_parse_html_tables(body)` extracts the date once from the full text
  then walks leaf tables looking for "Bought N @ £P" rows.
- `_try_html_summary_table` parses one leaf; returns None on structural
  tables or missing ticker/qty/price — so a partial email yields only
  its intact orders (the "2 orders, 1 parseable → 1 returned" invariant
  works by construction without raising).
- `parse_invest_engine_email` now falls through text/plain → text/html
  in the multipart message, picking the first strategy that returns
  activities. Order matters: text/plain wins when both succeed because
  the RFC 2822 strategy is the more constrained grammar.
- Regexes are module-level constants so they compile once per process.

Fixture `html_two_orders.eml` is a minimal-but-realistic multipart email
with two nested summary tables (VUAG + SWDA), no personal data beyond
tickers/qty/price.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 5 passed in 0.16s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load html_two_orders.eml, call parse_invest_engine_email,
assert len == 2 with both expected tickers (VUAG, SWDA) and numbers,
dates set to 2026-04-01.
2026-04-17 21:58:15 +00:00
Viktor Barzin
9ec8ece2d9 Add InvestEngine email parser — RFC 2822 v1/v2 line format
Context: The old finance/ app had a 324-line IE message parser with four
line-based variants (v1/v2/v3/v4) plus an HTML strategy and a CSV
fallback. Port into broker-sync so we can consume IE trade confirmation
emails as a backup to the live HTTP client (Phase 2b) while IE's public
API remains Bearer-only.

The upstream parser emits storage.model.Position; we emit canonical
Activity with the broker-sync invariants: account_id="invest-engine-primary"
(sink remaps to Wealthfolio UUID), account_type=ISA, currency=GBP, and
external_id="invest-engine:<fingerprint>" where the fingerprint is a
SHA-256 of (date|symbol|quantity|unit_price) — deterministic so repeat
imports of the same email dedup at the sync-record layer.

This change:
- Top-level `parse_invest_engine_email(raw_email: bytes) -> list[Activity]`
  extracts the text/plain body from an RFC 2822 message and dispatches to
  the line-based parser.
- `_parse_rfc2822_lines(body)` tries the v2 layout first (newer IE format
  where `Date: DD Month` is on line 2 and the year on line 3), then the
  v1 layout (where the day alone is on line 2 and `Month YYYY` on line 3).
  v3 and v4 variants are re-added in a follow-up if we find fixtures
  where they matter — initial fixture coverage hits v2.
- Drops the upstream `_ticker_post_processing` VUAG→VUAG.L hack.
  Wealthfolio's /import/check endpoint resolves exchange suffixes; the
  Trading212 provider also emits suffix-free tickers (e.g. `VUAG`), so
  staying consistent avoids double-mapping.
- Notes field records the parse-strategy tag ("rfc2822-v2") plus the
  matched line for debugging.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 3 passed in 0.03s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load the fixture email, call the parser, inspect the
returned Activity has symbol=VUAG, quantity=59.539562, unit_price=60.46,
date=2023-01-17, external_id starts with invest-engine:.
2026-04-17 21:55:01 +00:00
Viktor Barzin
dc4d3f889d Add InvestEngineProvider — Bearer-token HTTP client
Context: InvestEngine has no public API. The web app uses an undocumented
Django REST backend at /api/v0.3X/*, which requires a Bearer token and
rolls its minor every 4-6 weeks. MFA (push-approval) is mandatory on
every login, so we do NOT automate login — Viktor logs in manually in
a browser, copies the Bearer out of devtools, and pastes it into Vault.
This provider consumes that token.

The response shape is UNVERIFIED (MFA blocks an unauthed probe, so the
research leading into Phase 2b could only confirm endpoint existence via
401 responses on v0.31 and v0.32). `_transaction_to_activity` is written
defensively:

 - accepts both `results`/`data` list wrappers and `next`/`meta.next_page`
   cursor fields for pagination;
 - accepts `symbol`/`ticker`, `price`/`unit_price`, `amount`/`value`,
   `date`/`created_at`/`timestamp` field-name variants;
 - maps exact type strings (BUY, SELL, DIVIDEND, INTEREST, DEPOSIT,
   WITHDRAWAL, FEE, TAX) and substring-matches DEPOSIT/WITHDRAWAL for
   variants like "CASH_DEPOSIT"; refuses to guess on anything else —
   unknown types log WARNING and return None (silent misclassification
   would corrupt tax reporting).

Version probe:

  _START_VERSION_MINOR=32 (research: v0.31/v0.32 live, v0.30 Gone)
  GET /api/v0.{n}/ → 410 ? advance : done
  cap at v0.60 so a misconfigured backend doesn't infinite-loop.

A 410 response on a data endpoint triggers exactly one re-probe + retry
against the newer version; the new version is cached on the instance for
the rest of the process.

Token expiry is tracked at the Python layer:

 - constructor takes token_expires_at (set by Viktor when he pastes);
 - fetch() fails fast with InvestEngineTokenExpiredError if the clock
   says the token is already dead — cheaper than burning a request for
   a known 401;
 - a real 401 response also raises InvestEngineTokenExpiredError so the
   CLI/pipeline can alert Viktor to paste a new token.

Vault schema expected (consumed by the CLI in the follow-up commit):

  secret/broker-sync
    investengine_bearer_token       <devtools-captured Bearer>
    investengine_token_expires_at   <ISO-8601 set at paste time>
    investengine_refresh_token      <optional, not used yet>

This module does NOT read Vault — the caller hands values in, keeping
the provider testable.

This change:
 - New `broker_sync/providers/invest_engine.py`:
   * InvestEngineProvider with .accounts(), .fetch(), .close()
   * _probe_version / _active_version with 410-retry + cache
   * _transaction_to_activity with defensive type + field-name mapping
   * InvestEngineError / InvestEngineTokenExpiredError / InvestEngineVersionError
 - New `tests/providers/test_invest_engine.py`: 22 tests covering version
   probe, expiry fail-fast, 401→TokenExpired, 410→reprobe, header
   shape, pagination variants, and the full txn→activity mapping. One
   @pytest.mark.skip integration stub for when Viktor has a live token.

Assumptions flagged for verification with a live token:
 - IE id field is castable to str (int or string)
 - Type strings match or fuzz-contain: BUY, SELL, DIVIDEND, INTEREST,
   DEPOSIT, WITHDRAWAL, FEE, TAX
 - Transactions carry numeric quantity/price/amount (Decimal-convertible)
 - Date field is one of: date / created_at / timestamp
 - Pagination shape is {results, next} OR {data, meta.next_page}
 - /transactions/ accepts ?portfolio=<id>&start=YYYY-MM-DD&end=YYYY-MM-DD

## Automated

poetry run pytest tests/providers/test_invest_engine.py -v
======================== 22 passed, 1 skipped in 0.26s =========================

poetry run pytest -q
95 passed, 1 skipped in 0.84s

poetry run mypy --strict .
Success: no issues found in 34 source files

poetry run ruff check .
All checks passed!

poetry run yapf --diff broker_sync/providers/invest_engine.py tests/providers/test_invest_engine.py
(clean)

## Manual Verification

Once Viktor pastes a live token:

  1. Export:
     export IE_BEARER_TOKEN='<paste>'
     export IE_TOKEN_EXPIRES_AT='2026-05-17T00:00:00+00:00'
  2. Unmark the @pytest.mark.skip on test_live_integration_smoke
  3. poetry run pytest tests/providers/test_invest_engine.py::test_live_integration_smoke -v
     Expected: a successful round-trip that returns an empty-or-populated
     list of Activity objects — prove the version probe + auth header +
     portfolio enumeration actually work against the real IE backend.
  4. Validate the Assumptions list above against the real transaction JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:52:26 +00:00
Viktor Barzin
ea15b80111 Add InvestEngine email parser — RFC 2822 v1/v2 line format
Context: The old finance/ app had a 324-line IE message parser with four
line-based variants (v1/v2/v3/v4) plus an HTML strategy and a CSV
fallback. Port into broker-sync so we can consume IE trade confirmation
emails as a backup to the live HTTP client (Phase 2b) while IE's public
API remains Bearer-only.

The upstream parser emits storage.model.Position; we emit canonical
Activity with the broker-sync invariants: account_id="invest-engine-primary"
(sink remaps to Wealthfolio UUID), account_type=ISA, currency=GBP, and
external_id="invest-engine:<fingerprint>" where the fingerprint is a
SHA-256 of (date|symbol|quantity|unit_price) — deterministic so repeat
imports of the same email dedup at the sync-record layer.

This change:
- Top-level `parse_invest_engine_email(raw_email: bytes) -> list[Activity]`
  extracts the text/plain body from an RFC 2822 message and dispatches to
  the line-based parser.
- `_parse_rfc2822_lines(body)` tries the v2 layout first (newer IE format
  where `Date: DD Month` is on line 2 and the year on line 3), then the
  v1 layout (where the day alone is on line 2 and `Month YYYY` on line 3).
  v3 and v4 variants are re-added in a follow-up if we find fixtures
  where they matter — initial fixture coverage hits v2.
- Drops the upstream `_ticker_post_processing` VUAG→VUAG.L hack.
  Wealthfolio's /import/check endpoint resolves exchange suffixes; the
  Trading212 provider also emits suffix-free tickers (e.g. `VUAG`), so
  staying consistent avoids double-mapping.
- Notes field records the parse-strategy tag ("rfc2822-v2") plus the
  matched line for debugging.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 3 passed in 0.03s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load the fixture email, call the parser, inspect the
returned Activity has symbol=VUAG, quantity=59.539562, unit_price=60.46,
date=2023-01-17, external_id starts with invest-engine:.
2026-04-17 21:49:52 +00:00
Viktor Barzin
b363032e42 sinks: feed /import/check enrichment into /import body
/import/check hydrates each ActivityImport with resolved assetId,
exchangeMic, quoteCcy, instrumentType, quoteMode. The /import endpoint
on Wealthfolio 3.2 does NOT re-resolve — passing an un-enriched row
returns 200 OK but silently drops the activity (activities=[] in the
response).

The first live run returned `imported=63 failed=0` but nothing reached
the database. Fixed by posting the hydrated rows from the check response
to /import instead of the original.

Requires the test to also return list-shaped check responses (matches
the upstream Json<Vec<ActivityImport>> signature on the Rust side).

poetry run pytest -q     70 passed
poetry run mypy          clean
poetry run ruff check    clean
2026-04-17 20:54:17 +00:00
Viktor Barzin
80ca009373 Match Wealthfolio accounts by providerAccountId, remap accountId on import
Context: Wealthfolio 3.2 generates its own UUIDs on POST /accounts, ignoring any
`id` we supply. Our logical Account.id lives on as `providerAccountId`, which
WF preserves verbatim.

Live run created six duplicate accounts because ensure_account looked up by
our `id`, never found it, and POSTed a new account on every attempt. Deleted
the duplicates manually via DELETE /accounts/{id}.

This change:
- ensure_account now returns Wealthfolio's UUID; matches existing via
  (provider, providerAccountId)
- pipeline remaps activity.account_id to the WF UUID at submission time
  but keeps dedup keyed on our stable id (WF resets must not blow away
  the whole dedup history)
- test updates to the new account-shape + dedup key expectations

poetry run pytest -q    70 passed
poetry run mypy         clean
poetry run ruff check   clean
2026-04-17 20:44:32 +00:00
Viktor Barzin
ea881e272b sinks: match Wealthfolio NewAccount camelCase schema + required booleans
Wealthfolio 3.2's POST /api/v1/accounts was 422ing on live traffic — its
NewAccount struct uses camelCase field names and requires isDefault +
isActive as booleans. Reference:
https://github.com/afadil/wealthfolio/blob/main/apps/server/src/models.rs#L~145

Sends trackingMode=TRANSACTIONS so Wealthfolio computes holdings from
our imported activities (vs HOLDINGS mode which requires periodic
holdings snapshots). Populates providerAccountId so the broker account
is traceable back to our sync's id scheme.

Test plan:
  poetry run pytest -q   →  70 passed
  poetry run mypy        →  clean
  poetry run ruff check  →  clean

Live re-run of the backfill Job follows this commit's image rebuild.
2026-04-17 20:29:43 +00:00
Viktor Barzin
66cf0e0399 Fix live Wealthfolio login + Dockerfile poetry path
Context
-------
Two live-integration bugs surfaced during the Phase 0.5 auth-spike
run against the restored production Wealthfolio.

1. Wealthfolio 3.2's LoginRequest schema is `{ password: String }` —
   it rejects any request with an unknown `username` field as HTTP
   400 (empty body, hard to debug). Upstream source:
   https://github.com/afadil/wealthfolio/blob/main/apps/server/src/auth.rs#L86-L88

2. Dockerfile referenced `/opt/poetry/bin/poetry` but pip install
   puts poetry on the normal PATH; POETRY_HOME only affects the
   self-installer, not `pip install`. Exit 127 in GHA build.

This change
-----------
- WealthfolioSink.login() sends `{password}` only; kept `username`
  constructor arg as a stub for the day Wealthfolio adds multi-user.
- Dockerfile drops POETRY_HOME and uses `poetry` on PATH.
- Test: `_login_ok` now asserts body == {"password": "hunter2"}
  ("hunter2" is the XKCD placeholder — not a real credential).

Test plan
---------
## Automated
- poetry run pytest -q  →  70 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 29 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification (executed live)
```
kubectl -n wealthfolio port-forward svc/wealthfolio 18080:80 &
WF_BASE_URL=http://localhost:18080 WF_USERNAME=admin \
WF_PASSWORD=<from-vault> \
poetry run broker-sync auth-spike
→ "Logged in. 1 account(s) visible."
```
2026-04-17 20:17:24 +00:00
Viktor Barzin
6fc2ac5322 Add sync pipeline + trading212 CLI subcommand
Some checks are pending
CI / test (push) Waiting to run
CI / build (push) Blocked by required conditions
CI / deploy (push) Blocked by required conditions
Context
-------
Closes the gap between "Trading212 provider yields Activities" and
"activities land in Wealthfolio with dedup". One generic pipeline
function works for every provider (Phase 2 IMAP ingest and Phase 3
CSV drop will reuse it).

This change
-----------
- `broker_sync/pipeline.py` — sync_provider_to_wealthfolio():
  ensure accounts exist in Wealthfolio, fetch, dedup against the local
  SQLite store, batch into Wealthfolio's CSV import at 200 rows each,
  record successful imports in the dedup store with the returned
  Wealthfolio activity id. Failed batches don't touch the dedup store
  so the next run retries.
- Notes field stamped with `sync:<provider>:<external_id>` for human
  auditability — NOT used for dedup (the SQLite store owns that).
- `broker_sync/cli.py` — new `trading212` subcommand driven by
  T212_API_KEYS_JSON + WF_* + BROKER_SYNC_DATA_DIR env vars. Two modes:
  `steady` fetches last 7 days; `backfill` pulls all history. Exits 0
  on clean run, 1 if any batch failed, 2 on config errors.
- Pipeline tests with MockTransport: dedup-skip-then-import happy path
  (verifies imported CSV contains only the unseen rows and all three
  are recorded after the run); import-rejected path (verifies the
  failed row is NOT recorded so the next run retries).

Test plan
---------
## Automated
- poetry run pytest -q  →  70 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 29 source files
- poetry run ruff check .  →  All checks passed!
- poetry run broker-sync trading212 --help  →  shows all env vars + mode flag

## Manual Verification
Live smoke test blocked on:
1. Vault secret/broker-sync seeded (wf_base_url, wf_username, wf_password,
   trading212_api_keys).
2. Terraform stack applied (infra/stacks/broker-sync/ — staged, not yet applied).
3. Image pushed to viktorbarzin/broker-sync on DockerHub via GHA.

Once those land:
    kubectl -n broker-sync create job t212-backfill \
      --from=cronjob/broker-sync-trading212 -- \
      broker-sync trading212 --mode=backfill
2026-04-17 19:45:43 +00:00
Viktor Barzin
1eb3f78ea5 Wire T212 pagination, retries, and click<8.2 pin
Context
-------
Closes out the Trading212 provider's retry + pagination surface so
the "Add Trading212Provider core fetch" commit has everything the
CronJob needs: cursor-based pagination, 429 honouring Retry-After,
jittered exponential backoff for 429-without-header and 5xx, bailout
after _MAX_RETRIES, and checkpoint-after-page semantics so a crashed
run resumes at the start of the unfinished page.

Also pins click<8.2 — typer 0.12 calls Parameter.make_metavar()
without a ctx argument, which click 8.2 removed; `broker-sync --help`
was crashing with TypeError until this pin. typer 0.15+ would also
fix it; the pin is lower friction.

One test fix: test_checkpoint_advances_only_after_page_yielded had a
handler that unconditionally returned a next_path → infinite loop. The
assertion was always about "a cursor was saved after page 1", so I
changed the handler to return page 2 as empty-with-no-next, which
terminates the loop cleanly.

Test plan
---------
## Automated
- poetry run pytest -q  →  70 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 29 source files
- poetry run ruff check .  →  All checks passed!
- poetry run broker-sync --help  →  renders without crash; lists version + auth-spike

## Manual Verification
End-to-end against a live T212 key is in the next commit once the
CLI subcommand and pipeline land.
2026-04-17 19:45:23 +00:00
Viktor Barzin
7d2c1199a9 Add Trading212Provider core fetch
Context
-------
The Provider protocol is satisfied. This commit adds the first cut of
the concrete Trading212 implementation: one page of fills, mapped to
canonical Activities. Pagination, retries, and checkpointing on
resume are deliberately deferred to the next commit so this one stays
focused on the raw shape translation.

Design decisions
----------------
- One provider instance serves every T212 wrapper (ISA + Invest). T212
  exposes one API key per wrapper, so the caller hands over a list of
  (Account, api_key) pairs. `accounts()` returns only the Accounts —
  the keys never escape the provider.
- Auth: literal `Authorization: <api_key>`, NOT `Bearer <api_key>`.
  T212 quietly returns 401 for Bearer-prefixed keys. The test locks
  that in.
- Sell detection: T212 signs quantity (negative means closing a long
  or opening a short). We flip on the sign and store `abs(quantity)`,
  matching the Wealthfolio BUY/SELL convention.
- Null fills (cancelled orders) are silently dropped at parse time
  rather than surfacing to the caller.
- `external_id = t212:fill:<fill.id>` — the fill ID is stable per
  T212 docs and survives order cancellation/modification semantics.
- Ticker normalisation runs on ingress so downstream dedup + Wealthfolio
  see `VUAG` even though T212 reports `VUAGl_EQ`.
- `since` / `before` filter on `filledAt`. `before` is half-open
  (`< before`) so CronJobs can chain adjacent windows without
  double-counting the boundary.

Explicitly NOT in this change:
- Pagination (nextPagePath walk)
- 429 / 5xx retry
- Dividend / deposit endpoints (deferred — Phase 1.1, filed as
  beads follow-up if needed)

This change
-----------
- broker_sync/providers/trading212.py: `Trading212Provider` class +
  `Trading212Error` / `Trading212AuthError` exception hierarchy.
  `_item_to_activity` is pure and returns Optional so cancelled
  fills short-circuit without raising.
- tests/providers/test_trading212.py: MockTransport-driven tests for
  auth header shape, fill→Activity mapping (buy + sell sign flip),
  null-fill skip, since-filter, and both error types.

Test plan
---------
## Automated
- poetry run pytest -q  →  61 passed in 0.60s
- poetry run mypy broker_sync tests  →  Success: no issues found in 27 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Deferred to the CLI wiring commit — the live endpoint is 6 calls/min
and the full-volume dry run belongs with the env-driven command, not
the unit-level commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:34:03 +00:00
Viktor Barzin
56f3624344 Add ECB FX fetcher + cache population
Context
-------
Phase 1 needs live FX rates for USD-denominated RSU vestings (Schwab),
EUR-denominated deposits, and any multi-currency dividend. The FxCache
from an earlier commit stores (currency, date) → rate_to_gbp but was
intentionally left empty — this commit wires the ingestion path.

Design decisions
----------------
- ECB publishes EUR→X, not X→GBP. Everything pivots through EUR:
      rate(X→GBP) = rate(EUR→GBP) / rate(EUR→X)
  GBP goes into the result at 1.0 so callers iterating the dict get
  a consistent shape; `populate_fx_cache` then skips GBP because
  `convert_to_gbp` has a dedicated passthrough branch.
- `on_date` parameter is accepted for API symmetry with the future
  historical fetcher even though the daily endpoint only serves the
  most recent publication. The docstring calls this out explicitly.
- XML is parsed with stdlib `xml.etree.ElementTree`. No `lxml` —
  the file is 30 lines, no performance concern, and keeping deps
  minimal matters for the container image.
- The HTTP layer takes an optional `httpx.AsyncBaseTransport` the same
  way WealthfolioSink does — MockTransport drives all tests, the
  production caller just leaves it None.

This change
-----------
- broker_sync/fx_ecb.py:
  * `fetch_ecb_rates(on_date, *, transport=None)` — GETs the daily
    XML, parses, pivots through EUR, returns `{ccy: rate_to_gbp}`.
    Raises `RuntimeError` on non-2xx or if GBP is missing (cannot
    pivot). No retries — caller handles resilience.
  * `populate_fx_cache(cache, rates, on_date)` — writes every
    non-GBP rate with `FxRateSource.ECB_LIVE`.
  * `fetch_ecb_rates_historical(start, end)` — `NotImplementedError`
    stub; filed as beads task code-thw.2.2. Needed for backfilling
    years of T212 history (daily endpoint only covers today).
- tests/fixtures/ecb_2026-04-01.xml: realistic 5-currency ECB snapshot.
- tests/test_fx_ecb.py: fixture-driven tests covering the pivot math,
  the 503 failure path, the cache-skip-GBP rule, and the NotImplemented
  guard on the historical stub.

Test plan
---------
## Automated
- poetry run pytest -q  →  52 passed in 0.50s
- poetry run mypy broker_sync tests  →  Success: no issues found in 26 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Live daily endpoint hit deferred to the CLI integration commit — the
fetcher is pure + transport-injectable, so the unit tests cover the
parsing and pivot logic, and the CLI wiring will be the place where
the live call is exercised end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:32:23 +00:00
Viktor Barzin
43d2251159 Add per-account cursor Checkpoint helper
Context
-------
Trading212's `/equity/history/orders` is cursor-paginated via a
`nextPagePath` query-param in each response. Steady-state runs must
resume where the previous run finished, or we either miss fills (if
we start from 'now') or waste the 6/min rate limit walking history
we already imported (if we start from epoch).

A shared checkpoint store must live alongside the SyncRecordStore's
dedup DB on the /data PVC so CronJob pods can see progress from the
previous invocation. One file per (provider, account_id) because:

- T212 issues one API key per wrapper — ISA + Invest share no data.
- Plain JSON files are trivial to hand-edit during backfill if a
  resume cursor gets stuck at a bad point.

This change
-----------
- broker_sync/providers/_checkpoint.py: `Checkpoint(dir, provider,
  account_id)` with `load() -> str | None` and `save(cursor)`. Writes
  `{cursor, updated_at}` to `<provider>-<account_id>.json`. Creates
  parent directory lazily on first save so the PVC only needs a
  mountpoint, not a pre-seeded layout.
- Provider-agnostic: no T212 knowledge. Will be reused for
  InvestEngine in Phase 2.
- tests/providers/test_checkpoint.py: roundtrip, filename shape,
  overwrite, per-account isolation, parent-dir creation, and a
  malformed-file fallback (returns None rather than raising) so a
  manual edit gone wrong does not brick the CronJob.

Test plan
---------
## Automated
- poetry run pytest -q  →  48 passed in 0.47s
- poetry run mypy broker_sync tests  →  Success: no issues found in 24 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable — pure local-filesystem helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:30:20 +00:00
Viktor Barzin
73b03b227e Add Trading212 ticker normalisation
Context
-------
Phase 1 kickoff: Trading212 tags every ticker with `_EQ`, sometimes
preceded by a lowercase exchange letter ("l" = LSE) or `_US`. Raw
symbols like `VUAGl_EQ` are an implementation detail that would leak
into Wealthfolio and diverge from other providers (InvestEngine and
Schwab emit `VUAG` / `META`). The canonical form has to match across
providers so portfolio aggregation lines up.

Unlike the finance/ reference code, we do NOT restrict to a
SUPPORTED_TICKERS allowlist here — Wealthfolio is the source of truth,
everything gets imported, and the user decides what to track.

This change
-----------
- broker_sync/providers/trading212.py: pure `_normalise_ticker`
  helper backed by a single regex that peels `(_US)?[a-z]?_EQ`. No
  lookup tables — the rule covers all observed shapes.
- tests/providers/test_trading212_ticker.py: parametrised cases for
  every mapping called out in the Phase 1 plan plus pass-through of
  already-canonical symbols.

Test plan
---------
## Automated
- poetry run pytest -q  →  41 passed in 0.46s
- poetry run mypy broker_sync tests  →  Success: no issues found in 22 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable — pure function, no external side effects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:29:23 +00:00
Viktor Barzin
0eb6feefa8 Add typer CLI + production Dockerfile
Context
-------
Closes Phase 0 scaffolding. Image must build and run so infra can
schedule an initial no-op CronJob (the plan's Phase 0 exit criterion)
while Phase 0.5 / 0.75 / 1 land.

This change
-----------
- broker_sync/cli.py: typer app with two commands.
  * `version` — prints __version__; used as the no-op CronJob
    liveness check.
  * `auth-spike` — Phase 0.5 end-to-end live probe: log in to
    Wealthfolio, list accounts, exit 0 on success. Credentials read
    from env (WF_BASE_URL/USERNAME/PASSWORD) so CronJob + ESO can
    inject them without CLI flags.
- Dockerfile: multi-stage, Python 3.12-slim, non-root user 10001
  with /data as the shared PVC mount. Poetry virtualenv baked into
  /app/.venv, entrypoint is `broker-sync`, default command `version`.
- CLI test via typer.testing.CliRunner.

Test plan
---------
## Automated
- poetry run pytest -q  →  32 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 19 source files
- poetry run ruff check .  →  All checks passed!
- poetry run broker-sync version  →  broker-sync 0.1.0

## Manual Verification
Docker build + run deferred — image will be built via GHA after the
repo is pushed to GitHub in a follow-up session; the pyproject install
has already been verified locally.
2026-04-17 19:23:54 +00:00
Viktor Barzin
e7da408a85 Add WealthfolioSink with CSV import + cookie reuse
Context
-------
This is the Phase 0.5 deliverable — the hardest-to-validate unknown
in the plan. Wealthfolio auth is JWT HttpOnly cookie with a 5-req/min
login rate limit. CronJob pods are ephemeral, so we persist cookies
to disk between runs (shared PVC in production).

Plan stress-test also flagged: use the CSV import path, not per-row
JSON POST. Wealthfolio's UI uses /activities/import and its dedup
logic is battle-tested; CSVs double as audit artefacts we can replay.

This change
-----------
- WealthfolioSink (httpx async): login with username/password, persists
  cookie dict to session_path on disk, attaches it as a Cookie header
  on subsequent calls.
- 401 on a non-login endpoint triggers a single re-login + retry.
- ensure_account() is idempotent — GETs the account list first, only
  POSTs /accounts if id is missing.
- import_activities() always runs /activities/import/check first; any
  non-2xx there raises ImportValidationError and we never touch the
  real import endpoint. Protects against half-written state when the
  broker emits a symbol Wealthfolio doesn't know.
- httpx.MockTransport-based tests cover: login persistence, 401 on
  login raises UnauthorizedError, session reuse from disk, 401 retry
  path, ensure_account idempotency + creation, import dry-run-then-real
  sequencing, halt on check failure.

Not yet covered (deferred):
- Multi-process file lock on session_path (single-process enough for
  now; Phase 1 adds it when multiple CronJobs run concurrently).
- 429 jittered backoff (TBD when Wealthfolio actually rate-limits us).

Test plan
---------
## Automated
- poetry run pytest -q  →  31 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 17 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Live auth spike against https://wealthfolio.viktorbarzin.me deferred
until the password is seeded into Vault at secret/broker-sync/wealthfolio
in a follow-up commit (needs Viktor's Vault session).
2026-04-17 19:22:34 +00:00
Viktor Barzin
f306dc9605 Add Provider protocol and normaliser
Context
-------
Every broker connector needs a uniform shape so the orchestrator can
fan out without knowing provider-specific details. Normalisation (GBP
conversion) lives outside providers on purpose — keeping providers
native-currency-emitters means we can re-normalise historical activity
when HMRC rates land without re-fetching from the broker.

This change
-----------
- providers/base.py: Provider Protocol with `accounts()` and async
  `fetch(since, before)` iterator. No abstract base class — duck-typed
  Protocol so each concrete provider stays independent.
- normaliser.py: takes a native Activity + FxCache, returns a copy
  with amount_gbp/fx_rate_gbp/fx_rate_source filled in. Two modes:
  qty*price for BUY/SELL, amount for DIVIDEND/DEPOSIT/etc.
- Namespace packages for providers/, providers/parsers/, sinks/ so
  future modules slot in cleanly.

Test plan
---------
## Automated
- poetry run pytest -q  →  23 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 14 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable at this layer.
2026-04-17 19:20:12 +00:00
Viktor Barzin
33810899c9 Add FxCache and convert_to_gbp core
Context
-------
FX for UK users has two lives: live ECB rates for portfolio display
(available same-day), and HMRC monthly/daily rates for CGT basis
(published after month-end). The plan keeps both in one cache table
with an upgradable `source` column, so a later reconciliation job can
replace ECB_LIVE values with HMRC_MONTHLY for the same date without
schema work.

This change
-----------
- FxCache: SQLite table (currency, on_date) -> (rate_gbp, source) with
  ON CONFLICT UPDATE semantics so reconciliation is a single put().
- convert_to_gbp(): GBP short-circuits to identity; any other currency
  must be in the cache (network fetch is the caller's responsibility,
  separately implemented by the ECB and HMRC fetchers).
- Explicit LookupError on cache miss — deliberate, we do NOT want a
  silent fallback that produces wrong cost-basis numbers.

Decisions deferred to later commits:
- Actual ECB daily reference-rate fetcher (eurofxref XML) — lands with
  the Trading212 provider in Phase 1 when non-GBP trades first appear.
- HMRC monthly-rate fetcher + reconciliation CronJob — Phase 1 tail.

Test plan
---------
## Automated
- poetry run pytest tests/test_fx.py -v  →  6 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 8 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable — no network yet.
2026-04-17 19:18:41 +00:00
Viktor Barzin
a66ef189f6 Add SyncRecordStore for authoritative dedup
Context
-------
Wealthfolio's activity `notes` field is user-editable via the UI, so
using it as the dedup key would let a single note-edit in Wealthfolio
cause the next sync to create a duplicate. Stress-testing the plan
flagged this as the top structural risk.

This change
-----------
- SQLite-backed store at `/data/broker_sync.db` in production; keyed on
  (provider, account, external_id) so each provider's id space is
  scoped to its own account.
- `INSERT OR IGNORE` makes record() idempotent — second call with the
  same key is a no-op and preserves the original wealthfolio_activity_id
  plus first_seen timestamp.
- `filter_new()` is the integration point: provider fetches activities,
  hands them to the store, gets back only the unseen subset to submit
  to the Wealthfolio sink.
- Wealthfolio activity id returned by the API is persisted alongside
  each record so the HMRC FX reconciliation job can later PATCH the
  original activity rather than creating a new one.

Test plan
---------
## Automated
- poetry run pytest tests/test_dedup.py -v  →  6 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 6 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable for this layer — full end-to-end verification happens
once a provider + sink land (Phase 1 Trading212 and the auth spike).
2026-04-17 19:17:12 +00:00
Viktor Barzin
a2aa7ec486 Initial scaffold + canonical Activity model
Context
-------
New connector suite that syncs UK brokerage activity (Trading212,
InvestEngine, Schwab email-parsed, CSV drop-folder) into Wealthfolio.
Lives outside finance/ intentionally — finance/ is untouched per the
plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md.

This change
-----------
- Poetry project with httpx, typer, bs4, dev tools (pytest, mypy strict,
  ruff, yapf).
- Canonical Activity + Account models with the 6 UK tax wrappers
  (ISA/SIPP/GIA/LISA/JISA/WORKPLACE_PENSION) and the 12 Wealthfolio
  activity types from docs/activities/activity-types.md on the upstream.
- Validation invariants: BUY/SELL need qty+price, DIVIDEND/DEPOSIT/etc
  need amount — raises early so providers can't silently emit broken
  rows.
- to_wealthfolio_csv_row() shape matches Wealthfolio's CSV import;
  primary sink path per the plan.

Test plan
---------
## Automated
- poetry run pytest -q  →  7 passed in 0.03s
- poetry run mypy broker_sync tests  →  Success: no issues found in 4 source files
- poetry run ruff check .  →  All checks passed!
- poetry run yapf --diff --recursive broker_sync tests  →  no diff

## Manual Verification
Not applicable — pure data model, no runtime behaviour.

Closes: code-thw.1
2026-04-17 19:16:11 +00:00