Commit graph

12 commits

Author SHA1 Message Date
Viktor Barzin
f089b8b93a Add Schwab email parser (port from finance/)
Schwab's workplace-RSU confirmation emails have 5 data td elements
with class='dark-background-body' align='right': date, direction, qty,
ticker, price-with-currency-sign. One email → one Activity.

- parse_schwab_email(raw_html) -> list[Activity] (1-item or empty)
- Empty on any parse failure (IMAP batch shouldn't crash on one bad mail)
- Deterministic external_id ('schwab📅ticker:type:qty') — stable
  across re-pulls so dedup works
- Hardcoded to account 'schwab-workplace' / AccountType.GIA / USD
- 6 unit tests: SELL + BUY happy path, malformed, missing cells,
  external-id stability, commas in price

Dropped from the original finance port:
- msg_timestamp-based external id (non-deterministic — would re-import
  on every IMAP walk). Replaced with a hash-stable key.
- Currency.from_sign() currency hack. Schwab US is USD-only; we'll add
  FX when that changes.

poetry run pytest -q   →  109 passed, 1 skipped
poetry run mypy        →  clean (added types-python-dateutil)
poetry run ruff check  →  clean
2026-04-17 22:08:40 +00:00
Viktor Barzin
1aa60ce348 Merge ie-email-parser: HTML + CSV fallbacks + failure-mode tests
# Conflicts:
#	broker_sync/providers/parsers/invest_engine.py
#	tests/providers/parsers/test_invest_engine.py
2026-04-17 22:06:29 +00:00
Viktor Barzin
87526898e6 Pin InvestEngine parser failure modes — empty-on-junk + partial-match
Context: The port's graceful-failure contract was implicit in the way
each strategy returns None/[] on malformed input, but without tests it
was an accidental property that could regress silently. Codify it.

Two invariants, each backed by a fixture:

1. Junk email → empty list, never raise.
   `unparseable.eml` is a pure-marketing IE newsletter with no order
   data. All three strategies try and fail; parse_invest_engine_email
   returns []. No exception leaks.

2. Partial HTML email → intact orders only.
   `html_partial_match.eml` has two nested summary tables: one with a
   valid VUAG order, one that is missing both the ticker and "Bought N
   @ £P" rows (simulates IE dropping content mid-render). The parser
   returns just the VUAG order.

No implementation change needed — the behaviour existed as a side
effect of _try_html_summary_table returning None on missing fields.
These tests lock it down so future refactors can't quietly break it.

Test plan:
  poetry run pytest tests/providers/parsers/ -q   →  8 passed in 0.19s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py   →  clean
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py   →  All checks passed!
  poetry run yapf --diff   →  clean (no diff)

Manual verification:
- Load unparseable.eml → parse returns [].
- Load html_partial_match.eml → parse returns exactly 1 activity (VUAG).
2026-04-17 22:02:48 +00:00
Viktor Barzin
020ba16723 Add CSV attachment fallback for InvestEngine email parser
Context: IE has not (yet) sent CSV-attached statements in production,
but the upstream parser had _extract_positions_csv as a third fallback
for exactly this case. Keeping the fallback preserves behaviour-parity
with the legacy parser and makes future statement support one fixture
away — the shape is documented by column set, not scraped live.

Unlike the upstream which split the body on whitespace and broke on any
embedded commas in names, this port walks real MIME attachments using
Python's csv.DictReader. A part qualifies as CSV if:
- its Content-Type is text/csv / application/csv / application/vnd.ms-excel, OR
- its filename ends in .csv (defence against IE mis-labelling the part)

Rows missing required columns or containing unparseable numbers/dates
are skipped silently — consistent with the "partial match" contract:
a half-corrupt CSV yields whatever rows were intact. Required columns:
ticker, unit_price, quantity, date (YYYY-MM-DD), currency. Non-GBP
rows are filtered because the IE ISA is strictly sterling — flagging
this assumption in the review notes.

This change:
- Adds `_parse_csv_attachment(raw_email)` as the third strategy after
  text/plain and text/html; it re-parses the raw email bytes so we can
  inspect Content-Type/filename on each part.
- Flags symbols/currencies, filters non-GBP, and runs each row through
  the shared `_build_activity` so external_id formation matches every
  other strategy (dedup stays consistent across strategies).
- Fixture `csv_attachment.eml` has three rows (VUAG, SWDA, VUSA) in a
  `text/csv` part with a `.csv` filename — covers both detection paths.

Test plan:
  poetry run pytest tests/providers/parsers/ -q   →  6 passed in 0.15s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py  →  clean
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py  →  All checks passed!
  poetry run yapf --diff  →  clean (no diff)

Manual verification: load csv_attachment.eml, call parse_invest_engine_email,
assert 3 activities each with symbol in {VUAG,SWDA,VUSA}, currency=GBP,
notes containing "csv".
2026-04-17 22:01:46 +00:00
Viktor Barzin
72d348e294 Add HTML table fallback for InvestEngine email parser
Context: Plain-text IE emails vanished around 2024-Q2 when IE switched to
an HTML-only template with per-order nested summary tables. The RFC 2822
line parser returns [] on those modern emails, so we need a fallback
that walks the HTML table structure.

Upstream _extract_from_html parsed a fixed DOM path (table[1].tr[10].
table) and only handled ONE order per email. The real IE HTML template
nests one summary <table> per ticker inside the second top-level table —
multiple orders in a single batched confirmation are common — so this
port walks every leaf table (no child <table>) and interprets each one
as an independent trade summary. Structural (non-leaf) tables are
skipped to avoid double-counting via get_text().

This change:
- `_parse_html_tables(body)` extracts the date once from the full text
  then walks leaf tables looking for "Bought N @ £P" rows.
- `_try_html_summary_table` parses one leaf; returns None on structural
  tables or missing ticker/qty/price — so a partial email yields only
  its intact orders (the "2 orders, 1 parseable → 1 returned" invariant
  works by construction without raising).
- `parse_invest_engine_email` now falls through text/plain → text/html
  in the multipart message, picking the first strategy that returns
  activities. Order matters: text/plain wins when both succeed because
  the RFC 2822 strategy is the more constrained grammar.
- Regexes are module-level constants so they compile once per process.

Fixture `html_two_orders.eml` is a minimal-but-realistic multipart email
with two nested summary tables (VUAG + SWDA), no personal data beyond
tickers/qty/price.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 5 passed in 0.16s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load html_two_orders.eml, call parse_invest_engine_email,
assert len == 2 with both expected tickers (VUAG, SWDA) and numbers,
dates set to 2026-04-01.
2026-04-17 21:58:15 +00:00
Viktor Barzin
9ec8ece2d9 Add InvestEngine email parser — RFC 2822 v1/v2 line format
Context: The old finance/ app had a 324-line IE message parser with four
line-based variants (v1/v2/v3/v4) plus an HTML strategy and a CSV
fallback. Port into broker-sync so we can consume IE trade confirmation
emails as a backup to the live HTTP client (Phase 2b) while IE's public
API remains Bearer-only.

The upstream parser emits storage.model.Position; we emit canonical
Activity with the broker-sync invariants: account_id="invest-engine-primary"
(sink remaps to Wealthfolio UUID), account_type=ISA, currency=GBP, and
external_id="invest-engine:<fingerprint>" where the fingerprint is a
SHA-256 of (date|symbol|quantity|unit_price) — deterministic so repeat
imports of the same email dedup at the sync-record layer.

This change:
- Top-level `parse_invest_engine_email(raw_email: bytes) -> list[Activity]`
  extracts the text/plain body from an RFC 2822 message and dispatches to
  the line-based parser.
- `_parse_rfc2822_lines(body)` tries the v2 layout first (newer IE format
  where `Date: DD Month` is on line 2 and the year on line 3), then the
  v1 layout (where the day alone is on line 2 and `Month YYYY` on line 3).
  v3 and v4 variants are re-added in a follow-up if we find fixtures
  where they matter — initial fixture coverage hits v2.
- Drops the upstream `_ticker_post_processing` VUAG→VUAG.L hack.
  Wealthfolio's /import/check endpoint resolves exchange suffixes; the
  Trading212 provider also emits suffix-free tickers (e.g. `VUAG`), so
  staying consistent avoids double-mapping.
- Notes field records the parse-strategy tag ("rfc2822-v2") plus the
  matched line for debugging.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 3 passed in 0.03s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load the fixture email, call the parser, inspect the
returned Activity has symbol=VUAG, quantity=59.539562, unit_price=60.46,
date=2023-01-17, external_id starts with invest-engine:.
2026-04-17 21:55:01 +00:00
Viktor Barzin
dc4d3f889d Add InvestEngineProvider — Bearer-token HTTP client
Context: InvestEngine has no public API. The web app uses an undocumented
Django REST backend at /api/v0.3X/*, which requires a Bearer token and
rolls its minor every 4-6 weeks. MFA (push-approval) is mandatory on
every login, so we do NOT automate login — Viktor logs in manually in
a browser, copies the Bearer out of devtools, and pastes it into Vault.
This provider consumes that token.

The response shape is UNVERIFIED (MFA blocks an unauthed probe, so the
research leading into Phase 2b could only confirm endpoint existence via
401 responses on v0.31 and v0.32). `_transaction_to_activity` is written
defensively:

 - accepts both `results`/`data` list wrappers and `next`/`meta.next_page`
   cursor fields for pagination;
 - accepts `symbol`/`ticker`, `price`/`unit_price`, `amount`/`value`,
   `date`/`created_at`/`timestamp` field-name variants;
 - maps exact type strings (BUY, SELL, DIVIDEND, INTEREST, DEPOSIT,
   WITHDRAWAL, FEE, TAX) and substring-matches DEPOSIT/WITHDRAWAL for
   variants like "CASH_DEPOSIT"; refuses to guess on anything else —
   unknown types log WARNING and return None (silent misclassification
   would corrupt tax reporting).

Version probe:

  _START_VERSION_MINOR=32 (research: v0.31/v0.32 live, v0.30 Gone)
  GET /api/v0.{n}/ → 410 ? advance : done
  cap at v0.60 so a misconfigured backend doesn't infinite-loop.

A 410 response on a data endpoint triggers exactly one re-probe + retry
against the newer version; the new version is cached on the instance for
the rest of the process.

Token expiry is tracked at the Python layer:

 - constructor takes token_expires_at (set by Viktor when he pastes);
 - fetch() fails fast with InvestEngineTokenExpiredError if the clock
   says the token is already dead — cheaper than burning a request for
   a known 401;
 - a real 401 response also raises InvestEngineTokenExpiredError so the
   CLI/pipeline can alert Viktor to paste a new token.

Vault schema expected (consumed by the CLI in the follow-up commit):

  secret/broker-sync
    investengine_bearer_token       <devtools-captured Bearer>
    investengine_token_expires_at   <ISO-8601 set at paste time>
    investengine_refresh_token      <optional, not used yet>

This module does NOT read Vault — the caller hands values in, keeping
the provider testable.

This change:
 - New `broker_sync/providers/invest_engine.py`:
   * InvestEngineProvider with .accounts(), .fetch(), .close()
   * _probe_version / _active_version with 410-retry + cache
   * _transaction_to_activity with defensive type + field-name mapping
   * InvestEngineError / InvestEngineTokenExpiredError / InvestEngineVersionError
 - New `tests/providers/test_invest_engine.py`: 22 tests covering version
   probe, expiry fail-fast, 401→TokenExpired, 410→reprobe, header
   shape, pagination variants, and the full txn→activity mapping. One
   @pytest.mark.skip integration stub for when Viktor has a live token.

Assumptions flagged for verification with a live token:
 - IE id field is castable to str (int or string)
 - Type strings match or fuzz-contain: BUY, SELL, DIVIDEND, INTEREST,
   DEPOSIT, WITHDRAWAL, FEE, TAX
 - Transactions carry numeric quantity/price/amount (Decimal-convertible)
 - Date field is one of: date / created_at / timestamp
 - Pagination shape is {results, next} OR {data, meta.next_page}
 - /transactions/ accepts ?portfolio=<id>&start=YYYY-MM-DD&end=YYYY-MM-DD

## Automated

poetry run pytest tests/providers/test_invest_engine.py -v
======================== 22 passed, 1 skipped in 0.26s =========================

poetry run pytest -q
95 passed, 1 skipped in 0.84s

poetry run mypy --strict .
Success: no issues found in 34 source files

poetry run ruff check .
All checks passed!

poetry run yapf --diff broker_sync/providers/invest_engine.py tests/providers/test_invest_engine.py
(clean)

## Manual Verification

Once Viktor pastes a live token:

  1. Export:
     export IE_BEARER_TOKEN='<paste>'
     export IE_TOKEN_EXPIRES_AT='2026-05-17T00:00:00+00:00'
  2. Unmark the @pytest.mark.skip on test_live_integration_smoke
  3. poetry run pytest tests/providers/test_invest_engine.py::test_live_integration_smoke -v
     Expected: a successful round-trip that returns an empty-or-populated
     list of Activity objects — prove the version probe + auth header +
     portfolio enumeration actually work against the real IE backend.
  4. Validate the Assumptions list above against the real transaction JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:52:26 +00:00
Viktor Barzin
ea15b80111 Add InvestEngine email parser — RFC 2822 v1/v2 line format
Context: The old finance/ app had a 324-line IE message parser with four
line-based variants (v1/v2/v3/v4) plus an HTML strategy and a CSV
fallback. Port into broker-sync so we can consume IE trade confirmation
emails as a backup to the live HTTP client (Phase 2b) while IE's public
API remains Bearer-only.

The upstream parser emits storage.model.Position; we emit canonical
Activity with the broker-sync invariants: account_id="invest-engine-primary"
(sink remaps to Wealthfolio UUID), account_type=ISA, currency=GBP, and
external_id="invest-engine:<fingerprint>" where the fingerprint is a
SHA-256 of (date|symbol|quantity|unit_price) — deterministic so repeat
imports of the same email dedup at the sync-record layer.

This change:
- Top-level `parse_invest_engine_email(raw_email: bytes) -> list[Activity]`
  extracts the text/plain body from an RFC 2822 message and dispatches to
  the line-based parser.
- `_parse_rfc2822_lines(body)` tries the v2 layout first (newer IE format
  where `Date: DD Month` is on line 2 and the year on line 3), then the
  v1 layout (where the day alone is on line 2 and `Month YYYY` on line 3).
  v3 and v4 variants are re-added in a follow-up if we find fixtures
  where they matter — initial fixture coverage hits v2.
- Drops the upstream `_ticker_post_processing` VUAG→VUAG.L hack.
  Wealthfolio's /import/check endpoint resolves exchange suffixes; the
  Trading212 provider also emits suffix-free tickers (e.g. `VUAG`), so
  staying consistent avoids double-mapping.
- Notes field records the parse-strategy tag ("rfc2822-v2") plus the
  matched line for debugging.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 3 passed in 0.03s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load the fixture email, call the parser, inspect the
returned Activity has symbol=VUAG, quantity=59.539562, unit_price=60.46,
date=2023-01-17, external_id starts with invest-engine:.
2026-04-17 21:49:52 +00:00
Viktor Barzin
1eb3f78ea5 Wire T212 pagination, retries, and click<8.2 pin
Context
-------
Closes out the Trading212 provider's retry + pagination surface so
the "Add Trading212Provider core fetch" commit has everything the
CronJob needs: cursor-based pagination, 429 honouring Retry-After,
jittered exponential backoff for 429-without-header and 5xx, bailout
after _MAX_RETRIES, and checkpoint-after-page semantics so a crashed
run resumes at the start of the unfinished page.

Also pins click<8.2 — typer 0.12 calls Parameter.make_metavar()
without a ctx argument, which click 8.2 removed; `broker-sync --help`
was crashing with TypeError until this pin. typer 0.15+ would also
fix it; the pin is lower friction.

One test fix: test_checkpoint_advances_only_after_page_yielded had a
handler that unconditionally returned a next_path → infinite loop. The
assertion was always about "a cursor was saved after page 1", so I
changed the handler to return page 2 as empty-with-no-next, which
terminates the loop cleanly.

Test plan
---------
## Automated
- poetry run pytest -q  →  70 passed
- poetry run mypy broker_sync tests  →  Success: no issues found in 29 source files
- poetry run ruff check .  →  All checks passed!
- poetry run broker-sync --help  →  renders without crash; lists version + auth-spike

## Manual Verification
End-to-end against a live T212 key is in the next commit once the
CLI subcommand and pipeline land.
2026-04-17 19:45:23 +00:00
Viktor Barzin
7d2c1199a9 Add Trading212Provider core fetch
Context
-------
The Provider protocol is satisfied. This commit adds the first cut of
the concrete Trading212 implementation: one page of fills, mapped to
canonical Activities. Pagination, retries, and checkpointing on
resume are deliberately deferred to the next commit so this one stays
focused on the raw shape translation.

Design decisions
----------------
- One provider instance serves every T212 wrapper (ISA + Invest). T212
  exposes one API key per wrapper, so the caller hands over a list of
  (Account, api_key) pairs. `accounts()` returns only the Accounts —
  the keys never escape the provider.
- Auth: literal `Authorization: <api_key>`, NOT `Bearer <api_key>`.
  T212 quietly returns 401 for Bearer-prefixed keys. The test locks
  that in.
- Sell detection: T212 signs quantity (negative means closing a long
  or opening a short). We flip on the sign and store `abs(quantity)`,
  matching the Wealthfolio BUY/SELL convention.
- Null fills (cancelled orders) are silently dropped at parse time
  rather than surfacing to the caller.
- `external_id = t212:fill:<fill.id>` — the fill ID is stable per
  T212 docs and survives order cancellation/modification semantics.
- Ticker normalisation runs on ingress so downstream dedup + Wealthfolio
  see `VUAG` even though T212 reports `VUAGl_EQ`.
- `since` / `before` filter on `filledAt`. `before` is half-open
  (`< before`) so CronJobs can chain adjacent windows without
  double-counting the boundary.

Explicitly NOT in this change:
- Pagination (nextPagePath walk)
- 429 / 5xx retry
- Dividend / deposit endpoints (deferred — Phase 1.1, filed as
  beads follow-up if needed)

This change
-----------
- broker_sync/providers/trading212.py: `Trading212Provider` class +
  `Trading212Error` / `Trading212AuthError` exception hierarchy.
  `_item_to_activity` is pure and returns Optional so cancelled
  fills short-circuit without raising.
- tests/providers/test_trading212.py: MockTransport-driven tests for
  auth header shape, fill→Activity mapping (buy + sell sign flip),
  null-fill skip, since-filter, and both error types.

Test plan
---------
## Automated
- poetry run pytest -q  →  61 passed in 0.60s
- poetry run mypy broker_sync tests  →  Success: no issues found in 27 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Deferred to the CLI wiring commit — the live endpoint is 6 calls/min
and the full-volume dry run belongs with the env-driven command, not
the unit-level commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:34:03 +00:00
Viktor Barzin
43d2251159 Add per-account cursor Checkpoint helper
Context
-------
Trading212's `/equity/history/orders` is cursor-paginated via a
`nextPagePath` query-param in each response. Steady-state runs must
resume where the previous run finished, or we either miss fills (if
we start from 'now') or waste the 6/min rate limit walking history
we already imported (if we start from epoch).

A shared checkpoint store must live alongside the SyncRecordStore's
dedup DB on the /data PVC so CronJob pods can see progress from the
previous invocation. One file per (provider, account_id) because:

- T212 issues one API key per wrapper — ISA + Invest share no data.
- Plain JSON files are trivial to hand-edit during backfill if a
  resume cursor gets stuck at a bad point.

This change
-----------
- broker_sync/providers/_checkpoint.py: `Checkpoint(dir, provider,
  account_id)` with `load() -> str | None` and `save(cursor)`. Writes
  `{cursor, updated_at}` to `<provider>-<account_id>.json`. Creates
  parent directory lazily on first save so the PVC only needs a
  mountpoint, not a pre-seeded layout.
- Provider-agnostic: no T212 knowledge. Will be reused for
  InvestEngine in Phase 2.
- tests/providers/test_checkpoint.py: roundtrip, filename shape,
  overwrite, per-account isolation, parent-dir creation, and a
  malformed-file fallback (returns None rather than raising) so a
  manual edit gone wrong does not brick the CronJob.

Test plan
---------
## Automated
- poetry run pytest -q  →  48 passed in 0.47s
- poetry run mypy broker_sync tests  →  Success: no issues found in 24 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable — pure local-filesystem helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:30:20 +00:00
Viktor Barzin
73b03b227e Add Trading212 ticker normalisation
Context
-------
Phase 1 kickoff: Trading212 tags every ticker with `_EQ`, sometimes
preceded by a lowercase exchange letter ("l" = LSE) or `_US`. Raw
symbols like `VUAGl_EQ` are an implementation detail that would leak
into Wealthfolio and diverge from other providers (InvestEngine and
Schwab emit `VUAG` / `META`). The canonical form has to match across
providers so portfolio aggregation lines up.

Unlike the finance/ reference code, we do NOT restrict to a
SUPPORTED_TICKERS allowlist here — Wealthfolio is the source of truth,
everything gets imported, and the user decides what to track.

This change
-----------
- broker_sync/providers/trading212.py: pure `_normalise_ticker`
  helper backed by a single regex that peels `(_US)?[a-z]?_EQ`. No
  lookup tables — the rule covers all observed shapes.
- tests/providers/test_trading212_ticker.py: parametrised cases for
  every mapping called out in the Phase 1 plan plus pass-through of
  already-canonical symbols.

Test plan
---------
## Automated
- poetry run pytest -q  →  41 passed in 0.46s
- poetry run mypy broker_sync tests  →  Success: no issues found in 22 source files
- poetry run ruff check .  →  All checks passed!

## Manual Verification
Not applicable — pure function, no external side effects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 19:29:23 +00:00