Extends parse_schwab_email to handle Schwab's RSU Release Confirmation
emails alongside the existing trade confirmations. Adds:
- `VestEvent` dataclass in models.py — carries vest_date, ticker,
shares_vested, shares_sold_to_cover, fmv_at_vest_usd, tax_withheld_usd.
Written to payslip_ingest.rsu_vest_events by a postgres sink (pending
a real email fixture + cross-service DB grant).
- `parse_schwab_email_full()` — new entry point returning both
`list[Activity]` and `VestEvent | None`. The legacy
`parse_schwab_email()` shape is preserved for existing callers.
- Vest-release dispatch heuristic: HTML body mentions "Release
Confirmation" / "Award Vesting" / "RSU Release". On match, extract
vest fields via label regexes; the full vest becomes a BUY Activity
and the sell-to-cover slice becomes a SELL Activity at the same FMV
(net zero cash on the day). Gross vest + sell-to-cover returned so
Wealthfolio gets the full portfolio picture.
- Tests: 3 new (vest roundtrip, unparseable-vest safety, legacy shape
preserved); existing 6 unchanged.
The regex heuristics will need tightening once a real email sample
exists — the HTML structure observed in public Schwab emails may
differ in material ways. For now, unmatched vest bodies return
empty-result (no Activity, no VestEvent) rather than crashing the
IMAP batch.
Part of: code-860
Schwab's workplace-RSU confirmation emails have 5 data td elements
with class='dark-background-body' align='right': date, direction, qty,
ticker, price-with-currency-sign. One email → one Activity.
- parse_schwab_email(raw_html) -> list[Activity] (1-item or empty)
- Empty on any parse failure (IMAP batch shouldn't crash on one bad mail)
- Deterministic external_id ('schwab📅ticker:type:qty') — stable
across re-pulls so dedup works
- Hardcoded to account 'schwab-workplace' / AccountType.GIA / USD
- 6 unit tests: SELL + BUY happy path, malformed, missing cells,
external-id stability, commas in price
Dropped from the original finance port:
- msg_timestamp-based external id (non-deterministic — would re-import
on every IMAP walk). Replaced with a hash-stable key.
- Currency.from_sign() currency hack. Schwab US is USD-only; we'll add
FX when that changes.
poetry run pytest -q → 109 passed, 1 skipped
poetry run mypy → clean (added types-python-dateutil)
poetry run ruff check → clean
Context: The port's graceful-failure contract was implicit in the way
each strategy returns None/[] on malformed input, but without tests it
was an accidental property that could regress silently. Codify it.
Two invariants, each backed by a fixture:
1. Junk email → empty list, never raise.
`unparseable.eml` is a pure-marketing IE newsletter with no order
data. All three strategies try and fail; parse_invest_engine_email
returns []. No exception leaks.
2. Partial HTML email → intact orders only.
`html_partial_match.eml` has two nested summary tables: one with a
valid VUAG order, one that is missing both the ticker and "Bought N
@ £P" rows (simulates IE dropping content mid-render). The parser
returns just the VUAG order.
No implementation change needed — the behaviour existed as a side
effect of _try_html_summary_table returning None on missing fields.
These tests lock it down so future refactors can't quietly break it.
Test plan:
poetry run pytest tests/providers/parsers/ -q → 8 passed in 0.19s
poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → clean
poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → All checks passed!
poetry run yapf --diff → clean (no diff)
Manual verification:
- Load unparseable.eml → parse returns [].
- Load html_partial_match.eml → parse returns exactly 1 activity (VUAG).
Context: IE has not (yet) sent CSV-attached statements in production,
but the upstream parser had _extract_positions_csv as a third fallback
for exactly this case. Keeping the fallback preserves behaviour-parity
with the legacy parser and makes future statement support one fixture
away — the shape is documented by column set, not scraped live.
Unlike the upstream which split the body on whitespace and broke on any
embedded commas in names, this port walks real MIME attachments using
Python's csv.DictReader. A part qualifies as CSV if:
- its Content-Type is text/csv / application/csv / application/vnd.ms-excel, OR
- its filename ends in .csv (defence against IE mis-labelling the part)
Rows missing required columns or containing unparseable numbers/dates
are skipped silently — consistent with the "partial match" contract:
a half-corrupt CSV yields whatever rows were intact. Required columns:
ticker, unit_price, quantity, date (YYYY-MM-DD), currency. Non-GBP
rows are filtered because the IE ISA is strictly sterling — flagging
this assumption in the review notes.
This change:
- Adds `_parse_csv_attachment(raw_email)` as the third strategy after
text/plain and text/html; it re-parses the raw email bytes so we can
inspect Content-Type/filename on each part.
- Flags symbols/currencies, filters non-GBP, and runs each row through
the shared `_build_activity` so external_id formation matches every
other strategy (dedup stays consistent across strategies).
- Fixture `csv_attachment.eml` has three rows (VUAG, SWDA, VUSA) in a
`text/csv` part with a `.csv` filename — covers both detection paths.
Test plan:
poetry run pytest tests/providers/parsers/ -q → 6 passed in 0.15s
poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → clean
poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → All checks passed!
poetry run yapf --diff → clean (no diff)
Manual verification: load csv_attachment.eml, call parse_invest_engine_email,
assert 3 activities each with symbol in {VUAG,SWDA,VUSA}, currency=GBP,
notes containing "csv".
Context: Plain-text IE emails vanished around 2024-Q2 when IE switched to
an HTML-only template with per-order nested summary tables. The RFC 2822
line parser returns [] on those modern emails, so we need a fallback
that walks the HTML table structure.
Upstream _extract_from_html parsed a fixed DOM path (table[1].tr[10].
table) and only handled ONE order per email. The real IE HTML template
nests one summary <table> per ticker inside the second top-level table —
multiple orders in a single batched confirmation are common — so this
port walks every leaf table (no child <table>) and interprets each one
as an independent trade summary. Structural (non-leaf) tables are
skipped to avoid double-counting via get_text().
This change:
- `_parse_html_tables(body)` extracts the date once from the full text
then walks leaf tables looking for "Bought N @ £P" rows.
- `_try_html_summary_table` parses one leaf; returns None on structural
tables or missing ticker/qty/price — so a partial email yields only
its intact orders (the "2 orders, 1 parseable → 1 returned" invariant
works by construction without raising).
- `parse_invest_engine_email` now falls through text/plain → text/html
in the multipart message, picking the first strategy that returns
activities. Order matters: text/plain wins when both succeed because
the RFC 2822 strategy is the more constrained grammar.
- Regexes are module-level constants so they compile once per process.
Fixture `html_two_orders.eml` is a minimal-but-realistic multipart email
with two nested summary tables (VUAG + SWDA), no personal data beyond
tickers/qty/price.
Test plan:
poetry run pytest tests/providers/parsers/ -q
→ 5 passed in 0.16s
poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
→ Success: no issues found in 2 source files
poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
→ All checks passed!
poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
→ clean (no diff)
Manual verification: load html_two_orders.eml, call parse_invest_engine_email,
assert len == 2 with both expected tickers (VUAG, SWDA) and numbers,
dates set to 2026-04-01.
Context: The old finance/ app had a 324-line IE message parser with four
line-based variants (v1/v2/v3/v4) plus an HTML strategy and a CSV
fallback. Port into broker-sync so we can consume IE trade confirmation
emails as a backup to the live HTTP client (Phase 2b) while IE's public
API remains Bearer-only.
The upstream parser emits storage.model.Position; we emit canonical
Activity with the broker-sync invariants: account_id="invest-engine-primary"
(sink remaps to Wealthfolio UUID), account_type=ISA, currency=GBP, and
external_id="invest-engine:<fingerprint>" where the fingerprint is a
SHA-256 of (date|symbol|quantity|unit_price) — deterministic so repeat
imports of the same email dedup at the sync-record layer.
This change:
- Top-level `parse_invest_engine_email(raw_email: bytes) -> list[Activity]`
extracts the text/plain body from an RFC 2822 message and dispatches to
the line-based parser.
- `_parse_rfc2822_lines(body)` tries the v2 layout first (newer IE format
where `Date: DD Month` is on line 2 and the year on line 3), then the
v1 layout (where the day alone is on line 2 and `Month YYYY` on line 3).
v3 and v4 variants are re-added in a follow-up if we find fixtures
where they matter — initial fixture coverage hits v2.
- Drops the upstream `_ticker_post_processing` VUAG→VUAG.L hack.
Wealthfolio's /import/check endpoint resolves exchange suffixes; the
Trading212 provider also emits suffix-free tickers (e.g. `VUAG`), so
staying consistent avoids double-mapping.
- Notes field records the parse-strategy tag ("rfc2822-v2") plus the
matched line for debugging.
Test plan:
poetry run pytest tests/providers/parsers/ -q
→ 3 passed in 0.03s
poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
→ Success: no issues found in 2 source files
poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
→ All checks passed!
poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
→ clean (no diff)
Manual verification: load the fixture email, call the parser, inspect the
returned Activity has symbol=VUAG, quantity=59.539562, unit_price=60.46,
date=2023-01-17, external_id starts with invest-engine:.