Brokerage data sync (Trading 212, Schwab, Fidelity, IMAP-CSV) → Wealthfolio. Image is published as viktor/wealthfolio-sync per the wealthfolio stack convention.
Find a file
Viktor Barzin 72d348e294 Add HTML table fallback for InvestEngine email parser
Context: Plain-text IE emails vanished around 2024-Q2 when IE switched to
an HTML-only template with per-order nested summary tables. The RFC 2822
line parser returns [] on those modern emails, so we need a fallback
that walks the HTML table structure.

Upstream _extract_from_html parsed a fixed DOM path (table[1].tr[10].
table) and only handled ONE order per email. The real IE HTML template
nests one summary <table> per ticker inside the second top-level table —
multiple orders in a single batched confirmation are common — so this
port walks every leaf table (no child <table>) and interprets each one
as an independent trade summary. Structural (non-leaf) tables are
skipped to avoid double-counting via get_text().

This change:
- `_parse_html_tables(body)` extracts the date once from the full text
  then walks leaf tables looking for "Bought N @ £P" rows.
- `_try_html_summary_table` parses one leaf; returns None on structural
  tables or missing ticker/qty/price — so a partial email yields only
  its intact orders (the "2 orders, 1 parseable → 1 returned" invariant
  works by construction without raising).
- `parse_invest_engine_email` now falls through text/plain → text/html
  in the multipart message, picking the first strategy that returns
  activities. Order matters: text/plain wins when both succeed because
  the RFC 2822 strategy is the more constrained grammar.
- Regexes are module-level constants so they compile once per process.

Fixture `html_two_orders.eml` is a minimal-but-realistic multipart email
with two nested summary tables (VUAG + SWDA), no personal data beyond
tickers/qty/price.

Test plan:
  poetry run pytest tests/providers/parsers/ -q
  → 5 passed in 0.16s
  poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → Success: no issues found in 2 source files
  poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → All checks passed!
  poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py
  → clean (no diff)

Manual verification: load html_two_orders.eml, call parse_invest_engine_email,
assert len == 2 with both expected tickers (VUAG, SWDA) and numbers,
dates set to 2026-04-01.
2026-04-17 21:58:15 +00:00
.github/workflows CI: build image from phase-0-scaffold branch too (bootstrap) 2026-04-17 19:51:09 +00:00
.woodpecker Add GHA build + Woodpecker deploy pipelines 2026-04-17 19:32:00 +00:00
broker_sync Add HTML table fallback for InvestEngine email parser 2026-04-17 21:58:15 +00:00
tests Add HTML table fallback for InvestEngine email parser 2026-04-17 21:58:15 +00:00
.gitignore Initial scaffold + canonical Activity model 2026-04-17 19:16:11 +00:00
Dockerfile Fix live Wealthfolio login + Dockerfile poetry path 2026-04-17 20:17:24 +00:00
poetry.lock Wire T212 pagination, retries, and click<8.2 pin 2026-04-17 19:45:23 +00:00
pyproject.toml Wire T212 pagination, retries, and click<8.2 pin 2026-04-17 19:45:23 +00:00