Add HTML table fallback for InvestEngine email parser
Context: Plain-text IE emails vanished around 2024-Q2 when IE switched to an HTML-only template with per-order nested summary tables. The RFC 2822 line parser returns [] on those modern emails, so we need a fallback that walks the HTML table structure. Upstream _extract_from_html parsed a fixed DOM path (table[1].tr[10]. table) and only handled ONE order per email. The real IE HTML template nests one summary <table> per ticker inside the second top-level table — multiple orders in a single batched confirmation are common — so this port walks every leaf table (no child <table>) and interprets each one as an independent trade summary. Structural (non-leaf) tables are skipped to avoid double-counting via get_text(). This change: - `_parse_html_tables(body)` extracts the date once from the full text then walks leaf tables looking for "Bought N @ £P" rows. - `_try_html_summary_table` parses one leaf; returns None on structural tables or missing ticker/qty/price — so a partial email yields only its intact orders (the "2 orders, 1 parseable → 1 returned" invariant works by construction without raising). - `parse_invest_engine_email` now falls through text/plain → text/html in the multipart message, picking the first strategy that returns activities. Order matters: text/plain wins when both succeed because the RFC 2822 strategy is the more constrained grammar. - Regexes are module-level constants so they compile once per process. Fixture `html_two_orders.eml` is a minimal-but-realistic multipart email with two nested summary tables (VUAG + SWDA), no personal data beyond tickers/qty/price. Test plan: poetry run pytest tests/providers/parsers/ -q → 5 passed in 0.16s poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → Success: no issues found in 2 source files poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → All checks passed! poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → clean (no diff) Manual verification: load html_two_orders.eml, call parse_invest_engine_email, assert len == 2 with both expected tickers (VUAG, SWDA) and numbers, dates set to 2026-04-01.
This commit is contained in:
parent
9ec8ece2d9
commit
72d348e294
3 changed files with 198 additions and 15 deletions
55
tests/fixtures/invest_engine/html_two_orders.eml
vendored
Normal file
55
tests/fixtures/invest_engine/html_two_orders.eml
vendored
Normal file
|
|
@ -0,0 +1,55 @@
|
|||
From: InvestEngine <no-reply@investengine.com>
|
||||
To: viktorbarzin@example.com
|
||||
Subject: Your portfolio has been updated
|
||||
Date: Wed, 01 Apr 2026 09:15:00 +0000
|
||||
MIME-Version: 1.0
|
||||
Content-Type: multipart/alternative; boundary="----=_Part_1"
|
||||
|
||||
------=_Part_1
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
|
||||
(HTML-only view — your client does not render HTML emails.)
|
||||
|
||||
------=_Part_1
|
||||
Content-Type: text/html; charset=UTF-8
|
||||
|
||||
<html><head><title>InvestEngine</title></head><body>
|
||||
<table><tr><td>Header logo</td></tr></table>
|
||||
<table>
|
||||
<tr><td>Client name: Redacted</td></tr>
|
||||
<tr><td>Trading venue: London Stock Exchange</td></tr>
|
||||
<tr><td>Type: Market Order(s)</td></tr>
|
||||
<tr><td>Here's a summary of the trades we've made for you</td></tr>
|
||||
<tr>
|
||||
<td>a</td><td>b</td><td>c</td><td>d</td>
|
||||
<td> Date: 01 April 2026 </td>
|
||||
</tr>
|
||||
<tr><td>filler</td></tr>
|
||||
<tr><td>filler</td></tr>
|
||||
<tr><td>filler</td></tr>
|
||||
<tr><td>filler</td></tr>
|
||||
<tr><td>filler</td></tr>
|
||||
<tr>
|
||||
<td>
|
||||
<table>
|
||||
<tr><td>Vanguard S&P 500: VUAG</td></tr>
|
||||
<tr><td>Bought 10.5 @ £62.10 per share</td></tr>
|
||||
<tr><td>Total: £652.05</td></tr>
|
||||
<tr><td>ISIN: IE00BFMXXD54, Order ID: 300000/4000001, Traded at 9:05am GMT</td></tr>
|
||||
</table>
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
<table>
|
||||
<tr><td>iShares Core MSCI World: SWDA</td></tr>
|
||||
<tr><td>Bought 2.25 @ £85.40 per share</td></tr>
|
||||
<tr><td>Total: £192.15</td></tr>
|
||||
<tr><td>ISIN: IE00B4L5Y983, Order ID: 300000/4000002, Traded at 9:06am GMT</td></tr>
|
||||
</table>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</body></html>
|
||||
|
||||
------=_Part_1--
|
||||
|
|
@ -42,3 +42,29 @@ def test_rfc2822_notes_record_parse_strategy() -> None:
|
|||
a = parse_invest_engine_email(_load("rfc2822_v2_single_buy.eml"))[0]
|
||||
assert a.notes is not None
|
||||
assert "rfc2822" in a.notes
|
||||
|
||||
|
||||
# -- HTML table body (multipart/alternative, two orders) --
|
||||
|
||||
|
||||
def test_html_body_parses_both_orders() -> None:
|
||||
activities = parse_invest_engine_email(_load("html_two_orders.eml"))
|
||||
assert len(activities) == 2
|
||||
a, b = activities
|
||||
assert a.symbol == "VUAG"
|
||||
assert a.quantity == Decimal("10.5")
|
||||
assert a.unit_price == Decimal("62.10")
|
||||
assert a.date == datetime(2026, 4, 1)
|
||||
assert a.account_id == "invest-engine-primary"
|
||||
assert a.account_type is AccountType.ISA
|
||||
assert a.activity_type is ActivityType.BUY
|
||||
assert b.symbol == "SWDA"
|
||||
assert b.quantity == Decimal("2.25")
|
||||
assert b.unit_price == Decimal("85.40")
|
||||
assert b.date == datetime(2026, 4, 1)
|
||||
|
||||
|
||||
def test_html_notes_record_html_strategy() -> None:
|
||||
a = parse_invest_engine_email(_load("html_two_orders.eml"))[0]
|
||||
assert a.notes is not None
|
||||
assert "html" in a.notes
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue