Context: Plain-text IE emails vanished around 2024-Q2 when IE switched to an HTML-only template with per-order nested summary tables. The RFC 2822 line parser returns [] on those modern emails, so we need a fallback that walks the HTML table structure. Upstream _extract_from_html parsed a fixed DOM path (table[1].tr[10]. table) and only handled ONE order per email. The real IE HTML template nests one summary <table> per ticker inside the second top-level table — multiple orders in a single batched confirmation are common — so this port walks every leaf table (no child <table>) and interprets each one as an independent trade summary. Structural (non-leaf) tables are skipped to avoid double-counting via get_text(). This change: - `_parse_html_tables(body)` extracts the date once from the full text then walks leaf tables looking for "Bought N @ £P" rows. - `_try_html_summary_table` parses one leaf; returns None on structural tables or missing ticker/qty/price — so a partial email yields only its intact orders (the "2 orders, 1 parseable → 1 returned" invariant works by construction without raising). - `parse_invest_engine_email` now falls through text/plain → text/html in the multipart message, picking the first strategy that returns activities. Order matters: text/plain wins when both succeed because the RFC 2822 strategy is the more constrained grammar. - Regexes are module-level constants so they compile once per process. Fixture `html_two_orders.eml` is a minimal-but-realistic multipart email with two nested summary tables (VUAG + SWDA), no personal data beyond tickers/qty/price. Test plan: poetry run pytest tests/providers/parsers/ -q → 5 passed in 0.16s poetry run mypy broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → Success: no issues found in 2 source files poetry run ruff check broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → All checks passed! poetry run yapf --diff broker_sync/providers/parsers/invest_engine.py tests/providers/parsers/test_invest_engine.py → clean (no diff) Manual verification: load html_two_orders.eml, call parse_invest_engine_email, assert len == 2 with both expected tickers (VUAG, SWDA) and numbers, dates set to 2026-04-01.
55 lines
1.5 KiB
Text
55 lines
1.5 KiB
Text
From: InvestEngine <no-reply@investengine.com>
|
|
To: viktorbarzin@example.com
|
|
Subject: Your portfolio has been updated
|
|
Date: Wed, 01 Apr 2026 09:15:00 +0000
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/alternative; boundary="----=_Part_1"
|
|
|
|
------=_Part_1
|
|
Content-Type: text/plain; charset=UTF-8
|
|
|
|
(HTML-only view — your client does not render HTML emails.)
|
|
|
|
------=_Part_1
|
|
Content-Type: text/html; charset=UTF-8
|
|
|
|
<html><head><title>InvestEngine</title></head><body>
|
|
<table><tr><td>Header logo</td></tr></table>
|
|
<table>
|
|
<tr><td>Client name: Redacted</td></tr>
|
|
<tr><td>Trading venue: London Stock Exchange</td></tr>
|
|
<tr><td>Type: Market Order(s)</td></tr>
|
|
<tr><td>Here's a summary of the trades we've made for you</td></tr>
|
|
<tr>
|
|
<td>a</td><td>b</td><td>c</td><td>d</td>
|
|
<td> Date: 01 April 2026 </td>
|
|
</tr>
|
|
<tr><td>filler</td></tr>
|
|
<tr><td>filler</td></tr>
|
|
<tr><td>filler</td></tr>
|
|
<tr><td>filler</td></tr>
|
|
<tr><td>filler</td></tr>
|
|
<tr>
|
|
<td>
|
|
<table>
|
|
<tr><td>Vanguard S&P 500: VUAG</td></tr>
|
|
<tr><td>Bought 10.5 @ £62.10 per share</td></tr>
|
|
<tr><td>Total: £652.05</td></tr>
|
|
<tr><td>ISIN: IE00BFMXXD54, Order ID: 300000/4000001, Traded at 9:05am GMT</td></tr>
|
|
</table>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<table>
|
|
<tr><td>iShares Core MSCI World: SWDA</td></tr>
|
|
<tr><td>Bought 2.25 @ £85.40 per share</td></tr>
|
|
<tr><td>Total: £192.15</td></tr>
|
|
<tr><td>ISIN: IE00B4L5Y983, Order ID: 300000/4000002, Traded at 9:06am GMT</td></tr>
|
|
</table>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
</body></html>
|
|
|
|
------=_Part_1--
|