broker-sync/broker_sync/providers/invest_engine.py
Viktor Barzin dc4d3f889d Add InvestEngineProvider — Bearer-token HTTP client
Context: InvestEngine has no public API. The web app uses an undocumented
Django REST backend at /api/v0.3X/*, which requires a Bearer token and
rolls its minor every 4-6 weeks. MFA (push-approval) is mandatory on
every login, so we do NOT automate login — Viktor logs in manually in
a browser, copies the Bearer out of devtools, and pastes it into Vault.
This provider consumes that token.

The response shape is UNVERIFIED (MFA blocks an unauthed probe, so the
research leading into Phase 2b could only confirm endpoint existence via
401 responses on v0.31 and v0.32). `_transaction_to_activity` is written
defensively:

 - accepts both `results`/`data` list wrappers and `next`/`meta.next_page`
   cursor fields for pagination;
 - accepts `symbol`/`ticker`, `price`/`unit_price`, `amount`/`value`,
   `date`/`created_at`/`timestamp` field-name variants;
 - maps exact type strings (BUY, SELL, DIVIDEND, INTEREST, DEPOSIT,
   WITHDRAWAL, FEE, TAX) and substring-matches DEPOSIT/WITHDRAWAL for
   variants like "CASH_DEPOSIT"; refuses to guess on anything else —
   unknown types log WARNING and return None (silent misclassification
   would corrupt tax reporting).

Version probe:

  _START_VERSION_MINOR=32 (research: v0.31/v0.32 live, v0.30 Gone)
  GET /api/v0.{n}/ → 410 ? advance : done
  cap at v0.60 so a misconfigured backend doesn't infinite-loop.

A 410 response on a data endpoint triggers exactly one re-probe + retry
against the newer version; the new version is cached on the instance for
the rest of the process.

Token expiry is tracked at the Python layer:

 - constructor takes token_expires_at (set by Viktor when he pastes);
 - fetch() fails fast with InvestEngineTokenExpiredError if the clock
   says the token is already dead — cheaper than burning a request for
   a known 401;
 - a real 401 response also raises InvestEngineTokenExpiredError so the
   CLI/pipeline can alert Viktor to paste a new token.

Vault schema expected (consumed by the CLI in the follow-up commit):

  secret/broker-sync
    investengine_bearer_token       <devtools-captured Bearer>
    investengine_token_expires_at   <ISO-8601 set at paste time>
    investengine_refresh_token      <optional, not used yet>

This module does NOT read Vault — the caller hands values in, keeping
the provider testable.

This change:
 - New `broker_sync/providers/invest_engine.py`:
   * InvestEngineProvider with .accounts(), .fetch(), .close()
   * _probe_version / _active_version with 410-retry + cache
   * _transaction_to_activity with defensive type + field-name mapping
   * InvestEngineError / InvestEngineTokenExpiredError / InvestEngineVersionError
 - New `tests/providers/test_invest_engine.py`: 22 tests covering version
   probe, expiry fail-fast, 401→TokenExpired, 410→reprobe, header
   shape, pagination variants, and the full txn→activity mapping. One
   @pytest.mark.skip integration stub for when Viktor has a live token.

Assumptions flagged for verification with a live token:
 - IE id field is castable to str (int or string)
 - Type strings match or fuzz-contain: BUY, SELL, DIVIDEND, INTEREST,
   DEPOSIT, WITHDRAWAL, FEE, TAX
 - Transactions carry numeric quantity/price/amount (Decimal-convertible)
 - Date field is one of: date / created_at / timestamp
 - Pagination shape is {results, next} OR {data, meta.next_page}
 - /transactions/ accepts ?portfolio=<id>&start=YYYY-MM-DD&end=YYYY-MM-DD

## Automated

poetry run pytest tests/providers/test_invest_engine.py -v
======================== 22 passed, 1 skipped in 0.26s =========================

poetry run pytest -q
95 passed, 1 skipped in 0.84s

poetry run mypy --strict .
Success: no issues found in 34 source files

poetry run ruff check .
All checks passed!

poetry run yapf --diff broker_sync/providers/invest_engine.py tests/providers/test_invest_engine.py
(clean)

## Manual Verification

Once Viktor pastes a live token:

  1. Export:
     export IE_BEARER_TOKEN='<paste>'
     export IE_TOKEN_EXPIRES_AT='2026-05-17T00:00:00+00:00'
  2. Unmark the @pytest.mark.skip on test_live_integration_smoke
  3. poetry run pytest tests/providers/test_invest_engine.py::test_live_integration_smoke -v
     Expected: a successful round-trip that returns an empty-or-populated
     list of Activity objects — prove the version probe + auth header +
     portfolio enumeration actually work against the real IE backend.
  4. Validate the Assumptions list above against the real transaction JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:52:26 +00:00

380 lines
14 KiB
Python

"""InvestEngine Bearer-token provider.
InvestEngine (https://investengine.com) has no public API and requires MFA
(push-approval via the IE mobile app) on every login. We work around that
by having Viktor log in manually in a browser, copy the Bearer token out
of devtools, and paste it into Vault. This module consumes that token.
## Vault schema — `secret/broker-sync`
The following keys are expected in Vault. The caller (CLI/pipeline) reads
them and hands the values to the constructor. This module does NOT read
Vault directly — it stays testable.
- ``investengine_bearer_token`` — the Bearer string Viktor pastes from
devtools. Expires on IE's refresh schedule (~monthly based on observed
behaviour).
- ``investengine_token_expires_at`` — ISO-8601 timestamp Viktor sets WHEN
HE PASTES the token. Used to alert 3 days before expiry and to fail
fast before making a request with a known-dead token.
- ``investengine_refresh_token`` *(optional)* — if Viktor's devtools
capture included a refresh token, we may attempt auto-refresh in a
future iteration. Not used yet.
## Version probing
IE rolls the ``/api/v0.3X/`` version every 4-6 weeks. ``v0.29`` and
``v0.30`` were ``410 Gone`` at research time; ``v0.31`` and ``v0.32``
were live (401 without auth — the correct "auth required" signal);
``v0.33`` and later returned 404 (not yet created). The probe starts
at ``v0.32`` and walks forward on 410, stopping at the first version
that returns anything other than 410 (401/404/200/etc).
"""
from __future__ import annotations
import logging
from collections.abc import AsyncIterator
from datetime import UTC, datetime
from decimal import Decimal
from typing import Any
import httpx
from broker_sync.models import Account, AccountType, Activity, ActivityType
log = logging.getLogger(__name__)
_DEFAULT_BASE_URL = "https://investengine.com"
_USER_AGENT = "broker-sync/0.1 (+https://github.com/ViktorBarzin/broker-sync)"
# Version probe starts here. If this minor bumps past a live version,
# update the constant rather than relying on re-probe every process.
_START_VERSION_MINOR = 32
# Hard cap on how far forward we probe before giving up. IE has never
# skipped versions; this is defence against a runaway loop.
_MAX_VERSION_MINOR = 60
# One logical account — IE has only ever had a single ISA per user.
_ACCOUNT_ID = "invest-engine-primary"
_ACCOUNT = Account(
id=_ACCOUNT_ID,
name="InvestEngine ISA",
account_type=AccountType.ISA,
currency="GBP",
provider="invest-engine",
)
# Type-string → ActivityType. Exact IE strings are UNVERIFIED — we match
# case-insensitively and fall back to substring checks for the common
# variants ("DEPOSIT", "WITHDRAWAL").
_EXACT_TYPE_MAP: dict[str, ActivityType] = {
"BUY": ActivityType.BUY,
"SELL": ActivityType.SELL,
"DIVIDEND": ActivityType.DIVIDEND,
"INTEREST": ActivityType.INTEREST,
"DEPOSIT": ActivityType.DEPOSIT,
"WITHDRAWAL": ActivityType.WITHDRAWAL,
"FEE": ActivityType.FEE,
"TAX": ActivityType.TAX,
}
class InvestEngineError(Exception):
"""Any non-retryable InvestEngine API failure."""
class InvestEngineTokenExpiredError(InvestEngineError):
"""Bearer token rejected by IE (401). Viktor must paste a new token."""
class InvestEngineVersionError(InvestEngineError):
"""Could not find a live /api/v0.3X/ version on the IE backend."""
def _version_path(minor: int) -> str:
return f"/api/v0.{minor}/"
async def _probe_version(
client: httpx.AsyncClient,
*,
start_minor: int = _START_VERSION_MINOR,
max_minor: int = _MAX_VERSION_MINOR,
) -> int:
"""Walk forward from ``start_minor`` looking for a live API version.
Returns the minor number of the first version that is NOT 410 Gone.
A live version is one IE currently accepts; a 401 response is the
expected "auth required" signal and confirms the version is serving
traffic. Raises :class:`InvestEngineVersionError` if no live version
is found before ``max_minor``.
"""
for minor in range(start_minor, max_minor + 1):
resp = await client.get(_version_path(minor))
if resp.status_code != 410:
return minor
raise InvestEngineVersionError(
f"No live /api/v0.3X/ between v0.{start_minor} and v0.{max_minor}")
def _parse_iso(ts: str) -> datetime:
"""Accept both ``...Z`` and ``...+HH:MM`` suffixes."""
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
def _opt_decimal(raw: Any) -> Decimal | None:
if raw is None:
return None
return Decimal(str(raw))
def _classify_type(raw_type: str) -> ActivityType | None:
"""Map a raw IE type string to ActivityType, or None if unknown."""
upper = raw_type.upper()
exact = _EXACT_TYPE_MAP.get(upper)
if exact is not None:
return exact
if "DEPOSIT" in upper:
return ActivityType.DEPOSIT
if "WITHDRAWAL" in upper or "WITHDRAW" in upper:
return ActivityType.WITHDRAWAL
return None
def _transaction_to_activity(raw: dict[str, Any]) -> Activity | None:
"""Turn one IE transaction dict into a canonical Activity.
The IE response shape is UNVERIFIED — these assumptions WILL need
review once Viktor pastes a live token:
- ``id``: string or int (cast to str for ``external_id``)
- ``type``: upper-case string — ``BUY``/``SELL``/``DIVIDEND``/etc.
- ``symbol``: ticker string (may be missing on cash events)
- ``quantity`` / ``price`` / ``amount``: numeric-in-a-string or float
- ``currency``: ISO code; default ``GBP`` for an ISA
- ``date``: ISO-8601 with ``Z`` or offset suffix
Unknown type strings are logged at WARNING and skipped — silent
misclassification would corrupt tax reporting, so we refuse to guess.
"""
raw_type = str(raw.get("type", ""))
activity_type = _classify_type(raw_type)
if activity_type is None:
log.warning(
"invest-engine: skipping transaction id=%s with unknown type=%r",
raw.get("id"),
raw_type,
)
return None
txn_id = raw.get("id")
if txn_id is None:
log.warning("invest-engine: skipping transaction with missing id: %r", raw)
return None
currency = str(raw.get("currency") or "GBP")
date_str = raw.get("date") or raw.get("created_at") or raw.get("timestamp")
if not isinstance(date_str, str):
log.warning("invest-engine: skipping txn id=%s — no parseable date", txn_id)
return None
quantity = _opt_decimal(raw.get("quantity"))
unit_price = _opt_decimal(raw.get("price") or raw.get("unit_price"))
amount = _opt_decimal(raw.get("amount") or raw.get("value"))
fee = _opt_decimal(raw.get("fee")) or Decimal("0")
symbol_raw = raw.get("symbol") or raw.get("ticker")
symbol = str(symbol_raw) if symbol_raw else None
return Activity(
external_id=f"invest-engine:{txn_id}",
account_id=_ACCOUNT_ID,
account_type=AccountType.ISA,
date=_parse_iso(date_str),
activity_type=activity_type,
currency=currency,
symbol=symbol,
quantity=quantity,
unit_price=unit_price,
amount=amount,
fee=fee,
)
def _extract_list(page: dict[str, Any]) -> list[dict[str, Any]]:
"""Handle both ``{results: [...]}`` and ``{data: [...]}`` shapes."""
for key in ("results", "data"):
items = page.get(key)
if isinstance(items, list):
return [i for i in items if isinstance(i, dict)]
return []
def _extract_next(page: dict[str, Any]) -> str | None:
"""Pull the next-page URL from either DRF ``next`` or JSON:API ``meta.next_page``."""
nxt = page.get("next")
if isinstance(nxt, str) and nxt:
return nxt
meta = page.get("meta")
if isinstance(meta, dict):
meta_nxt = meta.get("next_page") or meta.get("next")
if isinstance(meta_nxt, str) and meta_nxt:
return meta_nxt
return None
class InvestEngineProvider:
"""Concrete Provider for InvestEngine.
Only one logical account per user (one ISA, GBP). The token expiry is
tracked at the Python layer: the CLI alerts 3 days before expiry, and
this class fails fast if the clock says the token is already dead
(cheaper than burning a request for the same 401).
"""
name = "invest-engine"
def __init__(
self,
*,
bearer_token: str,
token_expires_at: datetime,
base_url: str = _DEFAULT_BASE_URL,
transport: httpx.AsyncBaseTransport | None = None,
) -> None:
self._token = bearer_token
self._token_expires_at = token_expires_at
self._client = httpx.AsyncClient(
base_url=base_url,
timeout=30.0,
transport=transport,
headers={
"Authorization": f"Bearer {bearer_token}",
"User-Agent": _USER_AGENT,
},
)
self._version_minor: int | None = None
def accounts(self) -> list[Account]:
return [_ACCOUNT]
async def close(self) -> None:
await self._client.aclose()
async def _active_version(self, *, force: bool = False) -> int:
"""Cache the live API minor. Set ``force=True`` after a 410 to re-probe."""
if self._version_minor is not None and not force:
return self._version_minor
start = (self._version_minor + 1) if force and self._version_minor else _START_VERSION_MINOR
minor = await _probe_version(self._client, start_minor=start)
self._version_minor = minor
return minor
async def _request_json(self, path: str, params: dict[str, str] | None = None) -> Any:
"""GET ``path`` (relative), return JSON. Handles one 410 re-probe retry."""
resp = await self._client.get(path, params=params)
if resp.status_code == 401:
raise InvestEngineTokenExpiredError(
f"InvestEngine rejected Bearer token (HTTP 401 on {path}); "
f"token_expires_at={self._token_expires_at.isoformat()}. "
f"Viktor must paste a new token into Vault.")
if resp.status_code == 410:
# Version rolled mid-session. Re-probe, retarget path, retry once.
old_minor = self._version_minor
new_minor = await self._active_version(force=True)
if old_minor is not None and new_minor != old_minor:
new_path = path.replace(f"/v0.{old_minor}/", f"/v0.{new_minor}/", 1)
retry = await self._client.get(new_path, params=params)
if retry.status_code == 401:
raise InvestEngineTokenExpiredError(f"InvestEngine 401 on retry {new_path}")
if retry.status_code != 200:
raise InvestEngineError(
f"InvestEngine {new_path} HTTP {retry.status_code} after re-probe")
return retry.json()
raise InvestEngineError(f"InvestEngine 410 on {path} and no newer version")
if resp.status_code != 200:
raise InvestEngineError(
f"InvestEngine {path} HTTP {resp.status_code}: {resp.text[:200]}")
return resp.json()
async def fetch(
self,
*,
since: datetime | None = None,
before: datetime | None = None,
) -> AsyncIterator[Activity]:
# Fail fast if the token is already known-dead.
if self._token_expires_at <= datetime.now(UTC):
raise InvestEngineTokenExpiredError(
f"InvestEngine token expired at {self._token_expires_at.isoformat()}"
f"Viktor must paste a new token.")
version = await self._active_version()
portfolio_ids = await self._list_portfolio_ids(version)
for pid in portfolio_ids:
async for activity in self._fetch_portfolio(version, pid, since, before):
yield activity
async def _list_portfolio_ids(self, version: int) -> list[str]:
"""Walk `/portfolios/` pagination and return the list of ids."""
ids: list[str] = []
path: str | None = f"/api/v0.{version}/portfolios/"
while path is not None:
page = await self._request_json(path)
if not isinstance(page, dict):
log.warning("invest-engine: /portfolios/ returned non-dict %r", type(page))
break
for item in _extract_list(page):
pid = item.get("id")
if pid is None:
continue
ids.append(str(pid))
path = _next_page_path(_extract_next(page), current=path)
return ids
async def _fetch_portfolio(
self,
version: int,
portfolio_id: str,
since: datetime | None,
before: datetime | None,
) -> AsyncIterator[Activity]:
params: dict[str, str] = {"portfolio": portfolio_id}
if since is not None:
params["start"] = since.date().isoformat()
if before is not None:
params["end"] = before.date().isoformat()
path: str | None = f"/api/v0.{version}/transactions/"
while path is not None:
page = await self._request_json(
path, params=params if path.endswith("/transactions/") else None)
if not isinstance(page, dict):
break
for raw in _extract_list(page):
activity = _transaction_to_activity(raw)
if activity is None:
continue
if since is not None and activity.date < since:
continue
if before is not None and activity.date >= before:
continue
yield activity
path = _next_page_path(_extract_next(page), current=path)
def _next_page_path(raw_next: str | None, *, current: str) -> str | None:
"""Normalise a ``next`` URL to a request path.
IE might emit full URLs (``https://investengine.com/api/v0.32/...``)
or relative paths. We return the path component only — the httpx
client holds the base URL.
"""
if raw_next is None:
return None
if raw_next.startswith("http://") or raw_next.startswith("https://"):
from urllib.parse import urlparse
parsed = urlparse(raw_next)
path = parsed.path
if parsed.query:
path = f"{path}?{parsed.query}"
return path
return raw_next