broker-sync/docs/specs/2026-05-26-ibkr-ingest-design.md
Viktor Barzin 30af5fe2c9
Some checks failed
CI / test (push) Waiting to run
CI / build (push) Blocked by required conditions
CI / deploy (push) Blocked by required conditions
ci/woodpecker/push/build Pipeline was canceled
docs(ibkr): change Flex date range from Last Business Day → Last 90 Days
Trailing window backed by SyncRecordStore dedup is strictly better than
a single-day window — a single missed cron run with Last Business Day
loses that day's activity permanently. SyncRecordStore is keyed by
ibkr:trade:<tradeID> / ibkr:cash:<transactionID>, so overlapping pulls
are no-ops.

Caught during the brainstorming review.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-27 09:11:57 +00:00

328 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# IBKR Flex Ingestion — Design
**Date:** 2026-05-26
**Status:** Approved (brainstorming session 2026-05-26)
**Author:** Viktor + Claude (Opus 4.7)
**Implementation plan:** TBD (will be written next session via writing-plans skill)
## Context
Adds Interactive Brokers (IBKR UK / IE — stocks/ETFs only) as a new
broker-sync provider, pushing activities to Wealthfolio on a daily
schedule alongside the existing Trading 212 / InvestEngine / Fidelity
pipelines.
The user's IBKR account is **currently empty** (no positions, no trades).
This design covers the integration as it will run once the account is
funded and active. The initial backfill step in the setup checklist is a
no-op until the first IBKR trade.
This work is the structural follow-on from the 2026-05-26 Wealthfolio
dedup session, in which £252k of duplicated InvestEngine positions
accumulated silently in WF because the IMAP and API ingestion paths
emitted different `external_id` schemes and never reconciled against
broker-reported truth. The IBKR design bakes in **broker-vs-WF position
reconciliation from day one** — the missing capability that allowed the
IE drift to grow undetected.
## Decisions
### D1 — Use IBKR Flex Web Service (not Client Portal API / TWS)
Flex Web Service is a token-authenticated REST endpoint returning XML
statements. Suits unattended cron because:
- One-year token validity (no daily re-auth, unlike Client Portal Gateway).
- No sidecar / GUI / Java runtime needed.
- Designed for periodic batch reporting — the exact shape of our pipeline.
Client Portal Web API + `ibind` was considered and rejected: its Gateway
sidecar requires browser-based re-auth roughly every 24 hours, which is
incompatible with unattended scheduling.
### D2 — Library: `ibflex` (`csingley/ibflex` on PyPI)
Adds `ibflex = "^0.16"` to `pyproject.toml`. The library provides:
- `client.download(token, query_id) -> bytes` — handles Flex's 2-step
async API (`SendRequest``GetStatement` polling).
- `parser.parse(xml) -> FlexQueryResponse` — typed dataclasses for
`Trades`, `CashTransactions`, `OpenPositions`, `SecuritiesInfo`.
Fallback (Approach B): if `ibflex` proves to lag IBKR schema changes, drop
in raw `httpx` + `xml.etree`. Same provider shape; only the parsing
internals change.
### D3 — One CronJob, daily 02:00 UK, in `broker-sync` namespace
Matches the existing `broker-sync-trading212` cadence and placement. No
new namespace, no new image.
### D4 — Reconciliation is mandatory, not optional
Every run computes a per-asset quantity from the Flex
`OpenPositions` section and compares against WF's computed quantity from
activities. Drift is published as a Pushgateway metric. Cross-checking
broker truth is the line of defense against the IE-style silent
divergence we saw on 2026-05-26.
### D5 — One account, one query
Single Flex Activity Query covering Trades + Cash + Open Positions +
Securities. Single `Interactive Brokers (UK)` account in Wealthfolio.
Multiple accounts can be added later by parameterising the CLI command;
not in scope now.
## Architecture
```
broker-sync K8s namespace
├── CronJob broker-sync-ibkr (schedule: 0 2 * * *)
│ ├── env from broker-sync-secrets:
│ │ IBKR_FLEX_TOKEN, IBKR_FLEX_QUERY_ID, IBKR_ACCOUNT_ID,
│ │ WF_BASE_URL, WF_USERNAME, WF_PASSWORD
│ ├── PVC broker-sync-data-encrypted (shared with other broker-sync jobs)
│ └── image viktorbarzin/broker-sync:<tag> command = ["broker-sync", "ibkr"]
│ External calls
│ ├── HTTPS → ndcdyn.interactivebrokers.com (Flex Web Service)
│ ├── HTTP → wealthfolio.wealthfolio.svc (activities import + position read)
│ └── HTTP → pushgateway.monitoring.svc (drift + last-success metrics)
```
The provider is structurally identical to `broker-sync-trading212` and
the IE bearer-token path — same Vault → CronJob → provider → pipeline →
WF flow. Existing alerting (CronJob-failed, ExternalSecret-stale,
WF-sync-stale) applies transitively; we only add IBKR-specific alerts on
top.
## Components
| Path | Action | Description |
|---|---|---|
| `broker_sync/providers/ibkr.py` | NEW | `IBKRProvider` class implementing the `Provider` protocol. Maps Flex XML to `Activity[]`. ~200 LOC. |
| `broker_sync/cli.py` | MODIFY | New `@app.command("ibkr")` typer command, parallel to `trading212` and `invest-engine`. ~60 LOC. |
| `pyproject.toml` | MODIFY | Add `ibflex = "^0.16"` dependency. |
| `tests/providers/test_ibkr.py` | NEW | Fixture-based parsing tests, sign-conventions, position-drift math, account-id guard. |
| `infra/stacks/broker-sync/main.tf` | MODIFY | New `kubernetes_cron_job_v1.ibkr` resource. |
| Vault `secret/broker-sync` | MODIFY | Add `ibkr_flex_token`, `ibkr_flex_query_id`, `ibkr_account_id`. |
| Wealthfolio (one-time, manual) | NEW data | Create `Interactive Brokers (UK)` account; record its UUID in Vault. |
| `docs/providers/ibkr.md` | NEW | Production-facing provider docs (setup, query design, troubleshooting). Written after first successful run. |
## Data flow (per CronJob run)
1. **02:00 UK** — CronJob fires, pod starts with env from `broker-sync-secrets`.
2. **Download**`ibflex.client.download(token, query_id)` calls Flex
Web Service `SendRequest` + `GetStatement`. Typical 520 s. Library
handles retry/polling.
3. **Parse**`ibflex.parser.parse(xml)` produces a
`FlexQueryResponse`.
4. **Account guard** — two distinct identifiers exist:
- **IBKR_ACCOUNT_ID_UPSTREAM**: the IBKR-side account number
(e.g. `U12345678`), used to validate that the Flex report belongs to
the right account.
- **IBKR_ACCOUNT_ID** (alias: `ibkr_account_id` in Vault): the
Wealthfolio account UUID (e.g. `8a3f...`), used when posting
activities to WF.
Validate `stmt.accountId == os.environ["IBKR_ACCOUNT_ID_UPSTREAM"]`.
Refuse to ingest on mismatch — prevents wrong-account writes from a
misconfigured query.
5. **Map Trades → Activities**:
| Flex | Activity | Notes |
|---|---|---|
| `Trade.tradeID` | `external_id = "ibkr:trade:" + tradeID` | dedup key |
| `Trade.tradeDate + tradeTime` | `date` (UTC) | timezone normalised |
| `Trade.symbol` | `symbol` | canonicalised — LSE tickers get `.L` suffix |
| `Trade.buySell` (BUY / SELL) | `activity_type` | direct |
| `Trade.quantity` | `quantity` | always positive (broker-sync convention) |
| `Trade.tradePrice` | `unit_price` | |
| `Trade.currency` | `currency` | per-trade, multi-ccy supported |
| `Trade.ibCommission` | `fee = abs(ibCommission)` | always positive |
| `Trade.assetCategory` | (sanity check; skip if not in {STK, ETF}) |
6. **Map CashTransactions → Activities**:
| Flex `CashTransaction.type` | Activity `activity_type` | Notes |
|---|---|---|
| `Dividends` | `DIVIDEND` | |
| `Withholding Tax` | `FEE` | tag with `notes="wht:..."` |
| `Broker Interest Paid` | `FEE` | negative direction |
| `Broker Interest Received` | `DIVIDEND` | interest treated as income |
| `Deposits & Withdrawals` | `DEPOSIT` (amount > 0) or `WITHDRAWAL` (amount < 0) | |
| `Commission Adjustments` | `FEE` | |
| anything else | skip + log WARNING with the unknown type | refuse to guess, same convention as IE provider |
external_id = `"ibkr:cash:" + transactionID`.
7. **Cash-flow match** `_with_cash_flow_match(a)` from the shared
pipeline emits a matching DEPOSIT for every BUY (and WITHDRAWAL for
every SELL) so WF cash balance stays consistent. This is the existing
pattern used by T212 + IE; IBKR slots in identically.
8. **Dedup** `SyncRecordStore(/data/sync.db)` skips any `external_id`
already synced. Idempotent re-runs are safe.
9. **Import** `WealthfolioSink.import_activities(...)` POSTs to
`/api/v1/activities/import`. Existing 401 retry logic applies.
10. **Reconciliation** for each `OpenPositions` row:
```python
# compute_wf_position_qty: NEW helper in WealthfolioSink.
# Queries POST /api/v1/activities/search filtered by accountId, sums
# BUY/SELL/ADD_HOLDING/REMOVE_HOLDING quantities per asset.
wf_qty_by_asset = wf_sink.compute_position_qty(IBKR_ACCOUNT_ID)
for pos in flex_response.OpenPositions:
symbol = canonical_symbol(pos.symbol)
drift = float(pos.position) - wf_qty_by_asset.get(symbol, Decimal(0))
push_metric(
"ibkr_position_drift_shares",
labels={"symbol": symbol, "account": "ibkr-uk"},
value=float(drift),
)
push_metric("ibkr_sync_last_success_timestamp_seconds", time.time())
```
11. **Exit 0** on success, non-zero on any unrecoverable error.
## Error handling
| Failure | Detection | Response | Alert |
|---|---|---|---|
| Token expired (Flex code 1003) | `ibflex.client.ResponseCodeError` | Exit non-zero with explicit log | `IBKRFlexTokenExpired` Loki rule + stale-success Prom alert |
| Statement generation timeout | `ibflex.client.StatementGenerationTimeout` | Retry once after 60 s, then exit non-zero | Stale-success alert catches it after 24 h |
| Empty report (quiet day) | Zero Trades + zero CashTxns | Log "no new activity", still update success timestamp, still reconcile | (none — happy path) |
| WF API 401 | HTTP status | Re-login via `WealthfolioSink` (existing logic) | (existing) |
| WF rejects an activity row | `summary.skipped > 0` | Log per-row + exit non-zero | `IBKRImportRejected` Loki rule |
| Network / DNS fail | httpx exception | Retry once with 30 s backoff | `KubeJobFailed` (existing) |
| **Position drift > 0.01 share for >24h** | Pushgateway non-zero across runs | Prom alert `IBKRPositionDrift{symbol}` warning → Slack `#security` | **NEW capability** |
| Account ID mismatch | Flex `accountId` != env var | Exit 2 immediately, write nothing | `IBKRAccountMismatch` urgent Loki rule |
## Setup checklist (one-time)
### Step 1 — IBKR Client Portal (manual, ~5 min)
1. Sign in at `https://www.interactivebrokers.co.uk/` → **Account
Settings**.
2. **Reports → Settings → Flex Web Service** → Enable → copy the
one-time-displayed **Token** (1 year validity).
3. **Reports → Flex Queries → Activity Flex Query → Create New**:
- Name: `broker-sync-activity`
- Sections: `Account Information`, `Trades`, `Cash Transactions`,
`Open Positions`, `Securities Information`
- Date Format: `yyyy-MM-dd` · Time Format: `HH:mm:ss TimeZone`
- Date Range: `Last 90 Days` — trailing window so a missed cron run
(failed pod, outage, vacation) doesn't lose data. SyncRecordStore
keys on `ibkr:trade:<tradeID>` / `ibkr:cash:<transactionID>`, so
overlapping pulls are no-ops. `Last Business Day` was the original
choice but creates a "single missed run = permanent data loss"
failure mode — rejected in favour of dedup-backed resync window.
- Format: XML
- Trade fields: ensure `tradeID`, `tradeDate`, `tradeTime`, `symbol`,
`buySell`, `quantity`, `tradePrice`, `currency`, `ibCommission`,
`assetCategory` selected.
- CashTransaction fields: `transactionID`, `dateTime`, `type`,
`amount`, `currency`, `description`.
- OpenPositions fields: `symbol`, `position`, `markPrice`, `currency`,
`assetCategory`.
- Save → copy the **Query ID** (57 digit number).
### Step 2 — Vault
```bash
vault kv patch secret/broker-sync \
ibkr_flex_token='YOUR_TOKEN' \
ibkr_flex_query_id='YOUR_QUERY_ID' \
ibkr_account_id='WF_UUID_FROM_STEP_3' \
ibkr_account_id_upstream='YOUR_IBKR_ACCOUNT_NUMBER'
```
### Step 3 — Create WF account (script + paste UUID back)
```bash
# Login → POST /accounts → capture id
curl -sS -c /tmp/wf-jar -X POST "$WF_BASE_URL/api/v1/auth/login" \
-H 'Content-Type: application/json' -d "{\"password\":\"$WF_PASSWORD\"}"
curl -sS -b /tmp/wf-jar -X POST "$WF_BASE_URL/api/v1/accounts" \
-H 'Content-Type: application/json' \
-d '{"name":"Interactive Brokers (UK)","accountType":"GIA","currency":"GBP","isActive":true}' \
| jq -r '.id'
# Paste the UUID back into Vault under ibkr_account_id
```
### Step 4 — Initial backfill (skip while account is empty)
When the IBKR account first holds positions, the daily CronJob will
backfill automatically up to the 90-day trailing window. For older
history, temporarily switch the Flex query Date Range to
`Year to Date` (or `Custom Date Range` with a 1-year window), run the
CronJob manually once, verify WF totals match the broker app, then
switch the Flex query back to `Last 90 Days` for daily incremental.
Dedup makes the temporary widening safe — already-synced rows are no-ops.
### Step 5 — Deploy
1. Push to broker-sync `main` (direct push — personal repo convention,
no PR) → GHA builds `viktorbarzin/broker-sync:latest`.
2. `cd infra/stacks/broker-sync && scripts/tg apply` creates the new
CronJob.
3. Wait for the 02:00 UK run, or trigger manually:
`kubectl -n broker-sync create job --from=cronjob/broker-sync-ibkr broker-sync-ibkr-test-1`.
4. Verify in WF UI: account exists, activities present (if any),
reconciliation drift metric showing zero.
## Testing
**Unit tests** in `tests/providers/test_ibkr.py`:
- `test_parse_trades_maps_to_activities` — canned 3-trade XML, verify
external_id, symbol mapping, quantity sign, fee sign.
- `test_parse_dividend_maps_to_dividend_activity`.
- `test_parse_unknown_cash_type_logs_warning_and_skips`.
- `test_account_id_mismatch_raises` — Flex returns a different
`accountId` than env, refuse to ingest.
- `test_position_drift_computed_correctly` — three-asset scenario, two
match, one drifts.
- `test_canonical_symbol_lse_suffix` — `VUAG` → `VUAG.L`,
`AAPL` → `AAPL` (US, no suffix), etc.
All tests mock `ibflex.client.download` to avoid network.
**Integration test** (manual, post-deploy):
- Trigger CronJob manually.
- Inspect logs.
- Verify in WF UI and Pushgateway.
## Acceptance criteria
- [ ] `broker-sync ibkr` command runs end-to-end against the real Flex Web
Service with the user's token.
- [ ] WF accepts the resulting activity imports (no `summary.skipped`).
- [ ] `ibkr_position_drift_shares` is published for every asset; drift = 0
on a steady-state run.
- [ ] Re-running the command is idempotent — no duplicate activities
written to WF.
- [ ] CronJob completes successfully on its schedule for 7 consecutive days
before the design is marked Done.
## Out of scope
- Multi-account support (only one IBKR account designed in).
- Real-time data / order placement (Flex is batch-only).
- Stock split / corporate action handling — IBKR reports these in the
Flex `CorporateActions` section but we're not enabling that section
yet; revisit if it becomes needed.
- Multi-currency FX conversion math — we record per-trade currency
faithfully and let Wealthfolio do FX. If WF's FX handling proves
inadequate, a separate spec covers that.
## Open questions
(None at design-approval time. Captured here for future amendments.)
## References
- `ibflex` library docs (csingley/ibflex)
- Existing patterns in `broker_sync/providers/trading212.py` and
`broker_sync/providers/invest_engine.py`
- `~/code/infra/stacks/broker-sync/main.tf` (CronJob structure to mirror)
- 2026-05-26 Wealthfolio dedup session (motivates the reconciliation step)