imap: skip InvestEngine by default; opt back in via INCLUDE env
Some checks failed
CI / test (push) Has been cancelled
CI / build (push) Has been cancelled
CI / deploy (push) Has been cancelled
ci/woodpecker/push/build Pipeline was successful

Post-mortem 2026-05-27: 39 IMAP-source IE BUYs + their cash-flow
DEPOSITs were re-inserted into Wealthfolio at 09:22:18 UTC, exactly
the rows the £252k dedup removed the previous day. The cron's
BROKER_SYNC_IMAP_EXCLUDE_PROVIDERS=invest-engine env var did its job
(cron logged ie_skipped=53), but some other entry point — kubectl run,
poetry run on the devvm, or a sibling agent session — ran the IMAP
ingest WITHOUT that env. The opt-out was a foot-gun.

This change makes the IE-via-IMAP safety STRUCTURAL: `invest-engine`
is in the default exclude set inside _resolve_excluded_providers().
Any code path now skips IE unless the caller explicitly sets
`BROKER_SYNC_IMAP_INCLUDE_PROVIDERS=invest-engine`. The
`BROKER_SYNC_IMAP_EXCLUDE_PROVIDERS` env still works (additive) for
forward-compat in case Schwab etc. ever need similar treatment.

INCLUDE wins over both the default exclude set and EXCLUDE env.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-27 17:24:54 +00:00
parent 3427f5c9e1
commit 0d23487608
2 changed files with 94 additions and 17 deletions

View file

@ -151,14 +151,41 @@ def _fetch_all(creds: ImapCreds) -> Iterator[bytes]:
yield raw
def fetch_activities(creds: ImapCreds) -> list[Activity]:
out: list[Activity] = []
ie_parsed = schwab_parsed = ie_skipped = skipped = 0
exclude = {
p.strip().lower()
def _resolve_excluded_providers() -> set[str]:
"""Return the set of providers the IMAP fetcher must skip.
Default-exclude list is structural `invest-engine` is ALWAYS skipped
unless explicitly opted back in via `BROKER_SYNC_IMAP_INCLUDE_PROVIDERS`.
This protects against accidental re-ingestion via any code path that
doesn't set the cron's env (e.g. `kubectl run --rm`, devvm `poetry run`,
a sibling agent session). See post-mortem 2026-05-27 the IMAP path
re-inserted 39 IE BUYs that had been deduped the previous day, because
the safety lived only on the cronjob spec.
Additional providers can be excluded via
`BROKER_SYNC_IMAP_EXCLUDE_PROVIDERS`. `INCLUDE` always wins over
`EXCLUDE` and the default skip-list.
"""
_DEFAULT_EXCLUDED = {"invest-engine", "invest_engine"}
extra = {
p.strip().lower().replace("_", "-")
for p in os.environ.get("BROKER_SYNC_IMAP_EXCLUDE_PROVIDERS", "").split(",")
if p.strip()
}
include = {
p.strip().lower().replace("_", "-")
for p in os.environ.get("BROKER_SYNC_IMAP_INCLUDE_PROVIDERS", "").split(",")
if p.strip()
}
# Canonicalise the default set under the same key normalisation.
canonical = {p.replace("_", "-") for p in _DEFAULT_EXCLUDED}
return (canonical | extra) - include
def fetch_activities(creds: ImapCreds) -> list[Activity]:
out: list[Activity] = []
ie_parsed = schwab_parsed = ie_skipped = skipped = 0
exclude = _resolve_excluded_providers()
for raw in _fetch_all(creds):
try:
msg = email.message_from_bytes(raw)
@ -167,7 +194,7 @@ def fetch_activities(creds: ImapCreds) -> list[Activity]:
continue
sender = _extract_sender(msg)
if sender in _IE_SENDERS or sender.endswith("@investengine.com"):
if "invest-engine" in exclude or "invest_engine" in exclude:
if "invest-engine" in exclude:
ie_skipped += 1
continue
out.extend(ie_parser.parse_invest_engine_email(raw))