Some checks failed
ci/woodpecker/push/default Pipeline failed
Viktor asked to forward arbitrary emails with PDF attachments into paperless-ngx, with the forwarding sender mapping 1:1 to the paperless account that owns the document. paperless-ngx's built-in IMAP consumer already does the sender->owner mapping, so the infra half is a dedicated real mailbox docs@viktorbarzin.me: an explicit self-alias (the @domain catch-all would otherwise divert it into the TripIt-swept spam@ mailbox, whose sweeper LLM-parses and auto-replies to mail from linked senders) plus a per-user Dovecot sieve that discards non-family senders at delivery (chosen behaviour for unmatched senders: ignore and delete; also keeps spam out of the guessable address). The mailbox credential was added to Vault secret/platform.mailserver_accounts. Paperless-side mail account + 5 per-sender rules are DB state, configured via the API per the new runbook docs/runbooks/paperless-mail-ingest.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
116 lines
5.9 KiB
Markdown
116 lines
5.9 KiB
Markdown
# Paperless-ngx Mail Ingest (docs@viktorbarzin.me)
|
|
|
|
Last updated: 2026-07-03 (initial build)
|
|
|
|
Forward any email with document attachments to **`docs@viktorbarzin.me`** and
|
|
paperless-ngx ingests the attachments, owned by the paperless account mapped
|
|
from the **sender** (From) address. Built entirely from existing parts: a
|
|
docker-mailserver mailbox + Dovecot sieve, and paperless-ngx's native mail
|
|
consumer (the same machinery as the `utility:` rules).
|
|
|
|
## Flow
|
|
|
|
```
|
|
family member forwards email ──> MX ──> docker-mailserver
|
|
│ postfix virtual: docs@ has an explicit self-alias (extra/aliases.txt),
|
|
│ so the @domain catch-all (→ spam@, swept by TripIt) does NOT apply
|
|
▼
|
|
Dovecot LMTP delivery to docs@
|
|
│ per-user sieve (docs@viktorbarzin.me.dovecot.sieve): sender NOT in
|
|
│ allowlist → discard (decision 2026-07-03: unmatched = ignore & delete)
|
|
▼
|
|
docs@ INBOX ── paperless-ngx mail task (every 10 min, PAPERLESS_EMAIL_TASK_CRON
|
|
│ default) applies mail rules in order: filter_from = <sender>
|
|
│ → consume attachments (attachments-only: inline images like
|
|
│ signature logos are skipped), owner = mapped user,
|
|
│ tag = email-ingest, title = mail subject
|
|
▼
|
|
consumed mail is MOVED to the "Processed" IMAP folder (audit trail);
|
|
INBOX stays empty in steady state
|
|
```
|
|
|
|
## Sender → paperless account map (as built)
|
|
|
|
| Sender (From) | Paperless user | Rule |
|
|
|--------------------------|----------------|-----------------|
|
|
| me@viktorbarzin.me | root (id 3) | forward: Viktor (me@) |
|
|
| vbarzin@gmail.com | root (id 3) | forward: Viktor (gmail) |
|
|
| viktorbarzin@meta.com | root (id 3) | forward: Viktor (meta) |
|
|
| ancaelena98@gmail.com | anca (id 4) | forward: Anca |
|
|
| emil.barzin@gmail.com | emo (id 7) | forward: Emo |
|
|
|
|
The map lives in **two places by design** — keep them in sync:
|
|
|
|
1. **Delivery gate (infra, Terraform):**
|
|
`stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve`
|
|
— senders not listed here are discarded at delivery (spam control + the
|
|
"ignore and delete unmatched" behaviour; paperless cannot express
|
|
"delete without ingesting", so this must happen before the mailbox).
|
|
2. **Owner map (paperless DB, via API/UI):** one mail rule per sender on the
|
|
`docs@viktorbarzin.me` mail account. DB-state like workflows — NOT
|
|
Terraform.
|
|
|
|
## Add a family member / sender
|
|
|
|
1. Add the address to the sieve allowlist file above; commit; apply the
|
|
`mailserver` stack (normal apply is enough — the sieve CM key is not under
|
|
`ignore_changes`; Reloader restarts the pod).
|
|
2. Clone an existing `forward:` mail rule in the paperless admin UI
|
|
(Mail → Rules) or via API, changing `filter_from` and the rule **owner**
|
|
(documents are owned by the rule owner — `assign_owner_from_rule=true`).
|
|
Keep: action = Move to `Processed`, attachment type = attachments-only,
|
|
consumption scope = attachments only, tag `email-ingest`, order after the
|
|
existing rules.
|
|
|
|
## Operations
|
|
|
|
- **Trigger a fetch immediately** (instead of waiting ≤10 min):
|
|
`kubectl -n paperless-ngx exec deploy/paperless-ngx -c paperless-ngx -- python3 manage.py mail_fetcher`
|
|
- **Mailbox credentials:** Vault `secret/platform` → `mailserver_accounts`
|
|
JSON, key `docs@viktorbarzin.me` (also used by the paperless mail account).
|
|
- **Inspect the mailbox:**
|
|
`python3 -c` IMAP to `mailserver.mailserver.svc.cluster.local:993` (in-cluster,
|
|
from a pod) or `mail.viktorbarzin.me:993` (externally / devvm).
|
|
- **Paperless-side logs:** `kubectl -n paperless-ngx logs deploy/paperless-ngx | grep -i mail`
|
|
(also Loki, ns `paperless-ngx`). Rule/account state: `GET /api/mail_rules/`,
|
|
`GET /api/mail_accounts/` with the admin token
|
|
(k8s secret `paperless-ngx-secrets`, field `api_token`).
|
|
- **Account/mailbox provisioning:** adding/rotating anything in
|
|
`mailserver_accounts` requires the ConfigMap replace workaround —
|
|
`scripts/tg apply mailserver -- -replace=module.mailserver.kubernetes_config_map.mailserver_config`
|
|
— because `postfix-accounts.cf` is under `ignore_changes`
|
|
(non-deterministic bcrypt; see the module comment).
|
|
|
|
## Design notes / caveats
|
|
|
|
- **Why not the catch-all?** Mail to unknown `@viktorbarzin.me` addresses
|
|
lands in `spam@`, which the TripIt `ingest-plans` CronJob sweeps every
|
|
15 min: it marks everything `\Seen`, LLM-parses mail from linked senders and
|
|
replies with ack/failure emails. Forwarded bank statements would get
|
|
"couldn't parse a trip" replies. `docs@` being a real mailbox bypasses that
|
|
path entirely; TripIt, the `smoke-test@` roundtrip probe, and `dmarc@` are
|
|
untouched.
|
|
- **Spoofing:** the sender match is on the From header. Rspamd verifies
|
|
SPF/DKIM/DMARC on inbound mail, but gmail.com publishes `p=none`, so a
|
|
crafted spoof could ingest documents into a family member's account. Accepted
|
|
risk (worst case: unwanted documents appear, visible + deletable in
|
|
paperless).
|
|
- **Not PDF-only:** any real attachment type paperless supports is consumed
|
|
(PDF, images, Office via the existing tika+gotenberg pipeline). Inline
|
|
images are excluded by `attachment_type=1`.
|
|
- **No dedicated alerting** (deliberate, 2026-07-03): mail-task errors surface
|
|
in paperless logs; the mailserver inbound path is covered by
|
|
`email-roundtrip-monitor`. Revisit if forwards start silently failing.
|
|
- **Workflows:** the global `payslip-webhook` + `claude-mcp-readers
|
|
auto-permission` workflows fire for mail-ingested docs like any other
|
|
consumption source (verified pre-build; payslip receiver does its own
|
|
filtering).
|
|
|
|
## Rollback
|
|
|
|
1. Disable/delete the 5 `forward:` mail rules + the `docs@` mail account
|
|
(paperless admin UI or API).
|
|
2. Revert the infra commit (aliases.txt entry, sieve file, CM key + mount).
|
|
3. Remove `docs@viktorbarzin.me` from Vault `mailserver_accounts`, then apply
|
|
with the `-replace` workaround above. Mail to docs@ then falls back to the
|
|
catch-all (spam@) like any unknown address.
|