infra/docs/runbooks/paperless-mail-ingest.md
Viktor Barzin 68b9858eff
All checks were successful
ci/woodpecker/push/default Pipeline was successful
paperless-mail-ingest runbook: manual mail_fetcher must drop to the paperless user
A root-run kubectl exec mail_fetcher downloads attachments root-owned into
the scratch dir and the celery consumer (uid 1000) fails with
PermissionError — found during the build E2E. Document s6-setuidgid usage
and the recovery step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:26:12 +00:00

124 lines
6.5 KiB
Markdown

# Paperless-ngx Mail Ingest (docs@viktorbarzin.me)
Last updated: 2026-07-03 (initial build)
Forward any email with document attachments to **`docs@viktorbarzin.me`** and
paperless-ngx ingests the attachments, owned by the paperless account mapped
from the **sender** (From) address. Built entirely from existing parts: a
docker-mailserver mailbox + Dovecot sieve, and paperless-ngx's native mail
consumer (the same machinery as the `utility:` rules).
## Flow
```
family member forwards email ──> MX ──> docker-mailserver
│ postfix virtual: docs@ has an explicit self-alias (extra/aliases.txt),
│ so the @domain catch-all (→ spam@, swept by TripIt) does NOT apply
Dovecot LMTP delivery to docs@
│ per-user sieve (docs@viktorbarzin.me.dovecot.sieve): sender NOT in
│ allowlist → discard (decision 2026-07-03: unmatched = ignore & delete)
docs@ INBOX ── paperless-ngx mail task (every 10 min, PAPERLESS_EMAIL_TASK_CRON
│ default) applies mail rules in order: filter_from = <sender>
│ → consume attachments (attachments-only: inline images like
│ signature logos are skipped), owner = mapped user,
│ tag = email-ingest, title = mail subject
consumed mail is MOVED to the "Processed" IMAP folder (audit trail);
INBOX stays empty in steady state
```
## Sender → paperless account map (as built)
| Sender (From) | Paperless user | Rule |
|--------------------------|----------------|-----------------|
| me@viktorbarzin.me | root (id 3) | forward: Viktor (me@) |
| vbarzin@gmail.com | root (id 3) | forward: Viktor (gmail) |
| viktorbarzin@meta.com | root (id 3) | forward: Viktor (meta) |
| ancaelena98@gmail.com | anca (id 4) | forward: Anca |
| emil.barzin@gmail.com | emo (id 7) | forward: Emo |
The map lives in **two places by design** — keep them in sync:
1. **Delivery gate (infra, Terraform):**
`stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve`
— senders not listed here are discarded at delivery (spam control + the
"ignore and delete unmatched" behaviour; paperless cannot express
"delete without ingesting", so this must happen before the mailbox).
2. **Owner map (paperless DB, via API/UI):** one mail rule per sender on the
`docs@viktorbarzin.me` mail account. DB-state like workflows — NOT
Terraform.
## Add a family member / sender
1. Add the address to the sieve allowlist file above; commit; apply the
`mailserver` stack (normal apply is enough — the sieve CM key is not under
`ignore_changes`; Reloader restarts the pod).
2. Clone an existing `forward:` mail rule in the paperless admin UI
(Mail → Rules) or via API, changing `filter_from` and the rule **owner**
(documents are owned by the rule owner — `assign_owner_from_rule=true`).
Keep: action = Move to `Processed`, attachment type = attachments-only,
consumption scope = attachments only, tag `email-ingest`, order after the
existing rules.
## Operations
- **Trigger a fetch immediately** (instead of waiting ≤10 min):
`kubectl -n paperless-ngx exec deploy/paperless-ngx -c paperless-ngx -- s6-setuidgid paperless python3 manage.py mail_fetcher`
The `s6-setuidgid paperless` is **required**: `kubectl exec` runs as root, and a
root-run fetcher downloads attachments root-owned into the scratch dir, which
the celery consumer (uid 1000) then can't read — `PermissionError` on
`/tmp/paperless/paperless-mail-*/...`, consume task FAILURE (hit during the
2026-07-03 build E2E). The mail correctly stays in INBOX for retry (the move
action is a chord callback on successful consumption). Recover: `rm -rf
/tmp/paperless/paperless-mail-*` (as root) and let the next scheduled fetch
re-process.
- **Mailbox credentials:** Vault `secret/platform``mailserver_accounts`
JSON, key `docs@viktorbarzin.me` (also used by the paperless mail account).
- **Inspect the mailbox:**
`python3 -c` IMAP to `mailserver.mailserver.svc.cluster.local:993` (in-cluster,
from a pod) or `mail.viktorbarzin.me:993` (externally / devvm).
- **Paperless-side logs:** `kubectl -n paperless-ngx logs deploy/paperless-ngx | grep -i mail`
(also Loki, ns `paperless-ngx`). Rule/account state: `GET /api/mail_rules/`,
`GET /api/mail_accounts/` with the admin token
(k8s secret `paperless-ngx-secrets`, field `api_token`).
- **Account/mailbox provisioning:** adding/rotating anything in
`mailserver_accounts` requires the ConfigMap replace workaround —
`scripts/tg apply mailserver -- -replace=module.mailserver.kubernetes_config_map.mailserver_config`
— because `postfix-accounts.cf` is under `ignore_changes`
(non-deterministic bcrypt; see the module comment).
## Design notes / caveats
- **Why not the catch-all?** Mail to unknown `@viktorbarzin.me` addresses
lands in `spam@`, which the TripIt `ingest-plans` CronJob sweeps every
15 min: it marks everything `\Seen`, LLM-parses mail from linked senders and
replies with ack/failure emails. Forwarded bank statements would get
"couldn't parse a trip" replies. `docs@` being a real mailbox bypasses that
path entirely; TripIt, the `smoke-test@` roundtrip probe, and `dmarc@` are
untouched.
- **Spoofing:** the sender match is on the From header. Rspamd verifies
SPF/DKIM/DMARC on inbound mail, but gmail.com publishes `p=none`, so a
crafted spoof could ingest documents into a family member's account. Accepted
risk (worst case: unwanted documents appear, visible + deletable in
paperless).
- **Not PDF-only:** any real attachment type paperless supports is consumed
(PDF, images, Office via the existing tika+gotenberg pipeline). Inline
images are excluded by `attachment_type=1`.
- **No dedicated alerting** (deliberate, 2026-07-03): mail-task errors surface
in paperless logs; the mailserver inbound path is covered by
`email-roundtrip-monitor`. Revisit if forwards start silently failing.
- **Workflows:** the global `payslip-webhook` + `claude-mcp-readers
auto-permission` workflows fire for mail-ingested docs like any other
consumption source (verified pre-build; payslip receiver does its own
filtering).
## Rollback
1. Disable/delete the 5 `forward:` mail rules + the `docs@` mail account
(paperless admin UI or API).
2. Revert the infra commit (aliases.txt entry, sieve file, CM key + mount).
3. Remove `docs@viktorbarzin.me` from Vault `mailserver_accounts`, then apply
with the `-replace` workaround above. Mail to docs@ then falls back to the
catch-all (spam@) like any unknown address.