infra/docs/runbooks/paperless-mail-ingest.md
Viktor Barzin 77fcb08e8e
Some checks failed
ci/woodpecker/push/default Pipeline failed
mailserver: add docs@ paperless ingest mailbox (sieve sender allowlist)
Viktor asked to forward arbitrary emails with PDF attachments into
paperless-ngx, with the forwarding sender mapping 1:1 to the paperless
account that owns the document. paperless-ngx's built-in IMAP consumer
already does the sender->owner mapping, so the infra half is a dedicated
real mailbox docs@viktorbarzin.me: an explicit self-alias (the @domain
catch-all would otherwise divert it into the TripIt-swept spam@ mailbox,
whose sweeper LLM-parses and auto-replies to mail from linked senders)
plus a per-user Dovecot sieve that discards non-family senders at
delivery (chosen behaviour for unmatched senders: ignore and delete;
also keeps spam out of the guessable address). The mailbox credential
was added to Vault secret/platform.mailserver_accounts. Paperless-side
mail account + 5 per-sender rules are DB state, configured via the API
per the new runbook docs/runbooks/paperless-mail-ingest.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:06:19 +00:00

5.9 KiB

Paperless-ngx Mail Ingest (docs@viktorbarzin.me)

Last updated: 2026-07-03 (initial build)

Forward any email with document attachments to docs@viktorbarzin.me and paperless-ngx ingests the attachments, owned by the paperless account mapped from the sender (From) address. Built entirely from existing parts: a docker-mailserver mailbox + Dovecot sieve, and paperless-ngx's native mail consumer (the same machinery as the utility: rules).

Flow

family member forwards email ──> MX ──> docker-mailserver
    │  postfix virtual: docs@ has an explicit self-alias (extra/aliases.txt),
    │  so the @domain catch-all (→ spam@, swept by TripIt) does NOT apply
    ▼
Dovecot LMTP delivery to docs@
    │  per-user sieve (docs@viktorbarzin.me.dovecot.sieve): sender NOT in
    │  allowlist → discard (decision 2026-07-03: unmatched = ignore & delete)
    ▼
docs@ INBOX ── paperless-ngx mail task (every 10 min, PAPERLESS_EMAIL_TASK_CRON
    │          default) applies mail rules in order: filter_from = <sender>
    │          → consume attachments (attachments-only: inline images like
    │          signature logos are skipped), owner = mapped user,
    │          tag = email-ingest, title = mail subject
    ▼
consumed mail is MOVED to the "Processed" IMAP folder (audit trail);
INBOX stays empty in steady state

Sender → paperless account map (as built)

Sender (From) Paperless user Rule
me@viktorbarzin.me root (id 3) forward: Viktor (me@)
vbarzin@gmail.com root (id 3) forward: Viktor (gmail)
viktorbarzin@meta.com root (id 3) forward: Viktor (meta)
ancaelena98@gmail.com anca (id 4) forward: Anca
emil.barzin@gmail.com emo (id 7) forward: Emo

The map lives in two places by design — keep them in sync:

  1. Delivery gate (infra, Terraform): stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve — senders not listed here are discarded at delivery (spam control + the "ignore and delete unmatched" behaviour; paperless cannot express "delete without ingesting", so this must happen before the mailbox).
  2. Owner map (paperless DB, via API/UI): one mail rule per sender on the docs@viktorbarzin.me mail account. DB-state like workflows — NOT Terraform.

Add a family member / sender

  1. Add the address to the sieve allowlist file above; commit; apply the mailserver stack (normal apply is enough — the sieve CM key is not under ignore_changes; Reloader restarts the pod).
  2. Clone an existing forward: mail rule in the paperless admin UI (Mail → Rules) or via API, changing filter_from and the rule owner (documents are owned by the rule owner — assign_owner_from_rule=true). Keep: action = Move to Processed, attachment type = attachments-only, consumption scope = attachments only, tag email-ingest, order after the existing rules.

Operations

  • Trigger a fetch immediately (instead of waiting ≤10 min): kubectl -n paperless-ngx exec deploy/paperless-ngx -c paperless-ngx -- python3 manage.py mail_fetcher
  • Mailbox credentials: Vault secret/platformmailserver_accounts JSON, key docs@viktorbarzin.me (also used by the paperless mail account).
  • Inspect the mailbox: python3 -c IMAP to mailserver.mailserver.svc.cluster.local:993 (in-cluster, from a pod) or mail.viktorbarzin.me:993 (externally / devvm).
  • Paperless-side logs: kubectl -n paperless-ngx logs deploy/paperless-ngx | grep -i mail (also Loki, ns paperless-ngx). Rule/account state: GET /api/mail_rules/, GET /api/mail_accounts/ with the admin token (k8s secret paperless-ngx-secrets, field api_token).
  • Account/mailbox provisioning: adding/rotating anything in mailserver_accounts requires the ConfigMap replace workaround — scripts/tg apply mailserver -- -replace=module.mailserver.kubernetes_config_map.mailserver_config — because postfix-accounts.cf is under ignore_changes (non-deterministic bcrypt; see the module comment).

Design notes / caveats

  • Why not the catch-all? Mail to unknown @viktorbarzin.me addresses lands in spam@, which the TripIt ingest-plans CronJob sweeps every 15 min: it marks everything \Seen, LLM-parses mail from linked senders and replies with ack/failure emails. Forwarded bank statements would get "couldn't parse a trip" replies. docs@ being a real mailbox bypasses that path entirely; TripIt, the smoke-test@ roundtrip probe, and dmarc@ are untouched.
  • Spoofing: the sender match is on the From header. Rspamd verifies SPF/DKIM/DMARC on inbound mail, but gmail.com publishes p=none, so a crafted spoof could ingest documents into a family member's account. Accepted risk (worst case: unwanted documents appear, visible + deletable in paperless).
  • Not PDF-only: any real attachment type paperless supports is consumed (PDF, images, Office via the existing tika+gotenberg pipeline). Inline images are excluded by attachment_type=1.
  • No dedicated alerting (deliberate, 2026-07-03): mail-task errors surface in paperless logs; the mailserver inbound path is covered by email-roundtrip-monitor. Revisit if forwards start silently failing.
  • Workflows: the global payslip-webhook + claude-mcp-readers auto-permission workflows fire for mail-ingested docs like any other consumption source (verified pre-build; payslip receiver does its own filtering).

Rollback

  1. Disable/delete the 5 forward: mail rules + the docs@ mail account (paperless admin UI or API).
  2. Revert the infra commit (aliases.txt entry, sieve file, CM key + mount).
  3. Remove docs@viktorbarzin.me from Vault mailserver_accounts, then apply with the -replace workaround above. Mail to docs@ then falls back to the catch-all (spam@) like any unknown address.