Without this, each extraction took 5-10 minutes because the base64'd PDF
expanded to ~300KB of prompt tokens. poppler-utils ships pdftotext which
turns a 200KB PDF into ~3KB of plain text in milliseconds. Claude (Haiku)
then processes the text in seconds.
- Dockerfile installs poppler-utils in the runtime stage (one-liner).
- _build_prompt() tries pdftotext -layout first; falls back to base64 if
pdftotext is missing (local dev) or the PDF is unreadable (scanned image).
- Agent file documents the PAYSLIP_TEXT fast path — still handles
PDF_BASE64 for fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracted from /home/wizard/code monorepo into its own repo so Woodpecker CI
can watch it. Identical content to /home/wizard/code commit e426028.
See README.md for overview, env vars, and Paperless workflow config.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>