Auto-tagging silently no-op'd: the container env vars set in the deployment
shadowed the app's own /app/data/.env, because paperless-ai's dotenv loader
does not override process.env. A stale PROCESS_PREDEFINED_DOCUMENTS=yes (with
no TAGS) made the scan select zero documents.
Strip the wizard-owned behavioural config (Paperless URL, AI provider, model,
scan interval, tagging flags) from the container env, keeping only
infrastructural env (PUID/PGID/port/RAG/HF cache) and the Vault-sourced
secret refs. The app's setup-written .env on the PVC is now authoritative,
so processing runs and tags all documents. Qwen3 thinking is disabled via
SYSTEM_PROMPT=/no_think in that .env to keep the model's JSON output parseable.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Viktor wanted real semantic search over his ~300 Paperless documents and
preferred a ready-made solution over building one. paperless-ai provides
local-embedding RAG (ChromaDB + sentence-transformers, GPU-free) plus
LLM-driven auto-analysis/tagging.
Wiring:
- LLM (chat answers + tagging) -> in-cluster llama-swap qwen3-8b
(OpenAI-compatible); embeddings + vector store are local on the PVC.
- Reads Paperless over the internal service via a dedicated `paperless-ai`
superuser token (Vault secret/paperless-ai); app-admin creds also in Vault.
- Encrypted PVC for /app/data (SQLite + ChromaDB + model cache).
- Ingress paperless-ai.viktorbarzin.me behind Authentik (auth=required).
- Third-party image pinned (docker.io/clusterzx/paperless-ai:3.0.9), no Keel.
Runtime config persists to the PVC .env via the app's one-time setup; the
deployment env vars are pre-fill/documentation only.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>