paperless-ai: make the PVC .env the single source of config truth
All checks were successful
ci/woodpecker/push/default Pipeline was successful

Auto-tagging silently no-op'd: the container env vars set in the deployment
shadowed the app's own /app/data/.env, because paperless-ai's dotenv loader
does not override process.env. A stale PROCESS_PREDEFINED_DOCUMENTS=yes (with
no TAGS) made the scan select zero documents.

Strip the wizard-owned behavioural config (Paperless URL, AI provider, model,
scan interval, tagging flags) from the container env, keeping only
infrastructural env (PUID/PGID/port/RAG/HF cache) and the Vault-sourced
secret refs. The app's setup-written .env on the PVC is now authoritative,
so processing runs and tags all documents. Qwen3 thinking is disabled via
SYSTEM_PROMPT=/no_think in that .env to keep the model's JSON output parseable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-18 06:41:29 +00:00
parent aeee0d02e2
commit 4977153dfb

View file

@ -142,13 +142,16 @@ resource "kubernetes_deployment" "paperless_ai" {
name = "rag"
}
# NOTE on configuration model: paperless-ai persists its RUNTIME
# config (Paperless URL/token, AI provider, processing flags) plus
# the app-admin account to /app/data/.env + SQLite on the PVC,
# written once via its setup flow (POST /setup). The env vars below
# are consumed by the Node layer and serve as setup-form pre-fill;
# the authoritative runtime config is the PVC's .env. App-admin
# creds + the Paperless token live in Vault secret/paperless-ai.
# Configuration model: paperless-ai persists ALL behavioural config
# (Paperless URL, AI provider, scan interval, tagging flags) + the
# app-admin account to /app/data/.env + SQLite on the PVC, written
# once via its setup flow. The PVC .env is the SINGLE source of truth
# for behaviour we deliberately do NOT set those as container env,
# because the image's dotenv loader does NOT override process.env, so
# a container env silently shadows the .env (PROCESS_PREDEFINED_DOCUMENTS
# set here once forced the scan to no-op). Only infrastructural env +
# the Vault-sourced secrets (which mirror the .env copies) are set.
# App-admin creds + Paperless token live in Vault secret/paperless-ai.
env {
name = "PUID"
value = "1000"
@ -182,15 +185,7 @@ resource "kubernetes_deployment" "paperless_ai" {
value = "/app/data/st-cache"
}
# --- Paperless-ngx connection (internal service, no edge hop) ---
env {
name = "PAPERLESS_API_URL"
value = "http://paperless-ngx.paperless-ngx.svc.cluster.local/api"
}
env {
name = "PAPERLESS_USERNAME"
value = "paperless-ai"
}
# Vault-sourced secrets (mirror the .env copies the setup flow wrote).
env {
name = "PAPERLESS_API_TOKEN"
value_from {
@ -201,19 +196,6 @@ resource "kubernetes_deployment" "paperless_ai" {
}
}
# --- LLM backend: in-cluster llama-swap (OpenAI-compatible) ---
env {
name = "AI_PROVIDER"
value = "custom"
}
env {
name = "CUSTOM_BASE_URL"
value = "http://llama-swap.llama-cpp.svc.cluster.local:8080/v1"
}
env {
name = "CUSTOM_MODEL"
value = "qwen3-8b"
}
env {
name = "CUSTOM_API_KEY"
value_from {
@ -235,24 +217,6 @@ resource "kubernetes_deployment" "paperless_ai" {
}
}
# --- Processing: auto-analyze + tag every document ---
env {
name = "SCAN_INTERVAL"
value = "*/30 * * * *"
}
env {
name = "PROCESS_PREDEFINED_DOCUMENTS"
value = "yes"
}
env {
name = "ADD_AI_PROCESSED_TAG"
value = "yes"
}
env {
name = "AI_PROCESSED_TAG_NAME"
value = "ai-processed"
}
volume_mount {
name = "data"
mount_path = "/app/data"