paperless-ngx: scale Gotenberg x3 + Tika x2, 4 workers, skip-archive — speed the Emo import
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Bottleneck found: single Gotenberg 503s under concurrent workers (office docs failing + slow). Cluster is otherwise idle (sdc 0.5% util, etcd ~1/min), so: - Gotenberg 1->3 + Tika 1->2 (Service load-balances; fixes the 503s, parallel office conversion). - paperless TASK_WORKERS 2->4, THREADS_PER_WORKER 2->1, mem limit 4->8Gi (avoid OOM with 4 concurrent OCR). Requests kept low to stay within tier-quota (requests.memory 3840/4096Mi). - PAPERLESS_OCR_SKIP_ARCHIVE_FILE=with_text: skip redundant archive for born- digital/office docs (big IO saver for the work-doc set). Guard + etcd watch stay in place; revert to defaults after the import. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
d6bd9486e3
commit
2cb37d51d4
1 changed files with 15 additions and 6 deletions
|
|
@ -225,11 +225,18 @@ resource "kubernetes_deployment" "paperless-ngx" {
|
||||||
# it degrades. Revert both to defaults once the import is done.
|
# it degrades. Revert both to defaults once the import is done.
|
||||||
env {
|
env {
|
||||||
name = "PAPERLESS_TASK_WORKERS"
|
name = "PAPERLESS_TASK_WORKERS"
|
||||||
value = "2"
|
value = "4"
|
||||||
}
|
}
|
||||||
env {
|
env {
|
||||||
name = "PAPERLESS_THREADS_PER_WORKER"
|
name = "PAPERLESS_THREADS_PER_WORKER"
|
||||||
value = "2"
|
value = "1"
|
||||||
|
}
|
||||||
|
# Skip the redundant OCR'd archive PDF for inputs that already carry a
|
||||||
|
# text layer (born-digital PDFs + office->PDF via Gotenberg). Big
|
||||||
|
# speed/IO saver for emo's work-doc set; scanned docs still OCR+archive.
|
||||||
|
env {
|
||||||
|
name = "PAPERLESS_OCR_SKIP_ARCHIVE_FILE"
|
||||||
|
value = "with_text"
|
||||||
}
|
}
|
||||||
volume_mount {
|
volume_mount {
|
||||||
name = "data"
|
name = "data"
|
||||||
|
|
@ -242,7 +249,7 @@ resource "kubernetes_deployment" "paperless-ngx" {
|
||||||
memory = "2Gi"
|
memory = "2Gi"
|
||||||
}
|
}
|
||||||
limits = {
|
limits = {
|
||||||
memory = "4Gi"
|
memory = "8Gi"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -299,7 +306,9 @@ resource "kubernetes_service" "paperless-ngx" {
|
||||||
# --- Tika + Gotenberg: Office/email -> text/PDF conversion for paperless ---
|
# --- Tika + Gotenberg: Office/email -> text/PDF conversion for paperless ---
|
||||||
# Apache Tika extracts text+metadata; Gotenberg renders Office formats to PDF.
|
# Apache Tika extracts text+metadata; Gotenberg renders Office formats to PDF.
|
||||||
# Paperless routes Office/email docs through these (PAPERLESS_TIKA_* above).
|
# Paperless routes Office/email docs through these (PAPERLESS_TIKA_* above).
|
||||||
# Stateless (no PVC), pinned images, single replica — bulk import is serial.
|
# Stateless (no PVC), pinned images. 3 replicas during the bulk import: a
|
||||||
|
# single LibreOffice instance 503s under concurrent paperless workers; the
|
||||||
|
# Service load-balances office conversions across the replicas.
|
||||||
resource "kubernetes_deployment" "gotenberg" {
|
resource "kubernetes_deployment" "gotenberg" {
|
||||||
metadata {
|
metadata {
|
||||||
name = "gotenberg"
|
name = "gotenberg"
|
||||||
|
|
@ -310,7 +319,7 @@ resource "kubernetes_deployment" "gotenberg" {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
replicas = 1
|
replicas = 3
|
||||||
selector {
|
selector {
|
||||||
match_labels = {
|
match_labels = {
|
||||||
app = "gotenberg"
|
app = "gotenberg"
|
||||||
|
|
@ -395,7 +404,7 @@ resource "kubernetes_deployment" "tika" {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
replicas = 1
|
replicas = 2
|
||||||
selector {
|
selector {
|
||||||
match_labels = {
|
match_labels = {
|
||||||
app = "tika"
|
app = "tika"
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue