feat(tts): Chatterbox TTS stack + off-peak T4 gate, wire tripit narration [ci skip]

New `infra/stacks/tts/` deploys devnen/Chatterbox-TTS-Server (OpenAI-compatible /v1/audio/speech) as ClusterIP `chatterbox-tts.tts.svc:8000` (server listens on 8004; Service remaps), requesting ONE T4 time-slice. Mirrors stacks/llama-cpp/. Option A off-peak control (no VRAM isolation on the time-sliced T4 — see post-mortem 2026-06-02): Deployment sits at replicas=0; three Europe/London CronJobs own the replica count — `chatterbox-window-up` scales to 1 at 02:00 ONLY IF a free-VRAM preflight passes (sum gpu_pod_memory_used_bytes from gpu-pod-exporter; free = 16GiB - used >= floor), `chatterbox-vram-guard` yields the card mid-window if a resident wakes, `chatterbox-window-down` scales to 0 at 06:00. tripit's bake is best-effort + cached-forever (ADR-0002/0004) so a skipped/aborted window backfills next time. SA+Role+RoleBinding grant the CronJobs deployments/scale (nextcloud-watchdog pattern). Polite-tenant hardening: kyverno `inject-gpu-workload-priority` now excludes the `tts` namespace (new `gpu_priority_excluded_namespaces` local) so Chatterbox keeps tier-2-gpu priority (600k) and is always evicted first under GPU pressure — never immich-ml/frigate/llama-swap. The LimitRange-fallback policy still uses the base exclude list (tts untouched there). tripit: add TTS_MODE=openai_compatible, TTS_BASE_URL, TTS_MODEL=chatterbox to local.app_env (no token — ClusterIP only). No tripit code change. Image build is documented in stacks/tts/README.md (devnen cu128 target -> forgejo.viktorbarzin.me/viktor/chatterbox-tts) — build is impractical inline (large CUDA image + needs the upstream repo). NOT APPLIED — review branch only. Free-VRAM floor (var.vram_free_floor_bytes, default 6GiB) must be set from the measured chatterbox-multilingual T4 peak during the first bake. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 07:30:19 +00:00 · 2026-06-09 07:30:19 +00:00 · 1bc5c92622
commit 1bc5c92622
parent 05b50d2b96
5 changed files with 681 additions and 1 deletions
--- a/stacks/tripit/main.tf
+++ b/stacks/tripit/main.tf
@ -65,6 +65,15 @@ locals {
    SMTP_USER       = "spam@viktorbarzin.me"
    SMTP_FROM       = "plans@viktorbarzin.me"
    PUBLIC_BASE_URL = "https://tripit.viktorbarzin.me"
+    # Narrator audio (ADR-0004): Chatterbox via the in-cluster `tts` stack.
+    # OpenAI-compatible /v1/audio/speech; the bake POSTs best-effort synth
+    # requests, so a down/Pending Chatterbox is a clean skip (browser-TTS
+    # fallback), never a bake error. ClusterIP-only → no token. Note: the mode
+    # is `openai_compatible` (tripit renamed it from `chatterbox`); TTS_MODEL is
+    # still the `chatterbox` family string tripit sends as the OpenAI `model`.
+    TTS_MODE     = "openai_compatible"
+    TTS_BASE_URL = "http://chatterbox-tts.tts.svc.cluster.local:8000"
+    TTS_MODEL    = "chatterbox"
  }
 }