The devnen server runs chunked synthesis as a blocking call inside its
async handler, so the event loop (and every HTTP probe) hangs for the
whole multi-minute story. Kubelet's http liveness probe (1s timeout)
then killed the container mid-story (exit 137, twice within 10 min of
the first real drain), which reset the engine, so every following pass
started cold and tripit's 120s synthesis budget could never be met —
the queue would never drain.
TCP probes keep the meaning that matters: uvicorn binds 8004 only
after the model finishes loading in the lifespan hook, so readiness
still gates 'model loaded', while a GPU-busy server is left alive.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First live drain failed all 27 queued narrations with 404 'Voice file
'Emily' not found': tripit's catalog sends bare stems (Emily) but the
devnen server resolves the voice as a literal filename (Emily.wav) in
predefined_voices_path then reference_audio — no stem fallback exists
upstream (HEAD == our pinned sha), and symlinks can't bridge it because
safe_resolve_within() resolves them out of the containment check.
New initContainer on the chatterbox deployment copies the 28 bundled
voices to /data/reference_audio/<stem> on the PVC (second lookup path).
Same image as the main container so no extra pull; idempotent; ~15 MB.
Verified live before committing: an extension-less copy synthesizes
200 audio/mp3 (5.3s warm) where voice=Emily 404'd.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The demand-gate script defaulted an unreadable/unparseable tts-queue
response to QUEUED=0, which the scale-down arm reads as 'queue empty'.
One transient curl failure at 20:30 UTC today idled chatterbox-tts to 0
the very minute the pod first went Ready, with 27 narrations still
queued (tripit kept logging tts_unreachable). Probe failure now exits
without touching replicas: scale-up still needs a real count > 0, and
scale-down now needs an explicitly parsed 0. Worst case after this
change is a stale-up deployment idling until the 06:00 window-down.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The deployment's lifecycle.ignore_changes still ignored the container
image (copied from the keel-managed tripit pattern), which would have
made the previous commit's GHCR switch a silent no-op on apply. Keel
cannot poll the private GHCR repo anyway; the pinned sha tag is
terraform's to manage.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor reports the voice still isn't from the TTS service — correct:
zero story_audio rows exist; the pod has sat in ImagePullBackOff since
the first window because the 2026-06-09 Forgejo-registry push has a
corrupt layer blob (HEAD 500s; pushed from a 94%-full disk) and identical
digests can't heal corrupt registry storage. The off-infra GHA rebuild
(tripit build-chatterbox.yml, devnen 915ae289, succeeded 03:23 UTC) now
lives in private GHCR: switch the image there, pin the upstream-sha tag,
and add the vault-backed ghcr-credentials pull secret (mirrors
stacks/tripit). tripit's drain loop has 27 narrations queued and picks
them up the moment the pod goes Ready.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First real window (2026-06-12 02:00): the chatterbox pod sat in
ContainerCreating with MountVolume exit 32 x19 — /srv/nfs-ssd is exported
whole-tree but the chatterbox SUBDIR never existed on the host (the
go-live runbook step needed NFS-host shell nobody doing the apply had).
One-shot busybox Job mounts the export root and mkdir -p's the subtree;
kubelet's mount retry then self-heals the pod. Audio queue (27 items)
drains as soon as the model loads.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked 'can't we make it live? why the cronjob?' — the overnight
window guaranteed VRAM room on the shared T4, but immich/frigate models
idle-unload during the day so the card often has room (measured 10.3 GiB
free at 01:20). New 'demand' action every 3 min: scale Chatterbox up when
tripit's audio queue is non-empty AND free VRAM >= floor; idle it back to
0 when the queue empties (also frees the card early inside the nightly
window). Failed metrics scrape fail-safes to no-scale-up, same as the
window preflight. The guard moves to all-day */5 — live synthesis can
hold the card at any hour, so the yield-on-pressure watchdog must watch
at any hour. tripit exposes the unauthenticated in-cluster queue count;
a 404 from an older image reads as queued=0 (no-op). The 02:00 window-up
stays as the guaranteed nightly catch-up.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Woodpecker default pipeline selects stacks via git diff HEAD~1 HEAD;
on a merge commit that is the first-parent diff, which contained only the
concurrently-landed files — stacks/tts never got applied (namespace still
absent) and the kyverno re-trigger push got no pipeline at all. Single
non-merge commit touching both stacks so the detector sees them; the
sorted loop applies kyverno before tts, the order tripit#26 requires.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor's tour-guide redo (tripit#26): 87702bdc committed this stack with
[ci skip] so it was never applied — prod tripit has been pointing at a
nonexistent chatterbox-tts service since. This commit triggers the apply
and fixes the voices path: config pointed predefined_voices_path at the
NFS PVC (/data/voices), which nobody can seed without NFS-host shell
access and which would leave /v1/audio/voices empty (it gates readiness).
Use the 28 voices bundled in the image at /app/voices instead; /data
keeps reference audio (future cloning) and the HF model cache.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
New `infra/stacks/tts/` deploys devnen/Chatterbox-TTS-Server (OpenAI-compatible
/v1/audio/speech) as ClusterIP `chatterbox-tts.tts.svc:8000` (server listens on
8004; Service remaps), requesting ONE T4 time-slice. Mirrors stacks/llama-cpp/.
Option A off-peak control (no VRAM isolation on the time-sliced T4 — see
post-mortem 2026-06-02): Deployment sits at replicas=0; three Europe/London
CronJobs own the replica count — `chatterbox-window-up` scales to 1 at 02:00
ONLY IF a free-VRAM preflight passes (sum gpu_pod_memory_used_bytes from
gpu-pod-exporter; free = 16GiB - used >= floor), `chatterbox-vram-guard` yields
the card mid-window if a resident wakes, `chatterbox-window-down` scales to 0 at
06:00. tripit's bake is best-effort + cached-forever (ADR-0002/0004) so a
skipped/aborted window backfills next time. SA+Role+RoleBinding grant the
CronJobs deployments/scale (nextcloud-watchdog pattern).
Polite-tenant hardening: kyverno `inject-gpu-workload-priority` now excludes the
`tts` namespace (new `gpu_priority_excluded_namespaces` local) so Chatterbox
keeps tier-2-gpu priority (600k) and is always evicted first under GPU pressure
— never immich-ml/frigate/llama-swap. The LimitRange-fallback policy still uses
the base exclude list (tts untouched there).
tripit: add TTS_MODE=openai_compatible, TTS_BASE_URL, TTS_MODEL=chatterbox to
local.app_env (no token — ClusterIP only). No tripit code change.
Image build is documented in stacks/tts/README.md (devnen cu128 target ->
forgejo.viktorbarzin.me/viktor/chatterbox-tts) — build is impractical inline
(large CUDA image + needs the upstream repo). NOT APPLIED — review branch only.
Free-VRAM floor (var.vram_free_floor_bytes, default 6GiB) must be set from the
measured chatterbox-multilingual T4 peak during the first bake.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>