Commit graph

5 commits

Author SHA1 Message Date
Viktor Barzin
39a22b352e tts: bootstrap the chatterbox NFS subdir — first-window mount failed forever
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
First real window (2026-06-12 02:00): the chatterbox pod sat in
ContainerCreating with MountVolume exit 32 x19 — /srv/nfs-ssd is exported
whole-tree but the chatterbox SUBDIR never existed on the host (the
go-live runbook step needed NFS-host shell nobody doing the apply had).
One-shot busybox Job mounts the export root and mkdir -p's the subtree;
kubelet's mount retry then self-heals the pod. Audio queue (27 items)
drains as soon as the model loads.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 02:51:14 +00:00
Viktor Barzin
d3d37a15ec tts: GPU-gated live narration — demand-gate CronJob + all-day VRAM guard
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
ci/woodpecker/push/build-cli Pipeline was canceled
Viktor asked 'can't we make it live? why the cronjob?' — the overnight
window guaranteed VRAM room on the shared T4, but immich/frigate models
idle-unload during the day so the card often has room (measured 10.3 GiB
free at 01:20). New 'demand' action every 3 min: scale Chatterbox up when
tripit's audio queue is non-empty AND free VRAM >= floor; idle it back to
0 when the queue empties (also frees the card early inside the nightly
window). Failed metrics scrape fail-safes to no-scale-up, same as the
window preflight. The guard moves to all-day */5 — live synthesis can
hold the card at any hour, so the yield-on-pressure watchdog must watch
at any hour. tripit exposes the unauthenticated in-cluster queue count;
a 404 from an older image reads as queued=0 (no-op). The 02:00 window-up
stays as the guaranteed nightly catch-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 00:25:35 +00:00
Viktor Barzin
798b025580 tts+kyverno: non-merge apply trigger (merge-commit diff hid stacks/tts from the stack detector)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
The Woodpecker default pipeline selects stacks via git diff HEAD~1 HEAD;
on a merge commit that is the first-parent diff, which contained only the
concurrently-landed files — stacks/tts never got applied (namespace still
absent) and the kyverno re-trigger push got no pipeline at all. Single
non-merge commit touching both stacks so the detector sees them; the
sorted loop applies kyverno before tts, the order tripit#26 requires.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:08:23 +00:00
Viktor Barzin
4a8c4f9a14 tts: first apply of Chatterbox stack; predefined voices from the image, not the unseeded PVC
Viktor's tour-guide redo (tripit#26): 87702bdc committed this stack with
[ci skip] so it was never applied — prod tripit has been pointing at a
nonexistent chatterbox-tts service since. This commit triggers the apply
and fixes the voices path: config pointed predefined_voices_path at the
NFS PVC (/data/voices), which nobody can seed without NFS-host shell
access and which would leave /v1/audio/voices empty (it gates readiness).
Use the 28 voices bundled in the image at /app/voices instead; /data
keeps reference audio (future cloning) and the HF model cache.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 18:27:44 +00:00
Viktor Barzin
87702bdce8 feat(tts): Chatterbox TTS stack + off-peak T4 gate, wire tripit narration [ci skip]
New `infra/stacks/tts/` deploys devnen/Chatterbox-TTS-Server (OpenAI-compatible
/v1/audio/speech) as ClusterIP `chatterbox-tts.tts.svc:8000` (server listens on
8004; Service remaps), requesting ONE T4 time-slice. Mirrors stacks/llama-cpp/.

Option A off-peak control (no VRAM isolation on the time-sliced T4 — see
post-mortem 2026-06-02): Deployment sits at replicas=0; three Europe/London
CronJobs own the replica count — `chatterbox-window-up` scales to 1 at 02:00
ONLY IF a free-VRAM preflight passes (sum gpu_pod_memory_used_bytes from
gpu-pod-exporter; free = 16GiB - used >= floor), `chatterbox-vram-guard` yields
the card mid-window if a resident wakes, `chatterbox-window-down` scales to 0 at
06:00. tripit's bake is best-effort + cached-forever (ADR-0002/0004) so a
skipped/aborted window backfills next time. SA+Role+RoleBinding grant the
CronJobs deployments/scale (nextcloud-watchdog pattern).

Polite-tenant hardening: kyverno `inject-gpu-workload-priority` now excludes the
`tts` namespace (new `gpu_priority_excluded_namespaces` local) so Chatterbox
keeps tier-2-gpu priority (600k) and is always evicted first under GPU pressure
— never immich-ml/frigate/llama-swap. The LimitRange-fallback policy still uses
the base exclude list (tts untouched there).

tripit: add TTS_MODE=openai_compatible, TTS_BASE_URL, TTS_MODEL=chatterbox to
local.app_env (no token — ClusterIP only). No tripit code change.

Image build is documented in stacks/tts/README.md (devnen cu128 target ->
forgejo.viktorbarzin.me/viktor/chatterbox-tts) — build is impractical inline
(large CUDA image + needs the upstream repo). NOT APPLIED — review branch only.
Free-VRAM floor (var.vram_free_floor_bytes, default 6GiB) must be set from the
measured chatterbox-multilingual T4 peak during the first bake.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:41:53 +00:00