infra

Author	SHA1	Message	Date
Viktor Barzin	39a22b352e	tts: bootstrap the chatterbox NFS subdir — first-window mount failed forever All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details First real window (2026-06-12 02:00): the chatterbox pod sat in ContainerCreating with MountVolume exit 32 x19 — /srv/nfs-ssd is exported whole-tree but the chatterbox SUBDIR never existed on the host (the go-live runbook step needed NFS-host shell nobody doing the apply had). One-shot busybox Job mounts the export root and mkdir -p's the subtree; kubelet's mount retry then self-heals the pod. Audio queue (27 items) drains as soon as the model loads. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 02:51:14 +00:00
Viktor Barzin	d3d37a15ec	tts: GPU-gated live narration — demand-gate CronJob + all-day VRAM guard Some checks failed ci/woodpecker/push/default Pipeline was canceled Details ci/woodpecker/push/build-cli Pipeline was canceled Details Viktor asked 'can't we make it live? why the cronjob?' — the overnight window guaranteed VRAM room on the shared T4, but immich/frigate models idle-unload during the day so the card often has room (measured 10.3 GiB free at 01:20). New 'demand' action every 3 min: scale Chatterbox up when tripit's audio queue is non-empty AND free VRAM >= floor; idle it back to 0 when the queue empties (also frees the card early inside the nightly window). Failed metrics scrape fail-safes to no-scale-up, same as the window preflight. The guard moves to all-day */5 — live synthesis can hold the card at any hour, so the yield-on-pressure watchdog must watch at any hour. tripit exposes the unauthenticated in-cluster queue count; a 404 from an older image reads as queued=0 (no-op). The 02:00 window-up stays as the guaranteed nightly catch-up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 00:25:35 +00:00
Viktor Barzin	798b025580	tts+kyverno: non-merge apply trigger (merge-commit diff hid stacks/tts from the stack detector) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details The Woodpecker default pipeline selects stacks via git diff HEAD~1 HEAD; on a merge commit that is the first-parent diff, which contained only the concurrently-landed files — stacks/tts never got applied (namespace still absent) and the kyverno re-trigger push got no pipeline at all. Single non-merge commit touching both stacks so the detector sees them; the sorted loop applies kyverno before tts, the order tripit#26 requires. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 19:08:23 +00:00
Viktor Barzin	4a8c4f9a14	tts: first apply of Chatterbox stack; predefined voices from the image, not the unseeded PVC Viktor's tour-guide redo (tripit#26): `87702bdc` committed this stack with [ci skip] so it was never applied — prod tripit has been pointing at a nonexistent chatterbox-tts service since. This commit triggers the apply and fixes the voices path: config pointed predefined_voices_path at the NFS PVC (/data/voices), which nobody can seed without NFS-host shell access and which would leave /v1/audio/voices empty (it gates readiness). Use the 28 voices bundled in the image at /app/voices instead; /data keeps reference audio (future cloning) and the HF model cache. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 18:27:44 +00:00
Viktor Barzin	87702bdce8	feat(tts): Chatterbox TTS stack + off-peak T4 gate, wire tripit narration [ci skip] New `infra/stacks/tts/` deploys devnen/Chatterbox-TTS-Server (OpenAI-compatible /v1/audio/speech) as ClusterIP `chatterbox-tts.tts.svc:8000` (server listens on 8004; Service remaps), requesting ONE T4 time-slice. Mirrors stacks/llama-cpp/. Option A off-peak control (no VRAM isolation on the time-sliced T4 — see post-mortem 2026-06-02): Deployment sits at replicas=0; three Europe/London CronJobs own the replica count — `chatterbox-window-up` scales to 1 at 02:00 ONLY IF a free-VRAM preflight passes (sum gpu_pod_memory_used_bytes from gpu-pod-exporter; free = 16GiB - used >= floor), `chatterbox-vram-guard` yields the card mid-window if a resident wakes, `chatterbox-window-down` scales to 0 at 06:00. tripit's bake is best-effort + cached-forever (ADR-0002/0004) so a skipped/aborted window backfills next time. SA+Role+RoleBinding grant the CronJobs deployments/scale (nextcloud-watchdog pattern). Polite-tenant hardening: kyverno `inject-gpu-workload-priority` now excludes the `tts` namespace (new `gpu_priority_excluded_namespaces` local) so Chatterbox keeps tier-2-gpu priority (600k) and is always evicted first under GPU pressure — never immich-ml/frigate/llama-swap. The LimitRange-fallback policy still uses the base exclude list (tts untouched there). tripit: add TTS_MODE=openai_compatible, TTS_BASE_URL, TTS_MODEL=chatterbox to local.app_env (no token — ClusterIP only). No tripit code change. Image build is documented in stacks/tts/README.md (devnen cu128 target -> forgejo.viktorbarzin.me/viktor/chatterbox-tts) — build is impractical inline (large CUDA image + needs the upstream repo). NOT APPLIED — review branch only. Free-VRAM floor (var.vram_free_floor_bytes, default 6GiB) must be set from the measured chatterbox-multilingual T4 peak during the first bake. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 21:41:53 +00:00

5 commits