tts: TCP probes — http liveness killed the server mid-synthesis
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful

The devnen server runs chunked synthesis as a blocking call inside its
async handler, so the event loop (and every HTTP probe) hangs for the
whole multi-minute story. Kubelet's http liveness probe (1s timeout)
then killed the container mid-story (exit 137, twice within 10 min of
the first real drain), which reset the engine, so every following pass
started cold and tripit's 120s synthesis budget could never be met —
the queue would never drain.

TCP probes keep the meaning that matters: uvicorn binds 8004 only
after the model finishes loading in the lifespan hook, so readiness
still gates 'model loaded', while a GPU-busy server is left alive.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-12 20:57:27 +00:00
parent 30ff8f2db3
commit bd0cb71f17

View file

@ -440,12 +440,18 @@ resource "kubernetes_deployment" "chatterbox" {
mount_path = "/data" mount_path = "/data"
} }
# /v1/audio/voices is cheap and only 200s once the model is loaded # TCP probes, deliberately NOT http: the server synthesizes chunks
# so it gates real readiness. First start downloads the model, which # as a BLOCKING call inside its async handler, so the event loop
# is slow; the generous failure_threshold absorbs that. # and any HTTP probe hangs for the whole multi-minute story. The
# http liveness probe killed the container mid-synthesis (exit 137,
# observed 2026-06-12 20:4820:53: every drain pass then faced a
# cold engine and timed out forever). TCP keeps the original
# semantics where it matters: uvicorn only binds 8004 AFTER the
# lifespan hook finishes loading the model ("Application startup
# complete" precedes "Uvicorn running"), so a TCP readiness pass
# still means "model loaded", while a GPU-busy server stays alive.
readiness_probe { readiness_probe {
http_get { tcp_socket {
path = "/v1/audio/voices"
port = 8004 port = 8004
} }
initial_delay_seconds = 20 initial_delay_seconds = 20
@ -453,8 +459,7 @@ resource "kubernetes_deployment" "chatterbox" {
failure_threshold = 12 failure_threshold = 12
} }
liveness_probe { liveness_probe {
http_get { tcp_socket {
path = "/v1/audio/voices"
port = 8004 port = 8004
} }
initial_delay_seconds = 120 initial_delay_seconds = 120