tts: demand gate treats a failed queue probe as no-action, not queue-empty
Some checks failed
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was canceled

The demand-gate script defaulted an unreadable/unparseable tts-queue
response to QUEUED=0, which the scale-down arm reads as 'queue empty'.
One transient curl failure at 20:30 UTC today idled chatterbox-tts to 0
the very minute the pod first went Ready, with 27 narrations still
queued (tripit kept logging tts_unreachable). Probe failure now exits
without touching replicas: scale-up still needs a real count > 0, and
scale-down now needs an explicitly parsed 0. Worst case after this
change is a stale-up deployment idling until the 06:00 window-down.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-12 20:35:02 +00:00
parent 18f524c265
commit 87a8a393fe

View file

@ -152,9 +152,19 @@ locals {
# down when the queue empties (even inside the nightly window done is # down when the queue empties (even inside the nightly window done is
# done, free the card early). The 02:00 window-up stays the guaranteed # done, free the card early). The 02:00 window-up stays the guaranteed
# nightly catch-up for days the daytime card never had room. # nightly catch-up for days the daytime card never had room.
QUEUED="$(curl -sf -m 10 "$${QUEUE_URL}" \ # A FAILED probe must not read as "queue empty": defaulting to 0 idled
| sed -n 's/.*"queued"[^0-9]*\([0-9][0-9]*\).*/\1/p')" || QUEUED="" # the deployment the very minute it first went Ready (2026-06-12 20:30
QUEUED="$${QUEUED:-0}" # UTC one transient curl failure, 27 items still queued). Fail-safe
# is NO ACTION; worst case a stale-up deployment idles until the 06:00
# window-down. (This also covers a 404 from an older tripit image.)
if ! QBODY="$(curl -sf -m 10 "$${QUEUE_URL}")"; then
echo "demand: queue probe failed -> no action (fail-safe)"; exit 0
fi
QUEUED="$(printf '%s\n' "$${QBODY}" \
| sed -n 's/.*"queued"[^0-9]*\([0-9][0-9]*\).*/\1/p')"
if [ -z "$${QUEUED}" ]; then
echo "demand: unparseable queue response -> no action (fail-safe)"; exit 0
fi
REPLICAS="$(kubectl -n tts get deploy/chatterbox-tts -o jsonpath='{.spec.replicas}')" REPLICAS="$(kubectl -n tts get deploy/chatterbox-tts -o jsonpath='{.spec.replicas}')"
echo "demand: queued=$${QUEUED} replicas=$${REPLICAS}" echo "demand: queued=$${QUEUED} replicas=$${REPLICAS}"
if [ "$${QUEUED}" -gt 0 ] && [ "$${REPLICAS}" = "0" ]; then if [ "$${QUEUED}" -gt 0 ] && [ "$${REPLICAS}" = "0" ]; then
@ -198,7 +208,8 @@ locals {
} }
# tripit's unauthenticated in-cluster queue probe (count only, non-sensitive). # tripit's unauthenticated in-cluster queue probe (count only, non-sensitive).
# A 404 from an older tripit image yields QUEUED=0 -> the gate no-ops. # Probe failures (incl. a 404 from an older tripit image) make the demand
# gate take NO action only an explicit parsed count scales anything.
tripit_queue_url = "http://tripit.tripit.svc.cluster.local:8080/api/tour/tts-queue" tripit_queue_url = "http://tripit.tripit.svc.cluster.local:8080/api/tour/tts-queue"
} }