tts: demand gate treats a failed queue probe as no-action, not queue-empty
The demand-gate script defaulted an unreadable/unparseable tts-queue response to QUEUED=0, which the scale-down arm reads as 'queue empty'. One transient curl failure at 20:30 UTC today idled chatterbox-tts to 0 the very minute the pod first went Ready, with 27 narrations still queued (tripit kept logging tts_unreachable). Probe failure now exits without touching replicas: scale-up still needs a real count > 0, and scale-down now needs an explicitly parsed 0. Worst case after this change is a stale-up deployment idling until the 06:00 window-down. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
18f524c265
commit
87a8a393fe
1 changed files with 15 additions and 4 deletions
|
|
@ -152,9 +152,19 @@ locals {
|
||||||
# down when the queue empties (even inside the nightly window — done is
|
# down when the queue empties (even inside the nightly window — done is
|
||||||
# done, free the card early). The 02:00 window-up stays the guaranteed
|
# done, free the card early). The 02:00 window-up stays the guaranteed
|
||||||
# nightly catch-up for days the daytime card never had room.
|
# nightly catch-up for days the daytime card never had room.
|
||||||
QUEUED="$(curl -sf -m 10 "$${QUEUE_URL}" \
|
# A FAILED probe must not read as "queue empty": defaulting to 0 idled
|
||||||
| sed -n 's/.*"queued"[^0-9]*\([0-9][0-9]*\).*/\1/p')" || QUEUED=""
|
# the deployment the very minute it first went Ready (2026-06-12 20:30
|
||||||
QUEUED="$${QUEUED:-0}"
|
# UTC — one transient curl failure, 27 items still queued). Fail-safe
|
||||||
|
# is NO ACTION; worst case a stale-up deployment idles until the 06:00
|
||||||
|
# window-down. (This also covers a 404 from an older tripit image.)
|
||||||
|
if ! QBODY="$(curl -sf -m 10 "$${QUEUE_URL}")"; then
|
||||||
|
echo "demand: queue probe failed -> no action (fail-safe)"; exit 0
|
||||||
|
fi
|
||||||
|
QUEUED="$(printf '%s\n' "$${QBODY}" \
|
||||||
|
| sed -n 's/.*"queued"[^0-9]*\([0-9][0-9]*\).*/\1/p')"
|
||||||
|
if [ -z "$${QUEUED}" ]; then
|
||||||
|
echo "demand: unparseable queue response -> no action (fail-safe)"; exit 0
|
||||||
|
fi
|
||||||
REPLICAS="$(kubectl -n tts get deploy/chatterbox-tts -o jsonpath='{.spec.replicas}')"
|
REPLICAS="$(kubectl -n tts get deploy/chatterbox-tts -o jsonpath='{.spec.replicas}')"
|
||||||
echo "demand: queued=$${QUEUED} replicas=$${REPLICAS}"
|
echo "demand: queued=$${QUEUED} replicas=$${REPLICAS}"
|
||||||
if [ "$${QUEUED}" -gt 0 ] && [ "$${REPLICAS}" = "0" ]; then
|
if [ "$${QUEUED}" -gt 0 ] && [ "$${REPLICAS}" = "0" ]; then
|
||||||
|
|
@ -198,7 +208,8 @@ locals {
|
||||||
}
|
}
|
||||||
|
|
||||||
# tripit's unauthenticated in-cluster queue probe (count only, non-sensitive).
|
# tripit's unauthenticated in-cluster queue probe (count only, non-sensitive).
|
||||||
# A 404 from an older tripit image yields QUEUED=0 -> the gate no-ops.
|
# Probe failures (incl. a 404 from an older tripit image) make the demand
|
||||||
|
# gate take NO action — only an explicit parsed count scales anything.
|
||||||
tripit_queue_url = "http://tripit.tripit.svc.cluster.local:8080/api/tour/tts-queue"
|
tripit_queue_url = "http://tripit.tripit.svc.cluster.local:8080/api/tour/tts-queue"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue