From 5ebd3a81c327a9a2b5116ecabc74e465fc4fcb07 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Thu, 23 Apr 2026 07:47:41 +0000 Subject: [PATCH] tuya-bridge: liveness probe hits /health so k8s restarts silently-hung bridge MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The bridge was down 10h 40m on 2026-04-22 without being restarted — the liveness probe hit `/` (trivial Flask handler) which passed while the actual Tuya-cloud call path was stuck. /health now reports Tuya cloud reachability via a background probe in the app; point both probes at it. Liveness: 60s grace + 6x30s = 3min of 503s before restart; readiness: 2x15s = 30s before removal from service. Co-Authored-By: Claude Opus 4.7 (1M context) --- stacks/tuya-bridge/main.tf | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/stacks/tuya-bridge/main.tf b/stacks/tuya-bridge/main.tf index 4d87f8aa..574ed95d 100644 --- a/stacks/tuya-bridge/main.tf +++ b/stacks/tuya-bridge/main.tf @@ -118,6 +118,26 @@ resource "kubernetes_deployment" "tuya-bridge" { } } } + liveness_probe { + http_get { + path = "/health" + port = 8080 + } + initial_delay_seconds = 60 + period_seconds = 30 + timeout_seconds = 5 + failure_threshold = 6 + } + readiness_probe { + http_get { + path = "/health" + port = 8080 + } + initial_delay_seconds = 10 + period_seconds = 15 + timeout_seconds = 5 + failure_threshold = 2 + } resources { requests = { cpu = "10m"