From 31b8104b4321b0cbbac3b380c0abf5f08241cd7e Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Thu, 4 Jun 2026 08:18:54 +0000 Subject: [PATCH] =?UTF-8?q?cluster-health:=20uptime=5Fkuma=20check=20?= =?UTF-8?q?=E2=80=94=20only=20count=20status=3D=3D0=20as=20down?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit check_uptime_kuma flagged a monitor as down whenever its last heartbeat status != 1, and treated "no beats" as down too. But uptime-kuma status 2 = PENDING (mid-retry) and 3 = MAINTENANCE are not outages, and no-beats = no data. So a monitor caught in a momentary pending/retry state at check time produced a false "internal/external down(N)" WARN — observed twice on 2026-06-04 (Novelapp, then ha-sofia) for monitors uptime-kuma itself logged ZERO downs against over 24h (0/2880 and 0/288 beats). Count a monitor as down ONLY on an explicit DOWN beat (status==0); pending, maintenance, and no-data are not-down. Real outages still flag (uptime-kuma persists status==0 beats for genuine downs). --- scripts/cluster_healthcheck.sh | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/scripts/cluster_healthcheck.sh b/scripts/cluster_healthcheck.sh index a63e8caa..f0246eb5 100755 --- a/scripts/cluster_healthcheck.sh +++ b/scripts/cluster_healthcheck.sh @@ -819,16 +819,20 @@ try: continue beats = heartbeats.get(mid, []) + status = None if beats: last_beat = beats[-1] if isinstance(last_beat, list): last_beat = last_beat[-1] if last_beat else {} - status = last_beat.get("status", 0) if isinstance(last_beat, dict) else 0 + status = last_beat.get("status") if isinstance(last_beat, dict) else None if hasattr(status, "value"): status = status.value - is_up = (status == 1) - else: - is_up = False + # Only an explicit DOWN (status==0) counts as down. PENDING (2, + # mid-retry) and MAINTENANCE (3) are NOT outages, and no beats = no + # data (not an outage). Counting pending/no-data as down caused + # recurring false WARNs (Novelapp, ha-sofia 2026-06-04) for monitors + # uptime-kuma itself logged 0 downs over 24h. + is_up = (status != 0) if is_external: if is_up: