cluster-health: uptime_kuma check — only count status==0 as down

check_uptime_kuma flagged a monitor as down whenever its last heartbeat
status != 1, and treated "no beats" as down too. But uptime-kuma status 2 =
PENDING (mid-retry) and 3 = MAINTENANCE are not outages, and no-beats = no
data. So a monitor caught in a momentary pending/retry state at check time
produced a false "internal/external down(N)" WARN — observed twice on
2026-06-04 (Novelapp, then ha-sofia) for monitors uptime-kuma itself logged
ZERO downs against over 24h (0/2880 and 0/288 beats).

Count a monitor as down ONLY on an explicit DOWN beat (status==0); pending,
maintenance, and no-data are not-down. Real outages still flag (uptime-kuma
persists status==0 beats for genuine downs).
This commit is contained in:
Viktor Barzin 2026-06-04 08:18:54 +00:00
parent bf6ede2b9e
commit 31b8104b43

View file

@ -819,16 +819,20 @@ try:
continue
beats = heartbeats.get(mid, [])
status = None
if beats:
last_beat = beats[-1]
if isinstance(last_beat, list):
last_beat = last_beat[-1] if last_beat else {}
status = last_beat.get("status", 0) if isinstance(last_beat, dict) else 0
status = last_beat.get("status") if isinstance(last_beat, dict) else None
if hasattr(status, "value"):
status = status.value
is_up = (status == 1)
else:
is_up = False
# Only an explicit DOWN (status==0) counts as down. PENDING (2,
# mid-retry) and MAINTENANCE (3) are NOT outages, and no beats = no
# data (not an outage). Counting pending/no-data as down caused
# recurring false WARNs (Novelapp, ha-sofia 2026-06-04) for monitors
# uptime-kuma itself logged 0 downs over 24h.
is_up = (status != 0)
if is_external:
if is_up: