cluster-health: uptime_kuma check — only count status==0 as down
check_uptime_kuma flagged a monitor as down whenever its last heartbeat status != 1, and treated "no beats" as down too. But uptime-kuma status 2 = PENDING (mid-retry) and 3 = MAINTENANCE are not outages, and no-beats = no data. So a monitor caught in a momentary pending/retry state at check time produced a false "internal/external down(N)" WARN — observed twice on 2026-06-04 (Novelapp, then ha-sofia) for monitors uptime-kuma itself logged ZERO downs against over 24h (0/2880 and 0/288 beats). Count a monitor as down ONLY on an explicit DOWN beat (status==0); pending, maintenance, and no-data are not-down. Real outages still flag (uptime-kuma persists status==0 beats for genuine downs).
This commit is contained in:
parent
bf6ede2b9e
commit
31b8104b43
1 changed files with 8 additions and 4 deletions
|
|
@ -819,16 +819,20 @@ try:
|
|||
continue
|
||||
|
||||
beats = heartbeats.get(mid, [])
|
||||
status = None
|
||||
if beats:
|
||||
last_beat = beats[-1]
|
||||
if isinstance(last_beat, list):
|
||||
last_beat = last_beat[-1] if last_beat else {}
|
||||
status = last_beat.get("status", 0) if isinstance(last_beat, dict) else 0
|
||||
status = last_beat.get("status") if isinstance(last_beat, dict) else None
|
||||
if hasattr(status, "value"):
|
||||
status = status.value
|
||||
is_up = (status == 1)
|
||||
else:
|
||||
is_up = False
|
||||
# Only an explicit DOWN (status==0) counts as down. PENDING (2,
|
||||
# mid-retry) and MAINTENANCE (3) are NOT outages, and no beats = no
|
||||
# data (not an outage). Counting pending/no-data as down caused
|
||||
# recurring false WARNs (Novelapp, ha-sofia 2026-06-04) for monitors
|
||||
# uptime-kuma itself logged 0 downs over 24h.
|
||||
is_up = (status != 0)
|
||||
|
||||
if is_external:
|
||||
if is_up:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue