Commit graph

5 commits

Author SHA1 Message Date
Viktor Barzin
1742a79fb2
[ci skip] Fix Uptime Kuma false-down reports: use bulk heartbeat API instead of per-monitor calls 2026-02-22 01:37:28 +00:00
Viktor Barzin
86d1d50ad0
[ci skip] Extend cluster healthcheck from 14 to 24 checks
Add 10 new checks covering gaps discovered during incident response:
ResourceQuota pressure, StatefulSets, node disk usage, Helm release
health, Kyverno policy engine, NFS connectivity, DNS resolution,
TLS certificate expiry, GPU health, and Cloudflare tunnel status.
2026-02-21 23:57:04 +00:00
Viktor Barzin
6a20aa2f3e
[ci skip] Fix health check false positives for completed CronJob pods 2026-02-21 19:56:39 +00:00
Viktor Barzin
7ef23470cd
Add Uptime Kuma monitor check to cluster health script [ci skip]
Adds check #14 that queries Uptime Kuma API for application-level
monitor status, complementing the kubectl-level checks with HTTP/ping
health data. Reports down monitors by name with PASS/WARN/FAIL thresholds.
2026-02-15 17:49:40 +00:00
Viktor Barzin
8867769a75
Add cluster health check script with 13 diagnostic sections [ci skip] 2026-02-15 17:34:22 +00:00