Commit graph

4 commits

Author SHA1 Message Date
Viktor Barzin
98b711ff8d [ci skip] Extend cluster healthcheck from 14 to 24 checks
Add 10 new checks covering gaps discovered during incident response:
ResourceQuota pressure, StatefulSets, node disk usage, Helm release
health, Kyverno policy engine, NFS connectivity, DNS resolution,
TLS certificate expiry, GPU health, and Cloudflare tunnel status.
2026-02-21 23:57:04 +00:00
Viktor Barzin
038d4434c4 [ci skip] Fix health check false positives for completed CronJob pods 2026-02-21 19:56:39 +00:00
Viktor Barzin
2bae6ccce3 Add Uptime Kuma monitor check to cluster health script [ci skip]
Adds check #14 that queries Uptime Kuma API for application-level
monitor status, complementing the kubectl-level checks with HTTP/ping
health data. Reports down monitors by name with PASS/WARN/FAIL thresholds.
2026-02-15 17:49:40 +00:00
Viktor Barzin
9c4ff21d58 Add cluster health check script with 13 diagnostic sections [ci skip] 2026-02-15 17:34:22 +00:00