Commit graph

12 commits

Author SHA1 Message Date
Viktor Barzin
1742a79fb2
[ci skip] Fix Uptime Kuma false-down reports: use bulk heartbeat API instead of per-monitor calls 2026-02-22 01:37:28 +00:00
Viktor Barzin
86d1d50ad0
[ci skip] Extend cluster healthcheck from 14 to 24 checks
Add 10 new checks covering gaps discovered during incident response:
ResourceQuota pressure, StatefulSets, node disk usage, Helm release
health, Kyverno policy engine, NFS connectivity, DNS resolution,
TLS certificate expiry, GPU health, and Cloudflare tunnel status.
2026-02-21 23:57:04 +00:00
Viktor Barzin
6a20aa2f3e
[ci skip] Fix health check false positives for completed CronJob pods 2026-02-21 19:56:39 +00:00
Viktor Barzin
7ef23470cd
Add Uptime Kuma monitor check to cluster health script [ci skip]
Adds check #14 that queries Uptime Kuma API for application-level
monitor status, complementing the kubectl-level checks with HTTP/ping
health data. Reports down monitors by name with PASS/WARN/FAIL thresholds.
2026-02-15 17:49:40 +00:00
Viktor Barzin
8867769a75
Add cluster health check script with 13 diagnostic sections [ci skip] 2026-02-15 17:34:22 +00:00
Viktor Barzin
36d32b49e7
[ci skip] Fix pull-through cache for all registries
Replace deprecated wildcard containerd mirror with per-registry
config_path approach. Add proxy containers for ghcr.io, quay.io,
registry.k8s.io, and reg.kyverno.io on the docker-registry VM.
Set static IP for docker-registry VM to avoid DHCP issues.
2026-02-15 14:35:52 +00:00
Viktor Barzin
9df9ab1654
[ci skip] Add extend-vm-storage script and skills
- Script to automate K8s node VM disk expansion (drain, shutdown, resize, boot, expand FS, uncordon)
- Skill docs for the workflow and troubleshooting pitfalls (growpart, macOS grep -P, drain timeouts)
- Successfully tested on k8s-node2, k8s-node3, k8s-node4 (64G → 128G)
2026-02-13 22:08:46 +00:00
Viktor Barzin
2377045630
[ci skip] sync tfstate and add frigate helper scripts 2026-02-12 23:11:23 +00:00
Viktor Barzin
579c128e8f
upgrade to k8s 1.34.2 [ci skip] 2025-12-18 12:37:14 +00:00
Viktor Barzin
38eb7e66bd
scale down calibre-web-automated instead of calibre [ci skip] 2025-12-06 22:04:41 +00:00
Viktor Barzin
4ac543413a
some nits on the registry manager script - note it is still not working correctly [ci skip] 2025-10-17 19:23:43 +00:00
Viktor Barzin
9b29c78742
move helper scripts in scripts dir [ci skip] 2025-10-11 17:14:59 +00:00