infra

OpenClaw f30c62ee5c feat(health-check): Add Prometheus-based CPU and power monitoring SECTIONS ADDED: - Section 25: Advanced CPU Monitoring (Prometheus node_exporter metrics) - Section 26: Power Monitoring (DCGM GPU power + host power) FEATURES: - 5-minute CPU usage averages (more accurate than kubectl top) - Tesla T4 GPU power consumption monitoring - CPU thresholds: 70% warn, 85% critical - GPU power thresholds: 50W active, 65W high - Maps IP addresses to friendly node names - Integrates with existing health check infrastructure CURRENT STATUS: - All nodes have healthy disk usage (~10%) - k8s-node4 flagged at 87% CPU (explains resource pressure) - GPU operating normally at 30.9W - Enhanced monitoring prevents issues like node2 containerd corruption Total health check sections: 26 (was 24) Addresses node2 incident prevention requirements		2026-03-17 16:51:02 +00:00
..
agents	[ci skip] refactor claude files: compact CLAUDE.md, clean memory, remove generic agents	2026-03-06 23:27:46 +00:00
commands	[ci skip] update kubectl skill to use local kubeconfig	2026-02-07 13:42:35 +00:00
reference	resource quota review: fix OOM risks, close quota gaps, add HA protections	2026-03-08 18:17:46 +00:00
skills	[ci skip] claudeception: extract 2 skills from today's session	2026-03-07 15:46:36 +00:00
calendar-query.py	add claude [ci skip]	2026-02-06 20:10:02 +00:00
CLAUDE.md	[ci skip] add sealed secrets convention: fileset + kubernetes_manifest pattern	2026-03-08 20:03:50 +00:00
cluster-health.sh	feat(health-check): Add Prometheus-based CPU and power monitoring	2026-03-17 16:51:02 +00:00
home-assistant-sofia.py	[ci skip] Add ha-sofia Home Assistant deployment to skills	2026-02-07 21:26:05 +00:00
home-assistant.py	add claude [ci skip]	2026-02-06 20:10:02 +00:00
internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK	add claude [ci skip]	2026-02-06 20:10:02 +00:00
pfsense.py	[ci skip] Add pfSense firewall management skill	2026-02-14 12:42:10 +00:00
settings.json	add claude files [ci skip]	2026-01-18 15:40:43 +00:00