infra/.claude
OpenClaw 2d5f44d1b3 feat(monitoring): Enhance disk monitoring and containerd GC after node2 incident
IMMEDIATE CHANGES (Active Now):
- Lower disk warning threshold: 70% → WARN, 85% → FAIL (was 80%/90%)
- More aggressive alerting to prevent containerd corruption
- Enhanced cluster health check disk monitoring

INFRASTRUCTURE CHANGES (Requires Terraform Apply):
- Add containerd garbage collection configuration (30min intervals)
- More aggressive kubelet eviction policies (15%/20% vs 10%/15%)
- Enhanced disk space protection to prevent node2-type failures

Root Cause: node2 disk exhaustion corrupted containerd image store
Prevention: Proactive monitoring + aggressive cleanup policies

[ci skip] - Infrastructure changes require SOPS access for apply
2026-03-17 16:51:02 +00:00
..
agents [ci skip] refactor claude files: compact CLAUDE.md, clean memory, remove generic agents 2026-03-06 23:27:46 +00:00
commands [ci skip] update kubectl skill to use local kubeconfig 2026-02-07 13:42:35 +00:00
reference resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
skills [ci skip] claudeception: extract 2 skills from today's session 2026-03-07 15:46:36 +00:00
calendar-query.py add claude [ci skip] 2026-02-06 20:10:02 +00:00
CLAUDE.md [ci skip] add sealed secrets convention: fileset + kubernetes_manifest pattern 2026-03-08 20:03:50 +00:00
cluster-health.sh feat(monitoring): Enhance disk monitoring and containerd GC after node2 incident 2026-03-17 16:51:02 +00:00
home-assistant-sofia.py [ci skip] Add ha-sofia Home Assistant deployment to skills 2026-02-07 21:26:05 +00:00
home-assistant.py add claude [ci skip] 2026-02-06 20:10:02 +00:00
internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK add claude [ci skip] 2026-02-06 20:10:02 +00:00
pfsense.py [ci skip] Add pfSense firewall management skill 2026-02-14 12:42:10 +00:00
settings.json add claude files [ci skip] 2026-01-18 15:40:43 +00:00