infra/.claude
Viktor Barzin 1639910043 ingress latency: add histogram buckets, fix restarts, right-size memory
- Traefik: add fine-grained Prometheus histogram buckets (0.01-30s) for meaningful P50/P99
- Calibre: relax liveness probe (timeout 5→10s, threshold 3→6) to stop NFS-caused restarts
- Novelapp: increase memory 128Mi/256Mi → 640Mi/640Mi (confirmed OOMKilled, VPA upper 505Mi)
- Forgejo: increase memory 256Mi → 384Mi (at 80% of limit, VPA upper 311Mi)
- ActualBudget: add explicit resources to prevent silent LimitRange defaults
- Docs: update Nextcloud note from 4Gi → 8Gi limit (Apache spike history)
2026-03-23 10:52:43 +02:00
..
commands [ci skip] update kubectl skill to use local kubeconfig 2026-02-07 13:42:35 +00:00
post-mortems add NFS/CSI cascade failure post-mortem [ci skip] 2026-03-23 02:25:11 +02:00
reference add infrastructure agent team: 8 specialized agents + 14 diagnostic scripts 2026-03-15 02:01:07 +00:00
scripts post-mortem v2: pipeline team architecture with 4-stage agents [ci skip] 2026-03-16 21:59:34 +00:00
skills remove duplicate deploy-app skill, now global agent [ci skip] 2026-03-23 00:17:53 +02:00
calendar-query.py sync regenerated providers.tf + upstream changes 2026-03-22 02:56:04 +02:00
CLAUDE.md ingress latency: add histogram buckets, fix restarts, right-size memory 2026-03-23 10:52:43 +02:00
cluster-health.sh update claude knowledge: OpenClaw deployment and tg wrapper learnings [ci skip] 2026-03-14 23:42:17 +00:00
home-assistant-sofia.py [ci skip] Add ha-sofia Home Assistant deployment to skills 2026-02-07 21:26:05 +00:00
home-assistant.py add claude [ci skip] 2026-02-06 20:10:02 +00:00
internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK add claude [ci skip] 2026-02-06 20:10:02 +00:00
pfsense.py [ci skip] Add pfSense firewall management skill 2026-02-14 12:42:10 +00:00
settings.json add claude files [ci skip] 2026-01-18 15:40:43 +00:00