add infrastructure agent team: 8 specialized agents + 14 diagnostic scripts

Agents: devops-engineer, dba, security-engineer, sre, network-engineer, platform-engineer, observability-engineer, home-automation-engineer. Scripts: deploy-status, db-health, backup-verify, tls-check, crowdsec-status, authentik-audit, oom-investigator, resource-report, dns-check, network-health, nfs-health, truenas-status, platform-status, monitoring-health. Also: known-issues.md suppression list, cluster-health-checker port-forward fix.
2026-03-15 02:01:07 +00:00 · 2026-03-15 02:01:07 +00:00 · ff83ec3325
commit ff83ec3325
parent fca4e02c54
24 changed files with 3153 additions and 1 deletions
--- a/.claude/agents/cluster-health-checker.md
+++ b/.claude/agents/cluster-health-checker.md
@ -25,7 +25,7 @@ Run the cluster healthcheck script and interpret the results. If issues are foun
   - **Problematic pods**: `kubectl describe pod`, `kubectl logs --previous`
   - **Failed deployments**: check rollout status, events
   - **StatefulSet issues**: check pod readiness, GR status for MySQL
-   - **Prometheus alerts**: query via port-forward to prometheus-server
+   - **Prometheus alerts**: query via kubectl exec into prometheus-server
 4. Apply safe auto-fixes:
   - Delete evicted/failed pods: `kubectl delete pods -A --field-selector=status.phase=Failed`
   - Delete stale failed jobs: `kubectl delete jobs -n <ns> --field-selector=status.successful=0`