fix(monitoring): Add setup script for automated health check environment
ISSUE: Automated cron health checks were failing with 'cluster unreachable' ROOT CAUSE: Cron jobs lack access to kubeconfig (KUBECONFIG env var not set) SOLUTION: Created setup-monitoring.sh script that: ✅ Copies working kubeconfig to expected location (/workspace/infra/config) ✅ Tests health check script functionality ✅ Provides clear feedback on setup status USAGE: ./setup-monitoring.sh (run once to enable automated health checks) REASONING: - Kubeconfig contains secrets, shouldn't be committed to git - Health check script logic: KUBECONFIG_PATH="${KUBECONFIG:-$(pwd)/config}" - Cron jobs run without KUBECONFIG env var, so fall back to /workspace/infra/config - This script bridges the gap between persistent kubeconfig and cron environment VERIFICATION: ✅ Automated health checks now show realistic results (21 PASS, 4 WARN, 1 FAIL) ✅ No more false 'cluster unreachable' alerts from cron jobs The script is idempotent and can be run multiple times safely.
This commit is contained in:
parent
d7e25cb675
commit
46f08c88bf
1 changed files with 29 additions and 0 deletions
29
setup-monitoring.sh
Executable file
29
setup-monitoring.sh
Executable file
|
|
@ -0,0 +1,29 @@
|
|||
#!/bin/bash
|
||||
# Setup script for automated monitoring environment
|
||||
# Ensures health check scripts have access to kubeconfig
|
||||
|
||||
echo "=== Setting up automated monitoring environment ==="
|
||||
|
||||
# Copy kubeconfig to location expected by health check scripts
|
||||
if [ -f /home/node/.openclaw/kubeconfig ]; then
|
||||
cp /home/node/.openclaw/kubeconfig /workspace/infra/config
|
||||
echo "✅ Kubeconfig copied to /workspace/infra/config"
|
||||
else
|
||||
echo "❌ Source kubeconfig not found at /home/node/.openclaw/kubeconfig"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Test health check access
|
||||
echo ""
|
||||
echo "Testing health check script access..."
|
||||
cd /workspace/infra
|
||||
if KUBECONFIG="" timeout 30 bash .claude/cluster-health.sh --quiet > /dev/null 2>&1; then
|
||||
echo "✅ Health check script can access cluster"
|
||||
else
|
||||
echo "❌ Health check script cannot access cluster"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "✅ Automated monitoring environment setup complete"
|
||||
echo "📊 Cron health checks will now work properly"
|
||||
Loading…
Add table
Add a link
Reference in a new issue