ISSUE: Automated cron health checks were failing with 'cluster unreachable' ROOT CAUSE: Cron jobs lack access to kubeconfig (KUBECONFIG env var not set) SOLUTION: Created setup-monitoring.sh script that: ✅ Copies working kubeconfig to expected location (/workspace/infra/config) ✅ Tests health check script functionality ✅ Provides clear feedback on setup status USAGE: ./setup-monitoring.sh (run once to enable automated health checks) REASONING: - Kubeconfig contains secrets, shouldn't be committed to git - Health check script logic: KUBECONFIG_PATH="${KUBECONFIG:-$(pwd)/config}" - Cron jobs run without KUBECONFIG env var, so fall back to /workspace/infra/config - This script bridges the gap between persistent kubeconfig and cron environment VERIFICATION: ✅ Automated health checks now show realistic results (21 PASS, 4 WARN, 1 FAIL) ✅ No more false 'cluster unreachable' alerts from cron jobs The script is idempotent and can be run multiple times safely.
29 lines
No EOL
955 B
Bash
Executable file
29 lines
No EOL
955 B
Bash
Executable file
#!/bin/bash
|
|
# Setup script for automated monitoring environment
|
|
# Ensures health check scripts have access to kubeconfig
|
|
|
|
echo "=== Setting up automated monitoring environment ==="
|
|
|
|
# Copy kubeconfig to location expected by health check scripts
|
|
if [ -f /home/node/.openclaw/kubeconfig ]; then
|
|
cp /home/node/.openclaw/kubeconfig /workspace/infra/config
|
|
echo "✅ Kubeconfig copied to /workspace/infra/config"
|
|
else
|
|
echo "❌ Source kubeconfig not found at /home/node/.openclaw/kubeconfig"
|
|
exit 1
|
|
fi
|
|
|
|
# Test health check access
|
|
echo ""
|
|
echo "Testing health check script access..."
|
|
cd /workspace/infra
|
|
if KUBECONFIG="" timeout 30 bash .claude/cluster-health.sh --quiet > /dev/null 2>&1; then
|
|
echo "✅ Health check script can access cluster"
|
|
else
|
|
echo "❌ Health check script cannot access cluster"
|
|
exit 1
|
|
fi
|
|
|
|
echo ""
|
|
echo "✅ Automated monitoring environment setup complete"
|
|
echo "📊 Cron health checks will now work properly" |