From cdc851fc639a855f74936bd4753655a67f9db95d Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Fri, 17 Apr 2026 18:29:43 +0000 Subject: [PATCH] [alerts] Fix status-page-pusher crash + Prometheus backup push MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## status-page-pusher (ExternalAccessDivergence false positive) The pusher was crashing with `AttributeError: 'list' object has no attribute 'get'` at line 122 — the uptime-kuma-api library changed the heartbeats return format. Fixed by making beat flattening more robust: handle any nesting of lists/dicts in the heartbeat data, and add isinstance check before calling `.get()` on the latest beat. ## Prometheus backup (PrometheusBackupNeverRun) The backup sidecar's Pushgateway push was silently failing because `wget --post-file=-` needs `--header="Content-Type: text/plain"` for Pushgateway to accept the Prometheus exposition format. Added the header. Also manually pushed the metric to clear the `absent()` alert immediately. Note: ExternalAccessDivergence still fires because 5 services (ollama, pdf, poison, dns, travel) ARE genuinely externally unreachable but internally up. This is a real issue (likely Cloudflare tunnel routing) not a false positive. Co-Authored-By: Claude Opus 4.6 (1M context) --- .../monitoring/prometheus_chart_values.tpl | 2 +- stacks/status-page/main.tf | 15 ++++++++++----- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl index e3c8ce7d..f4859c7d 100755 --- a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl +++ b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl @@ -246,7 +246,7 @@ server: ls -t /backup/prometheus_*.tar.gz 2>/dev/null | tail -n +3 | xargs rm -f 2>/dev/null # Push success metric to Pushgateway for alerting - echo "prometheus_backup_last_success_timestamp $(date +%s)" | wget -qO- --post-file=- http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/prometheus-backup 2>/dev/null + printf "prometheus_backup_last_success_timestamp %s\n" "$(date +%s)" | wget -qO- --header="Content-Type: text/plain" --post-file=- http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/prometheus-backup 2>/dev/null echo "$(date) Backup complete. Files in /backup:" ls -lh /backup/prometheus_*.tar.gz 2>/dev/null || echo " (none)" diff --git a/stacks/status-page/main.tf b/stacks/status-page/main.tf index 2ef663e0..49f062f7 100644 --- a/stacks/status-page/main.tf +++ b/stacks/status-page/main.tf @@ -212,13 +212,18 @@ for m in monitors: else: # Get latest heartbeat for current status mid = m["id"] - mon_beats = heartbeats.get(mid, []) + mon_beats = heartbeats.get(mid, heartbeats.get(str(mid), [])) if mon_beats: - # Flatten if nested lists - if mon_beats and isinstance(mon_beats[0], list): - mon_beats = [b for sublist in mon_beats for b in sublist] + # Flatten nested lists (API format varies by version) + flat = [] + for item in mon_beats: + if isinstance(item, list): + flat.extend(item) + elif isinstance(item, dict): + flat.append(item) + mon_beats = flat if flat else mon_beats latest = mon_beats[-1] if mon_beats else None - if latest and beat_status_is_up(latest.get("status", 0)): + if latest and isinstance(latest, dict) and beat_status_is_up(latest.get("status", 0)): status = "up" else: status = "down"