monitoring: add AncaElementsMirror{Stale,Failing} alerts
Some checks failed
ci/woodpecker/push/build-cli Pipeline failed
ci/woodpecker/push/default Pipeline failed

Layer 3a (anca-elements local mirror) now has the same alert coverage
as offsite-sync-backup:
- AncaElementsMirrorStale fires if last_run_timestamp > 16d
  (2 weekly cycles, matches the 8d → 9d slack used elsewhere)
- AncaElementsMirrorFailing fires if last_status != 0

BackupDiskFull (existing) covers the sda fill-up risk at 85%.

Not applied this commit — pick up on next monitoring stack apply.
This commit is contained in:
Viktor Barzin 2026-05-24 11:55:19 +00:00
parent 6db64fe060
commit 416c2a0468

View file

@ -1562,6 +1562,20 @@ serverFiles:
severity: warning
annotations:
summary: "Offsite backup sync is {{ $value | humanizeDuration }} old (threshold: 9d)"
- alert: AncaElementsMirrorStale
expr: (time() - anca_elements_mirror_last_run_timestamp{job="anca-elements-mirror"}) > 1382400
for: 30m
labels:
severity: warning
annotations:
summary: "anca-elements mirror is {{ $value | humanizeDuration }} old (threshold: 16d / 2 weekly cycles)"
- alert: AncaElementsMirrorFailing
expr: anca_elements_mirror_last_status{job="anca-elements-mirror"} != 0
for: 0m
labels:
severity: warning
annotations:
summary: "anca-elements mirror last run failed (status={{ $value }})"
- alert: BackupDiskFull
expr: (1 - node_filesystem_avail_bytes{job="proxmox-host", mountpoint="/mnt/backup"} / node_filesystem_size_bytes{job="proxmox-host", mountpoint="/mnt/backup"}) > 0.85
for: 15m