monitoring: add AncaElementsMirror{Stale,Failing} alerts
Layer 3a (anca-elements local mirror) now has the same alert coverage as offsite-sync-backup: - AncaElementsMirrorStale fires if last_run_timestamp > 16d (2 weekly cycles, matches the 8d → 9d slack used elsewhere) - AncaElementsMirrorFailing fires if last_status != 0 BackupDiskFull (existing) covers the sda fill-up risk at 85%. Not applied this commit — pick up on next monitoring stack apply.
This commit is contained in:
parent
6db64fe060
commit
416c2a0468
1 changed files with 14 additions and 0 deletions
|
|
@ -1562,6 +1562,20 @@ serverFiles:
|
|||
severity: warning
|
||||
annotations:
|
||||
summary: "Offsite backup sync is {{ $value | humanizeDuration }} old (threshold: 9d)"
|
||||
- alert: AncaElementsMirrorStale
|
||||
expr: (time() - anca_elements_mirror_last_run_timestamp{job="anca-elements-mirror"}) > 1382400
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "anca-elements mirror is {{ $value | humanizeDuration }} old (threshold: 16d / 2 weekly cycles)"
|
||||
- alert: AncaElementsMirrorFailing
|
||||
expr: anca_elements_mirror_last_status{job="anca-elements-mirror"} != 0
|
||||
for: 0m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "anca-elements mirror last run failed (status={{ $value }})"
|
||||
- alert: BackupDiskFull
|
||||
expr: (1 - node_filesystem_avail_bytes{job="proxmox-host", mountpoint="/mnt/backup"} / node_filesystem_size_bytes{job="proxmox-host", mountpoint="/mnt/backup"}) > 0.85
|
||||
for: 15m
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue