actualbudget+monitoring: per-account bank-sync metrics, drop noisy alert
The bank-sync CronJob was posting to /accounts/banksync which fans out to
ALL accounts in a single call. With PSD2/GoCardless's 4-successful-pulls
per-account per-24h quota, a single rate-limited account would 500 the
whole call, and `bank_sync_success` would flip to 0 even though the data
itself was still flowing through manual UI syncs. Result: BankSyncFailing
fired routinely whenever the user had been active in the UI that day —
a structural false positive.
Fix:
* CronJob: enumerate accounts via GET /accounts, POST per-account
/accounts/{id}/banksync, emit bank_sync_account_success and
bank_sync_account_last_success_timestamp labelled by account name.
Roll up bank_sync_success = 1 iff any account succeeded.
* Alerts: drop BankSyncFailing (noise generator). Keep BankSyncStale
at 48h (global drought). Add BankSyncAccountStale at 72h (catches
single-account auth expiry — the real signal we wanted).
Verified: manual run on bank-sync-viktor pushes 6 per-account success +
timestamp series; roll-up bank_sync_success=1; no firing alerts.
This commit is contained in:
parent
7b6eee49c4
commit
665b6b2934
2 changed files with 92 additions and 43 deletions
|
|
@ -2152,20 +2152,24 @@ serverFiles:
|
|||
severity: warning
|
||||
annotations:
|
||||
summary: "Mail server has no available replicas - mail may not be received"
|
||||
- alert: BankSyncFailing
|
||||
expr: bank_sync_success == 0
|
||||
for: 6h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Bank sync failing. Accounts may need GoCardless reauthorization. Check Pushgateway for which instance."
|
||||
# Note: no BankSyncFailing alert — GoCardless enforces per-account
|
||||
# PSD2 quotas (4 successful pulls per account per 24h). Manual UI
|
||||
# syncs consume the same quota, so the nightly cron routinely hits
|
||||
# rate-limits without any real outage. Alert only on staleness.
|
||||
- alert: BankSyncStale
|
||||
expr: (time() - bank_sync_last_success_timestamp) > 172800
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Bank sync has not succeeded in more than 48h. Check CronJob and account auth."
|
||||
summary: "Bank sync (instance {{ $labels.instance }}): NO account has synced in over 48h. Likely a real outage — check CronJob, http-api logs, and GoCardless re-auth."
|
||||
- alert: BankSyncAccountStale
|
||||
expr: (time() - bank_sync_account_last_success_timestamp) > 259200
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Bank sync (instance {{ $labels.instance }}): account {{ $labels.account }} has not synced in over 72h. GoCardless requisition may have expired — re-link in Settings → Bank Sync."
|
||||
- alert: EmailRoundtripFailing
|
||||
expr: email_roundtrip_success{job="email-roundtrip-monitor"} == 0
|
||||
for: 60m
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue