Commit graph

22 commits

Author SHA1 Message Date
viktorbarzin
c87376b670
remove 1gb limit for tsdb to confirm it was the root cause for memory issues [ci skip] 2023-04-21 23:04:39 +01:00
viktorbarzin
64491f9028
attempt ot reduce prometheus memory by setting --storage.tsdb.retention.size; laso add metrics-api which is not working atm [ci skip] 2023-04-17 01:28:03 +01:00
viktorbarzin
a75e647b48
add alert for unhandled exceptions [ci skip] 2023-04-05 00:15:10 +01:00
viktorbarzin
9fe57e2ec6
reword finance app webhook exception alerting [ci skip] 2023-04-03 23:35:33 +01:00
viktorbarzin
6838d319a2
add counter for overall webhook failures 2023-04-03 22:37:59 +01:00
viktorbarzin
e9bf46caf8
add alert to monitor free memory on nodes [ci skip] 2023-03-26 17:22:04 +01:00
viktorbarzin
cbdd5d57ce
update idrac dns record to fix resolving issue [ci skip] 2023-02-15 21:24:17 +00:00
viktorbarzin
f8900ccf92 add openwrt to monitored guests 2023-01-23 21:38:46 +00:00
viktorbarzin
6c425b5573 reduce high power usage alert sensitivity 2022-01-11 20:24:14 +00:00
viktorbarzin
0b20fc1e73
add slack to notifications and update alert definitions after upgrade [ci skip] 2022-01-06 20:09:20 +00:00
viktorbarzin
b6f8160183
update home prometheus path 2021-09-05 18:39:18 +01:00
viktorbarzin
755942ee73
update home address [CI SKIP] 2021-08-17 23:25:13 +01:00
viktorbarzin
ae5cb2349b do not use pvcs for alertmanager and allow overlapping blocks in prometues [ci skip] 2021-05-03 12:14:56 +01:00
viktorbarzin
28cb35da94
fix summary in alerts [ci skip] 2021-04-11 19:18:56 +01:00
viktorbarzin
b9c9d82a03 add more alerts for services being down 2021-04-10 18:28:26 +01:00
viktorbarzin
0c77660e93 add prometheus check for mailserver down 2021-04-10 18:16:04 +01:00
viktorbarzin
12e46fad2a
add redfish exporter [CI SKIP] 2021-04-05 15:07:29 +01:00
viktorbarzin
81e3f47f18
add config for chatbot and add alert for high mem openwrt [CI SKIP] 2021-03-07 23:14:43 +00:00
viktorbarzin
2b6de2113f
add value to alert summaries 2021-02-19 18:58:36 +00:00
viktorbarzin
8a9fff6799
revert debug value for node load alert 2021-02-17 18:33:52 +00:00
viktorbarzin
95750c6949
fix typo from templating which caused missing metrics and add alerts to prevent that from happening again 2021-02-10 23:14:09 +00:00
viktorbarzin
4caa987213
initial 2021-02-08 20:02:17 +00:00