Commit graph

53 commits

Author SHA1 Message Date
Viktor Barzin
534fcdbfe3
adjust batter low alert to fire only when there is no pwoer [ci skip] 2025-03-22 15:47:30 +00:00
Viktor Barzin
987fc402b5
disable alert for pods less than in spec [ci skip] 2025-03-16 18:27:13 +00:00
Viktor Barzin
72bedfdd6e
disable perms errors and server errors for grafana and nextcloud ingresses as they were too noisy [ci skip] 2025-03-15 17:53:24 +00:00
Viktor Barzin
f7eff3cb74
add alert for ups low battery remaining [ci skip] 2025-03-02 20:48:07 +00:00
Viktor Barzin
095624a337
increase low voltage alert to 10 min [ci skip] 2025-03-01 14:28:56 +00:00
Viktor Barzin
5ef9ba5917
increase interval for 500 alerts to 20m [ci skip] 2025-01-10 20:47:25 +00:00
Viktor Barzin
aeee71751f
move prometheus alerts to different channel and move high cpu period [ci skip] 2025-01-04 14:27:48 +00:00
Viktor Barzin
3473f64670
increase idle power threshold to 130w [ci skip] 2025-01-03 17:49:24 +00:00
Viktor Barzin
4b725b02a6
add alert status to message [ci skip] 2025-01-02 21:13:09 +00:00
Viktor Barzin
c7113fa495
update prometheus alerts to be correctly grouped and sent to slack and deprecate some old ones [ci skip] 2025-01-02 20:33:55 +00:00
Viktor Barzin
9b0d686873
update prometheus chart values to get slack notiifcations to work and add alerts for 4xx and 5xx on ingress [ci skip] 2025-01-01 11:39:16 +00:00
Viktor Barzin
40f4354316
fix monitoring stack [ci skip] 2024-12-31 17:15:06 +00:00
Viktor Barzin
ce90629b54
add low voltage alert to prometheus and update some dashboards [ci skip] 2024-12-23 18:21:01 +00:00
Viktor Barzin
fbe305a891 add ups snmp exporter to prometheus [ci skip] 2024-12-15 18:13:33 +00:00
Viktor Barzin
185a944acd
replace oauth proxy with authentik auth [ci skip] 2024-11-18 22:06:31 +00:00
Viktor Barzin
64f81621c8 add homepage module and some more integrations [ci skip] 2024-10-20 13:05:03 +00:00
Viktor Barzin
b54fbf72fd
add meshcentral and diun[ci skip] 2024-08-18 18:14:22 +00:00
Viktor Barzin
506b4a2f87
reduce prometheus storage retention from 12w -> 8w to save ~30gb [ci skip] 2024-08-07 20:18:13 +00:00
Viktor Barzin
828f3f115a
update old prometheus alert detectors and upgrade immich to 101 [ci skip] 2024-04-12 21:15:31 +00:00
Viktor Barzin
8afbec0d23
remove hack for london openwrt monitoring after having tailscale now [ci skip] 2024-03-30 18:28:11 +00:00
Viktor Barzin
e5061dec27
update openwrt london prometheus target address [ci skip] 2024-03-29 22:20:29 +00:00
Viktor Barzin
215deb5568
add monitoring jobs to p8s for istiod and the service mesh [ci skip] 2024-01-07 17:47:36 +00:00
Viktor Barzin
15bade148c
upgrade prometheus helm chart [ci skip] 2023-12-25 21:40:19 +00:00
Viktor Barzin
e3a8cd16b4
add baseurl to prometheus helm to chart so alertmanager sends correct links with prometheus public url instead of podname [ci skip] 2023-12-25 13:48:19 +00:00
Viktor Barzin
3019f1cca8 add prometheus monitoring to crowdsec [ci skip] 2023-11-25 13:34:16 +00:00
Viktor Barzin
73d63b7713
add alert if node memory exceeds 90% [ci skip] 2023-11-10 22:48:45 +00:00
Viktor Barzin
1afb83e426
use .lan domain for idrac metrics scrape [ci skip] 2023-11-01 20:44:17 +00:00
Viktor Barzin
3c394e0e82
update redifhs exporter to new implementation [ci skip] 2023-10-24 11:44:19 +00:00
Viktor Barzin
50b57e1373
replace tls client cert auth with oauth and add localai stub [ci skip] 2023-10-22 14:07:18 +00:00
Viktor Barzin
9b5ed514cd
add alert on new client registration and update dns to use pfsense [ci skip] 2023-09-18 08:03:50 +00:00
Viktor Barzin
cd47f924b7
disable email notifications as they are spammy and using sendgrid quota [ci skip] 2023-06-20 14:04:02 +00:00
viktorbarzin
c87376b670
remove 1gb limit for tsdb to confirm it was the root cause for memory issues [ci skip] 2023-04-21 23:04:39 +01:00
viktorbarzin
64491f9028
attempt ot reduce prometheus memory by setting --storage.tsdb.retention.size; laso add metrics-api which is not working atm [ci skip] 2023-04-17 01:28:03 +01:00
viktorbarzin
a75e647b48
add alert for unhandled exceptions [ci skip] 2023-04-05 00:15:10 +01:00
viktorbarzin
9fe57e2ec6
reword finance app webhook exception alerting [ci skip] 2023-04-03 23:35:33 +01:00
viktorbarzin
6838d319a2
add counter for overall webhook failures 2023-04-03 22:37:59 +01:00
viktorbarzin
e9bf46caf8
add alert to monitor free memory on nodes [ci skip] 2023-03-26 17:22:04 +01:00
viktorbarzin
cbdd5d57ce
update idrac dns record to fix resolving issue [ci skip] 2023-02-15 21:24:17 +00:00
viktorbarzin
f8900ccf92 add openwrt to monitored guests 2023-01-23 21:38:46 +00:00
viktorbarzin
6c425b5573 reduce high power usage alert sensitivity 2022-01-11 20:24:14 +00:00
viktorbarzin
0b20fc1e73
add slack to notifications and update alert definitions after upgrade [ci skip] 2022-01-06 20:09:20 +00:00
viktorbarzin
b6f8160183
update home prometheus path 2021-09-05 18:39:18 +01:00
viktorbarzin
755942ee73
update home address [CI SKIP] 2021-08-17 23:25:13 +01:00
viktorbarzin
ae5cb2349b do not use pvcs for alertmanager and allow overlapping blocks in prometues [ci skip] 2021-05-03 12:14:56 +01:00
viktorbarzin
28cb35da94
fix summary in alerts [ci skip] 2021-04-11 19:18:56 +01:00
viktorbarzin
b9c9d82a03 add more alerts for services being down 2021-04-10 18:28:26 +01:00
viktorbarzin
0c77660e93 add prometheus check for mailserver down 2021-04-10 18:16:04 +01:00
viktorbarzin
12e46fad2a
add redfish exporter [CI SKIP] 2021-04-05 15:07:29 +01:00
viktorbarzin
81e3f47f18
add config for chatbot and add alert for high mem openwrt [CI SKIP] 2021-03-07 23:14:43 +00:00
viktorbarzin
2b6de2113f
add value to alert summaries 2021-02-19 18:58:36 +00:00