Commit graph

44 commits

Author SHA1 Message Date
Viktor Barzin
53d8b2d2c6 update prometheus alerts to be correctly grouped and sent to slack and deprecate some old ones [ci skip] 2025-01-02 20:33:55 +00:00
Viktor Barzin
48a0deb283 update prometheus chart values to get slack notiifcations to work and add alerts for 4xx and 5xx on ingress [ci skip] 2025-01-01 11:39:16 +00:00
Viktor Barzin
7336e7c033 fix monitoring stack [ci skip] 2024-12-31 17:15:06 +00:00
Viktor Barzin
5ee5e59e61 add low voltage alert to prometheus and update some dashboards [ci skip] 2024-12-23 18:21:01 +00:00
Viktor Barzin
c987301c48 add ups snmp exporter to prometheus [ci skip] 2024-12-15 18:13:33 +00:00
Viktor Barzin
72d780c26f replace oauth proxy with authentik auth [ci skip] 2024-11-18 22:06:31 +00:00
Viktor Barzin
cf39034bdf add homepage module and some more integrations [ci skip] 2024-10-20 13:05:03 +00:00
Viktor Barzin
ead57fe29b add meshcentral and diun[ci skip] 2024-08-18 18:14:22 +00:00
Viktor Barzin
c05d088598 reduce prometheus storage retention from 12w -> 8w to save ~30gb [ci skip] 2024-08-07 20:18:13 +00:00
Viktor Barzin
84b707fcf8 update old prometheus alert detectors and upgrade immich to 101 [ci skip] 2024-04-12 21:15:31 +00:00
Viktor Barzin
ffd5d970e0 remove hack for london openwrt monitoring after having tailscale now [ci skip] 2024-03-30 18:28:11 +00:00
Viktor Barzin
09d07639c9 update openwrt london prometheus target address [ci skip] 2024-03-29 22:20:29 +00:00
Viktor Barzin
1f8ef28435 add monitoring jobs to p8s for istiod and the service mesh [ci skip] 2024-01-07 17:47:36 +00:00
Viktor Barzin
6e38cb420d upgrade prometheus helm chart [ci skip] 2023-12-25 21:40:19 +00:00
Viktor Barzin
2f61744528 add baseurl to prometheus helm to chart so alertmanager sends correct links with prometheus public url instead of podname [ci skip] 2023-12-25 13:48:19 +00:00
Viktor Barzin
3e022b918d add prometheus monitoring to crowdsec [ci skip] 2023-11-25 13:34:16 +00:00
Viktor Barzin
6b30a0e533 add alert if node memory exceeds 90% [ci skip] 2023-11-10 22:48:45 +00:00
Viktor Barzin
2fe0644c03 use .lan domain for idrac metrics scrape [ci skip] 2023-11-01 20:44:17 +00:00
Viktor Barzin
af53a5d42d update redifhs exporter to new implementation [ci skip] 2023-10-24 11:44:19 +00:00
Viktor Barzin
4efa47172c replace tls client cert auth with oauth and add localai stub [ci skip] 2023-10-22 14:07:18 +00:00
Viktor Barzin
68a3580d20 add alert on new client registration and update dns to use pfsense [ci skip] 2023-09-18 08:03:50 +00:00
Viktor Barzin
3047ff1d9b disable email notifications as they are spammy and using sendgrid quota [ci skip] 2023-06-20 14:04:02 +00:00
viktorbarzin
ace5fa27be remove 1gb limit for tsdb to confirm it was the root cause for memory issues [ci skip] 2023-04-21 23:04:39 +01:00
viktorbarzin
e0ff80e217 attempt ot reduce prometheus memory by setting --storage.tsdb.retention.size; laso add metrics-api which is not working atm [ci skip] 2023-04-17 01:28:03 +01:00
viktorbarzin
38297c2809 add alert for unhandled exceptions [ci skip] 2023-04-05 00:15:10 +01:00
viktorbarzin
609435c8bd reword finance app webhook exception alerting [ci skip] 2023-04-03 23:35:33 +01:00
viktorbarzin
a93aa03f72 add counter for overall webhook failures 2023-04-03 22:37:59 +01:00
viktorbarzin
86df04d96f add alert to monitor free memory on nodes [ci skip] 2023-03-26 17:22:04 +01:00
viktorbarzin
44ec092ef0 update idrac dns record to fix resolving issue [ci skip] 2023-02-15 21:24:17 +00:00
viktorbarzin
41b61ac42b add openwrt to monitored guests 2023-01-23 21:38:46 +00:00
viktorbarzin
fb767033fa reduce high power usage alert sensitivity 2022-01-11 20:24:14 +00:00
viktorbarzin
2dfa48c2e1 add slack to notifications and update alert definitions after upgrade [ci skip] 2022-01-06 20:09:20 +00:00
viktorbarzin
a249205d9d update home prometheus path 2021-09-05 18:39:18 +01:00
viktorbarzin
9ce05b739c update home address [CI SKIP] 2021-08-17 23:25:13 +01:00
viktorbarzin
0e9624e78e do not use pvcs for alertmanager and allow overlapping blocks in prometues [ci skip] 2021-05-03 12:14:56 +01:00
viktorbarzin
b94768f00b fix summary in alerts [ci skip] 2021-04-11 19:18:56 +01:00
viktorbarzin
543c647835 add more alerts for services being down 2021-04-10 18:28:26 +01:00
viktorbarzin
f92f4f685c add prometheus check for mailserver down 2021-04-10 18:16:04 +01:00
viktorbarzin
a596ad9792 add redfish exporter [CI SKIP] 2021-04-05 15:07:29 +01:00
viktorbarzin
0cfc001cdf add config for chatbot and add alert for high mem openwrt [CI SKIP] 2021-03-07 23:14:43 +00:00
viktorbarzin
18fe4f89e1 add value to alert summaries 2021-02-19 18:58:36 +00:00
viktorbarzin
005f02d902 revert debug value for node load alert 2021-02-17 18:33:52 +00:00
viktorbarzin
7120a80696 fix typo from templating which caused missing metrics and add alerts to prevent that from happening again 2021-02-10 23:14:09 +00:00
viktorbarzin
7a7bc34ae3 initial 2021-02-08 20:02:17 +00:00