Commit graph

65 commits

Author SHA1 Message Date
Viktor Barzin
53d8b2d2c6 update prometheus alerts to be correctly grouped and sent to slack and deprecate some old ones [ci skip] 2025-01-02 20:33:55 +00:00
Viktor Barzin
48a0deb283 update prometheus chart values to get slack notiifcations to work and add alerts for 4xx and 5xx on ingress [ci skip] 2025-01-01 11:39:16 +00:00
Viktor Barzin
7336e7c033 fix monitoring stack [ci skip] 2024-12-31 17:15:06 +00:00
Viktor Barzin
55ac2f3707 add all grafana dashboards models [ci skip] 2024-12-24 13:48:21 +00:00
Viktor Barzin
5ee5e59e61 add low voltage alert to prometheus and update some dashboards [ci skip] 2024-12-23 18:21:01 +00:00
Viktor Barzin
9826f8546c fix typo in idrac voltage to be in volts not watts [ci skip] 2024-12-17 19:35:27 +00:00
Viktor Barzin
f587b2737c update ups grafana [ci skip] 2024-12-17 19:22:08 +00:00
Viktor Barzin
01f5f304d4 update idract refresh rate to 1m[ci skip] 2024-12-17 19:05:57 +00:00
Viktor Barzin
67d72fb7c0 add idrac grafana dashboard to repo [ci skip] 2024-12-16 22:36:00 +00:00
Viktor Barzin
3899dca6e6 add grafana dashboard for ups [ci skip] 2024-12-15 20:58:01 +00:00
Viktor Barzin
c987301c48 add ups snmp exporter to prometheus [ci skip] 2024-12-15 18:13:33 +00:00
Viktor Barzin
2a9560d4de move grafana and k8s dashboard to use authentik instead of oauth proxy [ci skip] 2024-11-22 00:47:00 +00:00
Viktor Barzin
72d780c26f replace oauth proxy with authentik auth [ci skip] 2024-11-18 22:06:31 +00:00
Viktor Barzin
cf39034bdf add homepage module and some more integrations [ci skip] 2024-10-20 13:05:03 +00:00
Viktor Barzin
ead57fe29b add meshcentral and diun[ci skip] 2024-08-18 18:14:22 +00:00
Viktor Barzin
c05d088598 reduce prometheus storage retention from 12w -> 8w to save ~30gb [ci skip] 2024-08-07 20:18:13 +00:00
Viktor Barzin
84b707fcf8 update old prometheus alert detectors and upgrade immich to 101 [ci skip] 2024-04-12 21:15:31 +00:00
Viktor Barzin
ffd5d970e0 remove hack for london openwrt monitoring after having tailscale now [ci skip] 2024-03-30 18:28:11 +00:00
Viktor Barzin
09d07639c9 update openwrt london prometheus target address [ci skip] 2024-03-29 22:20:29 +00:00
Viktor Barzin
1f8ef28435 add monitoring jobs to p8s for istiod and the service mesh [ci skip] 2024-01-07 17:47:36 +00:00
Viktor Barzin
6e38cb420d upgrade prometheus helm chart [ci skip] 2023-12-25 21:40:19 +00:00
Viktor Barzin
2f61744528 add baseurl to prometheus helm to chart so alertmanager sends correct links with prometheus public url instead of podname [ci skip] 2023-12-25 13:48:19 +00:00
Viktor Barzin
3e022b918d add prometheus monitoring to crowdsec [ci skip] 2023-11-25 13:34:16 +00:00
Viktor Barzin
33c1f786e5 move grafana to nfs [ci skip] 2023-11-11 00:16:58 +00:00
Viktor Barzin
6b30a0e533 add alert if node memory exceeds 90% [ci skip] 2023-11-10 22:48:45 +00:00
Viktor Barzin
5fa0c93f00 use nfs to prometheus [ci skip] 2023-11-10 22:20:25 +00:00
Viktor Barzin
2fe0644c03 use .lan domain for idrac metrics scrape [ci skip] 2023-11-01 20:44:17 +00:00
Viktor Barzin
98799f00ad add repo for the dockerfile for the redifsh exporter [ci skip] 2023-10-24 11:46:18 +00:00
Viktor Barzin
af53a5d42d update redifhs exporter to new implementation [ci skip] 2023-10-24 11:44:19 +00:00
Viktor Barzin
f9976e2c5e make dashy publicly accessible [ci skip] 2023-10-23 22:05:56 +00:00
Viktor Barzin
4efa47172c replace tls client cert auth with oauth and add localai stub [ci skip] 2023-10-22 14:07:18 +00:00
Viktor Barzin
68a3580d20 add alert on new client registration and update dns to use pfsense [ci skip] 2023-09-18 08:03:50 +00:00
Viktor Barzin
3047ff1d9b disable email notifications as they are spammy and using sendgrid quota [ci skip] 2023-06-20 14:04:02 +00:00
viktorbarzin
ace5fa27be remove 1gb limit for tsdb to confirm it was the root cause for memory issues [ci skip] 2023-04-21 23:04:39 +01:00
viktorbarzin
e0ff80e217 attempt ot reduce prometheus memory by setting --storage.tsdb.retention.size; laso add metrics-api which is not working atm [ci skip] 2023-04-17 01:28:03 +01:00
viktorbarzin
38297c2809 add alert for unhandled exceptions [ci skip] 2023-04-05 00:15:10 +01:00
viktorbarzin
609435c8bd reword finance app webhook exception alerting [ci skip] 2023-04-03 23:35:33 +01:00
viktorbarzin
25d602ce3b add prometheus pv and pvc [ci skip] 2023-04-03 23:21:48 +01:00
viktorbarzin
a93aa03f72 add counter for overall webhook failures 2023-04-03 22:37:59 +01:00
viktorbarzin
86df04d96f add alert to monitor free memory on nodes [ci skip] 2023-03-26 17:22:04 +01:00
viktorbarzin
44ec092ef0 update idrac dns record to fix resolving issue [ci skip] 2023-02-15 21:24:17 +00:00
viktorbarzin
41b61ac42b add openwrt to monitored guests 2023-01-23 21:38:46 +00:00
viktorbarzin
8b2a453f2f update tf to work with k8s 1.25.0 2022-08-31 22:04:09 +01:00
viktorbarzin
fb767033fa reduce high power usage alert sensitivity 2022-01-11 20:24:14 +00:00
viktorbarzin
2dfa48c2e1 add slack to notifications and update alert definitions after upgrade [ci skip] 2022-01-06 20:09:20 +00:00
viktorbarzin
fd6d15c598 fix k8s upgrade issues [ci skip] 2022-01-06 00:07:48 +00:00
viktorbarzin
a249205d9d update home prometheus path 2021-09-05 18:39:18 +01:00
viktorbarzin
9ce05b739c update home address [CI SKIP] 2021-08-17 23:25:13 +01:00
viktorbarzin
3ba28a4594 remove kubectl manifests bc drone is not happy running them :/ 2021-05-08 14:03:34 +01:00
viktorbarzin
59199606fb move grafana behind client tls until auth is setup [ci skip] 2021-05-08 01:19:24 +01:00