Commit graph

51 commits

Author SHA1 Message Date
Viktor Barzin
ead57fe29b add meshcentral and diun[ci skip] 2024-08-18 18:14:22 +00:00
Viktor Barzin
c05d088598 reduce prometheus storage retention from 12w -> 8w to save ~30gb [ci skip] 2024-08-07 20:18:13 +00:00
Viktor Barzin
84b707fcf8 update old prometheus alert detectors and upgrade immich to 101 [ci skip] 2024-04-12 21:15:31 +00:00
Viktor Barzin
ffd5d970e0 remove hack for london openwrt monitoring after having tailscale now [ci skip] 2024-03-30 18:28:11 +00:00
Viktor Barzin
09d07639c9 update openwrt london prometheus target address [ci skip] 2024-03-29 22:20:29 +00:00
Viktor Barzin
1f8ef28435 add monitoring jobs to p8s for istiod and the service mesh [ci skip] 2024-01-07 17:47:36 +00:00
Viktor Barzin
6e38cb420d upgrade prometheus helm chart [ci skip] 2023-12-25 21:40:19 +00:00
Viktor Barzin
2f61744528 add baseurl to prometheus helm to chart so alertmanager sends correct links with prometheus public url instead of podname [ci skip] 2023-12-25 13:48:19 +00:00
Viktor Barzin
3e022b918d add prometheus monitoring to crowdsec [ci skip] 2023-11-25 13:34:16 +00:00
Viktor Barzin
33c1f786e5 move grafana to nfs [ci skip] 2023-11-11 00:16:58 +00:00
Viktor Barzin
6b30a0e533 add alert if node memory exceeds 90% [ci skip] 2023-11-10 22:48:45 +00:00
Viktor Barzin
5fa0c93f00 use nfs to prometheus [ci skip] 2023-11-10 22:20:25 +00:00
Viktor Barzin
2fe0644c03 use .lan domain for idrac metrics scrape [ci skip] 2023-11-01 20:44:17 +00:00
Viktor Barzin
98799f00ad add repo for the dockerfile for the redifsh exporter [ci skip] 2023-10-24 11:46:18 +00:00
Viktor Barzin
af53a5d42d update redifhs exporter to new implementation [ci skip] 2023-10-24 11:44:19 +00:00
Viktor Barzin
f9976e2c5e make dashy publicly accessible [ci skip] 2023-10-23 22:05:56 +00:00
Viktor Barzin
4efa47172c replace tls client cert auth with oauth and add localai stub [ci skip] 2023-10-22 14:07:18 +00:00
Viktor Barzin
68a3580d20 add alert on new client registration and update dns to use pfsense [ci skip] 2023-09-18 08:03:50 +00:00
Viktor Barzin
3047ff1d9b disable email notifications as they are spammy and using sendgrid quota [ci skip] 2023-06-20 14:04:02 +00:00
viktorbarzin
ace5fa27be remove 1gb limit for tsdb to confirm it was the root cause for memory issues [ci skip] 2023-04-21 23:04:39 +01:00
viktorbarzin
e0ff80e217 attempt ot reduce prometheus memory by setting --storage.tsdb.retention.size; laso add metrics-api which is not working atm [ci skip] 2023-04-17 01:28:03 +01:00
viktorbarzin
38297c2809 add alert for unhandled exceptions [ci skip] 2023-04-05 00:15:10 +01:00
viktorbarzin
609435c8bd reword finance app webhook exception alerting [ci skip] 2023-04-03 23:35:33 +01:00
viktorbarzin
25d602ce3b add prometheus pv and pvc [ci skip] 2023-04-03 23:21:48 +01:00
viktorbarzin
a93aa03f72 add counter for overall webhook failures 2023-04-03 22:37:59 +01:00
viktorbarzin
86df04d96f add alert to monitor free memory on nodes [ci skip] 2023-03-26 17:22:04 +01:00
viktorbarzin
44ec092ef0 update idrac dns record to fix resolving issue [ci skip] 2023-02-15 21:24:17 +00:00
viktorbarzin
41b61ac42b add openwrt to monitored guests 2023-01-23 21:38:46 +00:00
viktorbarzin
8b2a453f2f update tf to work with k8s 1.25.0 2022-08-31 22:04:09 +01:00
viktorbarzin
fb767033fa reduce high power usage alert sensitivity 2022-01-11 20:24:14 +00:00
viktorbarzin
2dfa48c2e1 add slack to notifications and update alert definitions after upgrade [ci skip] 2022-01-06 20:09:20 +00:00
viktorbarzin
fd6d15c598 fix k8s upgrade issues [ci skip] 2022-01-06 00:07:48 +00:00
viktorbarzin
a249205d9d update home prometheus path 2021-09-05 18:39:18 +01:00
viktorbarzin
9ce05b739c update home address [CI SKIP] 2021-08-17 23:25:13 +01:00
viktorbarzin
3ba28a4594 remove kubectl manifests bc drone is not happy running them :/ 2021-05-08 14:03:34 +01:00
viktorbarzin
59199606fb move grafana behind client tls until auth is setup [ci skip] 2021-05-08 01:19:24 +01:00
viktorbarzin
0e9624e78e do not use pvcs for alertmanager and allow overlapping blocks in prometues [ci skip] 2021-05-03 12:14:56 +01:00
viktorbarzin
6d5556ca75 add shlink and parts of dbaas [ci skip] 2021-04-17 19:19:04 +01:00
viktorbarzin
b94768f00b fix summary in alerts [ci skip] 2021-04-11 19:18:56 +01:00
viktorbarzin
543c647835 add more alerts for services being down 2021-04-10 18:28:26 +01:00
viktorbarzin
f92f4f685c add prometheus check for mailserver down 2021-04-10 18:16:04 +01:00
viktorbarzin
a596ad9792 add redfish exporter [CI SKIP] 2021-04-05 15:07:29 +01:00
viktorbarzin
e918c6fdfd add cronjob to monitor prometheus and init correct config for wireguard ui [CI SKIP] 2021-04-02 23:14:47 +01:00
viktorbarzin
0cfc001cdf add config for chatbot and add alert for high mem openwrt [CI SKIP] 2021-03-07 23:14:43 +00:00
viktorbarzin
5b742e5b70 website www host, dns ipv6 2021-02-25 21:55:00 +00:00
viktorbarzin
18fe4f89e1 add value to alert summaries 2021-02-19 18:58:36 +00:00
viktorbarzin
b0f4616689 add missing mailserver terraform items 2021-02-18 22:26:36 +00:00
viktorbarzin
3a37fc181d make tls crt and keys optional params to the create_tls_secret module 2021-02-17 19:36:30 +00:00
viktorbarzin
005f02d902 revert debug value for node load alert 2021-02-17 18:33:52 +00:00
viktorbarzin
7120a80696 fix typo from templating which caused missing metrics and add alerts to prevent that from happening again 2021-02-10 23:14:09 +00:00