Commit graph

33 commits

Author SHA1 Message Date
viktorbarzin
c87376b670
remove 1gb limit for tsdb to confirm it was the root cause for memory issues [ci skip] 2023-04-21 23:04:39 +01:00
viktorbarzin
64491f9028
attempt ot reduce prometheus memory by setting --storage.tsdb.retention.size; laso add metrics-api which is not working atm [ci skip] 2023-04-17 01:28:03 +01:00
viktorbarzin
a75e647b48
add alert for unhandled exceptions [ci skip] 2023-04-05 00:15:10 +01:00
viktorbarzin
9fe57e2ec6
reword finance app webhook exception alerting [ci skip] 2023-04-03 23:35:33 +01:00
viktorbarzin
6e7de7e195
add prometheus pv and pvc [ci skip] 2023-04-03 23:21:48 +01:00
viktorbarzin
6838d319a2
add counter for overall webhook failures 2023-04-03 22:37:59 +01:00
viktorbarzin
e9bf46caf8
add alert to monitor free memory on nodes [ci skip] 2023-03-26 17:22:04 +01:00
viktorbarzin
cbdd5d57ce
update idrac dns record to fix resolving issue [ci skip] 2023-02-15 21:24:17 +00:00
viktorbarzin
f8900ccf92 add openwrt to monitored guests 2023-01-23 21:38:46 +00:00
viktorbarzin
695d002eef
update tf to work with k8s 1.25.0 2022-08-31 22:04:09 +01:00
viktorbarzin
6c425b5573 reduce high power usage alert sensitivity 2022-01-11 20:24:14 +00:00
viktorbarzin
0b20fc1e73
add slack to notifications and update alert definitions after upgrade [ci skip] 2022-01-06 20:09:20 +00:00
viktorbarzin
6870cee492
fix k8s upgrade issues [ci skip] 2022-01-06 00:07:48 +00:00
viktorbarzin
b6f8160183
update home prometheus path 2021-09-05 18:39:18 +01:00
viktorbarzin
755942ee73
update home address [CI SKIP] 2021-08-17 23:25:13 +01:00
viktorbarzin
c722825630
update home dns [ci skip] 2021-05-23 19:26:52 +01:00
viktorbarzin
9292b3285f
remove kubectl manifests bc drone is not happy running them :/ 2021-05-08 14:03:34 +01:00
viktorbarzin
ead58dfc99
move grafana behind client tls until auth is setup [ci skip] 2021-05-08 01:19:24 +01:00
viktorbarzin
ae5cb2349b do not use pvcs for alertmanager and allow overlapping blocks in prometues [ci skip] 2021-05-03 12:14:56 +01:00
viktorbarzin
eca78feb51
add shlink and parts of dbaas [ci skip] 2021-04-17 19:19:04 +01:00
viktorbarzin
28cb35da94
fix summary in alerts [ci skip] 2021-04-11 19:18:56 +01:00
viktorbarzin
b9c9d82a03 add more alerts for services being down 2021-04-10 18:28:26 +01:00
viktorbarzin
0c77660e93 add prometheus check for mailserver down 2021-04-10 18:16:04 +01:00
viktorbarzin
12e46fad2a
add redfish exporter [CI SKIP] 2021-04-05 15:07:29 +01:00
viktorbarzin
cccc49378e
add cronjob to monitor prometheus and init correct config for wireguard ui [CI SKIP] 2021-04-02 23:14:47 +01:00
viktorbarzin
81e3f47f18
add config for chatbot and add alert for high mem openwrt [CI SKIP] 2021-03-07 23:14:43 +00:00
viktorbarzin
f95fcc26c3
website www host, dns ipv6 2021-02-25 21:55:00 +00:00
viktorbarzin
2b6de2113f
add value to alert summaries 2021-02-19 18:58:36 +00:00
viktorbarzin
2673c16d98
add missing mailserver terraform items 2021-02-18 22:26:36 +00:00
viktorbarzin
40faa5dc0e make tls crt and keys optional params to the create_tls_secret module 2021-02-17 19:36:30 +00:00
viktorbarzin
8a9fff6799
revert debug value for node load alert 2021-02-17 18:33:52 +00:00
viktorbarzin
95750c6949
fix typo from templating which caused missing metrics and add alerts to prevent that from happening again 2021-02-10 23:14:09 +00:00
viktorbarzin
4caa987213
initial 2021-02-08 20:02:17 +00:00