infra/stacks/monitoring
Viktor Barzin 91242b0b40 feat(monitoring): add comprehensive hardware exporter alerts
Added 20 new alerts across 3 rule groups:

Power (8 new):
- UPSAlarmsActive, UPSBatteryDegraded, UPSOverloaded, UPSOutputVoltageAbnormal
- ATSFault, ATSPowerFault, ATSOverload, ATSInputVoltageAbnormal

Server Health (10 new):
- iDRACSystemUnhealthy, iDRACPowerSupplyUnhealthy, iDRACMemoryUnhealthy
- iDRACStorageDriveUnhealthy, iDRACSSDWearCritical/Warning
- iDRACServerPoweredOff, ProxmoxExporterDown
- FuseMainFault, FuseGarageFault

Metric Staleness (3 new):
- FuseMainMetricsMissing, FuseGarageMetricsMissing, ProxmoxMetricsMissing

Plus 4 new inhibition rules for alert cascade protection.
2026-04-06 15:31:50 +03:00
..
modules/monitoring feat(monitoring): add comprehensive hardware exporter alerts 2026-04-06 15:31:50 +03:00
main.tf add TrueNAS Cloud Sync monitor CronJob and bump Prometheus Helm timeout 2026-03-23 02:24:39 +02:00
secrets extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
terragrunt.hcl extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
tiers.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00