infra/stacks/monitoring/modules/monitoring
Viktor Barzin cd13b9d062 monitoring: drop PVAutoExpanding alert — info-only noise, not actionable
PVAutoExpanding fired at >80% used (info severity), but pvc-autoresizer's
threshold is 10% free (= 90% used) — the alert always fired ~10 points
before any action would have been taken, and there was nothing for an
operator to do during that window either. It was a "heads up" that
didn't surface a problem.

Real failure modes are already covered:
  * PVFillingUp (critical, >95% for 10m) — autoresizer didn't keep up
  * PVPredictedFull (warning, predict_linear 24h) — trend toward exhaustion

Sharpened PVFillingUp's annotation to spell out the likely causes
(storage_limit reached, expansion failing, or missing autoresizer
annotations) so the responder doesn't have to recall the runbook.
2026-05-22 14:16:44 +00:00
..
dashboards monitoring(wealth): paint declining segments red on portfolio chart 2026-05-22 14:16:44 +00:00
server-power-cycle Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
alloy.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
Dockerfile extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
goflow2.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
grafana.tf monitoring(grafana): swap python3 for jq in folder-ACL local-exec 2026-05-22 14:16:41 +00:00
grafana_chart_values.yaml monitoring: protect grafana ingress with authentik + disable anonymous 2026-05-22 14:16:41 +00:00
idrac.tf infra: document auth = "app|none" tier on every legacy ingress 2026-05-22 14:16:44 +00:00
k8s-monitoring-values.yaml cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip] 2026-03-25 23:56:07 +02:00
loki.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
loki.yaml [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
main.tf [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
prometheus.tf fix: HA Sofia REST sensors + PVC drift safety 2026-05-22 14:16:43 +00:00
prometheus_chart_values.tpl monitoring: drop PVAutoExpanding alert — info-only noise, not actionable 2026-05-22 14:16:44 +00:00
prometheus_snmp_chart_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
pve_exporter.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
snmp_exporter.tf infra: document auth = "app|none" tier on every legacy ingress 2026-05-22 14:16:44 +00:00
ups_snmp_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00