infra/stacks/monitoring/modules/monitoring
Viktor Barzin 9d5da4d8e0 fix: restore pvc-autoresizer by allow-listing kubelet_volume_stats_available_bytes
The Prometheus scrape config for the kubernetes-nodes job kept
capacity_bytes + used_bytes but dropped available_bytes. pvc-autoresizer
computes utilization from available/capacity, so without that metric it
was silent for every PVC in the cluster — including mailserver, which
filled to 89% (1.7G/2.0G) and started rejecting all inbound mail with
'452 4.3.1 Insufficient system storage' (15+ hours, all real senders:
Brevo, Gmail, Facebook).

Also bumps the floors of mailserver (2Gi -> 5Gi, limit 10Gi) and forgejo
(15Gi -> 30Gi) PVCs to recover from the immediate outage, and adds
ignore_changes on requests.storage so future autoresizer expansions
don't cause TF drift.
2026-05-09 10:49:17 +00:00
..
dashboards monitoring(wealth): monthly contrib-vs-mkt as line chart, not bars 2026-05-07 22:41:46 +00:00
server-power-cycle Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
alloy.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
Dockerfile extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
goflow2.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
grafana.tf openclaw: realtime usage dashboard via Prometheus exporter sidecar 2026-05-07 09:04:25 +00:00
grafana_chart_values.yaml [monitoring] UK Payslip v3.2 — stacked YTD panels, YTD-cumulative rate, Sankey 2026-04-19 13:42:27 +00:00
idrac.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
k8s-monitoring-values.yaml cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip] 2026-03-25 23:56:07 +02:00
loki.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
loki.yaml [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
main.tf [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 18:30:02 +00:00
prometheus.tf [monitoring] Tuya Cloud root-cause alert + cascade suppression 2026-04-23 09:59:48 +00:00
prometheus_chart_values.tpl fix: restore pvc-autoresizer by allow-listing kubelet_volume_stats_available_bytes 2026-05-09 10:49:17 +00:00
prometheus_snmp_chart_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
pve_exporter.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
snmp_exporter.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
ups_snmp_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00