infra/stacks/monitoring/modules/monitoring
Viktor Barzin bc33cd5ac4 monitoring: NodeFilesystemFull 90%->95% + Synology storage runbook
The Synology offsite backup target (/mnt/synology-backup, surfaced via
the PVE host NFS mount) sits at ~94% by design and was firing
NodeFilesystemFull continuously. Per user request, raise the threshold
to 95% (<5% free). NOTE: NodeFilesystemFull is a global node-filesystem
rule, so this also loosens the warning on k8s node/system disks;
BackupDiskFull (sda /mnt/backup) stays at 85%.

Also adds docs/runbooks/synology-storage.md: how to assess Synology
usage WITHOUT du (Storage Analyzer weekly CSVs, df/btrfs/qgroup),
btrfs async/snapshot-pinned reclaim, the 2026-06-05 capacity assessment
(94% full; Backup share 4.42TiB), and ~500GiB of homelab cleanup
candidates (redundant gphotos Takeout, old laptop VM images, archives).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 18:18:31 +00:00
..
dashboards monitoring(grafana): add professional "Cluster Logs" dashboard (Logs folder) 2026-06-05 17:03:45 +00:00
server-power-cycle Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
alloy.yaml monitoring(alloy): drop goflow2 + vpa logs from Loki to cut sdc write wear 2026-06-05 17:44:47 +00:00
authentik_walloff_probe.tf Reapply "tripit: Gmail ingest (12-month) + vbarzin owner + plans@ forward-to-parse" 2026-06-03 10:24:25 +00:00
Dockerfile extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
goflow2.tf monitoring: KEEL/tier ignore_changes on 5 exporters [ci skip] 2026-05-31 15:33:30 +00:00
grafana.tf monitoring(grafana): add professional "Cluster Logs" dashboard (Logs folder) 2026-06-05 17:03:45 +00:00
grafana_chart_values.yaml monitoring: protect grafana ingress with authentik + disable anonymous 2026-05-10 17:01:50 +00:00
idrac.tf monitoring: migrate R730 iDRAC scraping to SNMP (fast primary) + thin Redfish remnant 2026-06-05 16:33:20 +00:00
k8s-monitoring-values.yaml cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip] 2026-03-25 23:56:07 +02:00
loki.tf monitoring: KEEL/tier ignore_changes on 5 exporters [ci skip] 2026-05-31 15:33:30 +00:00
loki.yaml monitoring: right-size loki memory request 3Gi->1Gi (quota 89%->79%) 2026-06-05 09:19:11 +00:00
loki_ingress.tf monitoring: fix ingress auth-comment guard for loki-write-ingress 2026-06-05 13:36:43 +00:00
main.tf cluster-health: emergency-stop Keel + roll back image downgrades + quota raises 2026-05-26 18:48:50 +00:00
prometheus.tf monitoring: add local-only prometheus-query.lan ingress for ha-sofia SNMP sensors 2026-06-05 17:25:06 +00:00
prometheus_chart_values.tpl monitoring: NodeFilesystemFull 90%->95% + Synology storage runbook 2026-06-05 18:18:31 +00:00
prometheus_snmp_chart_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
pve_exporter.tf monitoring: KEEL/tier ignore_changes on 5 exporters [ci skip] 2026-05-31 15:33:30 +00:00
snmp_exporter.tf monitoring: KEEL/tier ignore_changes on 5 exporters [ci skip] 2026-05-31 15:33:30 +00:00
ups_snmp_values.yaml monitoring: migrate R730 iDRAC scraping to SNMP (fast primary) + thin Redfish remnant 2026-06-05 16:33:20 +00:00