infra/stacks/monitoring/modules/monitoring
Viktor Barzin 375a3d91d5 [monitoring] Exclude websocket protocol from HighServiceLatency alert
Traefik records websocket connection lifetimes (minutes to hours) as
"request duration." When websockets close, the full lifetime pollutes
the average latency metric — Authentik showed 6.7s avg (201s websocket
avg) vs 0.065s actual HTTP avg. This caused ~90 false alerts/day across
12 services (Authentik, Vaultwarden, Terminal, HA, etc.).

Changes:
- Add protocol!="websocket" filter to HighServiceLatency alert expr
- Raise minimum traffic threshold from 0.01 to 0.05 rps to filter
  statistical noise from services with <3 req/min
- Remove .githooks/pre-commit file-size hook (blocked state commits)

Validated against 7-day historical data: 637 breaches → ~2 with both
filters applied (99.7% reduction).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:51:19 +00:00
..
dashboards monitoring + proxmox-csi: LVM snapshot RBAC, pushgateway NodePort, backup dashboard 2026-04-06 11:57:41 +03:00
server-power-cycle extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
alloy.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
Dockerfile extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
goflow2.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
grafana.tf truenas deprecation: migrate all non-immich storage to proxmox NFS 2026-04-12 14:35:39 +01:00
grafana_chart_values.yaml feat: organize Grafana dashboards into folders 2026-03-28 16:23:49 +02:00
idrac.tf fix(monitoring): use patched idrac exporter with PSU input voltage metric 2026-03-23 22:07:36 +02:00
k8s-monitoring-values.yaml cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip] 2026-03-25 23:56:07 +02:00
loki.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
loki.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
main.tf deprecate TrueNAS: migrate Immich NFS to Proxmox, remove all 10.0.10.15 references [ci skip] 2026-04-13 14:42:07 +00:00
prometheus.tf fix(post-mortem): add NFSHighRPCRetransmissions alert + migrate alertmanager to proxmox-lvm-encrypted [PM-2026-04-14] 2026-04-14 18:05:33 +00:00
prometheus_chart_values.tpl [monitoring] Exclude websocket protocol from HighServiceLatency alert 2026-04-15 21:51:19 +00:00
prometheus_snmp_chart_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
pve_exporter.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
snmp_exporter.tf security(monitoring): remove public SNMP exporter ingress 2026-04-06 15:23:56 +03:00
ups_snmp_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00