infra

Viktor Barzin 17065304dc Fix NFSServerUnresponsive false positives Root cause: sum(rate(node_nfs_requests_total[5m])) == 0 was too fragile: - rate() returns nothing after Prometheus restarts (needs 2 scrapes) - Individual nodes show zero NFS rate during scrape gaps or low activity - The sum() could hit zero during quiet hours + scrape gaps New expression uses: - changes() instead of rate() — works with a single scrape - Per-instance aggregation: count nodes with any NFS counter change - Threshold < 2 nodes: single-node restarts won't trigger, real NFS outage (all nodes affected) will - Prometheus startup guard: skip first 15m after restart to avoid false positives from empty TSDB - Wider 15m changes() window to smooth out scrape gaps		2026-03-14 11:28:17 +00:00
..
authentik	Right-size CPU requests cluster-wide and remove missed CPU limits	2026-03-14 09:22:24 +00:00
cloudflared	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
cnpg	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
crowdsec	Right-size CPU requests cluster-wide and remove missed CPU limits	2026-03-14 09:22:24 +00:00
dbaas	Right-size CPU requests cluster-wide and remove missed CPU limits	2026-03-14 09:22:24 +00:00
headscale	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
infra-maintenance	[ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup	2026-03-06 19:54:21 +00:00
iscsi-csi	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
k8s-portal	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
kyverno	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
mailserver	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
metallb	[ci skip] Move Terraform modules into stack directories	2026-02-22 14:38:14 +00:00
metrics-server	[ci skip] Move Terraform modules into stack directories	2026-02-22 14:38:14 +00:00
monitoring	Fix NFSServerUnresponsive false positives	2026-03-14 11:28:17 +00:00
nfs-csi	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
nvidia	Right-size CPU requests cluster-wide and remove missed CPU limits	2026-03-14 09:22:24 +00:00
rbac	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
redis	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
reverse_proxy	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
sealed-secrets	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
technitium	Right-size CPU requests cluster-wide and remove missed CPU limits	2026-03-14 09:22:24 +00:00
traefik	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
uptime-kuma	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
vaultwarden	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
vpa	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
wireguard	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00
xray	Remove all CPU limits cluster-wide to eliminate CFS throttling	2026-03-14 08:51:45 +00:00