infra/stacks/monitoring/modules/monitoring
Viktor Barzin fc5a4b66ad monitoring: exclude catchall-error-pages from HighService4xxRate
The catchall-error-pages IngressRoute matches HostRegexp(^(.+\.)?
viktorbarzin\.me$) at priority=1 — it's the wildcard handler that
returns 404 for any unmatched hostname (typos + scanner traffic).
By design its 4xx rate sits at ~100%, so HighService4xxRate was a
permanent false positive for traefik-catchall-error-pages-*@kubernetescrd.

Same exclusion pattern as nextcloud/grafana/linkwarden/claude-memory
(services with legitimately high 4xx counts).
2026-05-27 19:46:40 +00:00
..
dashboards fire-planner: COL refresh CronJob + Grafana Cost-of-Living dashboard 2026-05-22 14:15:38 +00:00
server-power-cycle Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
alloy.yaml alloy: move resources to alloy.* (chart key bug); 1Gi limit fixes IO storm 2026-05-26 02:08:35 +00:00
Dockerfile extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
goflow2.tf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
grafana.tf fire-planner: COL refresh CronJob + Grafana Cost-of-Living dashboard 2026-05-22 14:15:38 +00:00
grafana_chart_values.yaml monitoring: protect grafana ingress with authentik + disable anonymous 2026-05-10 17:01:50 +00:00
idrac.tf infra: document auth = "app|none" tier on every legacy ingress 2026-05-11 19:25:48 +00:00
k8s-monitoring-values.yaml cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip] 2026-03-25 23:56:07 +02:00
loki.tf alloy: move resources to alloy.* (chart key bug); 1Gi limit fixes IO storm 2026-05-26 02:08:35 +00:00
loki.yaml monitoring/loki: bump memory request 2Gi → 3Gi (close gap to 4Gi limit) 2026-05-24 01:10:55 +00:00
main.tf cluster-health: emergency-stop Keel + roll back image downgrades + quota raises 2026-05-26 18:48:50 +00:00
prometheus.tf fix: HA Sofia REST sensors + PVC drift safety 2026-05-10 21:48:29 +00:00
prometheus_chart_values.tpl monitoring: exclude catchall-error-pages from HighService4xxRate 2026-05-27 19:46:40 +00:00
prometheus_snmp_chart_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
pve_exporter.tf monitoring: route proxmox-exporter to scrape_slow job (fix flapping alerts) 2026-05-27 18:36:11 +00:00
snmp_exporter.tf infra: document auth = "app|none" tier on every legacy ingress 2026-05-11 19:25:48 +00:00
ups_snmp_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00