infra/stacks/monitoring/modules/monitoring
Viktor Barzin 947f8ace54 [monitoring] Remove stale MySQL InnoDB Cluster alerts
MySQL migrated from InnoDB Cluster (Bitnami chart + mysql-operator) to
a standalone StatefulSet on 2026-04-16. Two Prometheus alerts still
referenced the old topology and were firing falsely against resources
that no longer exist:

- MySQLDown: queried kube_statefulset_status_replicas_ready{statefulset="mysql-cluster"}
  — that StatefulSet was deleted as part of Phase 1 of the migration.
- MySQLOperatorDown: queried kube_deployment_status_replicas_available{namespace="mysql-operator"}
  — the operator Deployment was removed in Phase 1.

Replacement availability monitoring for the standalone MySQL pod will
be handled via an Uptime Kuma MySQL-connection monitor (out of scope
for this change — no Prometheus replacement alert is being added, per
the migration plan's "simpler is better" principle).

MySQLBackupStale and MySQLBackupNeverSucceeded are retained — they
query the mysql-backup CronJob which is unchanged by the migration.

Also removes MySQLDown from the two inhibition rules (NodeDown and
NFSServerUnresponsive) that previously suppressed it during cascade
outages — the alert no longer exists so the reference became dead.

Closes: code-3sa

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:03:58 +00:00
..
dashboards [infra] Fix rewrite-body plugin + cleanup TrueNAS + version bumps 2026-04-17 05:51:52 +00:00
server-power-cycle Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
alloy.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
Dockerfile extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
goflow2.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
grafana.tf truenas deprecation: migrate all non-immich storage to proxmox NFS 2026-04-12 14:35:39 +01:00
grafana_chart_values.yaml feat: organize Grafana dashboards into folders 2026-03-28 16:23:49 +02:00
idrac.tf fix(monitoring): use patched idrac exporter with PSU input voltage metric 2026-03-23 22:07:36 +02:00
k8s-monitoring-values.yaml cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip] 2026-03-25 23:56:07 +02:00
loki.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
loki.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
main.tf [infra] Fix rewrite-body plugin + cleanup TrueNAS + version bumps 2026-04-17 05:51:52 +00:00
prometheus.tf fix(post-mortem): add NFSHighRPCRetransmissions alert + migrate alertmanager to proxmox-lvm-encrypted [PM-2026-04-14] 2026-04-14 18:05:33 +00:00
prometheus_chart_values.tpl [monitoring] Remove stale MySQL InnoDB Cluster alerts 2026-04-18 10:03:58 +00:00
prometheus_snmp_chart_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
pve_exporter.tf extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
snmp_exporter.tf security(monitoring): remove public SNMP exporter ingress 2026-04-06 15:23:56 +03:00
ups_snmp_values.yaml extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00