[monitoring] Remove stale MySQL InnoDB Cluster alerts

MySQL migrated from InnoDB Cluster (Bitnami chart + mysql-operator) to a standalone StatefulSet on 2026-04-16. Two Prometheus alerts still referenced the old topology and were firing falsely against resources that no longer exist: - MySQLDown: queried kube_statefulset_status_replicas_ready{statefulset="mysql-cluster"} — that StatefulSet was deleted as part of Phase 1 of the migration. - MySQLOperatorDown: queried kube_deployment_status_replicas_available{namespace="mysql-operator"} — the operator Deployment was removed in Phase 1. Replacement availability monitoring for the standalone MySQL pod will be handled via an Uptime Kuma MySQL-connection monitor (out of scope for this change — no Prometheus replacement alert is being added, per the migration plan's "simpler is better" principle). MySQLBackupStale and MySQLBackupNeverSucceeded are retained — they query the mysql-backup CronJob which is unchanged by the migration. Also removes MySQLDown from the two inhibition rules (NodeDown and NFSServerUnresponsive) that previously suppressed it during cascade outages — the alert no longer exists so the reference became dead. Closes: code-3sa Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:03:58 +00:00 · 2026-04-18 10:03:58 +00:00 · 947f8ace54
commit 947f8ace54
parent 99688bbb02
1 changed files with 2 additions and 16 deletions
--- a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
+++ b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
@ -73,12 +73,12 @@ alertmanager:
      - source_matchers:
          - alertname = NodeDown
        target_matchers:
-          - alertname =~ "NodeNotReady|NodeConditionBad|PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|NodeLowFreeMemory|PostgreSQLDown|MySQLDown|RedisDown|HeadscaleDown|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|NodeExporterDown|DockerRegistryDown|HomeAssistantDown|CloudflaredDown|TechnitiumDNSDown|iDRACRedfishMetricsMissing|iDRACSNMPMetricsMissing|HomeAssistantMetricsMissing"
+          - alertname =~ "NodeNotReady|NodeConditionBad|PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|NodeLowFreeMemory|PostgreSQLDown|RedisDown|HeadscaleDown|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|NodeExporterDown|DockerRegistryDown|HomeAssistantDown|CloudflaredDown|TechnitiumDNSDown|iDRACRedfishMetricsMissing|iDRACSNMPMetricsMissing|HomeAssistantMetricsMissing"
      # NFS down causes mass pod failures and NFS-dependent service outages
      - source_matchers:
          - alertname = NFSServerUnresponsive
        target_matchers:
-          - alertname =~ "PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|PostgreSQLDown|MySQLDown|RedisDown|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|HomeAssistantDown"
+          - alertname =~ "PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|PostgreSQLDown|RedisDown|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|HomeAssistantDown"
      # Traefik down makes service-level alerts noise
      - source_matchers:
          - alertname = TraefikDown
@ -1340,13 +1340,6 @@ serverFiles:
              severity: critical
            annotations:
              summary: "PostgreSQL pod {{ $labels.pod }} is not ready"
-          - alert: MySQLDown
-            expr: kube_statefulset_status_replicas_ready{namespace="dbaas", statefulset="mysql-cluster"} < 1
-            for: 5m
-            labels:
-              severity: critical
-            annotations:
-              summary: "MySQL InnoDB Cluster has no ready replicas"
          - alert: RedisDown
            expr: kube_statefulset_status_replicas_ready{namespace="redis", statefulset="redis-node"} < 1
            for: 5m
@ -1391,13 +1384,6 @@ serverFiles:
              severity: warning
            annotations:
              summary: "CNPG operator down — PostgreSQL failover/management degraded"
-          - alert: MySQLOperatorDown
-            expr: (kube_deployment_status_replicas_available{namespace="mysql-operator", deployment="mysql-operator"} or on() vector(0)) < 1
-            for: 10m
-            labels:
-              severity: warning
-            annotations:
-              summary: "MySQL operator down — InnoDB Cluster management degraded"
      - name: Cluster
        rules:
          - alert: NodeDown