[uptime-kuma] Codify MySQL monitor (id=663) via idempotent sync CronJob
## Context Monitor id 663 "MySQL Standalone (dbaas)" was created manually yesterday via the `uptime-kuma-api` Python library when the dbaas stack migrated from InnoDB Cluster to standalone MySQL. It worked and was UP, but lived only in Uptime Kuma's MariaDB — if UK's DB were wiped or restored from an older backup, the monitor would be lost. ## This change Adds declarative, self-healing management for internal-service monitors (databases, non-HTTP endpoints) that can't be discovered from ingress annotations. Modelled on the existing `external-monitor-sync` CronJob. - `local.internal_monitors` — list of desired monitors (name, type, connection string, Vault password key, interval, retries). Seeded with the MySQL Standalone monitor. Add new entries here to manage more. - `kubernetes_secret.internal_monitor_sync` — pulls admin password and all referenced DB passwords from Vault `secret/viktor` at apply time. Secret key names are derived from monitor name (`DB_PASSWORD_<upper_snake>`). - `kubernetes_config_map_v1.internal_monitor_targets` — renders the target list to JSON for the sync container. - `kubernetes_cron_job_v1.internal_monitor_sync` — runs every 10 min, looks up monitors by name, creates if missing, patches if drifted, leaves id and history untouched when already in desired state. ## Why this approach (Option B, not a Terraform provider) The `louislam/uptime-kuma` Terraform provider does NOT exist in the public registry (verified — only a CLI tool of the same name). Option A from the task brief was therefore unavailable. Option B (idempotent K8s CronJob) matches the established pattern in the same module for `external-monitor-sync` — no new machinery introduced. ## Monitor 663: no-op on first sync Manual import was not possible (no provider → no state to import). The sync job correctly identifies the existing monitor by name and reports: Monitor MySQL Standalone (dbaas) (id=663) already in desired state Internal monitor sync complete DB heartbeats confirm monitor 663 stayed UP throughout with `status=1` and `Rows: 1` responses every 60s — no disruption. ## Vault key — left manual (by design) `secret/viktor` is not Terraform-managed anywhere in the repo (only read via `data "vault_kv_secret_v2"`). It is a user-edited Vault entry holding 135 keys. The `uptimekuma_db_password` key was added manually yesterday; this change does NOT codify it. Codifying the whole `secret/viktor` entry is out of scope for this task (would need a separate migration + rotation story). The sync job reads the existing value at apply time — so if the value is ever rotated in Vault, the next sync picks it up. ## Plan + apply Plan: 3 to add, 0 to change, 0 to destroy. Apply complete! Resources: 3 added, 0 changed, 0 destroyed. Re-plan: No changes. Your infrastructure matches the configuration. Also updated `.claude/skills/uptime-kuma/SKILL.md` with the new pattern. Closes: code-ed2
This commit is contained in:
parent
d3bdf87676
commit
50e8184d99
2 changed files with 206 additions and 0 deletions
|
|
@ -155,3 +155,19 @@ Common port is 80. Exceptions:
|
||||||
3. Add `time.sleep(0.3)` between bulk operations to avoid overloading
|
3. Add `time.sleep(0.3)` between bulk operations to avoid overloading
|
||||||
4. Homepage dashboard widget slug: `cluster-internal`
|
4. Homepage dashboard widget slug: `cluster-internal`
|
||||||
5. Cloudflare-proxied at `uptime.viktorbarzin.me`
|
5. Cloudflare-proxied at `uptime.viktorbarzin.me`
|
||||||
|
|
||||||
|
## Terraform-Managed Monitors
|
||||||
|
|
||||||
|
There is NO `louislam/uptime-kuma` Terraform provider. Two patterns exist for
|
||||||
|
declarative monitor management in this stack:
|
||||||
|
|
||||||
|
- **External HTTPS monitors** — auto-discovered from ingress annotations by the
|
||||||
|
`external-monitor-sync` CronJob (`*/10 * * * *`). Opt-out via
|
||||||
|
`uptime.viktorbarzin.me/external-monitor: "false"` on the ingress.
|
||||||
|
- **Internal monitors (DBs, non-HTTP)** — declared in the
|
||||||
|
`local.internal_monitors` list in `stacks/uptime-kuma/modules/uptime-kuma/main.tf`
|
||||||
|
and synced by the `internal-monitor-sync` CronJob. To add one, append to the
|
||||||
|
list (provide `name`, `type`, `database_connection_string`,
|
||||||
|
`database_password_vault_key`, `interval`, `retry_interval`, `max_retries`)
|
||||||
|
and `scripts/tg apply`. The sync is idempotent — looks up by name, creates
|
||||||
|
if missing, patches if drifted. Existing monitors keep their id and history.
|
||||||
|
|
|
||||||
|
|
@ -520,3 +520,193 @@ PYEOF
|
||||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# =============================================================================
|
||||||
|
# Internal Monitor Sync
|
||||||
|
# Declaratively manages monitors for internal services (databases, non-HTTP
|
||||||
|
# endpoints) that can't be discovered from ingress annotations. Idempotent:
|
||||||
|
# looks up monitors by name, creates if missing, patches if drifted.
|
||||||
|
#
|
||||||
|
# Why a CronJob and not a one-shot Job:
|
||||||
|
# - louislam/uptime-kuma has no Terraform provider (only a CLI tool).
|
||||||
|
# - UK v2 stores monitors in MariaDB (`uptimekuma` on mysql.dbaas); if the DB
|
||||||
|
# is wiped/restored we must re-create them.
|
||||||
|
# - CronJob self-heals drift (manual UI edits, UK restarts, DB restores).
|
||||||
|
#
|
||||||
|
# Managed monitors (name -> desired spec) are defined in local.internal_monitors
|
||||||
|
# below. Add new internal-service monitors there.
|
||||||
|
# =============================================================================
|
||||||
|
|
||||||
|
locals {
|
||||||
|
internal_monitors = [
|
||||||
|
{
|
||||||
|
name = "MySQL Standalone (dbaas)"
|
||||||
|
type = "mysql"
|
||||||
|
database_connection_string = "mysql://uptimekuma@mysql.dbaas.svc.cluster.local:3306"
|
||||||
|
database_password_vault_key = "uptimekuma_db_password"
|
||||||
|
interval = 60
|
||||||
|
retry_interval = 60
|
||||||
|
max_retries = 2
|
||||||
|
},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_secret" "internal_monitor_sync" {
|
||||||
|
metadata {
|
||||||
|
name = "internal-monitor-sync"
|
||||||
|
namespace = kubernetes_namespace.uptime-kuma.metadata[0].name
|
||||||
|
}
|
||||||
|
data = merge(
|
||||||
|
{ UPTIME_KUMA_PASSWORD = data.vault_kv_secret_v2.viktor.data["uptime_kuma_admin_password"] },
|
||||||
|
{
|
||||||
|
for m in local.internal_monitors :
|
||||||
|
"DB_PASSWORD_${upper(replace(m.name, "/[^A-Za-z0-9]/", "_"))}" =>
|
||||||
|
data.vault_kv_secret_v2.viktor.data[m.database_password_vault_key]
|
||||||
|
},
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_config_map_v1" "internal_monitor_targets" {
|
||||||
|
metadata {
|
||||||
|
name = "internal-monitor-targets"
|
||||||
|
namespace = kubernetes_namespace.uptime-kuma.metadata[0].name
|
||||||
|
}
|
||||||
|
data = {
|
||||||
|
"targets.json" = jsonencode([
|
||||||
|
for m in local.internal_monitors : {
|
||||||
|
name = m.name
|
||||||
|
type = m.type
|
||||||
|
database_connection_string = m.database_connection_string
|
||||||
|
password_env = "DB_PASSWORD_${upper(replace(m.name, "/[^A-Za-z0-9]/", "_"))}"
|
||||||
|
interval = m.interval
|
||||||
|
retry_interval = m.retry_interval
|
||||||
|
max_retries = m.max_retries
|
||||||
|
}
|
||||||
|
])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_cron_job_v1" "internal_monitor_sync" {
|
||||||
|
metadata {
|
||||||
|
name = "internal-monitor-sync"
|
||||||
|
namespace = kubernetes_namespace.uptime-kuma.metadata[0].name
|
||||||
|
}
|
||||||
|
spec {
|
||||||
|
concurrency_policy = "Forbid"
|
||||||
|
failed_jobs_history_limit = 3
|
||||||
|
successful_jobs_history_limit = 3
|
||||||
|
schedule = "*/10 * * * *"
|
||||||
|
job_template {
|
||||||
|
metadata {}
|
||||||
|
spec {
|
||||||
|
backoff_limit = 1
|
||||||
|
ttl_seconds_after_finished = 300
|
||||||
|
template {
|
||||||
|
metadata {}
|
||||||
|
spec {
|
||||||
|
container {
|
||||||
|
name = "sync"
|
||||||
|
image = "docker.io/library/python:3.12-alpine"
|
||||||
|
command = ["/bin/sh", "-c", <<-EOT
|
||||||
|
pip install --quiet --disable-pip-version-check uptime-kuma-api
|
||||||
|
python3 << 'PYEOF'
|
||||||
|
import json, os, time
|
||||||
|
from uptime_kuma_api import UptimeKumaApi, MonitorType
|
||||||
|
|
||||||
|
UPTIME_KUMA_URL = "http://uptime-kuma.uptime-kuma.svc.cluster.local"
|
||||||
|
UPTIME_KUMA_PASS = os.environ["UPTIME_KUMA_PASSWORD"]
|
||||||
|
|
||||||
|
with open("/config/targets.json") as f:
|
||||||
|
targets = json.load(f)
|
||||||
|
|
||||||
|
api = UptimeKumaApi(UPTIME_KUMA_URL, timeout=120, wait_events=0.2)
|
||||||
|
api.login("admin", UPTIME_KUMA_PASS)
|
||||||
|
|
||||||
|
existing = {m["name"]: m for m in api.get_monitors()}
|
||||||
|
|
||||||
|
for t in targets:
|
||||||
|
name = t["name"]
|
||||||
|
password = os.environ[t["password_env"]]
|
||||||
|
# MYSQL monitors use `databaseConnectionString` + `radiusPassword`
|
||||||
|
# (UK v2 re-uses the radiusPassword field for mysql auth — backwards compat).
|
||||||
|
desired = {
|
||||||
|
"type": MonitorType(t["type"]),
|
||||||
|
"name": name,
|
||||||
|
"databaseConnectionString": t["database_connection_string"],
|
||||||
|
"radiusPassword": password,
|
||||||
|
"interval": t["interval"],
|
||||||
|
"retryInterval": t["retry_interval"],
|
||||||
|
"maxretries": t["max_retries"],
|
||||||
|
}
|
||||||
|
if name not in existing:
|
||||||
|
print(f"Creating monitor: {name}")
|
||||||
|
api.add_monitor(**desired)
|
||||||
|
continue
|
||||||
|
m = existing[name]
|
||||||
|
drifted = (
|
||||||
|
m.get("databaseConnectionString") != desired["databaseConnectionString"]
|
||||||
|
or m.get("radiusPassword") != desired["radiusPassword"]
|
||||||
|
or m.get("interval") != desired["interval"]
|
||||||
|
or m.get("retryInterval") != desired["retryInterval"]
|
||||||
|
or m.get("maxretries") != desired["maxretries"]
|
||||||
|
)
|
||||||
|
if drifted:
|
||||||
|
print(f"Updating monitor {name} (id={m['id']})")
|
||||||
|
api.edit_monitor(
|
||||||
|
m["id"],
|
||||||
|
databaseConnectionString=desired["databaseConnectionString"],
|
||||||
|
radiusPassword=desired["radiusPassword"],
|
||||||
|
interval=desired["interval"],
|
||||||
|
retryInterval=desired["retryInterval"],
|
||||||
|
maxretries=desired["maxretries"],
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
print(f"Monitor {name} (id={m['id']}) already in desired state")
|
||||||
|
time.sleep(0.3)
|
||||||
|
|
||||||
|
api.disconnect()
|
||||||
|
print("Internal monitor sync complete")
|
||||||
|
PYEOF
|
||||||
|
EOT
|
||||||
|
]
|
||||||
|
env_from {
|
||||||
|
secret_ref {
|
||||||
|
name = kubernetes_secret.internal_monitor_sync.metadata[0].name
|
||||||
|
}
|
||||||
|
}
|
||||||
|
volume_mount {
|
||||||
|
name = "config"
|
||||||
|
mount_path = "/config"
|
||||||
|
read_only = true
|
||||||
|
}
|
||||||
|
resources {
|
||||||
|
requests = {
|
||||||
|
memory = "128Mi"
|
||||||
|
cpu = "10m"
|
||||||
|
}
|
||||||
|
limits = {
|
||||||
|
memory = "256Mi"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
volume {
|
||||||
|
name = "config"
|
||||||
|
config_map {
|
||||||
|
name = kubernetes_config_map_v1.internal_monitor_targets.metadata[0].name
|
||||||
|
}
|
||||||
|
}
|
||||||
|
dns_config {
|
||||||
|
option {
|
||||||
|
name = "ndots"
|
||||||
|
value = "2"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lifecycle {
|
||||||
|
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue