## Context Monitor id 663 "MySQL Standalone (dbaas)" was created manually yesterday via the `uptime-kuma-api` Python library when the dbaas stack migrated from InnoDB Cluster to standalone MySQL. It worked and was UP, but lived only in Uptime Kuma's MariaDB — if UK's DB were wiped or restored from an older backup, the monitor would be lost. ## This change Adds declarative, self-healing management for internal-service monitors (databases, non-HTTP endpoints) that can't be discovered from ingress annotations. Modelled on the existing `external-monitor-sync` CronJob. - `local.internal_monitors` — list of desired monitors (name, type, connection string, Vault password key, interval, retries). Seeded with the MySQL Standalone monitor. Add new entries here to manage more. - `kubernetes_secret.internal_monitor_sync` — pulls admin password and all referenced DB passwords from Vault `secret/viktor` at apply time. Secret key names are derived from monitor name (`DB_PASSWORD_<upper_snake>`). - `kubernetes_config_map_v1.internal_monitor_targets` — renders the target list to JSON for the sync container. - `kubernetes_cron_job_v1.internal_monitor_sync` — runs every 10 min, looks up monitors by name, creates if missing, patches if drifted, leaves id and history untouched when already in desired state. ## Why this approach (Option B, not a Terraform provider) The `louislam/uptime-kuma` Terraform provider does NOT exist in the public registry (verified — only a CLI tool of the same name). Option A from the task brief was therefore unavailable. Option B (idempotent K8s CronJob) matches the established pattern in the same module for `external-monitor-sync` — no new machinery introduced. ## Monitor 663: no-op on first sync Manual import was not possible (no provider → no state to import). The sync job correctly identifies the existing monitor by name and reports: Monitor MySQL Standalone (dbaas) (id=663) already in desired state Internal monitor sync complete DB heartbeats confirm monitor 663 stayed UP throughout with `status=1` and `Rows: 1` responses every 60s — no disruption. ## Vault key — left manual (by design) `secret/viktor` is not Terraform-managed anywhere in the repo (only read via `data "vault_kv_secret_v2"`). It is a user-edited Vault entry holding 135 keys. The `uptimekuma_db_password` key was added manually yesterday; this change does NOT codify it. Codifying the whole `secret/viktor` entry is out of scope for this task (would need a separate migration + rotation story). The sync job reads the existing value at apply time — so if the value is ever rotated in Vault, the next sync picks it up. ## Plan + apply Plan: 3 to add, 0 to change, 0 to destroy. Apply complete! Resources: 3 added, 0 changed, 0 destroyed. Re-plan: No changes. Your infrastructure matches the configuration. Also updated `.claude/skills/uptime-kuma/SKILL.md` with the new pattern. Closes: code-ed2
5.1 KiB
5.1 KiB
| name | description | author | version | date |
|---|---|---|---|---|
| uptime-kuma | Manage Uptime Kuma monitoring via the Python API. Use when: (1) User asks to add, remove, or list monitors, (2) User asks about service uptime or monitoring status, (3) User asks to check what's being monitored, (4) User deploys a new service and needs monitoring added, (5) User mentions "uptime", "monitoring", "health check", or "uptime kuma". Uptime Kuma v2 running in Kubernetes, managed via uptime-kuma-api Python library. | Claude Code | 1.0.0 | 2026-02-14 |
Uptime Kuma Monitoring Management
Overview
- URL:
https://uptime.viktorbarzin.me - Internal:
uptime-kuma.uptime-kuma.svc.cluster.local:80 - Image:
louislam/uptime-kuma:2 - Storage: NFS at
/mnt/main/uptime-kuma->/app/data - API Library:
uptime-kuma-api(pip, available via PYTHONPATH) - Credentials: admin / (from
UPTIME_KUMA_PASSWORDenv var)
Python API Access
Connection Pattern
import os
from uptime_kuma_api import UptimeKumaApi, MonitorType
api = UptimeKumaApi('https://uptime.viktorbarzin.me')
api.login('admin', os.environ.get('UPTIME_KUMA_PASSWORD', ''))
# ... operations ...
api.disconnect()
Execution
python3 -c "
import os
from uptime_kuma_api import UptimeKumaApi, MonitorType
api = UptimeKumaApi('https://uptime.viktorbarzin.me')
api.login('admin', os.environ.get('UPTIME_KUMA_PASSWORD', ''))
# ... your code ...
api.disconnect()
"
Common Operations
List All Monitors
monitors = api.get_monitors()
for m in monitors:
print(f'{m["id"]:3d} | {m["name"]:30s} | {m["type"]:15s} | interval={m["interval"]}s')
Add HTTP Monitor
api.add_monitor(
type=MonitorType.HTTP,
name="Service Name",
url="http://service.namespace.svc.cluster.local",
interval=120,
maxretries=2,
)
Add PING Monitor
api.add_monitor(
type=MonitorType.PING,
name="Host Name",
hostname="10.0.20.1",
interval=30,
maxretries=3,
)
Add PORT Monitor
api.add_monitor(
type=MonitorType.PORT,
name="Service Port",
hostname="service.namespace.svc.cluster.local",
port=8080,
interval=120,
maxretries=2,
)
Edit Monitor
api.edit_monitor(monitor_id, interval=120, maxretries=2)
Delete Monitor
api.delete_monitor(monitor_id)
Pause/Resume Monitor
api.pause_monitor(monitor_id)
api.resume_monitor(monitor_id)
Monitor Types
MonitorType.HTTP— HTTP(S) endpoint checkMonitorType.PING— ICMP pingMonitorType.PORT— TCP port checkMonitorType.POSTGRES— PostgreSQL connectionMonitorType.REDIS— Redis connectionMonitorType.DNS— DNS resolution check
Tiered Monitoring System
Monitors use tiered intervals to balance responsiveness with resource usage:
| Tier | Interval | Retries | Use For |
|---|---|---|---|
| 1 - Critical | 30s | 3 | Core infra (DNS, gateway, ingress, NFS, K8s API, auth, mail) |
| 2 - Important | 120s | 2 | Actively used services (Nextcloud, Immich, Vaultwarden, etc.) |
| 3 - Standard | 300s | 1 | Auxiliary/optional services (blog, games, tools) |
Tier Assignment Guidelines
- Tier 1: If it goes down, multiple other services fail or the cluster is unreachable
- Tier 2: User-facing services that are actively used daily
- Tier 3: Nice-to-have services, tools, dashboards
When Adding a New Service
Match the tier to the service's DEFCON level from CLAUDE.md:
- DEFCON 1-2 → Tier 1 (30s)
- DEFCON 3-4 → Tier 2 (120s)
- DEFCON 5 → Tier 3 (300s)
Internal Service URL Pattern
Most K8s services follow: http://<service-name>.<namespace>.svc.cluster.local:<port>
Common port is 80. Exceptions:
- Homepage: port 3000
- Ollama: port 11434
- Loki: port 3100 (use
/readyendpoint) - Traefik dashboard: port 8080 (use
/dashboard/path) - K8s API:
https://10.0.20.100:6443 - Immich: port 2283 (use
/api/server/ping)
Notes
- Uptime Kuma uses Socket.IO (WebSocket) for its API, not REST
- The
uptime-kuma-apiPython library wraps Socket.IO - Add
time.sleep(0.3)between bulk operations to avoid overloading - Homepage dashboard widget slug:
cluster-internal - Cloudflare-proxied at
uptime.viktorbarzin.me
Terraform-Managed Monitors
There is NO louislam/uptime-kuma Terraform provider. Two patterns exist for
declarative monitor management in this stack:
- External HTTPS monitors — auto-discovered from ingress annotations by the
external-monitor-syncCronJob (*/10 * * * *). Opt-out viauptime.viktorbarzin.me/external-monitor: "false"on the ingress. - Internal monitors (DBs, non-HTTP) — declared in the
local.internal_monitorslist instacks/uptime-kuma/modules/uptime-kuma/main.tfand synced by theinternal-monitor-syncCronJob. To add one, append to the list (providename,type,database_connection_string,database_password_vault_key,interval,retry_interval,max_retries) andscripts/tg apply. The sync is idempotent — looks up by name, creates if missing, patches if drifted. Existing monitors keep their id and history.