infra/.claude/skills/uptime-kuma/SKILL.md

---
name: uptime-kuma
description: |
  Manage Uptime Kuma monitoring via the Python API. Use when:
  (1) User asks to add, remove, or list monitors,
  (2) User asks about service uptime or monitoring status,
  (3) User asks to check what's being monitored,
  (4) User deploys a new service and needs monitoring added,
  (5) User mentions "uptime", "monitoring", "health check", or "uptime kuma".
  Uptime Kuma v2 running in Kubernetes, managed via uptime-kuma-api Python library.
author: Claude Code
version: 1.0.0
date: 2026-02-14
---

# Uptime Kuma Monitoring Management

## Overview
- **URL**: `https://uptime.viktorbarzin.me`
- **Internal**: `uptime-kuma.uptime-kuma.svc.cluster.local:80`
- **Image**: `louislam/uptime-kuma:2`
- **Storage**: NFS at `/mnt/main/uptime-kuma` -> `/app/data`
- **API Library**: `uptime-kuma-api` (pip, available via PYTHONPATH)
- **Credentials**: admin / (from `UPTIME_KUMA_PASSWORD` env var)

## Python API Access

### Connection Pattern
```python
import os
from uptime_kuma_api import UptimeKumaApi, MonitorType

api = UptimeKumaApi('https://uptime.viktorbarzin.me')
api.login('admin', os.environ.get('UPTIME_KUMA_PASSWORD', ''))

# ... operations ...

api.disconnect()
```

### Execution
```bash
python3 -c "
import os
from uptime_kuma_api import UptimeKumaApi, MonitorType
api = UptimeKumaApi('https://uptime.viktorbarzin.me')
api.login('admin', os.environ.get('UPTIME_KUMA_PASSWORD', ''))
# ... your code ...
api.disconnect()
"
```

### Common Operations

#### List All Monitors
```python
monitors = api.get_monitors()
for m in monitors:
    print(f'{m["id"]:3d} | {m["name"]:30s} | {m["type"]:15s} | interval={m["interval"]}s')
```

#### Add HTTP Monitor
```python
api.add_monitor(
    type=MonitorType.HTTP,
    name="Service Name",
    url="http://service.namespace.svc.cluster.local",
    interval=120,
    maxretries=2,
)
```

#### Add PING Monitor
```python
api.add_monitor(
    type=MonitorType.PING,
    name="Host Name",
    hostname="10.0.20.1",
    interval=30,
    maxretries=3,
)
```

#### Add PORT Monitor
```python
api.add_monitor(
    type=MonitorType.PORT,
    name="Service Port",
    hostname="service.namespace.svc.cluster.local",
    port=8080,
    interval=120,
    maxretries=2,
)
```

#### Edit Monitor
```python
api.edit_monitor(monitor_id, interval=120, maxretries=2)
```

#### Delete Monitor
```python
api.delete_monitor(monitor_id)
```

#### Pause/Resume Monitor
```python
api.pause_monitor(monitor_id)
api.resume_monitor(monitor_id)
```

## Monitor Types
- `MonitorType.HTTP` — HTTP(S) endpoint check
- `MonitorType.PING` — ICMP ping
- `MonitorType.PORT` — TCP port check
- `MonitorType.POSTGRES` — PostgreSQL connection
- `MonitorType.REDIS` — Redis connection
- `MonitorType.DNS` — DNS resolution check

## Tiered Monitoring System

Monitors use tiered intervals to balance responsiveness with resource usage:

| Tier | Interval | Retries | Use For |
|------|----------|---------|---------|
| **1 - Critical** | 30s | 3 | Core infra (DNS, gateway, ingress, NFS, K8s API, auth, mail) |
| **2 - Important** | 120s | 2 | Actively used services (Nextcloud, Immich, Vaultwarden, etc.) |
| **3 - Standard** | 300s | 1 | Auxiliary/optional services (blog, games, tools) |

### Tier Assignment Guidelines
- **Tier 1**: If it goes down, multiple other services fail or the cluster is unreachable
- **Tier 2**: User-facing services that are actively used daily
- **Tier 3**: Nice-to-have services, tools, dashboards

### When Adding a New Service
Match the tier to the service's DEFCON level from CLAUDE.md:
- DEFCON 1-2 → Tier 1 (30s)
- DEFCON 3-4 → Tier 2 (120s)
- DEFCON 5 → Tier 3 (300s)

## Internal Service URL Pattern
Most K8s services follow: `http://<service-name>.<namespace>.svc.cluster.local:<port>`

Common port is 80. Exceptions:
- Homepage: port 3000
- Ollama: port 11434
- Loki: port 3100 (use `/ready` endpoint)
- Traefik dashboard: port 8080 (use `/dashboard/` path)
- K8s API: `https://10.0.20.100:6443`
- Immich: port 2283 (use `/api/server/ping`)

## Notes
1. Uptime Kuma uses Socket.IO (WebSocket) for its API, not REST
2. The `uptime-kuma-api` Python library wraps Socket.IO
3. Add `time.sleep(0.3)` between bulk operations to avoid overloading
4. Homepage dashboard widget slug: `cluster-internal`
5. Cloudflare-proxied at `uptime.viktorbarzin.me`

## Terraform-Managed Monitors

There is NO `louislam/uptime-kuma` Terraform provider. Two patterns exist for
declarative monitor management in this stack:

- **External HTTPS monitors** — auto-discovered from ingress annotations by the
  `external-monitor-sync` CronJob (`*/10 * * * *`). Opt-out via
  `uptime.viktorbarzin.me/external-monitor: "false"` on the ingress.
- **Internal monitors (DBs, non-HTTP)** — declared in the
  `local.internal_monitors` list in `stacks/uptime-kuma/modules/uptime-kuma/main.tf`
  and synced by the `internal-monitor-sync` CronJob. To add one, append to the
  list (provide `name`, `type`, `database_connection_string`,
  `database_password_vault_key`, `interval`, `retry_interval`, `max_retries`)
  and `scripts/tg apply`. The sync is idempotent — looks up by name, creates
  if missing, patches if drifted. Existing monitors keep their id and history.
[ci skip] Add uptime-kuma management skill with tiered monitoring 2026-02-15 12:33:34 +00:00			`---`
			`name: uptime-kuma`
			`description: \|`
			`Manage Uptime Kuma monitoring via the Python API. Use when:`
			`(1) User asks to add, remove, or list monitors,`
			`(2) User asks about service uptime or monitoring status,`
			`(3) User asks to check what's being monitored,`
			`(4) User deploys a new service and needs monitoring added,`
			`(5) User mentions "uptime", "monitoring", "health check", or "uptime kuma".`
			`Uptime Kuma v2 running in Kubernetes, managed via uptime-kuma-api Python library.`
			`author: Claude Code`
			`version: 1.0.0`
			`date: 2026-02-14`
			`---`

			`# Uptime Kuma Monitoring Management`

			`## Overview`
			- URL: `https://uptime.viktorbarzin.me`
			- Internal: `uptime-kuma.uptime-kuma.svc.cluster.local:80`
			- Image: `louislam/uptime-kuma:2`
			- Storage: NFS at `/mnt/main/uptime-kuma` -> `/app/data`
[ci skip] Pass skill secrets to moltbot container and fix Python env - Add skill_secrets variable to moltbot module with HA tokens and Uptime Kuma password as container env vars - Install Python packages (requests, caldav, icalendar, uptime-kuma-api) in init container with PYTHONPATH for main container access - Update all skills to use python3 directly instead of ~/.venvs/claude venv path that doesn't exist in the container - Remove hardcoded Uptime Kuma password from skill, use env var 2026-02-17 21:53:32 +00:00			- API Library: `uptime-kuma-api` (pip, available via PYTHONPATH)
			- Credentials: admin / (from `UPTIME_KUMA_PASSWORD` env var)
[ci skip] Add uptime-kuma management skill with tiered monitoring 2026-02-15 12:33:34 +00:00
			`## Python API Access`

			`### Connection Pattern`
			```python
[ci skip] Pass skill secrets to moltbot container and fix Python env - Add skill_secrets variable to moltbot module with HA tokens and Uptime Kuma password as container env vars - Install Python packages (requests, caldav, icalendar, uptime-kuma-api) in init container with PYTHONPATH for main container access - Update all skills to use python3 directly instead of ~/.venvs/claude venv path that doesn't exist in the container - Remove hardcoded Uptime Kuma password from skill, use env var 2026-02-17 21:53:32 +00:00			`import os`
[ci skip] Add uptime-kuma management skill with tiered monitoring 2026-02-15 12:33:34 +00:00			`from uptime_kuma_api import UptimeKumaApi, MonitorType`

			`api = UptimeKumaApi('https://uptime.viktorbarzin.me')`
[ci skip] Pass skill secrets to moltbot container and fix Python env - Add skill_secrets variable to moltbot module with HA tokens and Uptime Kuma password as container env vars - Install Python packages (requests, caldav, icalendar, uptime-kuma-api) in init container with PYTHONPATH for main container access - Update all skills to use python3 directly instead of ~/.venvs/claude venv path that doesn't exist in the container - Remove hardcoded Uptime Kuma password from skill, use env var 2026-02-17 21:53:32 +00:00			`api.login('admin', os.environ.get('UPTIME_KUMA_PASSWORD', ''))`
[ci skip] Add uptime-kuma management skill with tiered monitoring 2026-02-15 12:33:34 +00:00
			`# ... operations ...`

			`api.disconnect()`
			```

			`### Execution`
			```bash
[ci skip] Pass skill secrets to moltbot container and fix Python env - Add skill_secrets variable to moltbot module with HA tokens and Uptime Kuma password as container env vars - Install Python packages (requests, caldav, icalendar, uptime-kuma-api) in init container with PYTHONPATH for main container access - Update all skills to use python3 directly instead of ~/.venvs/claude venv path that doesn't exist in the container - Remove hardcoded Uptime Kuma password from skill, use env var 2026-02-17 21:53:32 +00:00			`python3 -c "`
			`import os`
[ci skip] Add uptime-kuma management skill with tiered monitoring 2026-02-15 12:33:34 +00:00			`from uptime_kuma_api import UptimeKumaApi, MonitorType`
			`api = UptimeKumaApi('https://uptime.viktorbarzin.me')`
[ci skip] Pass skill secrets to moltbot container and fix Python env - Add skill_secrets variable to moltbot module with HA tokens and Uptime Kuma password as container env vars - Install Python packages (requests, caldav, icalendar, uptime-kuma-api) in init container with PYTHONPATH for main container access - Update all skills to use python3 directly instead of ~/.venvs/claude venv path that doesn't exist in the container - Remove hardcoded Uptime Kuma password from skill, use env var 2026-02-17 21:53:32 +00:00			`api.login('admin', os.environ.get('UPTIME_KUMA_PASSWORD', ''))`
[ci skip] Add uptime-kuma management skill with tiered monitoring 2026-02-15 12:33:34 +00:00			`# ... your code ...`
			`api.disconnect()`
			`"`
			```

			`### Common Operations`

			`#### List All Monitors`
			```python
			`monitors = api.get_monitors()`
			`for m in monitors:`
			`print(f'{m["id"]:3d} \| {m["name"]:30s} \| {m["type"]:15s} \| interval={m["interval"]}s')`
			```

			`#### Add HTTP Monitor`
			```python
			`api.add_monitor(`
			`type=MonitorType.HTTP,`
			`name="Service Name",`
			`url="http://service.namespace.svc.cluster.local",`
			`interval=120,`
			`maxretries=2,`
			`)`
			```

			`#### Add PING Monitor`
			```python
			`api.add_monitor(`
			`type=MonitorType.PING,`
			`name="Host Name",`
			`hostname="10.0.20.1",`
			`interval=30,`
			`maxretries=3,`
			`)`
			```

			`#### Add PORT Monitor`
			```python
			`api.add_monitor(`
			`type=MonitorType.PORT,`
			`name="Service Port",`
			`hostname="service.namespace.svc.cluster.local",`
			`port=8080,`
			`interval=120,`
			`maxretries=2,`
			`)`
			```

			`#### Edit Monitor`
			```python
			`api.edit_monitor(monitor_id, interval=120, maxretries=2)`
			```

			`#### Delete Monitor`
			```python
			`api.delete_monitor(monitor_id)`
			```

			`#### Pause/Resume Monitor`
			```python
			`api.pause_monitor(monitor_id)`
			`api.resume_monitor(monitor_id)`
			```

			`## Monitor Types`
			- `MonitorType.HTTP` — HTTP(S) endpoint check
			- `MonitorType.PING` — ICMP ping
			- `MonitorType.PORT` — TCP port check
			- `MonitorType.POSTGRES` — PostgreSQL connection
			- `MonitorType.REDIS` — Redis connection
			- `MonitorType.DNS` — DNS resolution check

			`## Tiered Monitoring System`

			`Monitors use tiered intervals to balance responsiveness with resource usage:`

			`\| Tier \| Interval \| Retries \| Use For \|`
			`\|------\|----------\|---------\|---------\|`
			`\| 1 - Critical \| 30s \| 3 \| Core infra (DNS, gateway, ingress, NFS, K8s API, auth, mail) \|`
			`\| 2 - Important \| 120s \| 2 \| Actively used services (Nextcloud, Immich, Vaultwarden, etc.) \|`
			`\| 3 - Standard \| 300s \| 1 \| Auxiliary/optional services (blog, games, tools) \|`

			`### Tier Assignment Guidelines`
			`- Tier 1: If it goes down, multiple other services fail or the cluster is unreachable`
			`- Tier 2: User-facing services that are actively used daily`
			`- Tier 3: Nice-to-have services, tools, dashboards`

			`### When Adding a New Service`
			`Match the tier to the service's DEFCON level from CLAUDE.md:`
			`- DEFCON 1-2 → Tier 1 (30s)`
			`- DEFCON 3-4 → Tier 2 (120s)`
			`- DEFCON 5 → Tier 3 (300s)`

			`## Internal Service URL Pattern`
			Most K8s services follow: `http://<service-name>.<namespace>.svc.cluster.local:<port>`

			`Common port is 80. Exceptions:`
			`- Homepage: port 3000`
			`- Ollama: port 11434`
			- Loki: port 3100 (use `/ready` endpoint)
			- Traefik dashboard: port 8080 (use `/dashboard/` path)
			- K8s API: `https://10.0.20.100:6443`
			- Immich: port 2283 (use `/api/server/ping`)

			`## Notes`
			`1. Uptime Kuma uses Socket.IO (WebSocket) for its API, not REST`
			2. The `uptime-kuma-api` Python library wraps Socket.IO
			3. Add `time.sleep(0.3)` between bulk operations to avoid overloading
			4. Homepage dashboard widget slug: `cluster-internal`
			5. Cloudflare-proxied at `uptime.viktorbarzin.me`
[uptime-kuma] Codify MySQL monitor (id=663) via idempotent sync CronJob ## Context Monitor id 663 "MySQL Standalone (dbaas)" was created manually yesterday via the `uptime-kuma-api` Python library when the dbaas stack migrated from InnoDB Cluster to standalone MySQL. It worked and was UP, but lived only in Uptime Kuma's MariaDB — if UK's DB were wiped or restored from an older backup, the monitor would be lost. ## This change Adds declarative, self-healing management for internal-service monitors (databases, non-HTTP endpoints) that can't be discovered from ingress annotations. Modelled on the existing `external-monitor-sync` CronJob. - `local.internal_monitors` — list of desired monitors (name, type, connection string, Vault password key, interval, retries). Seeded with the MySQL Standalone monitor. Add new entries here to manage more. - `kubernetes_secret.internal_monitor_sync` — pulls admin password and all referenced DB passwords from Vault `secret/viktor` at apply time. Secret key names are derived from monitor name (`DB_PASSWORD_<upper_snake>`). - `kubernetes_config_map_v1.internal_monitor_targets` — renders the target list to JSON for the sync container. - `kubernetes_cron_job_v1.internal_monitor_sync` — runs every 10 min, looks up monitors by name, creates if missing, patches if drifted, leaves id and history untouched when already in desired state. ## Why this approach (Option B, not a Terraform provider) The `louislam/uptime-kuma` Terraform provider does NOT exist in the public registry (verified — only a CLI tool of the same name). Option A from the task brief was therefore unavailable. Option B (idempotent K8s CronJob) matches the established pattern in the same module for `external-monitor-sync` — no new machinery introduced. ## Monitor 663: no-op on first sync Manual import was not possible (no provider → no state to import). The sync job correctly identifies the existing monitor by name and reports: Monitor MySQL Standalone (dbaas) (id=663) already in desired state Internal monitor sync complete DB heartbeats confirm monitor 663 stayed UP throughout with `status=1` and `Rows: 1` responses every 60s — no disruption. ## Vault key — left manual (by design) `secret/viktor` is not Terraform-managed anywhere in the repo (only read via `data "vault_kv_secret_v2"`). It is a user-edited Vault entry holding 135 keys. The `uptimekuma_db_password` key was added manually yesterday; this change does NOT codify it. Codifying the whole `secret/viktor` entry is out of scope for this task (would need a separate migration + rotation story). The sync job reads the existing value at apply time — so if the value is ever rotated in Vault, the next sync picks it up. ## Plan + apply Plan: 3 to add, 0 to change, 0 to destroy. Apply complete! Resources: 3 added, 0 changed, 0 destroyed. Re-plan: No changes. Your infrastructure matches the configuration. Also updated `.claude/skills/uptime-kuma/SKILL.md` with the new pattern. Closes: code-ed2 2026-04-18 12:04:17 +00:00
			`## Terraform-Managed Monitors`

			There is NO `louislam/uptime-kuma` Terraform provider. Two patterns exist for
			`declarative monitor management in this stack:`

			`- External HTTPS monitors — auto-discovered from ingress annotations by the`
			`external-monitor-sync` CronJob (`/10 * * *`). Opt-out via
			`uptime.viktorbarzin.me/external-monitor: "false"` on the ingress.
			`- Internal monitors (DBs, non-HTTP) — declared in the`
			`local.internal_monitors` list in `stacks/uptime-kuma/modules/uptime-kuma/main.tf`
			and synced by the `internal-monitor-sync` CronJob. To add one, append to the
			list (provide `name`, `type`, `database_connection_string`,
			`database_password_vault_key`, `interval`, `retry_interval`, `max_retries`)
			and `scripts/tg apply`. The sync is idempotent — looks up by name, creates
			`if missing, patches if drifted. Existing monitors keep their id and history.`