infra/stacks/uptime-kuma
Viktor Barzin f6812fe69f [uptime-kuma] Support per-ingress probe path annotation
## Context

The `external-monitor-sync` CronJob probed `https://<host>/` for every
`*.viktorbarzin.me` ingress. Homepages frequently return 200 (or
allow-listed 30x/40x) even when the backend or DB is broken, producing
false-negatives — the forgejo outage on 2026-04-17 was not caught for
this reason: `/` returned a login page while `/api/healthz` returned
503 from the DB probe.

Manual monitor edits don't stick: the next sync is create-if-missing
only, so a deleted monitor gets recreated pointing at `/` again.

## This change

Teaches the sync three things:

1. **Reads a new annotation** `uptime.viktorbarzin.me/external-monitor-path`.
   The annotation value is appended as the probe path; default `/`
   preserves today's behaviour for every ingress that hasn't opted in.
2. **Tightens accepted status codes** when an explicit path is set:
   `['200-299']` (strict — we expect a real healthz). The default `/`
   path keeps the existing lenient set `['200-299','300-399','400-499']`
   because homepages routinely 30x redirect or 40x on missing auth.
3. **Updates existing monitors** when the target URL or accepted
   status codes drift. Previously the loop was create-if-missing only,
   so annotating an already-monitored ingress had no effect until the
   monitor was deleted. Now re-running the sync after changing the
   annotation converges the live monitor.

## What is NOT in this change

- No change to the Ingress annotations on any individual stack. Each
  service that wants a non-`/` probe path opts in separately.
- No change to the ConfigMap fallback payload shape — legacy entries
  still get the lenient status codes.
- Monitor DB state in Uptime Kuma's SQLite is untouched at plan time;
  the sync CronJob is what reconciles state on each run.

## Flow

```
  ingress annotation           CronJob Python
  ------------------           --------------
  (none)                 -->   url = https://host/        codes = lenient
  external-monitor-path  -->   url = https://host<path>   codes = strict ['200-299']
  ^^ "/api/healthz"            https://host/api/healthz   codes = ['200-299']

  existing monitor + drifted target url  -->  api.edit_monitor(id, url=..., accepted_statuscodes=...)
```

## Test Plan

### Automated

- `terraform fmt -check -recursive stacks/uptime-kuma` — exit 0.
- `scripts/tg plan` on `stacks/uptime-kuma` — `Plan: 0 to add, 1 to
  change, 0 to destroy`. The single in-place change is the CronJob
  command (Python heredoc re-rendered). No other resources drift.
- Embedded Python compiles: extracted the `PYEOF` block and ran
  `python3 -m py_compile` — OK.

### Manual Verification

1. Annotate an ingress: `kubectl annotate ingress/<name> -n <ns> uptime.viktorbarzin.me/external-monitor-path=/api/healthz`
2. Trigger sync early: `kubectl -n uptime-kuma create job --from=cronjob/external-monitor-sync external-monitor-sync-manual`
3. Expected log line:
   `Updating monitor [External] <name>: https://host/ -> https://host/api/healthz (codes ['200-299','300-399','400-499'] -> ['200-299'])`
4. Inspect monitor in Uptime Kuma UI: URL and accepted status codes
   reflect the annotation.
5. Final summary line includes updated count:
   `Sync complete: 0 created, 1 updated, 0 deleted, N unchanged`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 22:06:23 +00:00
..
modules/uptime-kuma [uptime-kuma] Support per-ingress probe path annotation 2026-04-17 22:06:23 +00:00
main.tf feat: add external monitoring for all Cloudflare-proxied services 2026-04-14 19:04:45 +00:00
secrets extract remaining 19 modules from platform, complete stack split [ci skip] 2026-03-17 21:42:16 +00:00
terragrunt.hcl extract remaining 19 modules from platform, complete stack split [ci skip] 2026-03-17 21:42:16 +00:00