fix: cluster healthcheck fixes + Authentik upgrade to 2026.2.2

- Authentik: upgrade 2025.10.3 → 2025.12.4 → 2026.2.2 with DB restore
  and stepped migration. Switch to existingSecret, PgBouncer session mode.
- Mailserver: migrate email roundtrip probe from Mailgun to Brevo API
- Redis: fix HAProxy tcp-check regex (rstring), faster health intervals
- Nextcloud: fix Redis fallback to HAProxy service, update dependency
- MeshCentral: fix TLSOffload + certUrl init container for first-run
- Monitoring: remove authentik from latency alert exclusion
- Diun: simplify to webhook notifier, remove git auto-update

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-15 06:41:56 +00:00
parent d31bbc9a18
commit bd41bb9230
11 changed files with 115 additions and 282 deletions

View file

@ -1594,10 +1594,10 @@ serverFiles:
- alert: HighServiceLatency
expr: |
(
sum(rate(traefik_service_request_duration_seconds_sum{service!~".*idrac.*|.*headscale.*|.*authentik.*"}[5m])) by (service)
/ sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*|.*authentik.*"}[5m])) by (service)
sum(rate(traefik_service_request_duration_seconds_sum{service!~".*idrac.*|.*headscale.*"}[5m])) by (service)
/ sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*"}[5m])) by (service)
) > 10
and sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*|.*authentik.*"}[5m])) by (service) > 0.01
and sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*"}[5m])) by (service) > 0.01
and on() (time() - process_start_time_seconds{job="prometheus"}) > 900
for: 5m
labels: