Viktor Barzin
bd41bb9230
fix: cluster healthcheck fixes + Authentik upgrade to 2026.2.2
...
- Authentik: upgrade 2025.10.3 → 2025.12.4 → 2026.2.2 with DB restore
and stepped migration. Switch to existingSecret, PgBouncer session mode.
- Mailserver: migrate email roundtrip probe from Mailgun to Brevo API
- Redis: fix HAProxy tcp-check regex (rstring), faster health intervals
- Nextcloud: fix Redis fallback to HAProxy service, update dependency
- MeshCentral: fix TLSOffload + certUrl init container for first-run
- Monitoring: remove authentik from latency alert exclusion
- Diun: simplify to webhook notifier, remove git auto-update
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 06:41:56 +00:00
Viktor Barzin
78dec8f0ad
add e2e email roundtrip monitoring
...
CronJob (every 30 min) sends test email via Mailgun API to
smoke-test@viktorbarzin.me , verifies IMAP delivery in spam@ catch-all,
deletes test email, pushes metrics to Pushgateway + Uptime Kuma.
Prometheus alerts: EmailRoundtripFailing, EmailRoundtripStale,
EmailRoundtripNeverRun. Uptime Kuma: SMTP/IMAP port checks + E2E push.
2026-03-25 22:50:22 +02:00
Viktor Barzin
ae36dc253b
extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]
...
Phase 2 of platform stack split. 5 more modules extracted into
independent stacks. All applied successfully with zero destroys.
Cloudflared now reads k8s_users from Vault directly to compute
user_domains. Woodpecker pipeline runs all 8 extracted stacks
in parallel. Memory bumped to 6Gi for 9 concurrent TF processes.
Platform reduced from 27 to 19 modules.
2026-03-17 21:34:11 +00:00