Adds check #14 that queries Uptime Kuma API for application-level
monitor status, complementing the kubectl-level checks with HTTP/ping
health data. Reports down monitors by name with PASS/WARN/FAIL thresholds.
Replace deprecated wildcard containerd mirror with per-registry
config_path approach. Add proxy containers for ghcr.io, quay.io,
registry.k8s.io, and reg.kyverno.io on the docker-registry VM.
Set static IP for docker-registry VM to avoid DHCP issues.
Add new Kubernetes service for OpenClaw gateway connected to in-cluster
Ollama, with kubectl/terraform/git access for infrastructure management.
Protected behind Authentik SSO.
Fires after 5m if the haos Prometheus scrape target is unreachable.
Covers the HTTP API endpoint which shares the same process as the
WebSocket API used by the mobile app.
- meshcentral: rename port from "https" to "http" — MeshCentral serves
plain HTTP when REVERSE_PROXY=true, but Traefik inferred HTTPS from the
port name, causing 100% 5xx errors
- osm-routing/otp: scale to 0 — TfL GTFS data expired, OTP crash-loops
trying to build graph with no valid transit trips
- wireguard: add prometheus.io/port=9586 annotation — without it,
Prometheus tried scraping all container ports (51820 UDP, 80)
- travel-blog: remove stale prometheus.io annotations and dead port 9113
— nginx-exporter sidecar was commented out but annotations remained
- dawarich: remove prometheus.io annotations — exporter env vars are
commented out so nothing listens on port 9394
- monitoring: raise CPU temp threshold 60°C→75°C (E5-2699 v4 Tcase is
79°C), lower registry cache threshold 50%→25%, add minimum traffic
floor (>0.1 req/s) to 4xx/5xx rate alerts to prevent false positives
on low-traffic services
- Switch acquisition from ingress-nginx to traefik namespace/pods
- Change collection from crowdsecurity/nginx to crowdsecurity/traefik
- Add Slack notification plugin for ban/captcha decisions
- Wire alertmanager_slack_api_url through to CrowdSec module
- Add color coding (red/green) to Slack alerts, show alertname in title
- Use summary annotation in Slack text (description was always empty)
- Format all alert summaries consistently: value with units and threshold
- Fix ratio expressions (CPU/memory) to display as percentages
- Fix "failiure" typo, capitalize Tailscale
The packruler/rewrite-body plugin (used for rybbit analytics injection)
fails to decompress gzip responses with "flate: corrupt input before
offset 5", corrupting the response body. This broke HA Companion app's
external_auth flow and WebSocket connections on ha-sofia.
Fix: add a strip-accept-encoding middleware that removes Accept-Encoding
from requests when rybbit is active, forcing backends to send uncompressed
responses that the plugin can safely process.
Also add extra_middlewares variable to reverse_proxy factory for
extensibility.
Enhance ingress_factory with full_host, extra_middlewares, and
skip_default_rate_limit variables. Fix TLS hosts bug to use
effective_host. Migrate 13 services from raw kubernetes_ingress_v1
resources to centralized ingress_factory module calls, removing
manual rybbit middleware CRDs where the factory now handles them.
When deploying a new service, the cloudflared module must also be applied
to create the Cloudflare DNS record. Updated CLAUDE.md and setup-project skill.
Add grampsweb module with web app + Celery worker in a single pod,
using shared Redis (DB 2/3), NFS storage, email via mailserver,
and Ollama AI integration. Available at family.viktorbarzin.me.
Apple Health data visualization app (Svelte + FastAPI + Caddy).
Uses shared PostgreSQL via DBaaS, NFS storage for uploads,
accessible at health.viktorbarzin.me.