From 6ee283c2f05f5268cc3616f6020ec41a59a76076 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sun, 19 Apr 2026 15:19:06 +0000 Subject: [PATCH] [docs] Document external-monitor opt-out mechanism in monitoring.md The doc said monitors were created for everything in cloudflare_proxied_names, but since the k8s-api discovery rewrite the ConfigMap is a fallback only. Describe the opt-OUT semantics and how external_monitor=false on a factory call translates to the sync script's skip annotation. --- docs/architecture/monitoring.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/architecture/monitoring.md b/docs/architecture/monitoring.md index 5a7a120d..774a3356 100644 --- a/docs/architecture/monitoring.md +++ b/docs/architecture/monitoring.md @@ -75,7 +75,9 @@ Prometheus scrapes metrics from all cluster components and applications using Se ### External Monitoring -The `external-monitor-sync` CronJob (every 10min, `stacks/uptime-kuma/`) ensures Uptime Kuma has `[External] ` monitors for every service in `cloudflare_proxied_names`. These monitors test the full external access path (DNS → Cloudflare → Tunnel → Traefik → Service) from inside the cluster. The status-page-pusher groups them as "External Reachability" and pushes a `external_internal_divergence_count` metric to Pushgateway when services are externally down but internally up. Alert `ExternalAccessDivergence` fires after 15min of divergence. +The `external-monitor-sync` CronJob (every 10min, `stacks/uptime-kuma/`) ensures Uptime Kuma has `[External] ` monitors for externally-reachable ingresses. Discovery is **opt-OUT**: the script lists every ingress via the K8s API and creates a monitor for any host ending in `.viktorbarzin.me`, skipping only those annotated `uptime.viktorbarzin.me/external-monitor: "false"`. Both `ingress_factory` and the `reverse-proxy` factory emit that annotation when the caller sets `external_monitor = false`; leaving it null keeps the opt-in default (important for helm-provisioned ingresses that don't go through our factories). The legacy `cloudflare_proxied_names` ConfigMap is a fallback if the K8s API discovery fails. + +These monitors test the full external access path (DNS → Cloudflare → Tunnel → Traefik → Service) from inside the cluster. The status-page-pusher groups them as "External Reachability" and pushes a `external_internal_divergence_count` metric to Pushgateway when services are externally down but internally up. Alert `ExternalAccessDivergence` fires after 15min of divergence. Data flows from targets through Prometheus storage to Grafana dashboards. Applications emit logs to stdout/stderr which are aggregated by Loki and queryable through Grafana's log viewer.