[docs] Document external-monitor opt-out mechanism in monitoring.md
The doc said monitors were created for everything in cloudflare_proxied_names, but since the k8s-api discovery rewrite the ConfigMap is a fallback only. Describe the opt-OUT semantics and how external_monitor=false on a factory call translates to the sync script's skip annotation.
This commit is contained in:
parent
af6574a006
commit
6ee283c2f0
1 changed files with 3 additions and 1 deletions
|
|
@ -75,7 +75,9 @@ Prometheus scrapes metrics from all cluster components and applications using Se
|
|||
|
||||
### External Monitoring
|
||||
|
||||
The `external-monitor-sync` CronJob (every 10min, `stacks/uptime-kuma/`) ensures Uptime Kuma has `[External] <service>` monitors for every service in `cloudflare_proxied_names`. These monitors test the full external access path (DNS → Cloudflare → Tunnel → Traefik → Service) from inside the cluster. The status-page-pusher groups them as "External Reachability" and pushes a `external_internal_divergence_count` metric to Pushgateway when services are externally down but internally up. Alert `ExternalAccessDivergence` fires after 15min of divergence.
|
||||
The `external-monitor-sync` CronJob (every 10min, `stacks/uptime-kuma/`) ensures Uptime Kuma has `[External] <service>` monitors for externally-reachable ingresses. Discovery is **opt-OUT**: the script lists every ingress via the K8s API and creates a monitor for any host ending in `.viktorbarzin.me`, skipping only those annotated `uptime.viktorbarzin.me/external-monitor: "false"`. Both `ingress_factory` and the `reverse-proxy` factory emit that annotation when the caller sets `external_monitor = false`; leaving it null keeps the opt-in default (important for helm-provisioned ingresses that don't go through our factories). The legacy `cloudflare_proxied_names` ConfigMap is a fallback if the K8s API discovery fails.
|
||||
|
||||
These monitors test the full external access path (DNS → Cloudflare → Tunnel → Traefik → Service) from inside the cluster. The status-page-pusher groups them as "External Reachability" and pushes a `external_internal_divergence_count` metric to Pushgateway when services are externally down but internally up. Alert `ExternalAccessDivergence` fires after 15min of divergence.
|
||||
|
||||
Data flows from targets through Prometheus storage to Grafana dashboards. Applications emit logs to stdout/stderr which are aggregated by Loki and queryable through Grafana's log viewer.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue