[monitoring] Scrape mailserver Dovecot exporter + near-limit alerts
Port 9166 (`dovecot-metrics`) is exposed on the mailserver Service but nothing was scraping it. Added a static `mailserver-dovecot` scrape job to `extraScrapeConfigs` (we run `prometheus-community/prometheus`, not `kube-prometheus-stack`, so no ServiceMonitor CRDs are available). Two alerts in a new `Mailserver Dovecot` rule group: - `DovecotConnectionsNearLimit` fires at ≥42/50 IMAP connections for 5m (85% of `mail_max_userip_connections = 50`). - `DovecotExporterDown` fires if the scrape target is unreachable for 10m (catches pod restarts + network issues). Originally drafted as `kubernetes_manifest` ServiceMonitor + PrometheusRule on `mailserver-beta1` branch; that commit is abandoned because the CRDs aren't installed. This path is functionally equivalent and plans cleanly. Closes: code-61v
This commit is contained in:
parent
6a75ed4809
commit
c36b41eabc
1 changed files with 29 additions and 0 deletions
|
|
@ -1977,6 +1977,26 @@ serverFiles:
|
||||||
severity: warning
|
severity: warning
|
||||||
annotations:
|
annotations:
|
||||||
summary: "Authentik outpost restarted {{ $value | printf \"%.0f\" }} times in 30m — check for OOM or crash loop"
|
summary: "Authentik outpost restarted {{ $value | printf \"%.0f\" }} times in 30m — check for OOM or crash loop"
|
||||||
|
- name: Mailserver Dovecot
|
||||||
|
# Dovecot exporter on mailserver:9166 exposes connection-count gauges.
|
||||||
|
# The Dovecot IMAP login service is capped by `mail_max_userip_connections`
|
||||||
|
# (50 per user-IP in the deployed config); fire at 85% so we can tune
|
||||||
|
# before real users get ECONNREFUSED.
|
||||||
|
rules:
|
||||||
|
- alert: DovecotConnectionsNearLimit
|
||||||
|
expr: max(dovecot_imap_connected_users) >= 42
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Dovecot IMAP connections near cap ({{ $value | printf \"%.0f\" }} / 50) — review mail_max_userip_connections or investigate noisy client"
|
||||||
|
- alert: DovecotExporterDown
|
||||||
|
expr: up{job="mailserver-dovecot"} == 0
|
||||||
|
for: 10m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "Dovecot exporter unreachable for 10m — check mailserver pod health + port 9166"
|
||||||
- name: Infrastructure Drift
|
- name: Infrastructure Drift
|
||||||
# Metrics pushed by .woodpecker/drift-detection.yml after each cron run.
|
# Metrics pushed by .woodpecker/drift-detection.yml after each cron run.
|
||||||
# See Wave 7 of the state-drift consolidation plan.
|
# See Wave 7 of the state-drift consolidation plan.
|
||||||
|
|
@ -2011,6 +2031,15 @@ serverFiles:
|
||||||
summary: "{{ $value | printf \"%.0f\" }} stacks drifting — likely a systemic cause (new admission webhook, provider upgrade). Check the most recent drift-detection run in Woodpecker."
|
summary: "{{ $value | printf \"%.0f\" }} stacks drifting — likely a systemic cause (new admission webhook, provider upgrade). Check the most recent drift-detection run in Woodpecker."
|
||||||
|
|
||||||
extraScrapeConfigs: |
|
extraScrapeConfigs: |
|
||||||
|
- job_name: 'mailserver-dovecot'
|
||||||
|
# Dovecot exporter lives on the mailserver pod; port 9166 is exposed by
|
||||||
|
# the mailserver Service (`dovecot-metrics`). Kube-prometheus-stack (with
|
||||||
|
# ServiceMonitor CRDs) isn't deployed here, so we scrape by service DNS.
|
||||||
|
static_configs:
|
||||||
|
- targets:
|
||||||
|
- "mailserver.mailserver.svc.cluster.local:9166"
|
||||||
|
metrics_path: '/metrics'
|
||||||
|
scrape_interval: 30s
|
||||||
- job_name: 'proxmox-host'
|
- job_name: 'proxmox-host'
|
||||||
static_configs:
|
static_configs:
|
||||||
- targets:
|
- targets:
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue