[mailserver] Retire Dovecot exporter + scrape + alerts [ci skip]
## Context
code-vnc confirmed `viktorbarzin/dovecot_exporter` cannot produce real
metrics against docker-mailserver 15.0.0's Dovecot 2.3.19 — the
exporter speaks the pre-2.3 `old_stats` FIFO protocol, which Dovecot
2.3 deprecated in favour of `service stats` + `doveadm-server` with
a different wire format. The scrape only ever returned
`dovecot_up{scope="user"} 0`.
code-1ik listed two paths: (a) switch to a Dovecot 2.3+ exporter, or
(b) retire the exporter + scrape + alerts. Picking (b) — carrying a
no-op exporter + scrape + alert group taxes cluster resources,
clutters Prometheus /targets, and tees up an alert that can never
fire correctly. If a future session needs real Dovecot stats, reach
for a known-good exporter (e.g., jtackaberry/dovecot_exporter) and
rebuild this scaffolding.
## This change
### mailserver stack
- Removes the `dovecot-exporter` container from
`kubernetes_deployment.mailserver` (was ~28 lines). Pod now
runs a single `docker-mailserver` container.
- Removes `kubernetes_service.mailserver_metrics` (ClusterIP Service
added in code-izl). The `mailserver` LoadBalancer (ports 25, 465,
587, 993) is unaffected.
- Drops the dovecot.cf comment documenting the failed code-vnc
attempt — the documentation survives here + in bd code-vnc /
code-1ik.
### monitoring stack
- Removes `job_name: 'mailserver-dovecot'` from `extraScrapeConfigs`.
- Removes the `Mailserver Dovecot` PrometheusRule group
(`DovecotConnectionsNearLimit`, `DovecotExporterDown`).
- Inline comments in both files point future work at code-1ik's
decision record.
Prometheus configmap-reload picked up the change; scrape target set
now has zero entries for `mailserver-dovecot`. Pod rolled cleanly to
1/1 Running.
## What is NOT in this change
- No replacement exporter — deliberate. The alert that was removed
was a false-signal alert; its removal returns cluster alerting to
a correct, lower-noise state.
- mailserver MetalLB Service + SMTP/IMAP ports — unchanged.
- `auth_failure_delay`, `mail_max_userip_connections` — stay; those
are unrelated to stats export.
## Test Plan
### Automated
```
$ kubectl get pod -n mailserver -l app=mailserver
NAME READY STATUS RESTARTS AGE
mailserver-78589bfd95-swz6h 1/1 Running 0 49s
$ kubectl get svc -n mailserver
NAME TYPE PORT(S)
mailserver LoadBalancer 25/TCP,465/TCP,587/TCP,993/TCP
roundcubemail ClusterIP 80/TCP
# mailserver-metrics gone
$ kubectl exec -n monitoring <prom-pod> -c prometheus-server -- \
wget -qO- 'http://localhost:9090/api/v1/targets?scrapePool=mailserver-dovecot'
{"status":"success","data":{"activeTargets":[]}}
```
### Manual Verification
1. E2E probe `email-roundtrip-monitor` keeps succeeding (20-min cadence)
2. `EmailRoundtripFailing` stays green — proves IMAP is healthy even
without the exporter signal
3. Prometheus `/alerts` page no longer shows DovecotConnectionsNearLimit
or DovecotExporterDown
Closes: code-1ik
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
137404a6a2
commit
a5df175a67
2 changed files with 16 additions and 97 deletions
|
|
@ -139,17 +139,6 @@ resource "kubernetes_config_map" "mailserver_config" {
|
|||
# attempt waits 5s before responding, stretching a 1000-password
|
||||
# dictionary attack from <1s to ~85min. Addresses code-9mi.
|
||||
auth_failure_delay = 5s
|
||||
# NOTE (code-vnc 2026-04-19): `viktorbarzin/dovecot_exporter`
|
||||
# expects the legacy old_stats FIFO wire protocol. Dovecot 2.3 still
|
||||
# supports the `old_stats` plugin, but docker-mailserver 15.0.0
|
||||
# ships `service stats` (new architecture) as the default. Mixing
|
||||
# the two — enabling old_stats + declaring `service old-stats
|
||||
# unix_listener stats-reader` — makes `doveadm stats dump` fail
|
||||
# with "Failed to read VERSION line" and the exporter loops on
|
||||
# "Input does not provide any columns". A real fix requires either
|
||||
# a newer exporter that speaks Dovecot 2.3 `doveadm-server` /
|
||||
# HTTP stats, or retiring the exporter entirely. Tracked as a
|
||||
# follow-up task.
|
||||
EOF
|
||||
fail2ban_conf = <<-EOF
|
||||
[DEFAULT]
|
||||
|
|
@ -467,33 +456,6 @@ resource "kubernetes_deployment" "mailserver" {
|
|||
|
||||
}
|
||||
|
||||
container {
|
||||
name = "dovecot-exporter"
|
||||
image = "viktorbarzin/dovecot_exporter@sha256:1114224c9bf0261ca8e9949a6b42d3c5a2c923d34ca4593f6b62f034daf14fc5"
|
||||
command = [
|
||||
"/dovecot_exporter/exporter",
|
||||
"--dovecot.socket-path=/var/run/dovecot/stats-reader"
|
||||
]
|
||||
image_pull_policy = "IfNotPresent"
|
||||
port {
|
||||
name = "dovecotexporter"
|
||||
container_port = 9166
|
||||
protocol = "TCP"
|
||||
}
|
||||
volume_mount {
|
||||
name = "var-run-dovecot"
|
||||
mount_path = "/var/run/dovecot"
|
||||
}
|
||||
resources {
|
||||
requests = {
|
||||
cpu = "10m"
|
||||
memory = "32Mi"
|
||||
}
|
||||
limits = {
|
||||
memory = "32Mi"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
volume {
|
||||
name = "config"
|
||||
|
|
@ -597,35 +559,13 @@ resource "kubernetes_service" "mailserver" {
|
|||
}
|
||||
}
|
||||
|
||||
# Split the Dovecot metrics port off the public LB and onto its own
|
||||
# ClusterIP Service. Port 9166 was only LAN-routable via 10.0.20.202
|
||||
# but was over-exposed for a Prometheus-internal metric. Addresses
|
||||
# code-izl. Prometheus scrape target follows in
|
||||
# stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
|
||||
# (updated to `mailserver-metrics.mailserver.svc.cluster.local:9166`).
|
||||
resource "kubernetes_service" "mailserver_metrics" {
|
||||
metadata {
|
||||
name = "mailserver-metrics"
|
||||
namespace = kubernetes_namespace.mailserver.metadata[0].name
|
||||
labels = {
|
||||
app = "mailserver"
|
||||
}
|
||||
}
|
||||
|
||||
spec {
|
||||
type = "ClusterIP"
|
||||
selector = {
|
||||
app = "mailserver"
|
||||
}
|
||||
|
||||
port {
|
||||
name = "dovecot-metrics"
|
||||
protocol = "TCP"
|
||||
port = 9166
|
||||
target_port = 9166
|
||||
}
|
||||
}
|
||||
}
|
||||
# The `mailserver-metrics` ClusterIP Service (formerly split from the
|
||||
# main LB in code-izl) was retired in code-1ik when the Dovecot
|
||||
# exporter was removed — the exporter spoke the pre-Dovecot-2.3
|
||||
# old_stats protocol which docker-mailserver 15.0.0 no longer
|
||||
# emits, so the scrape was a no-op. If a working exporter is ever
|
||||
# re-introduced, add back: ClusterIP Service exposing port 9166
|
||||
# with selector app=mailserver.
|
||||
|
||||
# =============================================================================
|
||||
# E2E Email Roundtrip Monitor
|
||||
|
|
|
|||
|
|
@ -1977,26 +1977,10 @@ serverFiles:
|
|||
severity: warning
|
||||
annotations:
|
||||
summary: "Authentik outpost restarted {{ $value | printf \"%.0f\" }} times in 30m — check for OOM or crash loop"
|
||||
- name: Mailserver Dovecot
|
||||
# Dovecot exporter on mailserver:9166 exposes connection-count gauges.
|
||||
# The Dovecot IMAP login service is capped by `mail_max_userip_connections`
|
||||
# (50 per user-IP in the deployed config); fire at 85% so we can tune
|
||||
# before real users get ECONNREFUSED.
|
||||
rules:
|
||||
- alert: DovecotConnectionsNearLimit
|
||||
expr: max(dovecot_imap_connected_users) >= 42
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Dovecot IMAP connections near cap ({{ $value | printf \"%.0f\" }} / 50) — review mail_max_userip_connections or investigate noisy client"
|
||||
- alert: DovecotExporterDown
|
||||
expr: up{job="mailserver-dovecot"} == 0
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Dovecot exporter unreachable for 10m — check mailserver pod health + port 9166"
|
||||
# Mailserver Dovecot alerts were removed with the exporter in
|
||||
# code-1ik (viktorbarzin/dovecot_exporter incompatible with
|
||||
# Dovecot 2.3 stats architecture). Re-add the rule group if a
|
||||
# working exporter is introduced.
|
||||
- name: Infrastructure Drift
|
||||
# Metrics pushed by .woodpecker/drift-detection.yml after each cron run.
|
||||
# See Wave 7 of the state-drift consolidation plan.
|
||||
|
|
@ -2031,16 +2015,11 @@ serverFiles:
|
|||
summary: "{{ $value | printf \"%.0f\" }} stacks drifting — likely a systemic cause (new admission webhook, provider upgrade). Check the most recent drift-detection run in Woodpecker."
|
||||
|
||||
extraScrapeConfigs: |
|
||||
- job_name: 'mailserver-dovecot'
|
||||
# Dovecot exporter lives on the mailserver pod; port 9166 is exposed by
|
||||
# the dedicated ClusterIP Service `mailserver-metrics` (split from the
|
||||
# public LB in code-izl). Kube-prometheus-stack (with ServiceMonitor
|
||||
# CRDs) isn't deployed here, so we scrape by service DNS.
|
||||
static_configs:
|
||||
- targets:
|
||||
- "mailserver-metrics.mailserver.svc.cluster.local:9166"
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 30s
|
||||
# The `mailserver-dovecot` scrape job was retired in code-1ik together
|
||||
# with the Dovecot exporter. docker-mailserver 15.0.0's Dovecot 2.3
|
||||
# doesn't emit the old_stats protocol the exporter expected, so the
|
||||
# scrape only ever returned `dovecot_up{scope="user"} 0`. Re-add here
|
||||
# if a working exporter is introduced.
|
||||
- job_name: 'proxmox-host'
|
||||
static_configs:
|
||||
- targets:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue