infra/stacks/mailserver
Viktor Barzin a5df175a67 [mailserver] Retire Dovecot exporter + scrape + alerts [ci skip]
## Context

code-vnc confirmed `viktorbarzin/dovecot_exporter` cannot produce real
metrics against docker-mailserver 15.0.0's Dovecot 2.3.19 — the
exporter speaks the pre-2.3 `old_stats` FIFO protocol, which Dovecot
2.3 deprecated in favour of `service stats` + `doveadm-server` with
a different wire format. The scrape only ever returned
`dovecot_up{scope="user"} 0`.

code-1ik listed two paths: (a) switch to a Dovecot 2.3+ exporter, or
(b) retire the exporter + scrape + alerts. Picking (b) — carrying a
no-op exporter + scrape + alert group taxes cluster resources,
clutters Prometheus /targets, and tees up an alert that can never
fire correctly. If a future session needs real Dovecot stats, reach
for a known-good exporter (e.g., jtackaberry/dovecot_exporter) and
rebuild this scaffolding.

## This change

### mailserver stack
- Removes the `dovecot-exporter` container from
  `kubernetes_deployment.mailserver` (was ~28 lines). Pod now
  runs a single `docker-mailserver` container.
- Removes `kubernetes_service.mailserver_metrics` (ClusterIP Service
  added in code-izl). The `mailserver` LoadBalancer (ports 25, 465,
  587, 993) is unaffected.
- Drops the dovecot.cf comment documenting the failed code-vnc
  attempt — the documentation survives here + in bd code-vnc /
  code-1ik.

### monitoring stack
- Removes `job_name: 'mailserver-dovecot'` from `extraScrapeConfigs`.
- Removes the `Mailserver Dovecot` PrometheusRule group
  (`DovecotConnectionsNearLimit`, `DovecotExporterDown`).
- Inline comments in both files point future work at code-1ik's
  decision record.

Prometheus configmap-reload picked up the change; scrape target set
now has zero entries for `mailserver-dovecot`. Pod rolled cleanly to
1/1 Running.

## What is NOT in this change

- No replacement exporter — deliberate. The alert that was removed
  was a false-signal alert; its removal returns cluster alerting to
  a correct, lower-noise state.
- mailserver MetalLB Service + SMTP/IMAP ports — unchanged.
- `auth_failure_delay`, `mail_max_userip_connections` — stay; those
  are unrelated to stats export.

## Test Plan

### Automated
```
$ kubectl get pod -n mailserver -l app=mailserver
NAME                          READY  STATUS   RESTARTS  AGE
mailserver-78589bfd95-swz6h   1/1    Running  0         49s

$ kubectl get svc -n mailserver
NAME            TYPE          PORT(S)
mailserver      LoadBalancer  25/TCP,465/TCP,587/TCP,993/TCP
roundcubemail   ClusterIP     80/TCP
# mailserver-metrics gone

$ kubectl exec -n monitoring <prom-pod> -c prometheus-server -- \
    wget -qO- 'http://localhost:9090/api/v1/targets?scrapePool=mailserver-dovecot'
{"status":"success","data":{"activeTargets":[]}}
```

### Manual Verification
1. E2E probe `email-roundtrip-monitor` keeps succeeding (20-min cadence)
2. `EmailRoundtripFailing` stays green — proves IMAP is healthy even
   without the exporter signal
3. Prometheus `/alerts` page no longer shows DovecotConnectionsNearLimit
   or DovecotExporterDown

Closes: code-1ik

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:01:07 +00:00
..
modules/mailserver [mailserver] Retire Dovecot exporter + scrape + alerts [ci skip] 2026-04-19 11:01:07 +00:00
main.tf [mailserver] Move probe secrets to ExternalSecret via ESO [ci skip] 2026-04-18 23:39:06 +00:00
secrets extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
terragrunt.hcl extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00