technitium: mirror mail-auth records into internal zone; fix redfish check [ci skip]

Two fixes from the post-DNS-internalization health sweep:

1. The internal viktorbarzin.me zone served only ingress A/CNAME records.
   Since the mailserver pods now resolve the domain through it (CoreDNS
   viktorbarzin.me:53 -> Technitium, 59a531b8), rspamd's SPF checks on
   inbound @viktorbarzin.me mail saw SPF=none and quarantined it — the
   Brevo email-roundtrip probe failed from the 16:20 run onward
   (EmailRoundtripFailing/Stale). The ingress-dns-sync CronJob now also
   maintains the static mail-auth records (SPF, brevo-code TXT, MX;
   DMARC + DKIM were already present), idempotently. Principle: the
   internal zone must be a SUPERSET of the public zone for every record
   type internal clients consume. Verified in-pod: all four types
   resolve; roundtrip re-probe green.

2. cluster_healthcheck #30 queried instant `up`, which goes stale for
   ~5 of every 10 minutes on the deliberate 10m redfish-idrac remnant
   job -> intermittent false "redfish-idrac=missing". Now uses
   last_over_time(up[15m]) — same answers for fast jobs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-10 17:46:37 +00:00
parent e7fbf986fb
commit 00bc1e052d
2 changed files with 37 additions and 2 deletions

View file

@ -2026,11 +2026,16 @@ check_hardware_exporters() {
fi
done
# Check Prometheus scrape targets for hardware exporters
# Check Prometheus scrape targets for hardware exporters.
# last_over_time(up[15m]) instead of instant `up`: the redfish-idrac
# remnant scrapes every 10m (> the 5m staleness window), so an instant
# query returns it EMPTY ~half the time -> intermittent false "missing"
# (observed 2026-06-10). 15m covers the slowest job; identical answers
# for the 1-2m jobs.
local prom_jobs=("snmp-idrac" "snmp-ups" "redfish-idrac" "proxmox-host")
local up_result
up_result=$($KUBECTL exec -n monitoring deploy/prometheus-server -- \
wget -q -O- 'http://localhost:9090/api/v1/query?query=up' 2>/dev/null || true)
wget -q -O- 'http://localhost:9090/api/v1/query?query=last_over_time(up%5B15m%5D)' 2>/dev/null || true)
if [[ -n "$up_result" ]]; then
for job in "${prom_jobs[@]}"; do