technitium: mirror mail-auth records into internal zone; fix redfish check [ci skip]
Two fixes from the post-DNS-internalization health sweep:
1. The internal viktorbarzin.me zone served only ingress A/CNAME records.
Since the mailserver pods now resolve the domain through it (CoreDNS
viktorbarzin.me:53 -> Technitium, 59a531b8), rspamd's SPF checks on
inbound @viktorbarzin.me mail saw SPF=none and quarantined it — the
Brevo email-roundtrip probe failed from the 16:20 run onward
(EmailRoundtripFailing/Stale). The ingress-dns-sync CronJob now also
maintains the static mail-auth records (SPF, brevo-code TXT, MX;
DMARC + DKIM were already present), idempotently. Principle: the
internal zone must be a SUPERSET of the public zone for every record
type internal clients consume. Verified in-pod: all four types
resolve; roundtrip re-probe green.
2. cluster_healthcheck #30 queried instant `up`, which goes stale for
~5 of every 10 minutes on the deliberate 10m redfish-idrac remnant
job -> intermittent false "redfish-idrac=missing". Now uses
last_over_time(up[15m]) — same answers for fast jobs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
e7fbf986fb
commit
00bc1e052d
2 changed files with 37 additions and 2 deletions
|
|
@ -2026,11 +2026,16 @@ check_hardware_exporters() {
|
|||
fi
|
||||
done
|
||||
|
||||
# Check Prometheus scrape targets for hardware exporters
|
||||
# Check Prometheus scrape targets for hardware exporters.
|
||||
# last_over_time(up[15m]) instead of instant `up`: the redfish-idrac
|
||||
# remnant scrapes every 10m (> the 5m staleness window), so an instant
|
||||
# query returns it EMPTY ~half the time -> intermittent false "missing"
|
||||
# (observed 2026-06-10). 15m covers the slowest job; identical answers
|
||||
# for the 1-2m jobs.
|
||||
local prom_jobs=("snmp-idrac" "snmp-ups" "redfish-idrac" "proxmox-host")
|
||||
local up_result
|
||||
up_result=$($KUBECTL exec -n monitoring deploy/prometheus-server -- \
|
||||
wget -q -O- 'http://localhost:9090/api/v1/query?query=up' 2>/dev/null || true)
|
||||
wget -q -O- 'http://localhost:9090/api/v1/query?query=last_over_time(up%5B15m%5D)' 2>/dev/null || true)
|
||||
|
||||
if [[ -n "$up_result" ]]; then
|
||||
for job in "${prom_jobs[@]}"; do
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue