fix: HA Sofia REST sensors + PVC drift safety
Two real issues found while triaging HomeAssistantCriticalSensorUnavailable
alerts and the prometheus + technitium PVC Terminating-but-in-use
state from the earlier session.
1. idrac-redfish-exporter + snmp-exporter ingresses: auth=required →
auth=none. HA Sofia REST sensors scrape these endpoints
programmatically; with Authentik forward-auth in front, every
request got a 302 to authentik.viktorbarzin.me and the REST
sensors parsed the HTML login page instead of metrics — leaving
the R730, UPS, and ~20 other sensors permanently unavailable.
The allow_local_access_only IP allowlist (192.168.0.0/16 +
10.0.0.0/8) already gates external access, so authentik on top
was breaking machine-to-machine traffic for no security gain.
2. prometheus_server_pvc + technitium primary_config_encrypted:
add lifecycle.ignore_changes = [spec[0].resources[0].requests].
The autoresizer expands these PVCs; PVCs can't shrink. Without
the ignore, every TF apply tried to revert the live size back
to the TF spec value, hit K8s's shrink-forbidden rule, and
force-replaced the PVC. Because the pod still mounted it, the
PVC went into Terminating-but-protected limbo — fine until a
pod restart would have orphaned the volume. Root cause of the
2026-05-10 PVC Terminating incident.
Bonus: prometheus_server_pvc threshold was the inverted "90%" (the
same bug the bulk fecfa211 sweep fixed elsewhere; my regex only
matched "80%" so this one slipped through). Now "10%".
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
2db8bdac0d
commit
5c59429182
4 changed files with 36 additions and 5 deletions
|
|
@ -116,6 +116,13 @@ resource "kubernetes_persistent_volume_claim" "primary_config_encrypted" {
|
|||
}
|
||||
}
|
||||
}
|
||||
lifecycle {
|
||||
# Autoresizer expands; PVCs can't shrink. Without this, TF apply
|
||||
# plans destroy+recreate which leaves the PVC in Terminating while
|
||||
# the technitium primary pod still uses it. See incident on
|
||||
# 2026-05-10 (both prometheus-data-proxmox + this PVC).
|
||||
ignore_changes = [spec[0].resources[0].requests]
|
||||
}
|
||||
}
|
||||
|
||||
resource "kubernetes_deployment" "technitium" {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue