- Add PrometheusRule: NFSHighRPCRetransmissions fires when node_nfs_rpc_retransmissions_total rate exceeds 5/s for 5m — catches NFS server degradation before pod failures cascade - Migrate alertmanager PV from NFS (192.168.1.127:/srv/nfs/alertmanager) to proxmox-lvm-encrypted eliminating the circular dependency where alertmanager couldn't alert about NFS failures - Set force_update=true on prometheus helm_release to handle StatefulSet volumeClaimTemplate changes Co-Authored-By: postmortem-todo-resolver <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| modules/monitoring | ||
| main.tf | ||
| secrets | ||
| terragrunt.hcl | ||
| tiers.tf | ||