infra/stacks/monitoring
Viktor Barzin 134d6b9a82 vault runbook + raft/HA stuck-leader alerts
Post-2026-04-22 Step 5 deliverables:
- docs/runbooks/vault-raft-leader-deadlock.md — safe pod-restart
  sequence that avoids zombie containerd-shim + kernel NFS
  corruption, qm reset no-op gotcha, boot-order gotcha.
- prometheus_chart_values.tpl — VaultRaftLeaderStuck +
  VaultHAStatusUnavailable. Silent until vault telemetry
  scraping lands (tracked as beads code-vkpn).

Epic for moving vault off NFS tracked as beads code-gy7h.
2026-04-22 12:44:46 +00:00
..
modules/monitoring vault runbook + raft/HA stuck-leader alerts 2026-04-22 12:44:46 +00:00
main.tf [registry] Stop recurring orphan OCI-index incidents — detection + prevention + recovery 2026-04-19 17:08:28 +00:00
secrets extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00
terragrunt.hcl extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] 2026-03-17 21:34:11 +00:00