infra/stacks/vault
Viktor Barzin 2f1f9107f8 vault: add fsGroupChangePolicy=OnRootMismatch + 2026-04-22 post-mortem
The 2026-04-22 Vault outage caught kubelet in a 2-minute chown loop that
never exited because the default fsGroupChangePolicy (Always) walks every
file on the NFS-backed data PVC. With retrans=3,timeo=30 NFS options and
a 1GB audit log, the recursive chown outlasted the deadline and restarted
forever — blocking raft quorum recovery. OnRootMismatch makes chown a
no-op when the volume root is already correct, which it always is after
initial setup.

The breakglass fix was applied live via kubectl patch at 10:54 UTC; this
commit persists it in Terraform so the next apply doesn't revert.

The post-mortem also documents the upstream raft stuck-leader pattern,
NFS kernel client corruption after force-kill, and the path to migrate
Vault off NFS to proxmox-lvm-encrypted.
2026-04-22 11:12:19 +00:00
..
backend.tf chore: sync terraform state after nfsvers=4 convergence 2026-04-14 11:20:18 +00:00
main.tf vault: add fsGroupChangePolicy=OnRootMismatch + 2026-04-22 post-mortem 2026-04-22 11:12:19 +00:00
providers.tf [ci,vault] Fix Tier-1 apply silently failing in Woodpecker 2026-04-19 14:25:52 +00:00
secrets chore: add untracked stacks, scripts, and agent configs 2026-04-15 09:33:06 +00:00
terragrunt.hcl Add Vault OIDC authentication via Authentik 2026-03-14 13:53:05 +00:00