vault: migrate vault-0 storage to proxmox-lvm-encrypted

Phase 2 of the NFS-hostile migration: data + audit storageClass on
the vault helm release switches from nfs-proxmox to
proxmox-lvm-encrypted, then per-pod rolling swap (24h soak between).

vault-0 swap done. vault-1 + vault-2 still on NFS — the rolling part
is what makes this safe (raft quorum maintained by 2 healthy pods
while one is replaced).

Also restores chart-default pod securityContext fields. The previous
`statefulSet.securityContext.pod = {fsGroupChangePolicy = "..."}`
block REPLACED (not merged) the chart's defaults — fsGroup,
runAsGroup, runAsUser, runAsNonRoot were all silently dropped. NFS
exports were permissive enough to mask the missing fsGroup; ext4 LV
volume root is root:root and the vault user (UID 100) couldn't open
vault.db, CrashLoopBackOff. Fix: provide all five fields explicitly,
survives future chart bumps. vault-1 and vault-2 retained their
correct securityContext from when their pod specs were written to
etcd, before the partial customization landed — the bug only surfaces
when a pod is recreated.

Pre-flight raft snapshot saved at /tmp/vault-pre-migration-*.snap
(recovery anchor).

Refs: code-gy7h

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-25 16:19:49 +00:00
parent 08b13858dd
commit 288efa89b3
3 changed files with 80 additions and 27 deletions

View file

@ -72,13 +72,13 @@ resource "helm_release" "vault" {
dataStorage = {
enabled = true
size = "2Gi"
storageClass = "nfs-proxmox" # Proxmox host NFS (was nfs-truenas)
storageClass = "proxmox-lvm-encrypted" # Migrated 2026-04-25 from nfs-proxmox; raft fsync is NFS-hostile (post-mortems/2026-04-22-vault-raft-leader-deadlock.md)
}
auditStorage = {
enabled = true
size = "2Gi"
storageClass = "nfs-proxmox" # Proxmox host NFS (was nfs-truenas)
storageClass = "proxmox-lvm-encrypted" # Migrated 2026-04-25 from nfs-proxmox
}
standalone = { enabled = false }
@ -120,10 +120,17 @@ resource "helm_release" "vault" {
# fsGroupChangePolicy=OnRootMismatch skips recursive chown on restart.
# Without this, kubelet walks every file over NFS each restart; during
# 2026-04-22 outage this looped for 10m+ and blocked quorum recovery.
# The other four fields restore the chart defaults providing pod{}
# replaces them, and missing fsGroup left vault unable to write to
# the freshly-formatted ext4 PVC during the 2026-04-25 migration.
statefulSet = {
securityContext = {
pod = {
fsGroupChangePolicy = "OnRootMismatch"
fsGroup = 1000
runAsGroup = 1000
runAsUser = 100
runAsNonRoot = true
}
}
}