vault: migrate vault-0 storage to proxmox-lvm-encrypted
Phase 2 of the NFS-hostile migration: data + audit storageClass on
the vault helm release switches from nfs-proxmox to
proxmox-lvm-encrypted, then per-pod rolling swap (24h soak between).
vault-0 swap done. vault-1 + vault-2 still on NFS — the rolling part
is what makes this safe (raft quorum maintained by 2 healthy pods
while one is replaced).
Also restores chart-default pod securityContext fields. The previous
`statefulSet.securityContext.pod = {fsGroupChangePolicy = "..."}`
block REPLACED (not merged) the chart's defaults — fsGroup,
runAsGroup, runAsUser, runAsNonRoot were all silently dropped. NFS
exports were permissive enough to mask the missing fsGroup; ext4 LV
volume root is root:root and the vault user (UID 100) couldn't open
vault.db, CrashLoopBackOff. Fix: provide all five fields explicitly,
survives future chart bumps. vault-1 and vault-2 retained their
correct securityContext from when their pod specs were written to
etcd, before the partial customization landed — the bug only surfaces
when a pod is recreated.
Pre-flight raft snapshot saved at /tmp/vault-pre-migration-*.snap
(recovery anchor).
Refs: code-gy7h
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
08b13858dd
commit
288efa89b3
3 changed files with 80 additions and 27 deletions
|
|
@ -72,13 +72,13 @@ resource "helm_release" "vault" {
|
|||
dataStorage = {
|
||||
enabled = true
|
||||
size = "2Gi"
|
||||
storageClass = "nfs-proxmox" # Proxmox host NFS (was nfs-truenas)
|
||||
storageClass = "proxmox-lvm-encrypted" # Migrated 2026-04-25 from nfs-proxmox; raft fsync is NFS-hostile (post-mortems/2026-04-22-vault-raft-leader-deadlock.md)
|
||||
}
|
||||
|
||||
auditStorage = {
|
||||
enabled = true
|
||||
size = "2Gi"
|
||||
storageClass = "nfs-proxmox" # Proxmox host NFS (was nfs-truenas)
|
||||
storageClass = "proxmox-lvm-encrypted" # Migrated 2026-04-25 from nfs-proxmox
|
||||
}
|
||||
|
||||
standalone = { enabled = false }
|
||||
|
|
@ -120,10 +120,17 @@ resource "helm_release" "vault" {
|
|||
# fsGroupChangePolicy=OnRootMismatch skips recursive chown on restart.
|
||||
# Without this, kubelet walks every file over NFS each restart; during
|
||||
# 2026-04-22 outage this looped for 10m+ and blocked quorum recovery.
|
||||
# The other four fields restore the chart defaults — providing pod{}
|
||||
# replaces them, and missing fsGroup left vault unable to write to
|
||||
# the freshly-formatted ext4 PVC during the 2026-04-25 migration.
|
||||
statefulSet = {
|
||||
securityContext = {
|
||||
pod = {
|
||||
fsGroupChangePolicy = "OnRootMismatch"
|
||||
fsGroup = 1000
|
||||
runAsGroup = 1000
|
||||
runAsUser = 100
|
||||
runAsNonRoot = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue