The initial recovery at 11:03 was premature; vault-1's audit writes over NFS started hanging ~15 min later and the cluster regressed to 503. Full recovery required rebooting node4 (to free vault-0's stuck NFS mount and shed PVE NFS thread contention) and a second reboot of node3 (to clear another round of kernel NFS client degradation). Final recovery at 11:43:28 UTC with vault-2 as active leader on the quorum vault-0 + vault-2. vault-1 remains stuck in ContainerCreating on node2 — a third node2 reboot is required for full 3/3 quorum, but 2/3 is operationally sufficient, so that's deferred. |
||
|---|---|---|
| .. | ||
| 2026-03-16-kured-containerd-cascade-outage.html | ||
| 2026-03-16-nfs-csi-cascade-failure.md | ||
| 2026-04-14-nfs-fsid0-dns-vault-outage.md | ||
| 2026-04-14-postmortem-pipeline-test.md | ||
| 2026-04-18-authentik-outpost-shm-full.md | ||
| 2026-04-19-registry-orphan-index.md | ||
| 2026-04-22-vault-raft-leader-deadlock.md | ||
| index.html | ||