- Full rewrite of backup-dr.md: 3-2-1 strategy with sda backup disk,
PVC file-level copy from LVM snapshots, pfsense backup, two offsite
paths. 4 Mermaid diagrams (data flow, timeline, disk layout, restore tree).
- Update storage.md: 65 proxmox-lvm PVCs, sda backup tier
- Update restore-full-cluster.md: add Phase 3.5 for PVC restore from sda
- Update restore-{mysql,postgresql,vault,vaultwarden}.md: add sda fallback paths
- New runbook: restore-pvc-from-backup.md (file-level restore from sda)
- Update CLAUDE.md Storage & Backup section for 3-2-1 architecture
4.9 KiB
Runbook: Restore PVC from LVM Thin Snapshot
Last updated: 2026-04-06
When to Use
- Rolling back a PVC to a previous state after a bad migration, data corruption, or accidental deletion
- Pre-upgrade safety: snapshot before upgrade, restore if upgrade fails
- Fast recovery for data changed within the last 7 days
Prerequisites
- SSH access to PVE host (192.168.1.127)
- The
lvm-pvc-snapshotscript at/usr/local/bin/lvm-pvc-snapshot - kubectl configured on PVE host (
/root/.kube/config)
Snapshot Retention
- Daily snapshots: Created at 03:00 via systemd timer
- Retention: 7 days (older snapshots automatically pruned)
- Coverage: All proxmox-lvm PVCs except
dbaasandmonitoringnamespaces
If you need data older than 7 days, see "Alternative: Restore from sda Backup" below.
Procedure
1. List Available Snapshots
ssh root@192.168.1.127 lvm-pvc-snapshot list
Output shows all snapshots with their original LV, age, and data divergence percentage.
2. Identify the PVC LV Name
Find the LV name for your PVC:
# From your workstation (with kubectl):
kubectl get pv -o custom-columns='PV:.metadata.name,PVC:.spec.claimRef.name,NS:.spec.claimRef.namespace,HANDLE:.spec.csi.volumeHandle'
# The HANDLE column shows "local-lvm:<lv-name>"
3. Run the Restore
ssh root@192.168.1.127
lvm-pvc-snapshot restore <pvc-lv-name> <snapshot-lv-name>
The script will:
- Look up the K8s PV/PVC/workload for the LV
- Show a dry-run of all actions
- Ask for confirmation (type
yes) - Scale down the workload (Deployment or StatefulSet)
- Rename the current LV to
<name>_pre_restore_<timestamp> - Rename the snapshot LV to the original name
- Scale the workload back up
- Wait for pod to become Ready
4. Verify
# Check pod is running
kubectl get pods -n <namespace> -l app=<workload>
# Check the application is working correctly
# (service-specific verification)
5. Clean Up
Once you've verified the restore is correct, remove the pre-restore backup:
ssh root@192.168.1.127 lvremove -f pve/<original-lv>_pre_restore_<timestamp>
Manual Restore (if script fails)
If the automated restore fails, perform these steps manually:
# 1. Scale down the workload
kubectl scale deployment/<name> -n <ns> --replicas=0
# or for StatefulSets:
kubectl scale statefulset/<name> -n <ns> --replicas=0
# 2. Wait for pods to terminate
kubectl wait --for=delete pod -l app=<name> -n <ns> --timeout=120s
# 3. SSH to PVE host
ssh root@192.168.1.127
# 4. Verify LV is inactive
lvs -o lv_name,lv_active pve | grep <lv-name>
# 5. Rename LVs
lvrename pve <original-lv> <original-lv>_pre_restore_$(date +%Y%m%d_%H%M)
lvrename pve <snapshot-lv> <original-lv>
# 6. Scale back up
kubectl scale deployment/<name> -n <ns> --replicas=1
Database-Specific Notes
- MySQL InnoDB: After restore, InnoDB will replay redo logs automatically on startup. Check
SHOW ENGINE INNODB STATUSfor recovery progress. - PostgreSQL: WAL replay happens automatically. Check
pg_is_in_recovery()and PostgreSQL logs. - Redis: Redis loads the RDB file on startup. Check
INFO persistencefor load status.
For databases, prefer the app-level backup restore (see restore-mysql.md, restore-postgresql.md) unless you need a very recent point-in-time that predates the last dump.
Alternative: Restore from sda Backup
If LVM snapshots are too old or missing (data lost >7 days ago), use the weekly file-level backup on sda:
Location: /mnt/backup/pvc-data/<YYYY-WW>/<namespace>/<pvc-name>/ on PVE host
Retention: 4 weekly versions (weeks 0-3)
Procedure
# 1. List available backup weeks
ssh root@192.168.1.127
ls -l /mnt/backup/pvc-data/
# 2. Identify the PVC backup directory
ls -l /mnt/backup/pvc-data/2026-14/<namespace>/
# 3. Scale down the workload
kubectl scale deployment/<name> -n <ns> --replicas=0
# 4. Mount the live PVC LV on PVE host
lvchange -ay pve/<pvc-lv-name>
mkdir -p /mnt/restore-temp
mount /dev/pve/<pvc-lv-name> /mnt/restore-temp
# 5. Restore from backup
rsync -avP --delete /mnt/backup/pvc-data/2026-14/<namespace>/<pvc-name>/ /mnt/restore-temp/
# 6. Unmount and scale up
umount /mnt/restore-temp
lvchange -an pve/<pvc-lv-name>
kubectl scale deployment/<name> -n <ns> --replicas=1
See restore-pvc-from-backup.md for detailed walkthrough.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| "Another instance is running" | Concurrent snapshot/restore | Wait for timer to finish: systemctl status lvm-pvc-snapshot.service |
| LV still active after scale-down | Proxmox CSI hasn't detached | Wait 30s, or lvchange -an pve/<lv> |
| Pod stuck in ContainerCreating | Volume not attached to node | kubectl describe pod — check events for attach errors |
| No PV found for volume handle | LV name doesn't match any PV | Check kubectl get pv -o yaml for the correct volumeHandle format |