# Runbook: Restore PVC from LVM Thin Snapshot Last updated: 2026-04-06 ## When to Use - Rolling back a PVC to a previous state after a bad migration, data corruption, or accidental deletion - Pre-upgrade safety: snapshot before upgrade, restore if upgrade fails - Fast recovery for data changed within the last 7 days ## Prerequisites - SSH access to PVE host (192.168.1.127) - The `lvm-pvc-snapshot` script at `/usr/local/bin/lvm-pvc-snapshot` - kubectl configured on PVE host (`/root/.kube/config`) ## Snapshot Retention - **Daily snapshots**: Created at 03:00 via systemd timer - **Retention**: 7 days (older snapshots automatically pruned) - **Coverage**: All proxmox-lvm PVCs except `dbaas` and `monitoring` namespaces **If you need data older than 7 days**, see "Alternative: Restore from sda Backup" below. ## Procedure ### 1. List Available Snapshots ```bash ssh root@192.168.1.127 lvm-pvc-snapshot list ``` Output shows all snapshots with their original LV, age, and data divergence percentage. ### 2. Identify the PVC LV Name Find the LV name for your PVC: ```bash # From your workstation (with kubectl): kubectl get pv -o custom-columns='PV:.metadata.name,PVC:.spec.claimRef.name,NS:.spec.claimRef.namespace,HANDLE:.spec.csi.volumeHandle' # The HANDLE column shows "local-lvm:" ``` ### 3. Run the Restore ```bash ssh root@192.168.1.127 lvm-pvc-snapshot restore ``` The script will: 1. Look up the K8s PV/PVC/workload for the LV 2. Show a dry-run of all actions 3. Ask for confirmation (type `yes`) 4. Scale down the workload (Deployment or StatefulSet) 5. Rename the current LV to `_pre_restore_` 6. Rename the snapshot LV to the original name 7. Scale the workload back up 8. Wait for pod to become Ready ### 4. Verify ```bash # Check pod is running kubectl get pods -n -l app= # Check the application is working correctly # (service-specific verification) ``` ### 5. Clean Up Once you've verified the restore is correct, remove the pre-restore backup: ```bash ssh root@192.168.1.127 lvremove -f pve/_pre_restore_ ``` ## Manual Restore (if script fails) If the automated restore fails, perform these steps manually: ```bash # 1. Scale down the workload kubectl scale deployment/ -n --replicas=0 # or for StatefulSets: kubectl scale statefulset/ -n --replicas=0 # 2. Wait for pods to terminate kubectl wait --for=delete pod -l app= -n --timeout=120s # 3. SSH to PVE host ssh root@192.168.1.127 # 4. Verify LV is inactive lvs -o lv_name,lv_active pve | grep # 5. Rename LVs lvrename pve _pre_restore_$(date +%Y%m%d_%H%M) lvrename pve # 6. Scale back up kubectl scale deployment/ -n --replicas=1 ``` ## Database-Specific Notes - **MySQL InnoDB**: After restore, InnoDB will replay redo logs automatically on startup. Check `SHOW ENGINE INNODB STATUS` for recovery progress. - **PostgreSQL**: WAL replay happens automatically. Check `pg_is_in_recovery()` and PostgreSQL logs. - **Redis**: Redis loads the RDB file on startup. Check `INFO persistence` for load status. For databases, prefer the app-level backup restore (see `restore-mysql.md`, `restore-postgresql.md`) unless you need a very recent point-in-time that predates the last dump. ## Alternative: Restore from sda Backup If LVM snapshots are too old or missing (data lost >7 days ago), use the weekly file-level backup on sda: **Location**: `/mnt/backup/pvc-data////` on PVE host **Retention**: 4 weekly versions (weeks 0-3) ### Procedure ```bash # 1. List available backup weeks ssh root@192.168.1.127 ls -l /mnt/backup/pvc-data/ # 2. Identify the PVC backup directory ls -l /mnt/backup/pvc-data/2026-14// # 3. Scale down the workload kubectl scale deployment/ -n --replicas=0 # 4. Mount the live PVC LV on PVE host lvchange -ay pve/ mkdir -p /mnt/restore-temp mount /dev/pve/ /mnt/restore-temp # 5. Restore from backup rsync -avP --delete /mnt/backup/pvc-data/2026-14/// /mnt/restore-temp/ # 6. Unmount and scale up umount /mnt/restore-temp lvchange -an pve/ kubectl scale deployment/ -n --replicas=1 ``` See `restore-pvc-from-backup.md` for detailed walkthrough. ## Troubleshooting | Problem | Cause | Fix | |---------|-------|-----| | "Another instance is running" | Concurrent snapshot/restore | Wait for timer to finish: `systemctl status lvm-pvc-snapshot.service` | | LV still active after scale-down | Proxmox CSI hasn't detached | Wait 30s, or `lvchange -an pve/` | | Pod stuck in ContainerCreating | Volume not attached to node | `kubectl describe pod` — check events for attach errors | | No PV found for volume handle | LV name doesn't match any PV | Check `kubectl get pv -o yaml` for the correct volumeHandle format |