From d9ea7812f51ab754183a9eada7caf026b95fdafb Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Wed, 10 Jun 2026 09:04:57 +0000 Subject: [PATCH] =?UTF-8?q?nfs-mirror:=20exclude=20/vzdump/=20=E2=80=94=20?= =?UTF-8?q?it=20was=20reaping=20the=20new=20VM-image=20backups=20nightly?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit nfs-mirror does `rsync -rlt --delete /srv/nfs/ -> /mnt/backup/`; any /mnt/backup dir with no /srv/nfs counterpart is an orphan and gets --delete'd. vzdump-vms (added yesterday) writes /mnt/backup/vzdump/, which wasn't excluded — so the 02:00 nfs-mirror run silently deleted both successful 40G devvm images (verified: dir gone, 40G freed, despite status=0 success logs). Add --exclude='/vzdump/' alongside the existing pvc-data/pfsense/pve-config/ sqlite-backup excludes that exist for exactly this reason. TDD-proven with an isolated rsync --delete -n -v. backup-dr.md notes the dependency. [ci skip] Co-Authored-By: Claude Opus 4.8 --- docs/architecture/backup-dr.md | 1 + scripts/nfs-mirror.sh | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/architecture/backup-dr.md b/docs/architecture/backup-dr.md index 93c292dc..df30dea2 100644 --- a/docs/architecture/backup-dr.md +++ b/docs/architecture/backup-dr.md @@ -365,6 +365,7 @@ The hand-managed Linux VMs are **intentionally not in Terraform** (telmate/bpg p **Mode**: `vzdump --mode snapshot` — live, no downtime. devvm has the qemu guest agent enabled (`agent: 1`), so the snapshot is **filesystem-consistent** (fs-freeze) rather than merely crash-consistent. Runs `Nice=10` + `IOSchedulingClass=idle` + `--ionice 7` so it never starves etcd on the contended sdc IO domain. **Scope**: VMIDs in `VZDUMP_VMIDS` (default `102` = devvm). Add VMIDs there to image other hand-managed VMs. **Retention**: `KEEP=3` newest dumps per VMID on sda (`/mnt/backup/vzdump/`); each devvm image is ~35-50 GB zstd. +**Critical dependency**: `nfs-mirror` MUST keep `--exclude='/vzdump/'`. Its nightly `rsync -rlt --delete /srv/nfs/ → /mnt/backup/` treats any `/mnt/backup` dir with no `/srv/nfs` counterpart as an orphan and deletes it — this silently reaped the first two vzdump images at 02:00 on 2026-06-10 before the exclude was added (same reason `pvc-data`/`pfsense`/`pve-config`/`sqlite-backup` are excluded). **Offsite**: deliberately **NOT** appended to the incremental offsite manifest — it never deletes, so daily multi-GB images would accumulate unbounded on Synology. Instead the **monthly offsite-sync full pass (days 1-7)** mirrors all of `/mnt/backup` (including `vzdump/`) to Synology with `--delete`, bounded to local retention. So Copy 2 (sda) refreshes **daily**; Copy 3 (Synology) refreshes **monthly**. **Monitoring**: pushes `vzdump_last_run_timestamp` / `vzdump_last_status` / `vzdump_last_success_timestamp` to Pushgateway job `vzdump-backup`. A `VzdumpBackupStale` / `VzdumpBackupFailing` alert in `stacks/monitoring` (mirroring the LVM/pfSense backup alerts) is the recommended next addition. **Restore**: on the PVE host, `qmrestore /mnt/backup/vzdump/vzdump-qemu--.vma.zst ` — restore to a spare VMID first if the original still exists, then swap disks; or use the PVE UI (add `/mnt/backup` as a dir storage with content=backup → Restore). diff --git a/scripts/nfs-mirror.sh b/scripts/nfs-mirror.sh index 2e322ede..882a8a9c 100644 --- a/scripts/nfs-mirror.sh +++ b/scripts/nfs-mirror.sh @@ -54,11 +54,14 @@ PUSHGATEWAY="${NFS_MIRROR_PUSHGATEWAY:-http://10.0.20.100:30091}" PUSHGATEWAY_JOB=nfs-mirror EXCLUDES=( - # ---- /mnt/backup subtrees owned by daily-backup — leave alone ---- + # ---- /mnt/backup subtrees owned by OTHER backup jobs — leave alone ---- + # Without these, the top-level `rsync --delete /srv/nfs/ → /mnt/backup/` below + # reaps any /mnt/backup dir that has no /srv/nfs counterpart. --exclude='/pvc-data/' --exclude='/sqlite-backup/' --exclude='/pfsense/' --exclude='/pve-config/' + --exclude='/vzdump/' # VM images from vzdump-vms — NOT a /srv/nfs svc (else --delete reaps them nightly) --exclude='/lost+found/' # ---- state files used by other backup jobs ----