backup: image-level vzdump of hand-managed VMs (devvm) — close no-VM-backup DR gap

The hand-managed Linux VMs (not in Terraform) were never imaged: the
PVC/NFS/pfSense/PVE-config scripts cover cluster data but no VM disk. A lost
devvm disk = unrecoverable home dirs + local-only git repos (monorepo root has
no remote).

vzdump-vms.{sh,service,timer}: daily 01:00 live `vzdump --mode snapshot` of
VZDUMP_VMIDS (default 102=devvm) -> /mnt/backup/vzdump (Copy 2), keep 3; the
monthly offsite-sync full pass mirrors it to Synology (Copy 3). Guest agent
enabled -> fs-consistent. Nice/idle-ionice so it never starves etcd.
Pushgateway job vzdump-backup.

Deployed live to PVE + timer enabled. Docs updated: backup-dr.md (new VM-image
layer + protection matrix), infra CLAUDE.md, AGENTS.md.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-09 21:22:34 +00:00
parent 7fc4caefe3
commit 83f418159a
6 changed files with 177 additions and 2 deletions

View file

@ -0,0 +1,16 @@
[Unit]
Description=vzdump image backup of hand-managed VMs (devvm, …) to /mnt/backup
Documentation=https://forgejo.viktorbarzin.me/viktor/infra/src/branch/main/docs/architecture/backup-dr.md
After=network-online.target
Wants=network-online.target
RequiresMountsFor=/mnt/backup
[Service]
Type=oneshot
ExecStart=/usr/local/bin/vzdump-vms
# Be gentle on the contended PVE IO domain (sdc) — backup must never starve etcd.
Nice=10
IOSchedulingClass=idle
# Reading a ~77 GB disk + zstd can run long under IO contention; well above
# normal (~15-30 min) but bounded so a hung run can't wedge the timer forever.
TimeoutStartSec=4h