Commit graph

2 commits

Author SHA1 Message Date
Viktor Barzin
bc37b16815 backup: fix vzdump-vms exit code — EXIT-trap && short-circuit falsely failed OK runs
First live run produced a valid 40G dump and logged status=0, but the service
exited 1/FAILURE: cleanup() used `[ -n "$KILLED" ] && push_metrics 2 0`, and a
bash EXIT trap whose LAST command returns non-zero overrides the script's
`exit 0`. With KILLED empty the && short-circuits -> returns 1 -> a successful
backup is marked failed (would trip a vzdump staleness/failure alert). Switch to
daily-backup's `if…fi` idiom (returns 0 when not killed). Bug reproduced + fix
verified locally; redeployed to PVE + reset-failed.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:30:19 +00:00
Viktor Barzin
83f418159a backup: image-level vzdump of hand-managed VMs (devvm) — close no-VM-backup DR gap
The hand-managed Linux VMs (not in Terraform) were never imaged: the
PVC/NFS/pfSense/PVE-config scripts cover cluster data but no VM disk. A lost
devvm disk = unrecoverable home dirs + local-only git repos (monorepo root has
no remote).

vzdump-vms.{sh,service,timer}: daily 01:00 live `vzdump --mode snapshot` of
VZDUMP_VMIDS (default 102=devvm) -> /mnt/backup/vzdump (Copy 2), keep 3; the
monthly offsite-sync full pass mirrors it to Synology (Copy 3). Guest agent
enabled -> fs-consistent. Nice/idle-ionice so it never starves etcd.
Pushgateway job vzdump-backup.

Deployed live to PVE + timer enabled. Docs updated: backup-dr.md (new VM-image
layer + protection matrix), infra CLAUDE.md, AGENTS.md.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:30:19 +00:00