backup: image-level vzdump of hand-managed VMs (devvm) — close no-VM-backup DR gap
The hand-managed Linux VMs (not in Terraform) were never imaged: the
PVC/NFS/pfSense/PVE-config scripts cover cluster data but no VM disk. A lost
devvm disk = unrecoverable home dirs + local-only git repos (monorepo root has
no remote).
vzdump-vms.{sh,service,timer}: daily 01:00 live `vzdump --mode snapshot` of
VZDUMP_VMIDS (default 102=devvm) -> /mnt/backup/vzdump (Copy 2), keep 3; the
monthly offsite-sync full pass mirrors it to Synology (Copy 3). Guest agent
enabled -> fs-consistent. Nice/idle-ionice so it never starves etcd.
Pushgateway job vzdump-backup.
Deployed live to PVE + timer enabled. Docs updated: backup-dr.md (new VM-image
layer + protection matrix), infra CLAUDE.md, AGENTS.md.
[ci skip]
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
7fc4caefe3
commit
83f418159a
6 changed files with 177 additions and 2 deletions
|
|
@ -293,6 +293,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
|
|||
- `/usr/local/bin/daily-backup` — Daily 05:00. Mounts LVM thin snapshots ro → rsyncs FILES to `/mnt/backup/pvc-data/<YYYY-WW>/<ns>/<pvc>/` with `--link-dest` versioning (4 weeks). Auto SQLite backup (magic number check, `?mode=ro`). Also backs up pfSense (config.xml + tar), PVE config. Prunes snapshots >7d. **Skip-list (2026-06-01)**: `nextcloud/nextcloud-data-proxmox` (orphaned pre-encryption PV).
|
||||
- `/usr/local/bin/offsite-sync-backup` — Daily 06:00 (After=daily-backup). Step 1: sda → Synology `pve-backup/` (incremental via manifest; monthly full `rsync --delete` days 1–7). Step 2: NFS direct → Synology — **immich-only on BOTH `nfs/` and `nfs-ssd/` (2026-06-01)**; ollama/llamacpp on the SSD no longer ship offsite.
|
||||
- `/usr/local/bin/lvm-pvc-snapshot` — Daily 03:00. Thin snapshots of all PVCs except dbaas+monitoring. 7-day retention. Instant restore: `lvm-pvc-snapshot restore <lv> <snap>`.
|
||||
- `/usr/local/bin/vzdump-vms` — Daily 01:00. Live `vzdump --mode snapshot` of hand-managed VMs (the ones NOT in Terraform) → `/mnt/backup/vzdump/`, keep 3 per VMID. `VZDUMP_VMIDS` default `102` (devvm) — **the only VM imaged today** (its per-user home dirs + local-only git repos, incl. the no-remote monorepo root, are otherwise irreplaceable). devvm has the guest agent (`agent: 1`) so dumps are fs-consistent. Deliberately NOT in the incremental offsite manifest (would balloon Synology); the monthly offsite full pass (days 1-7) mirrors `/mnt/backup/vzdump/`. Pushgateway job `vzdump-backup`. Added 2026-06-09 (closed the silent "VMs never imaged" DR gap). Restore: `qmrestore /mnt/backup/vzdump/vzdump-qemu-<vmid>-<ts>.vma.zst <vmid>`.
|
||||
- `nfs-change-tracker.service` — Continuous inotifywait on `/srv/nfs` + `/srv/nfs-ssd`. Logs changed file paths to `/mnt/backup/.nfs-changes.log`. Consumed by offsite-sync-backup for incremental rsync (completes in seconds instead of 30+ minutes).
|
||||
|
||||
**Synology layout** (`192.168.1.13:/volume1/Backup/Viki/`):
|
||||
|
|
|
|||
|
|
@ -109,7 +109,8 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
|
|||
- **SQLite on NFS is unreliable** (fsync issues) — always use proxmox-lvm or local disk for databases.
|
||||
- **NFS mount options**: Always `soft,timeo=30,retrans=3` to prevent uninterruptible sleep (D state).
|
||||
- **NFS export directory must exist** on the Proxmox host before Terraform can create the PV.
|
||||
- **Backup (3-2-1)**: Copy 1 = live PVCs on sdc. Copy 2 = sda `/mnt/backup` (PVC file backups, auto SQLite backups, pfSense, PVE config). Copy 3 = Synology offsite (two-tier: sda→`pve-backup/`, NFS→`nfs/`+`nfs-ssd/` via inotify change tracking).
|
||||
- **Backup (3-2-1)**: Copy 1 = live PVCs on sdc. Copy 2 = sda `/mnt/backup` (PVC file backups, auto SQLite backups, pfSense, PVE config, **VM images via `vzdump-vms`**). Copy 3 = Synology offsite (two-tier: sda→`pve-backup/`, NFS→`nfs/`+`nfs-ssd/` via inotify change tracking).
|
||||
- **vzdump-vms** (Daily 01:00): live `vzdump --mode snapshot` of hand-managed VMs (NOT in TF) → `/mnt/backup/vzdump/`, keep 3/VMID. `VZDUMP_VMIDS` default `102` (devvm) — the only VM imaged today; before this (2026-06-09) no VM was ever imaged. NOT in the incremental offsite manifest; monthly full pass mirrors it. See `docs/architecture/backup-dr.md`.
|
||||
- **daily-backup** (Daily 05:00): Auto-discovered BACKUP_DIRS (glob), auto SQLite backup (magic number + `?mode=ro`), pfSense, PVE config. No NFS mirror step (NFS syncs directly to Synology via inotify).
|
||||
- **offsite-sync-backup** (Daily 06:00): Step 1: sda→Synology `pve-backup/`. Step 2: NFS→Synology `nfs/`+`nfs-ssd/` via `rsync --files-from` (inotify change log). Monthly full `--delete`.
|
||||
- **nfs-change-tracker.service**: inotifywait on `/srv/nfs` + `/srv/nfs-ssd`, logs to `/mnt/backup/.nfs-changes.log`. Incremental syncs complete in seconds.
|
||||
|
|
|
|||
|
|
@ -77,6 +77,8 @@ The **bypass list** (leg 2) is just `/srv/nfs/immich/` — too big for sda (1.5
|
|||
- `Synology/Backup/Viki/nfs/` — immich only (post-2026-05-26)
|
||||
- `Synology/Backup/Viki/nfs-ssd/` — **immich-ML only (2026-06-01)**; ollama/llamacpp dropped (re-pullable models, live-only on the SSD)
|
||||
|
||||
**VM image backups (added 2026-06-09)**: the hand-managed Linux VMs (those NOT in Terraform — see `compute.md`) were historically **not imaged at all** — only their *contents* reached backup if they happened to host a PVC/NFS path. `vzdump-vms` now takes a daily live `vzdump --mode snapshot` of each configured VMID → `/mnt/backup/vzdump/` (Copy 2), carried offsite by the monthly offsite-sync full pass (Copy 3). **Currently enabled for VMID 102 (devvm)** — the shared workstation, whose per-user home dirs + local-only git repos are otherwise irreplaceable. Extend via `VZDUMP_VMIDS` in the unit. See "VM Image Backups (vzdump)" under How It Works.
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
### Data Routing — where each path goes (post-2026-05-26)
|
||||
|
|
@ -208,13 +210,14 @@ graph LR
|
|||
T0000["00:00 LVM thin snapshots<br/>(lvm-pvc-snapshot)<br/>sdc PVCs CoW"]
|
||||
T0015["00:15 PostgreSQL per-DB dumps<br/>(CronJob)"]
|
||||
T0045["00:45 MySQL per-DB dumps<br/>(CronJob)"]
|
||||
T0100["01:00 vzdump-vms<br/>live image of hand-managed VMs<br/>(devvm) → sda /mnt/backup/vzdump/"]
|
||||
T0200["02:00 nfs-mirror (daily)<br/>sdc /srv/nfs/* → sda /mnt/backup/<svc>/<br/>~10-20 min steady state"]
|
||||
T0500["05:00 daily-backup<br/>mount LVM snapshots ro<br/>rsync PVC files → /mnt/backup/pvc-data/<br/>+ sqlite + pfsense + pve-config"]
|
||||
T0600["06:00 offsite-sync-backup<br/>Step 1: sda → Synology /Viki/pve-backup/<br/>Step 2: sdc/immich + nfs-ssd → /Viki/nfs[-ssd]/"]
|
||||
T1200["12:00 LVM thin snapshots (midday)<br/>second daily snapshot"]
|
||||
end
|
||||
|
||||
T0000 --> T0015 --> T0045 --> T0200 --> T0500 --> T0600 --> T1200
|
||||
T0000 --> T0015 --> T0045 --> T0100 --> T0200 --> T0500 --> T0600 --> T1200
|
||||
INO -.->|change events feed Step 2| T0600
|
||||
|
||||
style Nightly fill:#ffe0b2
|
||||
|
|
@ -322,6 +325,7 @@ graph LR
|
|||
| NFS Change Tracker | Continuous (inotifywait) | PVE host: `nfs-change-tracker.service` | Logs changed NFS file paths to `/mnt/backup/.nfs-changes.log` |
|
||||
| pfSense Backup | Daily 05:00 + daily-backup | PVE host: SSH + API | config.xml + full filesystem tar |
|
||||
| Offsite Sync | Daily 06:00 (after daily-backup) | PVE host: `offsite-sync-backup` | Two-step: sda→pve-backup + NFS→nfs/nfs-ssd via inotify |
|
||||
| VM Image Backup (vzdump) | Daily 01:00, keep 3 | PVE host: `vzdump-vms` | Live `vzdump` of hand-managed VMs (devvm) → `/mnt/backup/vzdump/` |
|
||||
| PostgreSQL Backup (full) | Daily 00:00, 14d retention | CronJob in `dbaas` namespace | pg_dumpall for all databases |
|
||||
| PostgreSQL Backup (per-db) | Daily 00:15, 14d retention | CronJob in `dbaas` namespace | pg_dump -Fc per database → `/backup/per-db/<db>/` |
|
||||
| MySQL Backup (full) | Daily 00:30, 14d retention | CronJob in `dbaas` namespace | mysqldump --all-databases |
|
||||
|
|
@ -352,6 +356,19 @@ Native LVM thin snapshots provide crash-consistent point-in-time recovery for 62
|
|||
|
||||
**Restore**: `lvm-pvc-snapshot restore <pvc-lv> <snapshot-lv>` — auto-discovers K8s workload, scales down, swaps LVs, scales back up. See `docs/runbooks/restore-lvm-snapshot.md`.
|
||||
|
||||
### VM Image Backups (vzdump)
|
||||
|
||||
The hand-managed Linux VMs are **intentionally not in Terraform** (telmate/bpg provider bugs — see `compute.md`) and were historically **not imaged at all**: nothing took a whole-disk backup of the VM itself. For most that is acceptable — k8s nodes are reprovisioned from cloud-init and their data lives in PVCs covered above. But **devvm** (the shared multi-user Claude Code workstation, VMID 102) holds irreplaceable state that lives nowhere else: per-user home dirs (`~/.claude`, `~/.t3`, shell history), manually-installed tooling, and **local-only git repos** — the monorepo root at `/home/wizard/code` has no git remote. A lost devvm disk = unrecoverable.
|
||||
|
||||
**Script**: `/usr/local/bin/vzdump-vms` on PVE host (source: `infra/scripts/vzdump-vms.sh`). Deploy: `scp infra/scripts/vzdump-vms.sh root@192.168.1.127:/usr/local/bin/vzdump-vms` + `scp infra/scripts/vzdump-vms.{service,timer} root@192.168.1.127:/etc/systemd/system/`, then `systemctl daemon-reload && systemctl enable --now vzdump-vms.timer`.
|
||||
**Schedule**: Daily 01:00 via systemd timer — ahead of the other backup jobs so the fresh image is on sda before offsite-sync runs.
|
||||
**Mode**: `vzdump --mode snapshot` — live, no downtime. devvm has the qemu guest agent enabled (`agent: 1`), so the snapshot is **filesystem-consistent** (fs-freeze) rather than merely crash-consistent. Runs `Nice=10` + `IOSchedulingClass=idle` + `--ionice 7` so it never starves etcd on the contended sdc IO domain.
|
||||
**Scope**: VMIDs in `VZDUMP_VMIDS` (default `102` = devvm). Add VMIDs there to image other hand-managed VMs.
|
||||
**Retention**: `KEEP=3` newest dumps per VMID on sda (`/mnt/backup/vzdump/`); each devvm image is ~35-50 GB zstd.
|
||||
**Offsite**: deliberately **NOT** appended to the incremental offsite manifest — it never deletes, so daily multi-GB images would accumulate unbounded on Synology. Instead the **monthly offsite-sync full pass (days 1-7)** mirrors all of `/mnt/backup` (including `vzdump/`) to Synology with `--delete`, bounded to local retention. So Copy 2 (sda) refreshes **daily**; Copy 3 (Synology) refreshes **monthly**.
|
||||
**Monitoring**: pushes `vzdump_last_run_timestamp` / `vzdump_last_status` / `vzdump_last_success_timestamp` to Pushgateway job `vzdump-backup`. A `VzdumpBackupStale` / `VzdumpBackupFailing` alert in `stacks/monitoring` (mirroring the LVM/pfSense backup alerts) is the recommended next addition.
|
||||
**Restore**: on the PVE host, `qmrestore /mnt/backup/vzdump/vzdump-qemu-<vmid>-<ts>.vma.zst <vmid>` — restore to a spare VMID first if the original still exists, then swap disks; or use the PVE UI (add `/mnt/backup` as a dir storage with content=backup → Restore).
|
||||
|
||||
### Layer 2: Weekly File-Level Backup (sda Backup Disk)
|
||||
|
||||
**Backup disk**: sda (1.1TB RAID1 SAS) → VG `backup` → LV `data` → ext4 → mounted at `/mnt/backup` on PVE host. Dedicated backup disk, independent of live storage.
|
||||
|
|
@ -527,12 +544,16 @@ The btrfs cleaner thread reclaims async — `df` may lag the snapshot-delete by
|
|||
| `/usr/local/bin/lvm-pvc-snapshot` | PVE host: LVM snapshot creation + restore |
|
||||
| `/usr/local/bin/daily-backup` | PVE host: PVC file copy + auto SQLite backup + pfSense |
|
||||
| `/usr/local/bin/offsite-sync-backup` | PVE host: two-step rsync to Synology (sda + NFS via inotify) |
|
||||
| `/usr/local/bin/vzdump-vms` | PVE host: daily live `vzdump` image of hand-managed VMs (devvm) → `/mnt/backup/vzdump/` |
|
||||
| `/mnt/backup/` | PVE host: sda mount point (1.1TB backup disk) |
|
||||
| `/mnt/backup/vzdump/` | PVE host: vzdump VM images (keep 3 per VMID), mirrored offsite monthly |
|
||||
| `/mnt/backup/.nfs-changes.log` | NFS change log from inotifywait, consumed by offsite-sync |
|
||||
| `/etc/systemd/system/nfs-change-tracker.service` | inotifywait watcher for `/srv/nfs` + `/srv/nfs-ssd` |
|
||||
| `/etc/systemd/system/lvm-pvc-snapshot.timer` | Daily 03:00 (LVM snapshots) |
|
||||
| `/etc/systemd/system/daily-backup.timer` | Daily 05:00 (file backup) |
|
||||
| `/etc/systemd/system/offsite-sync-backup.timer` | Daily 06:00 (offsite sync) |
|
||||
| `/etc/systemd/system/vzdump-vms.timer` | Daily 01:00 (VM image backup) |
|
||||
| `/etc/systemd/system/vzdump-vms.service` | oneshot: `vzdump-vms` (source `infra/scripts/vzdump-vms.{sh,service,timer}`) |
|
||||
| `/usr/local/bin/nfs-mirror` | PVE host: daily 02:00 mirror of /srv/nfs/* → sda /mnt/backup/<svc>/ (Layer 3a) |
|
||||
| `/etc/systemd/system/nfs-mirror.timer` | Daily 02:00 (NFS local mirror to sda) |
|
||||
| `stacks/dbaas/` | Terraform: PostgreSQL/MySQL backup CronJobs |
|
||||
|
|
@ -911,6 +932,9 @@ the 2026-04-22 backup_offsite_sync FAIL (node3 kubelet hiccup at
|
|||
| Uptime Kuma | ✓ | ✓ | — | ✓ | proxmox-lvm |
|
||||
| **Other apps not enumerated above** | ✓¹ | ✓¹ | varies | ✓ | proxmox-lvm / proxmox-lvm-encrypted |
|
||||
| **Postiz** (bundled bitnami PG on local-path) | — | — | ✓ daily pg_dump → NFS | ✓ | local-path + NFS |
|
||||
| **Hand-managed VMs (not in Terraform)** |
|
||||
| devvm (workstation, VMID 102) | — | — | ✓ daily vzdump image | ✓ monthly | local-lvm (sdc) |
|
||||
| Other hand-managed VMs (HA 103, registry 220, k8s nodes) | — | — | — gap² | — | local-lvm — see note² |
|
||||
| **Media (NFS)** |
|
||||
| Immich (~800GB) | — | — | — | ✓ | NFS |
|
||||
| Audiobookshelf | — | — | — | ✓ | NFS |
|
||||
|
|
@ -924,6 +948,8 @@ the 2026-04-22 backup_offsite_sync FAIL (node3 kubelet hiccup at
|
|||
|
||||
**Note**: All proxmox-lvm and proxmox-lvm-encrypted PVCs get LVM snapshots (except `dbaas` and `monitoring` namespaces, excluded for write-amplification reasons) + file-level backup. NFS-backed media syncs directly to Synology `nfs/` and `nfs-ssd/` via inotify change tracking.
|
||||
|
||||
² **Hand-managed VMs** — only **devvm (102)** is imaged today (`vzdump-vms`, `VZDUMP_VMIDS=102`). The k8s nodes are deliberately uncovered (reprovisioned from cloud-init; their data lives in the PVCs already backed up above). **home-assistant (103) and docker-registry (220) are a documented gap** — add their VMIDs to `VZDUMP_VMIDS` to image them (registry content is also re-pullable from upstreams; HA has its own add-on backups). pfSense (101) is covered separately by `daily-backup` (config.xml + weekly tar).
|
||||
|
||||
¹ **"Other apps not enumerated above"** — the table only enumerates services worth calling out. The default backup posture for any service using `proxmox-lvm` or `proxmox-lvm-encrypted` (outside `dbaas`/`monitoring`) is **automatic** Layer 1 (LVM thin snapshots, 7d retention) + Layer 2 (file backup, 4 weekly versions on sda) + Layer 3 (offsite to Synology). Auto-discovery is by LV name pattern (`vm-*-pvc-*`), so adding a new service to the cluster gets it covered without any explicit registration. Run `ssh root@192.168.1.127 lvs --noheadings -o lv_name pve | grep '^vm-.*-pvc-' | grep -v _snap_ | wc -l` to see the live count.
|
||||
|
||||
**Known gaps** — services with PVCs not on the proxmox-lvm path lose Layer 1+2:
|
||||
|
|
|
|||
16
scripts/vzdump-vms.service
Normal file
16
scripts/vzdump-vms.service
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
[Unit]
|
||||
Description=vzdump image backup of hand-managed VMs (devvm, …) to /mnt/backup
|
||||
Documentation=https://forgejo.viktorbarzin.me/viktor/infra/src/branch/main/docs/architecture/backup-dr.md
|
||||
After=network-online.target
|
||||
Wants=network-online.target
|
||||
RequiresMountsFor=/mnt/backup
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/usr/local/bin/vzdump-vms
|
||||
# Be gentle on the contended PVE IO domain (sdc) — backup must never starve etcd.
|
||||
Nice=10
|
||||
IOSchedulingClass=idle
|
||||
# Reading a ~77 GB disk + zstd can run long under IO contention; well above
|
||||
# normal (~15-30 min) but bounded so a hung run can't wedge the timer forever.
|
||||
TimeoutStartSec=4h
|
||||
117
scripts/vzdump-vms.sh
Normal file
117
scripts/vzdump-vms.sh
Normal file
|
|
@ -0,0 +1,117 @@
|
|||
#!/usr/bin/env bash
|
||||
# vzdump-vms — image-level backup of hand-managed Proxmox VMs (NOT in Terraform).
|
||||
# Deploy to PVE host at /usr/local/bin/vzdump-vms (strip the .sh).
|
||||
# Schedule: Daily 01:00 via systemd timer.
|
||||
#
|
||||
# WHY: the hand-managed Linux VMs (devvm, …) have NO image backup. nfs-mirror /
|
||||
# daily-backup / offsite-sync cover cluster PVCs, NFS, pfSense and PVE config —
|
||||
# but never the VM disks themselves. A lost devvm disk = unrecoverable home dirs
|
||||
# + local-only git repos (the monorepo root has no remote). This takes a live
|
||||
# `vzdump --mode snapshot` of each configured VMID to /mnt/backup/vzdump (sda =
|
||||
# Copy 2). The monthly offsite-sync full pass (days 1-7) mirrors /mnt/backup —
|
||||
# including this dir — to Synology with --delete (Copy 3), bounded to local
|
||||
# retention. We deliberately do NOT append to the incremental manifest: it never
|
||||
# deletes, so daily multi-GB images would accumulate unbounded on Synology.
|
||||
#
|
||||
# RESTORE: pick a dump under /mnt/backup/vzdump, then on the PVE host:
|
||||
# qmrestore /mnt/backup/vzdump/vzdump-qemu-<vmid>-<ts>.vma.zst <new-or-same-vmid>
|
||||
# (restore to a fresh VMID first if the original still exists, then swap), or use
|
||||
# the PVE UI (Datacenter → Storage → upload dir → Restore). See backup-dr.md.
|
||||
set -euo pipefail
|
||||
|
||||
# systemd oneshot units get a minimal PATH (/usr/bin:/bin) — qm and vzdump live
|
||||
# in /usr/sbin, so set an explicit PATH or the script silently can't find them.
|
||||
export PATH="/usr/sbin:/usr/bin:/sbin:/bin:${PATH:-}"
|
||||
|
||||
# --- Configuration ---
|
||||
VMIDS="${VZDUMP_VMIDS:-102}" # space-separated. 102 = devvm. Add VMIDs here.
|
||||
DUMPDIR="${VZDUMP_DUMPDIR:-/mnt/backup/vzdump}"
|
||||
KEEP="${VZDUMP_KEEP:-3}" # retain N newest dumps per VMID on sda
|
||||
COMPRESS="${VZDUMP_COMPRESS:-zstd}"
|
||||
BACKUP_ROOT="/mnt/backup"
|
||||
PUSHGATEWAY="${VZDUMP_PUSHGATEWAY:-http://10.0.20.100:30091}"
|
||||
PUSHGATEWAY_JOB="vzdump-backup"
|
||||
LOCKFILE="/run/vzdump-vms.lock"
|
||||
|
||||
# --- Logging ---
|
||||
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
|
||||
warn() { log "WARN: $*" >&2; }
|
||||
|
||||
# --- Metrics (always returns 0 so it never trips set -e) ---
|
||||
push_metrics() {
|
||||
local status="${1:-0}" bytes="${2:-0}" now
|
||||
now=$(date +%s)
|
||||
{
|
||||
echo "vzdump_last_run_timestamp ${now}"
|
||||
echo "vzdump_last_status ${status}"
|
||||
echo "vzdump_last_bytes ${bytes}"
|
||||
[ "${status}" -eq 0 ] && echo "vzdump_last_success_timestamp ${now}"
|
||||
} | curl -s --connect-timeout 5 --max-time 10 --data-binary @- \
|
||||
"${PUSHGATEWAY}/metrics/job/${PUSHGATEWAY_JOB}" 2>/dev/null || true
|
||||
return 0
|
||||
}
|
||||
|
||||
# --- Locking (push a non-success metric if systemd kills us mid-run) ---
|
||||
KILLED=""
|
||||
cleanup() {
|
||||
rm -f "${LOCKFILE}"
|
||||
[ -n "${KILLED}" ] && push_metrics 2 0
|
||||
}
|
||||
trap cleanup EXIT
|
||||
trap 'KILLED=1; exit 143' TERM INT
|
||||
|
||||
if ! ( set -o noclobber; echo $$ > "${LOCKFILE}" ) 2>/dev/null; then
|
||||
warn "Another instance running (PID $(cat "${LOCKFILE}" 2>/dev/null || echo unknown)) — exiting"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# --- Preconditions ---
|
||||
if ! mountpoint -q "${BACKUP_ROOT}"; then
|
||||
warn "${BACKUP_ROOT} not mounted — aborting"; push_metrics 1 0; exit 1
|
||||
fi
|
||||
mkdir -p "${DUMPDIR}"
|
||||
|
||||
# --- Main ---
|
||||
log "=== vzdump-vms starting (VMIDs: ${VMIDS}, keep ${KEEP}) ==="
|
||||
STATUS=0
|
||||
TOTAL_BYTES=0
|
||||
|
||||
for vmid in ${VMIDS}; do
|
||||
if ! qm status "${vmid}" >/dev/null 2>&1; then
|
||||
warn "VMID ${vmid} not found on this node — skipping"
|
||||
STATUS=1
|
||||
continue
|
||||
fi
|
||||
|
||||
log "--- vzdump ${vmid} ($(qm config "${vmid}" 2>/dev/null | sed -n 's/^name: //p')) ---"
|
||||
if vzdump "${vmid}" \
|
||||
--dumpdir "${DUMPDIR}" \
|
||||
--mode snapshot \
|
||||
--compress "${COMPRESS}" \
|
||||
--ionice 7 \
|
||||
--quiet 1; then
|
||||
newest=$(ls -t "${DUMPDIR}"/vzdump-qemu-"${vmid}"-*.vma.* 2>/dev/null | grep -v '\.notes$' | head -1 || true)
|
||||
if [ -n "${newest}" ]; then
|
||||
sz=$(stat -c%s "${newest}" 2>/dev/null || echo 0)
|
||||
TOTAL_BYTES=$((TOTAL_BYTES + sz))
|
||||
log " OK: $(basename "${newest}") ($(numfmt --to=iec "${sz}" 2>/dev/null || echo "${sz}B"))"
|
||||
fi
|
||||
else
|
||||
warn "vzdump ${vmid} failed (rc=$?)"
|
||||
STATUS=1
|
||||
fi
|
||||
|
||||
# Retention: keep newest ${KEEP} per VMID (archive + its .log + .notes siblings).
|
||||
mapfile -t archives < <(ls -t "${DUMPDIR}"/vzdump-qemu-"${vmid}"-*.vma.* 2>/dev/null | grep -v '\.notes$' || true)
|
||||
if [ "${#archives[@]}" -gt "${KEEP}" ]; then
|
||||
for old in "${archives[@]:${KEEP}}"; do
|
||||
prefix="${old%.vma.*}" # …/vzdump-qemu-<vmid>-<YYYY_MM_DD>-<HH_MM_SS>
|
||||
log " prune: $(basename "${prefix}")"
|
||||
rm -f "${prefix}".vma.* "${prefix}".log 2>/dev/null || true
|
||||
done
|
||||
fi
|
||||
done
|
||||
|
||||
log "=== vzdump-vms complete (status=${STATUS}, $(numfmt --to=iec "${TOTAL_BYTES}" 2>/dev/null || echo "${TOTAL_BYTES}B")) ==="
|
||||
push_metrics "${STATUS}" "${TOTAL_BYTES}"
|
||||
exit "${STATUS}"
|
||||
14
scripts/vzdump-vms.timer
Normal file
14
scripts/vzdump-vms.timer
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
[Unit]
|
||||
Description=Daily vzdump image backup of hand-managed VMs (devvm, …)
|
||||
Documentation=https://forgejo.viktorbarzin.me/viktor/infra/src/branch/main/docs/architecture/backup-dr.md
|
||||
|
||||
[Timer]
|
||||
# 01:00 — ahead of nfs-mirror (02:00), lvm-pvc-snapshot (03:00), daily-backup
|
||||
# (05:00) and offsite-sync (06:00), so the fresh image is on sda before the
|
||||
# monthly full offsite pass mirrors /mnt/backup to Synology.
|
||||
OnCalendar=*-*-* 01:00:00
|
||||
RandomizedDelaySec=10min
|
||||
Persistent=true
|
||||
|
||||
[Install]
|
||||
WantedBy=timers.target
|
||||
Loading…
Add table
Add a link
Reference in a new issue