From 4473b469e319a833f694d9472d2f4962084253b0 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Mon, 29 Jun 2026 12:59:03 +0000 Subject: [PATCH] lvm-pvc-snapshot: cut retention 7->3 days (reduce sdc thin-pool CoW IOPS + free ~1TB) Part of the sdc IOPS-reduction work (code-oflt). 462 daily thin snapshots (66 PVCs x 7d) drive ~10-34 w/s of thin-pool metadata (tmeta) CoW writes on the contended sdc spindle and pin ~2TB in the 70%-full pool. Halving to 3 days roughly halves both. Instant-restore window shrinks 7->3d; daily-backup still keeps 4 weeks of file-level PVC history, so DR coverage is unchanged. Deployed to the PVE host via scp (these host scripts are scp-deployed, not TF-managed). Doc updated in .claude/CLAUDE.md. Refs: code-oflt. Co-Authored-By: Claude Opus 4.8 --- .claude/CLAUDE.md | 2 +- scripts/lvm-pvc-snapshot.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index bf51ef57..e89b44f6 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -357,7 +357,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" { - `/usr/local/bin/nfs-mirror` — Daily 02:00. `rsync --delete /srv/nfs// → /mnt/backup//` (sda leg 1), appends transferred paths to `/mnt/backup/.changed-files` for offsite Step 1. **EXCLUDES**: immich (too big — direct leg), frigate/temp (no backup), anca-elements (in Immich), and **(2026-06-01) ollama, prometheus-backup, audiblez, ebook2audiobook** — regenerable, live-only on sdc, kept off the space-constrained offsite. Does NOT mirror `/srv/nfs-ssd`. - `/usr/local/bin/daily-backup` — Daily 05:00. Mounts LVM thin snapshots ro → rsyncs FILES to `/mnt/backup/pvc-data////` with `--link-dest` versioning (4 weeks). Auto SQLite backup (magic number check, `?mode=ro`). Also backs up pfSense (config.xml + tar), PVE config. Prunes snapshots >7d. **Skip-list (2026-06-01)**: `nextcloud/nextcloud-data-proxmox` (orphaned pre-encryption PV). - `/usr/local/bin/offsite-sync-backup` — Daily 06:00 (After=daily-backup). Step 1: sda → Synology `pve-backup/` (incremental via manifest; monthly full `rsync --delete` days 1–7). Step 2: NFS direct → Synology — **immich-only on BOTH `nfs/` and `nfs-ssd/` (2026-06-01)**; ollama/llamacpp on the SSD no longer ship offsite. -- `/usr/local/bin/lvm-pvc-snapshot` — Daily 03:00. Thin snapshots of all PVCs except dbaas+monitoring. 7-day retention. Instant restore: `lvm-pvc-snapshot restore `. +- `/usr/local/bin/lvm-pvc-snapshot` — Daily 03:00. Thin snapshots of all PVCs except dbaas+monitoring. 3-day retention (cut from 7 on 2026-06-29, code-oflt: fewer thin snapshots = less thin-pool tmeta CoW write-IOPS on sdc + ~1TB freed; `daily-backup` keeps 4wk file-level history). Instant restore: `lvm-pvc-snapshot restore `. - `/usr/local/bin/vzdump-vms` — Daily 01:00. Live `vzdump --mode snapshot` of hand-managed VMs (the ones NOT in Terraform) → `/mnt/backup/vzdump/`, keep 3 per VMID. `VZDUMP_VMIDS` default `102` (devvm) — **the only VM imaged today** (its per-user home dirs + local-only git repos, incl. the no-remote monorepo root, are otherwise irreplaceable). devvm has the guest agent (`agent: 1`) so dumps are fs-consistent. Deliberately NOT in the incremental offsite manifest (would balloon Synology); the monthly offsite full pass (days 1-7) mirrors `/mnt/backup/vzdump/`. Pushgateway job `vzdump-backup`. Added 2026-06-09 (closed the silent "VMs never imaged" DR gap). Restore: `qmrestore /mnt/backup/vzdump/vzdump-qemu--.vma.zst `. - `nfs-change-tracker.service` — Continuous inotifywait on `/srv/nfs` + `/srv/nfs-ssd`. Logs changed file paths to `/mnt/backup/.nfs-changes.log`. Consumed by offsite-sync-backup for incremental rsync (completes in seconds instead of 30+ minutes). diff --git a/scripts/lvm-pvc-snapshot.sh b/scripts/lvm-pvc-snapshot.sh index 6ec5dc34..3c86447c 100755 --- a/scripts/lvm-pvc-snapshot.sh +++ b/scripts/lvm-pvc-snapshot.sh @@ -7,7 +7,7 @@ set -euo pipefail VG="pve" THINPOOL="data" SNAP_SUFFIX_FORMAT="%Y%m%d_%H%M" -RETENTION_DAYS=7 +RETENTION_DAYS=3 # 7->3 (2026-06-29, code-oflt): fewer thin snapshots = less tmeta CoW write-IOPS on sdc + ~1TB freed; daily-backup keeps 4wk file-level history MIN_FREE_PCT=10 PUSHGATEWAY="${LVM_SNAP_PUSHGATEWAY:-http://10.0.20.100:30091}" PUSHGATEWAY_JOB="lvm-pvc-snapshot"