Three immediate fixes surfaced by the backup-pipeline audit:
1. **S1 silent-loss race fix** (daily-backup.sh:142): remove the
`> "${MANIFEST}"` truncation at the start of daily-backup. Truncation
already lives in offsite-sync-backup at line 159, gated on a successful
sync. With both scripts truncating, an offsite-sync failure followed by
the next morning's daily-backup would silently wipe yesterday's
unconsumed manifest entries — those files would only reach Synology
via the monthly full sync (1st-7th of month). Now only offsite-sync
truncates, and only on success.
2. **Missing alert OffsiteBackupSyncFailing**: documented in backup-dr.md
but was never added to prometheus_chart_values.tpl. Step 1 or Step 2
failure pushes offsite_sync_last_status=1 but nothing read it. Added.
3. **wear: drop `-z` from local-only rsyncs** (daily-backup.sh:218 PVC
snapshot rsync + line 347 /etc/pve sync). Both are local-to-sda
transfers — compression wastes CPU and yields nothing (gigabit local
path, intermediate disk doesn't benefit).
Bonus cleanups (zero functional impact):
- "Weekly backup starting/complete" → "daily-backup starting/complete"
(the timer is daily, not weekly — legacy from earlier monthly-rotation
schedule).
- "--- Step 2: PVC file copy ---" → "Step 1:" (was numbered from 2 with no
Step 1 above).
- **wear: pfSense full filesystem tar now Sunday-only** instead of daily.
config.xml stays daily (it's the primary restore artifact and tiny).
Full tar is forensic recovery only — re-tarring ~100MB+ daily writes
~3G/month to sda + Synology for unchanged content. Weekly is plenty.
docs/architecture/backup-dr.md: rewritten Overview + 3-2-1 breakdown to
reflect today's two-leg architecture; added a "2026-05-24 session"
changelog summary at the top; added a "Synology snapshot management"
subsection with the sudo + `synosharesnapshot` recipe (DSM API is gated
by 2FA so this is the only programmatic path); updated Key Files table
with nfs-mirror + the Synology SSH access notes.
Open follow-ups from the audit (S2 — file as beads if pursued):
- Factor two-leg invariant into /etc/backup-skip-list.conf sourced by
both nfs-mirror.sh and offsite-sync-backup.sh.
- Manifest write-collision flock between nfs-mirror Mon 04:11 and
daily-backup Mon 05:00.
- Unbounded manifest cap (force full sync if > 500k lines).
- Synology free-space scraper + alert.
- LVM thin pool meta-pool fill alert.
- nfs-change-tracker.service heartbeat to Pushgateway.
- Synology config drift TF surface (snap retention, share defs).