rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip]
Reflects the schedule change from weekly to daily. All references updated:
- scripts/weekly-backup.{sh,timer,service} → daily-backup.*
- Pushgateway job name: weekly-backup → daily-backup
- Prometheus metric names: weekly_backup_* → daily_backup_*
- All docs, runbooks, AGENTS.md, CLAUDE.md, proxmox-inventory
- offsite-sync dependency: After=daily-backup.service
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
ca5039f8aa
commit
82f674a0b4
13 changed files with 72 additions and 72 deletions
|
|
@ -188,8 +188,8 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" {
|
|||
**Copy 3**: Synology NAS offsite (two-tier: sda + NFS)
|
||||
|
||||
**PVE host scripts** (source: `infra/scripts/`):
|
||||
- `/usr/local/bin/weekly-backup` — Sunday 05:00. Mounts LVM thin snapshots ro → rsyncs FILES to `/mnt/backup/pvc-data/<YYYY-WW>/<ns>/<pvc>/` with `--link-dest` versioning (4 weeks). Auto SQLite backup (magic number check, `?mode=ro`). Auto-discovered BACKUP_DIRS (glob, not hardcoded). Also backs up pfSense (config.xml + tar), PVE config. Prunes snapshots >7d.
|
||||
- `/usr/local/bin/offsite-sync-backup` — Sunday 08:00 (After=weekly-backup). Step 1: sda → Synology `pve-backup/` (PVC snapshots, pfSense, PVE config). Step 2: NFS → Synology `nfs/` + `nfs-ssd/` via inotify change-tracked `rsync --files-from`. Monthly full `rsync --delete` on 1st Sunday.
|
||||
- `/usr/local/bin/daily-backup` — Daily 05:00. Mounts LVM thin snapshots ro → rsyncs FILES to `/mnt/backup/pvc-data/<YYYY-WW>/<ns>/<pvc>/` with `--link-dest` versioning (4 weeks). Auto SQLite backup (magic number check, `?mode=ro`). Auto-discovered BACKUP_DIRS (glob, not hardcoded). Also backs up pfSense (config.xml + tar), PVE config. Prunes snapshots >7d.
|
||||
- `/usr/local/bin/offsite-sync-backup` — Daily 06:00 (After=daily-backup). Step 1: sda → Synology `pve-backup/` (PVC snapshots, pfSense, PVE config). Step 2: NFS → Synology `nfs/` + `nfs-ssd/` via inotify change-tracked `rsync --files-from`. Monthly full `rsync --delete` on 1st Sunday.
|
||||
- `/usr/local/bin/lvm-pvc-snapshot` — Daily 03:00. Thin snapshots of all PVCs except dbaas+monitoring. 7-day retention. Instant restore: `lvm-pvc-snapshot restore <lv> <snap>`.
|
||||
- `nfs-change-tracker.service` — Continuous inotifywait on `/srv/nfs` + `/srv/nfs-ssd`. Logs changed file paths to `/mnt/backup/.nfs-changes.log`. Consumed by offsite-sync-backup for incremental rsync (completes in seconds instead of 30+ minutes).
|
||||
|
||||
|
|
|
|||
|
|
@ -118,8 +118,8 @@ Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB
|
|||
| Unit | Type | Schedule | Purpose |
|
||||
|------|------|----------|---------|
|
||||
| `lvm-pvc-snapshot.timer` | Timer | Daily 03:00 | LVM thin snapshots of all PVCs (7-day retention) |
|
||||
| `weekly-backup.timer` | Timer | Sunday 05:00 | PVC file backup, auto SQLite backup, pfSense, PVE config |
|
||||
| `offsite-sync-backup.timer` | Timer | Sunday 08:00 | Two-step rsync to Synology (sda + NFS via inotify) |
|
||||
| `daily-backup.timer` | Timer | Daily 05:00 | PVC file backup, auto SQLite backup, pfSense, PVE config |
|
||||
| `offsite-sync-backup.timer` | Timer | Daily 06:00 | Two-step rsync to Synology (sda + NFS via inotify) |
|
||||
| `nfs-change-tracker.service` | Service | Continuous | inotifywait on `/srv/nfs` + `/srv/nfs-ssd`, logs to `/mnt/backup/.nfs-changes.log` |
|
||||
|
||||
## GPU Node (k8s-node1)
|
||||
|
|
|
|||
|
|
@ -148,15 +148,15 @@ check_lvm_snapshot_timer() {
|
|||
# LAYER 2: Weekly Backup (sda)
|
||||
# ============================================================
|
||||
|
||||
check_weekly_backup_freshness() {
|
||||
if $DRY_RUN; then add_check "weekly-backup-freshness" "ok" "DRY RUN"; return; fi
|
||||
if ! $PVE_REACHABLE; then add_check "weekly-backup-freshness" "fail" "PVE unreachable"; return; fi
|
||||
check_daily_backup_freshness() {
|
||||
if $DRY_RUN; then add_check "daily-backup-freshness" "ok" "DRY RUN"; return; fi
|
||||
if ! $PVE_REACHABLE; then add_check "daily-backup-freshness" "fail" "PVE unreachable"; return; fi
|
||||
|
||||
local ts
|
||||
ts=$($PVE_SSH "curl -s http://10.0.20.100:30091/metrics 2>/dev/null | grep '^weekly_backup_last_run_timestamp' | head -1 | awk '{print \$2}'" 2>/dev/null) || true
|
||||
ts=$($PVE_SSH "curl -s http://10.0.20.100:30091/metrics 2>/dev/null | grep '^daily_backup_last_run_timestamp' | head -1 | awk '{print \$2}'" 2>/dev/null) || true
|
||||
|
||||
if [ -z "$ts" ]; then
|
||||
add_check "weekly-backup-freshness" "fail" "No weekly backup metric — may have never run"
|
||||
add_check "daily-backup-freshness" "fail" "No weekly backup metric — may have never run"
|
||||
return
|
||||
fi
|
||||
|
||||
|
|
@ -165,44 +165,44 @@ check_weekly_backup_freshness() {
|
|||
age_h=$(python3 -c "print(f'{($now - $ts) / 3600:.1f}')" 2>/dev/null)
|
||||
|
||||
if python3 -c "exit(0 if ($now - $ts) < 777600 else 1)" 2>/dev/null; then # 9d
|
||||
add_check "weekly-backup-freshness" "ok" "Last run ${age_h}h ago"
|
||||
add_check "daily-backup-freshness" "ok" "Last run ${age_h}h ago"
|
||||
else
|
||||
add_check "weekly-backup-freshness" "fail" "Weekly backup stale: ${age_h}h ago (threshold: 9d)"
|
||||
add_check "daily-backup-freshness" "fail" "Daily backup stale: ${age_h}h ago (threshold: 9d)"
|
||||
fi
|
||||
}
|
||||
|
||||
check_weekly_backup_status() {
|
||||
if $DRY_RUN; then add_check "weekly-backup-status" "ok" "DRY RUN"; return; fi
|
||||
if ! $PVE_REACHABLE; then add_check "weekly-backup-status" "fail" "PVE unreachable"; return; fi
|
||||
check_daily_backup_status() {
|
||||
if $DRY_RUN; then add_check "daily-backup-status" "ok" "DRY RUN"; return; fi
|
||||
if ! $PVE_REACHABLE; then add_check "daily-backup-status" "fail" "PVE unreachable"; return; fi
|
||||
|
||||
local status
|
||||
status=$($PVE_SSH "curl -s http://10.0.20.100:30091/metrics 2>/dev/null | grep '^weekly_backup_last_status' | head -1 | awk '{print \$2}'" 2>/dev/null) || true
|
||||
status=$($PVE_SSH "curl -s http://10.0.20.100:30091/metrics 2>/dev/null | grep '^daily_backup_last_status' | head -1 | awk '{print \$2}'" 2>/dev/null) || true
|
||||
|
||||
if [ "$status" = "0" ] || [ "$status" = "0.0" ]; then
|
||||
add_check "weekly-backup-status" "ok" "Last weekly backup succeeded"
|
||||
add_check "daily-backup-status" "ok" "Last weekly backup succeeded"
|
||||
elif [ -z "$status" ]; then
|
||||
add_check "weekly-backup-status" "warn" "No status metric found"
|
||||
add_check "daily-backup-status" "warn" "No status metric found"
|
||||
else
|
||||
add_check "weekly-backup-status" "fail" "Last weekly backup failed (status=$status)"
|
||||
add_check "daily-backup-status" "fail" "Last weekly backup failed (status=$status)"
|
||||
fi
|
||||
}
|
||||
|
||||
check_weekly_backup_timer() {
|
||||
if $DRY_RUN; then add_check "weekly-backup-timer" "ok" "DRY RUN"; return; fi
|
||||
if ! $PVE_REACHABLE; then add_check "weekly-backup-timer" "fail" "PVE unreachable"; return; fi
|
||||
check_daily_backup_timer() {
|
||||
if $DRY_RUN; then add_check "daily-backup-timer" "ok" "DRY RUN"; return; fi
|
||||
if ! $PVE_REACHABLE; then add_check "daily-backup-timer" "fail" "PVE unreachable"; return; fi
|
||||
|
||||
local active enabled
|
||||
active=$($PVE_SSH "systemctl is-active weekly-backup.timer 2>/dev/null" 2>/dev/null) || active="unknown"
|
||||
enabled=$($PVE_SSH "systemctl is-enabled weekly-backup.timer 2>/dev/null" 2>/dev/null) || enabled="unknown"
|
||||
active=$($PVE_SSH "systemctl is-active daily-backup.timer 2>/dev/null" 2>/dev/null) || active="unknown"
|
||||
enabled=$($PVE_SSH "systemctl is-enabled daily-backup.timer 2>/dev/null" 2>/dev/null) || enabled="unknown"
|
||||
|
||||
if [ "$active" = "active" ] && [ "$enabled" = "enabled" ]; then
|
||||
add_check "weekly-backup-timer" "ok" "Timer active and enabled"
|
||||
add_check "daily-backup-timer" "ok" "Timer active and enabled"
|
||||
else
|
||||
add_check "weekly-backup-timer" "fail" "Timer: active=$active enabled=$enabled"
|
||||
add_check "daily-backup-timer" "fail" "Timer: active=$active enabled=$enabled"
|
||||
if $FIX; then
|
||||
$PVE_SSH "systemctl enable --now weekly-backup.timer" 2>/dev/null && \
|
||||
add_check "weekly-backup-timer-fix" "ok" "AUTO-FIX: Timer re-enabled" || \
|
||||
add_check "weekly-backup-timer-fix" "fail" "AUTO-FIX: Failed to re-enable timer"
|
||||
$PVE_SSH "systemctl enable --now daily-backup.timer" 2>/dev/null && \
|
||||
add_check "daily-backup-timer-fix" "ok" "AUTO-FIX: Timer re-enabled" || \
|
||||
add_check "daily-backup-timer-fix" "fail" "AUTO-FIX: Failed to re-enable timer"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
|
@ -529,9 +529,9 @@ check_lvm_thinpool_free
|
|||
check_lvm_snapshot_timer
|
||||
|
||||
# Layer 2: Weekly Backup (sda)
|
||||
check_weekly_backup_freshness
|
||||
check_weekly_backup_status
|
||||
check_weekly_backup_timer
|
||||
check_daily_backup_freshness
|
||||
check_daily_backup_status
|
||||
check_daily_backup_timer
|
||||
check_sda_mount
|
||||
check_sda_disk_usage
|
||||
check_pvc_data_freshness
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue