Commit graph

3 commits

Author SHA1 Message Date
Viktor Barzin
82f674a0b4 rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip]
Reflects the schedule change from weekly to daily. All references updated:
- scripts/weekly-backup.{sh,timer,service} → daily-backup.*
- Pushgateway job name: weekly-backup → daily-backup
- Prometheus metric names: weekly_backup_* → daily_backup_*
- All docs, runbooks, AGENTS.md, CLAUDE.md, proxmox-inventory
- offsite-sync dependency: After=daily-backup.service

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:37:04 +00:00
Viktor Barzin
1c7998163d extend backup-verify.sh with full 3-2-1 health inspection
18 checks covering all backup layers:
- Layer 1: LVM snapshot freshness/status/count, thin pool free, timer
- Layer 2: Weekly backup freshness/status/timer, sda mount/usage,
  PVC data/NFS mirror/pfsense freshness
- Layer 3: Offsite sync freshness/status/timer
- DB CronJobs: age check with per-service thresholds
- CNPG backups: existing check preserved

New --fix flag for conservative auto-remediation:
- Re-enable disabled timers
- Clear stale lockfiles
- Mount /mnt/backup if unmounted
2026-04-09 22:48:49 +01:00
Viktor Barzin
ff83ec3325 add infrastructure agent team: 8 specialized agents + 14 diagnostic scripts
Agents: devops-engineer, dba, security-engineer, sre, network-engineer,
platform-engineer, observability-engineer, home-automation-engineer.
Scripts: deploy-status, db-health, backup-verify, tls-check, crowdsec-status,
authentik-audit, oom-investigator, resource-report, dns-check, network-health,
nfs-health, truenas-status, platform-status, monitoring-health.
Also: known-issues.md suppression list, cluster-health-checker port-forward fix.
2026-03-15 02:01:07 +00:00