Commit graph

4 commits

Author SHA1 Message Date
Viktor Barzin
fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00
Viktor Barzin
6d224861c4 stem95su: scheduled Drive->site sync CronJob (every 10m)
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:42:26 +00:00
Viktor Barzin
37d88ce50e nfs-mirror: weekly Mon 04:00 → daily 02:00
Steady-state delta runs in 10-20 min and the weekly cadence left a
real RPO gap: app data under /srv/nfs/<svc>/ that isn't a PVC
(captured by daily-backup) or a *-backup CronJob (captured daily by
the CronJob writing to /srv/nfs/<svc>-backup/) was on a 7-day worst
case for off-disk durability. Affected paths include nextcloud shared
files, audiobookshelf library, mailserver Maildir, calibre, servarr
metadata, real-estate-crawler scraped data, openclaw agent state.
Daily cadence drops their RPO to ~24h at negligible cost.

Slot: 02:00, 3h ahead of daily-backup (05:00) so the manifest is
populated before offsite-sync reads it at 06:00.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 19:00:10 +00:00
Viktor Barzin
4d756be4f5 backup: consolidate to one local-mirror script + invert offsite filter
Some checks failed
ci/woodpecker/push/build-cli Pipeline failed
ci/woodpecker/push/default Pipeline failed
Before this commit, the in-flight design split anca-elements (its own
mirror script + timer) from the rest of /srv/nfs (still going to
Synology via inotify-tracked offsite-sync). It also meant Synology
received some bytes via both paths (sda → Synology AND direct NFS →
Synology), which doubled consumption.

This commit collapses both into a clean 3-2-1:

  Copy 1 (sdc):       live /srv/nfs/* + cluster block PVCs
  Copy 2 (sda):       /mnt/backup/{pvc-data,sqlite-backup,pfsense,
                                   pve-config,<critical-nfs>/}
                      ← daily-backup + nfs-mirror (one script each)
  Copy 3 (Synology):  /Backup/Viki/{pve-backup,nfs,nfs-ssd}
                      ← offsite-sync-backup Step 1 (sda → Synology)
                        + Step 2 (sda-BYPASS paths only → Synology direct)

scripts/nfs-mirror.{sh,service,timer}:
  New consolidated weekly mirror. Replaces anca-elements-mirror (to be
  removed in a follow-up after the current in-flight rsync completes,
  parity-verified, and Synology source-of-truth is deleted). Single
  rsync /srv/nfs/ → /mnt/backup/ with an explicit EXCLUDES list that
  drops paths not worth a local 2nd copy: immich (1.2T — too big),
  frigate (14d ring), prometheus/loki (rebuildable), ollama/llamacpp/
  audiblez/ebook2audiobook (re-fetchable), *-backup (already backups),
  temp/alertmanager (transient). Nice=10, IOSchedulingClass=idle.

scripts/offsite-sync-backup.sh:
  Step 2 (NFS → Synology) filter inverted: instead of `--exclude=
  anca-elements/`, it now `--include`s only the sda-BYPASS paths
  (immich, frigate, prometheus, *-backup, …). The bypass-include
  regex MUST stay in lockstep with nfs-mirror's EXCLUDES — they are
  complementary and any drift creates either gaps or duplication on
  Synology. Comment in the script flags this.

monitoring alerts: renamed AncaElementsMirror{Stale,Failing} to
NfsMirror{Stale,Failing} matching the new metric job name
`nfs-mirror`. Thresholds unchanged.

docs/architecture/backup-dr.md: rewritten Step 1/Step 2 sections and
added the bypass-list rationale + cross-reference between scripts.

NOT YET DEPLOYED — gated on the in-flight anca-elements-mirror rsync
finishing + parity verification + Synology /volume1/Backup/Anca/
Elements deletion. The old scripts (anca-elements-{mirror,sync.sh})
remain on the PVE host until then, and will be removed in a cleanup
commit.
2026-05-24 12:49:20 +00:00