Viktor Barzin af2222fce8 backup & DR: add alerting, fix rotation, secure MySQL password, add runbooks

Phase 1: Add 12 PrometheusRules for backup health alerting
- PostgreSQL, MySQL, Vault, Vaultwarden, Redis staleness + never-succeeded alerts
- CSIDriverCrashLoop alert for nfs-csi/iscsi-csi namespaces
- Generic BackupCronJobFailed alert

Phase 2: Fix backup rotation
- etcd: timestamped snapshots instead of overwriting single file
- Redis: timestamped RDB files with 7-day retention purge
- PostgreSQL: retention increased from 7 to 14 days

Phase 3: Fix MySQL password exposure
- Move root password from command line arg to MYSQL_PWD env var via secretKeyRef

Phase 5: Add restore runbooks
- PostgreSQL, MySQL, Vault, etcd, Vaultwarden, full cluster rebuild

2026-03-19 20:34:33 +00:00

2.9 KiB

Raw Blame History

Restore Vaultwarden

Prerequisites

kubectl access to the cluster
Backup available on NFS at /mnt/main/vaultwarden-backup/

Backup Location

NFS: /mnt/main/vaultwarden-backup/YYYY_MM_DD_HH_MM/ (directory per backup)
Each backup contains: db.sqlite3, rsa_key.pem, rsa_key.pub.pem, attachments/, sends/, config.json
Replicated to Synology NAS (192.168.1.13) via TrueNAS ZFS replication
Retention: 30 days
Schedule: Daily at 00:00

Backup Contents

File	Purpose	Critical?
`db.sqlite3`	All passwords, TOTP seeds, org data	Yes
`rsa_key.pem` / `rsa_key.pub.pem`	JWT signing keys	Yes — without these, all sessions invalidate
`attachments/`	File attachments on vault items	Yes
`sends/`	Bitwarden Send files	No
`config.json`	Server configuration	No — can be recreated

Restore Procedure

1. Identify the backup to restore

# List available backups (directories sorted by date)
kubectl run vw-ls --rm -it --image=alpine \
  --overrides='{"spec":{"volumes":[{"name":"backup","persistentVolumeClaim":{"claimName":"vaultwarden-backup"}}],"containers":[{"name":"vw-ls","image":"alpine","volumeMounts":[{"name":"backup","mountPath":"/backup"}],"command":["ls","-lt","/backup/"]}]}}' \
  -n vaultwarden

2. Scale down Vaultwarden

kubectl scale deployment vaultwarden -n vaultwarden --replicas=0

3. Restore the backup

BACKUP_DIR="YYYY_MM_DD_HH_MM"  # Set to desired backup

kubectl run vw-restore --rm -it --image=alpine \
  --overrides='{"spec":{"volumes":[{"name":"backup","persistentVolumeClaim":{"claimName":"vaultwarden-backup"}},{"name":"data","persistentVolumeClaim":{"claimName":"vaultwarden-data"}}],"containers":[{"name":"vw-restore","image":"alpine","volumeMounts":[{"name":"backup","mountPath":"/backup"},{"name":"data","mountPath":"/data"}],"command":["/bin/sh","-c","cp /backup/'$BACKUP_DIR'/db.sqlite3 /data/db.sqlite3 && cp /backup/'$BACKUP_DIR'/rsa_key.pem /data/ && cp /backup/'$BACKUP_DIR'/rsa_key.pub.pem /data/ && cp -a /backup/'$BACKUP_DIR'/attachments /data/ 2>/dev/null; echo Restore complete"]}]}}' \
  -n vaultwarden

4. Scale up Vaultwarden

kubectl scale deployment vaultwarden -n vaultwarden --replicas=1

# Wait for pod to be ready
kubectl wait --for=condition=Ready pod -l app=vaultwarden -n vaultwarden --timeout=120s

5. Verify restoration

# Check pod logs for startup errors
kubectl logs -n vaultwarden -l app=vaultwarden --tail=20

# Test web UI access
curl -s -o /dev/null -w "%{http_code}" https://vaultwarden.viktorbarzin.me/

Can log in with your account
Vault items are present and readable
Attachments are accessible
TOTP codes are generating correctly

Estimated Time

Restore: ~5 minutes
Verification: ~5 minutes

2.9 KiB Raw Blame History

Restore Vaultwarden

Prerequisites

Backup Location

Backup Contents

Restore Procedure

1. Identify the backup to restore

2. Scale down Vaultwarden

3. Restore the backup

4. Scale up Vaultwarden

5. Verify restoration

6. Test login

Estimated Time

2.9 KiB

Raw Blame History