infra/docs/runbooks/restore-vaultwarden.md
Viktor Barzin af2222fce8 backup & DR: add alerting, fix rotation, secure MySQL password, add runbooks
Phase 1: Add 12 PrometheusRules for backup health alerting
- PostgreSQL, MySQL, Vault, Vaultwarden, Redis staleness + never-succeeded alerts
- CSIDriverCrashLoop alert for nfs-csi/iscsi-csi namespaces
- Generic BackupCronJobFailed alert

Phase 2: Fix backup rotation
- etcd: timestamped snapshots instead of overwriting single file
- Redis: timestamped RDB files with 7-day retention purge
- PostgreSQL: retention increased from 7 to 14 days

Phase 3: Fix MySQL password exposure
- Move root password from command line arg to MYSQL_PWD env var via secretKeyRef

Phase 5: Add restore runbooks
- PostgreSQL, MySQL, Vault, etcd, Vaultwarden, full cluster rebuild
2026-03-19 20:34:33 +00:00

2.9 KiB

Restore Vaultwarden

Prerequisites

  • kubectl access to the cluster
  • Backup available on NFS at /mnt/main/vaultwarden-backup/

Backup Location

  • NFS: /mnt/main/vaultwarden-backup/YYYY_MM_DD_HH_MM/ (directory per backup)
  • Each backup contains: db.sqlite3, rsa_key.pem, rsa_key.pub.pem, attachments/, sends/, config.json
  • Replicated to Synology NAS (192.168.1.13) via TrueNAS ZFS replication
  • Retention: 30 days
  • Schedule: Daily at 00:00

Backup Contents

File Purpose Critical?
db.sqlite3 All passwords, TOTP seeds, org data Yes
rsa_key.pem / rsa_key.pub.pem JWT signing keys Yes — without these, all sessions invalidate
attachments/ File attachments on vault items Yes
sends/ Bitwarden Send files No
config.json Server configuration No — can be recreated

Restore Procedure

1. Identify the backup to restore

# List available backups (directories sorted by date)
kubectl run vw-ls --rm -it --image=alpine \
  --overrides='{"spec":{"volumes":[{"name":"backup","persistentVolumeClaim":{"claimName":"vaultwarden-backup"}}],"containers":[{"name":"vw-ls","image":"alpine","volumeMounts":[{"name":"backup","mountPath":"/backup"}],"command":["ls","-lt","/backup/"]}]}}' \
  -n vaultwarden

2. Scale down Vaultwarden

kubectl scale deployment vaultwarden -n vaultwarden --replicas=0

3. Restore the backup

BACKUP_DIR="YYYY_MM_DD_HH_MM"  # Set to desired backup

kubectl run vw-restore --rm -it --image=alpine \
  --overrides='{"spec":{"volumes":[{"name":"backup","persistentVolumeClaim":{"claimName":"vaultwarden-backup"}},{"name":"data","persistentVolumeClaim":{"claimName":"vaultwarden-data"}}],"containers":[{"name":"vw-restore","image":"alpine","volumeMounts":[{"name":"backup","mountPath":"/backup"},{"name":"data","mountPath":"/data"}],"command":["/bin/sh","-c","cp /backup/'$BACKUP_DIR'/db.sqlite3 /data/db.sqlite3 && cp /backup/'$BACKUP_DIR'/rsa_key.pem /data/ && cp /backup/'$BACKUP_DIR'/rsa_key.pub.pem /data/ && cp -a /backup/'$BACKUP_DIR'/attachments /data/ 2>/dev/null; echo Restore complete"]}]}}' \
  -n vaultwarden

4. Scale up Vaultwarden

kubectl scale deployment vaultwarden -n vaultwarden --replicas=1

# Wait for pod to be ready
kubectl wait --for=condition=Ready pod -l app=vaultwarden -n vaultwarden --timeout=120s

5. Verify restoration

# Check pod logs for startup errors
kubectl logs -n vaultwarden -l app=vaultwarden --tail=20

# Test web UI access
curl -s -o /dev/null -w "%{http_code}" https://vaultwarden.viktorbarzin.me/

6. Test login

Log in to the Vaultwarden web UI and verify:

  • Can log in with your account
  • Vault items are present and readable
  • Attachments are accessible
  • TOTP codes are generating correctly

Estimated Time

  • Restore: ~5 minutes
  • Verification: ~5 minutes