[ci skip] add storage reliability design: DB replication + SQLite consolidation
This commit is contained in:
parent
2bbf52ac9b
commit
6381bcee40
1 changed files with 224 additions and 0 deletions
224
docs/plans/2026-02-28-storage-reliability-design.md
Normal file
224
docs/plans/2026-02-28-storage-reliability-design.md
Normal file
|
|
@ -0,0 +1,224 @@
|
||||||
|
# Storage Reliability: Database Replication + SQLite Consolidation
|
||||||
|
|
||||||
|
**Date**: 2026-02-28
|
||||||
|
**Status**: Approved
|
||||||
|
**Goal**: Eliminate data corruption risk from NFS outages by moving databases off NFS
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
All 70+ services store data on a single TrueNAS VM (10.0.10.15) via NFS. When this VM crashes or hangs:
|
||||||
|
|
||||||
|
- **22 services** risk **data corruption** (databases with WAL/fsync requirements on NFS)
|
||||||
|
- **12 services** experience downtime but no corruption (media, configs)
|
||||||
|
- The shared PostgreSQL alone backs 12 services — a single NFS hiccup can corrupt data for all of them
|
||||||
|
|
||||||
|
SQLite-over-NFS is fundamentally broken (advisory locking unreliable, WAL mode unsafe).
|
||||||
|
|
||||||
|
## Constraints
|
||||||
|
|
||||||
|
- Zero cost — all self-hosted, OSS
|
||||||
|
- Must preserve backup workflow (consolidate to TrueNAS → rsync to backup NAS)
|
||||||
|
- Stop-and-verify after each service migration
|
||||||
|
- No data loss tolerance
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Strategy Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
BEFORE: All services → NFS (TrueNAS VM) → single point of failure
|
||||||
|
|
||||||
|
AFTER: Databases → local disk + native replication (HA, proper fsync)
|
||||||
|
SQLite apps → migrated to shared PostgreSQL where possible
|
||||||
|
Media/configs → NFS (TrueNAS, non-critical path)
|
||||||
|
Backups → all consolidate to NFS → rsync to backup NAS
|
||||||
|
```
|
||||||
|
|
||||||
|
### Component 1: PostgreSQL HA via CloudNativePG
|
||||||
|
|
||||||
|
**Current**: Single PostgreSQL 16 pod on NFS (`/mnt/main/postgresql/data`)
|
||||||
|
**Target**: CloudNativePG operator with 3-instance cluster on local disk
|
||||||
|
|
||||||
|
CloudNativePG is a CNCF K8s operator that manages PostgreSQL clusters with:
|
||||||
|
- Automatic primary/replica failover
|
||||||
|
- Streaming replication across nodes
|
||||||
|
- Continuous WAL archiving (to NFS for backup)
|
||||||
|
- Local PVCs for data (proper fsync)
|
||||||
|
|
||||||
|
Architecture:
|
||||||
|
```
|
||||||
|
CloudNativePG Cluster
|
||||||
|
├── Primary (node A) — local PVC
|
||||||
|
├── Replica (node B) — local PVC, streaming replication
|
||||||
|
├── Replica (node C) — local PVC, streaming replication
|
||||||
|
└── WAL archive → NFS:/mnt/main/postgresql-wal-archive/ (backup only)
|
||||||
|
```
|
||||||
|
|
||||||
|
Dependent services (unchanged connection, new reliable backend):
|
||||||
|
authentik, n8n, dawarich, tandoor, linkwarden, netbox, woodpecker,
|
||||||
|
rybbit, affine, health, resume, trading-bot
|
||||||
|
|
||||||
|
Resource overhead: ~3GB RAM total, ~50GB local disk per node
|
||||||
|
|
||||||
|
### Component 2: MySQL HA (or Migration to PostgreSQL)
|
||||||
|
|
||||||
|
**Current**: Single MySQL pod on NFS (`/mnt/main/mysql`)
|
||||||
|
**Target**: Either MySQL Operator (InnoDB Cluster) or migrate MySQL services to PostgreSQL
|
||||||
|
|
||||||
|
Services on MySQL: hackmd, speedtest, onlyoffice, crowdsec,
|
||||||
|
paperless-ngx, real-estate-crawler, url-shortener, grafana
|
||||||
|
|
||||||
|
Many of these support PostgreSQL. Consolidating to one DB engine
|
||||||
|
reduces operational complexity. Evaluate per-service during implementation.
|
||||||
|
|
||||||
|
### Component 3: Redis HA via Sentinel
|
||||||
|
|
||||||
|
**Current**: Single redis-stack pod on NFS (`/mnt/main/redis`)
|
||||||
|
**Target**: Redis Sentinel (3 instances) with data on local disk
|
||||||
|
|
||||||
|
Architecture:
|
||||||
|
```
|
||||||
|
Redis Sentinel (3 instances)
|
||||||
|
├── Primary (node A) — local PVC, RDB + AOF persistence
|
||||||
|
├── Replica (node B) — local PVC
|
||||||
|
├── Replica (node C) — local PVC
|
||||||
|
└── Sentinel monitors (3) — automatic failover
|
||||||
|
```
|
||||||
|
|
||||||
|
Resource overhead: ~1.5GB RAM total, ~2GB local disk per node
|
||||||
|
|
||||||
|
### Component 4: Immich PostgreSQL
|
||||||
|
|
||||||
|
**Current**: Dedicated PostgreSQL + pgvector on NFS
|
||||||
|
**Target**: Migrate to CloudNativePG cluster (separate database in same cluster, or dedicated cluster with pgvector extension)
|
||||||
|
|
||||||
|
### Component 5: ClickHouse HA (Rybbit)
|
||||||
|
|
||||||
|
**Current**: Single ClickHouse on NFS (`/mnt/main/clickhouse`)
|
||||||
|
**Target**: ClickHouse with native replication via ClickHouse Keeper, or accept risk (analytics data, rebuildable)
|
||||||
|
|
||||||
|
### Component 6: SQLite App Consolidation to PostgreSQL
|
||||||
|
|
||||||
|
Apps that support PostgreSQL — migrate to shared CloudNativePG cluster:
|
||||||
|
|
||||||
|
| App | Config mechanism | Priority |
|
||||||
|
|-----|-----------------|----------|
|
||||||
|
| Vaultwarden | `DATABASE_URL` env var | P0 (password vault) |
|
||||||
|
| Headscale | `db_type: postgres` in config | P0 (VPN) |
|
||||||
|
| Forgejo | `[database]` section in app.ini | P1 |
|
||||||
|
| Open WebUI | `DATABASE_URL` env var | P2 |
|
||||||
|
| Meshcentral | config.json `db` section | P2 |
|
||||||
|
| FreshRSS | `db` config | P2 |
|
||||||
|
|
||||||
|
Apps stuck on SQLite (accept risk or use Litestream for backup):
|
||||||
|
|
||||||
|
| App | Storage engine | Mitigation |
|
||||||
|
|-----|---------------|------------|
|
||||||
|
| Uptime Kuma | SQLite only | Litestream or accept |
|
||||||
|
| Navidrome | SQLite only | Litestream or accept |
|
||||||
|
| Audiobookshelf | SQLite only | Litestream or accept |
|
||||||
|
| Calibre-Web | SQLite (Calibre format) | Accept (format constraint) |
|
||||||
|
| Wealthfolio | SQLite only | Litestream or accept |
|
||||||
|
| Diun | BoltDB only | Accept (rebuildable state) |
|
||||||
|
|
||||||
|
### Component 7: Monitoring Stack
|
||||||
|
|
||||||
|
Prometheus, Loki, Alertmanager use specialized storage (TSDB, BoltDB).
|
||||||
|
Cannot migrate to PostgreSQL. Options:
|
||||||
|
- Move to local disk (emptyDir or local PVC)
|
||||||
|
- Accept NFS risk (metrics/logs are ephemeral, loss is annoying not catastrophic)
|
||||||
|
- Prometheus WAL is already on tmpfs (good)
|
||||||
|
|
||||||
|
Recommendation: Move to local PVCs. Losing metrics history on node
|
||||||
|
failure is acceptable for a homelab.
|
||||||
|
|
||||||
|
### Component 8: What Stays on NFS (unchanged)
|
||||||
|
|
||||||
|
All ~35 LOW risk services: media files, configs, caches, static content.
|
||||||
|
Immich photos, Jellyfin media, Audiobookshelf audiobooks, Calibre ebooks,
|
||||||
|
Frigate recordings, downloads, backups, model caches, etc.
|
||||||
|
|
||||||
|
NFS failure for these = temporary unavailability, not corruption.
|
||||||
|
|
||||||
|
## Backup Strategy
|
||||||
|
|
||||||
|
```
|
||||||
|
CloudNativePG → continuous WAL archiving → NFS:/mnt/main/postgresql-wal-archive/
|
||||||
|
MySQL (if kept) → automated mysqldump → NFS:/mnt/main/mysql-backup/
|
||||||
|
Redis Sentinel → periodic RDB snapshots → NFS:/mnt/main/redis-backup/
|
||||||
|
Litestream → continuous SQLite backup → NFS:/mnt/main/litestream/
|
||||||
|
Media/configs → already on NFS
|
||||||
|
|
||||||
|
NFS (TrueNAS) → rsync → Backup NAS (unchanged)
|
||||||
|
```
|
||||||
|
|
||||||
|
All backups still consolidate to TrueNAS. The rsync-to-backup-NAS
|
||||||
|
workflow is completely unchanged.
|
||||||
|
|
||||||
|
## Migration Order (Safety-First)
|
||||||
|
|
||||||
|
Each phase: backup → migrate → verify → user confirms → next phase.
|
||||||
|
|
||||||
|
### Phase 0: Infrastructure Prerequisites
|
||||||
|
- Install CloudNativePG operator
|
||||||
|
- Add local virtual disks to K8s nodes (via Proxmox)
|
||||||
|
- Set up local-path StorageClass
|
||||||
|
- Install Redis Sentinel
|
||||||
|
|
||||||
|
### Phase 1: PostgreSQL Migration (highest impact)
|
||||||
|
1. Full pg_dumpall backup to NFS
|
||||||
|
2. Deploy CloudNativePG cluster (empty)
|
||||||
|
3. Restore backup into CloudNativePG
|
||||||
|
4. Verify all 12 dependent services work
|
||||||
|
5. Decommission old PostgreSQL pod + NFS volume
|
||||||
|
|
||||||
|
### Phase 2: Redis Migration
|
||||||
|
1. RDB snapshot backup
|
||||||
|
2. Deploy Redis Sentinel cluster
|
||||||
|
3. Restore data
|
||||||
|
4. Update service connection strings
|
||||||
|
5. Verify all 11 dependent services
|
||||||
|
|
||||||
|
### Phase 3: Critical SQLite Apps → PostgreSQL
|
||||||
|
Migrate one at a time, verify after each:
|
||||||
|
3a. Vaultwarden (password vault — most critical)
|
||||||
|
3b. Headscale (VPN coordination)
|
||||||
|
|
||||||
|
### Phase 4: MySQL Migration
|
||||||
|
Either deploy MySQL Operator or migrate services to PostgreSQL.
|
||||||
|
One service at a time.
|
||||||
|
|
||||||
|
### Phase 5: Immich PostgreSQL
|
||||||
|
Migrate Immich's dedicated PostgreSQL to CloudNativePG.
|
||||||
|
|
||||||
|
### Phase 6: Remaining SQLite Apps → PostgreSQL
|
||||||
|
One at a time: Forgejo, Open WebUI, Meshcentral, FreshRSS
|
||||||
|
|
||||||
|
### Phase 7: Monitoring Stack
|
||||||
|
Move Prometheus, Loki, Alertmanager to local PVCs.
|
||||||
|
|
||||||
|
### Phase 8: ClickHouse + Remaining
|
||||||
|
ClickHouse replication or accept risk.
|
||||||
|
Litestream for SQLite-only apps (optional).
|
||||||
|
|
||||||
|
## Resource Budget
|
||||||
|
|
||||||
|
| Component | RAM | Local Disk |
|
||||||
|
|-----------|-----|-----------|
|
||||||
|
| CloudNativePG (3 instances) | ~3GB | ~50GB/node |
|
||||||
|
| Redis Sentinel (3 instances) | ~1.5GB | ~2GB/node |
|
||||||
|
| MySQL Operator (if kept) | ~2GB | ~20GB/node |
|
||||||
|
| Litestream (6 apps) | ~300MB | None |
|
||||||
|
| **Total new** | **~7GB** | **~72GB/node** |
|
||||||
|
|
||||||
|
Current cluster has 88GB RAM total. TrueNAS VM (16GB) could be
|
||||||
|
downsized since it no longer serves database workloads, partially
|
||||||
|
offsetting the new overhead.
|
||||||
|
|
||||||
|
## Success Criteria
|
||||||
|
|
||||||
|
- [ ] No database runs on NFS
|
||||||
|
- [ ] TrueNAS VM restart causes zero data corruption
|
||||||
|
- [ ] TrueNAS VM restart only affects media/config services (temporary unavailability)
|
||||||
|
- [ ] All backups still consolidate to TrueNAS for rsync to backup NAS
|
||||||
|
- [ ] Each migrated service verified working before proceeding to next
|
||||||
Loading…
Add table
Add a link
Reference in a new issue