The redis-v2 Sentinel cluster split-brained: redis-v2-0 booted during a network
partition, hit the init script's deterministic "pod-0 = bootstrap master"
fallback, and became a SECOND master alongside the sentinel-elected redis-v2-2.
HAProxy's `expect rstring role:master` matched both and round-robined client
connections across the two diverging masters, so Immich enqueued BullMQ jobs on
one while its workers blocked-popped on the other -> every queue wedged and
new-upload thumbnails 404'd cluster-wide. Third Sentinel-class incident in ~6
weeks (after the 2026-04-19 PM quorum drift and 2026-04-22 flap cascade).
Revert to a single standalone instance: replicas=1; drop Sentinel + HAProxy +
init bootstrap configmap + both PDBs; redis container only (+ exporter).
maxmemory-policy allkeys-lru -> volatile-lru so one shared instance serves both
workload classes correctly: evict only TTL'd cache keys, never TTL-less Immich
BullMQ / Celery job keys. redis-master service name/DNS unchanged -> no consumer
edits; collapsed onto redis-v2-0's existing dataset (queued jobs preserved).
Applied via tg (Tier 1 / PG-authoritative state); this commit syncs source +
docs only, hence [ci skip].
Monitoring: drop RedisReplicationLagHigh + RedisReplicasMissing (no replicas
now; the latter would false-fire), RedisMemoryPressure 85%->80% volatile-lru backstop.
Docs: rewrite databases.md Redis section (single-instance design + incident
history); add post-mortem 2026-05-30-redis-split-brain.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>