infra/.claude/reference/service-catalog.md
Viktor Barzin b6cd83f85a [redis] Phase 3-7: cutover to redis-v2, Nextcloud HAProxy-only
Phase 3 — replication chain (old → v2):
 - Discovered the v2 cluster was running redis:7.4-alpine, but the
   Bitnami old master ships redis 8.6.2 which writes RDB format 13 —
   the 7.4 replicas rejected the stream with "Can't handle RDB format
   version 13". Bumped v2 image to redis:8-alpine (also 8.6.2) to
   restore PSYNC compatibility.
 - Discovered that sentinel on BOTH v2 and old Bitnami clusters
   auto-discovered the cross-cluster replication chain when v2-0
   REPLICAOF'd the old master, triggering a failover that reparented
   old-master to a v2 replica and took HAProxy's backend offline.
   Mitigation: `SENTINEL REMOVE mymaster` on all 5 sentinels (both
   clusters) during the REPLICAOF surgery, then re-MONITOR after
   cutover. This must be done on the OLD sentinels too, not just v2 —
   they're the ones that kept fighting our REPLICAOF.
 - Set up the chain: v2-0 REPLICAOF old-master; v2-{1,2} REPLICAOF v2-0.
   All 76 keys (db0:76, db1:22, db4:16) synced including `immich_bull:*`
   BullMQ queues and `_kombu.*` Celery queues — the user-stated
   must-survive data class.

Phase 4 — HAProxy cutover:
 - Updated `kubernetes_config_map.haproxy` to point at
   `redis-v2-{0,1,2}.redis-v2-headless` for both redis_master and
   redis_sentinel backends (removed redis-node-{0,1}).
 - Promoted v2-0 (`REPLICAOF NO ONE`) at the same time as the
   ConfigMap apply so HAProxy's 1s health-check interval found a
   role:master within a few seconds. Cutover disruption on HAProxy
   rollout was brief; old clients naturally moved to new HAProxy pods
   within the rolling update window.
 - Re-enabled sentinel monitoring on v2 with `SENTINEL MONITOR
   mymaster <hostname> 6379 2` after verifying `resolve-hostnames yes`
   + `announce-hostnames yes` were active — this ensures sentinel
   stores the hostname (not resolved IP) in its rewritten config, so
   pod-IP churn on restart doesn't break failover.

Phase 5 — chaos:
 - Round 1: killed master v2-0 mid-probe. First run exposed the
   sentinel IP-storage issue (stored 10.10.107.222, went stale on
   restart) — ~12s probe disruption. Fixed hostname persistence and
   re-MONITORed.
 - Round 2: killed new master v2-2 with hostnames correctly stored.
   Sentinel elected v2-0, HAProxy re-routed, 1/40 probe failures over
   60s — target <3s of actual user-visible disruption.

Phase 6 — Nextcloud simplification:
 - `zzz-redis.config.php` no longer queries sentinel in-process —
   just points at `redis-master.redis.svc.cluster.local`. Removed 20
   lines of PHP. HAProxy handles master tracking transparently now
   that it's scaled to 3 + PDB minAvailable=2.

Phase 7 step 1:
 - `kubectl scale statefulset/redis-node --replicas=0` (transient —
   TF removal in a 24h follow-up). Old PVCs `redis-data-redis-node-{0,1}`
   preserved as cold rollback.

Docs:
 - Rewrote `databases.md` Redis section to reflect post-cutover reality
   and the sentinel hostname gotcha (so future sessions don't relearn it).
 - `.claude/reference/service-catalog.md` entry updated.

The parallel-bootstrap race documented in the previous commit is still
worth watching — the init container now defaults to pod-0 as master
when no peer reports role:master-with-slaves, so fresh boots land in
a deterministic topology.

Closes: code-7n4
Closes: code-9y6
Closes: code-cnf
Closes: code-tc4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:13:43 +00:00

6.2 KiB

Service Catalog

Auto-maintained reference. See .claude/CLAUDE.md for operational guidance.

Critical - Network & Auth (Tier: core)

Service Description Stack
wireguard VPN server wireguard
technitium DNS server (10.0.20.201, query logging on PostgreSQL via custom PG plugin) technitium
headscale Tailscale control server headscale
traefik Ingress controller (Helm) traefik
xray Proxy/tunnel platform
authentik Identity provider (SSO) authentik
cloudflared Cloudflare tunnel cloudflared
authelia Auth middleware (may be merged into ebooks or removed) platform
monitoring Prometheus/Grafana/Loki stack monitoring

Storage & Security (Tier: cluster)

Service Description Stack
vaultwarden Bitwarden-compatible password manager platform
redis Shared Redis 8.x via HAProxy at redis-master.redis.svc.cluster.local — 3-pod raw StatefulSet redis-v2 (redis+sentinel+exporter per pod), quorum=2. Clients use HAProxy only, no sentinel fallback. redis
immich Photo management (GPU) immich
nvidia GPU device plugin nvidia
metrics-server K8s metrics metrics-server
uptime-kuma Status monitoring uptime-kuma
crowdsec Security/WAF (PostgreSQL backend) crowdsec
kyverno Policy engine kyverno

Admin

Service Description Stack
k8s-dashboard Kubernetes dashboard k8s-dashboard
reverse-proxy Generic reverse proxy reverse-proxy

Active Use

Service Description Stack
mailserver Email (docker-mailserver) mailserver
shadowsocks Proxy shadowsocks
webhook_handler Webhook processing webhook_handler
tuya-bridge Smart home bridge tuya-bridge
dawarich Location history dawarich
owntracks Location tracking owntracks
nextcloud File sync/share nextcloud
calibre E-book management (may be merged into ebooks stack) calibre
onlyoffice Document editing onlyoffice
f1-stream F1 streaming f1-stream
rybbit Analytics rybbit
isponsorblocktv SponsorBlock for TV isponsorblocktv
actualbudget Budgeting (factory pattern) actualbudget
insta2spotify Instagram reel song ID to Spotify playlist insta2spotify
trading-bot Event-driven trading with sentiment analysis trading-bot
claude-memory Persistent memory MCP server claude-memory
council-complaints Islington civic reporting pilot council-complaints

Optional

Service Description Stack
blog Personal blog blog
descheduler Pod descheduler descheduler
hackmd Collaborative markdown hackmd
kms Key management kms
privatebin Encrypted pastebin privatebin
vault HashiCorp Vault vault
reloader ConfigMap/Secret reloader reloader
city-guesser Game city-guesser
echo Echo server echo
url URL shortener url
excalidraw Whiteboard excalidraw
travel_blog Travel blog travel_blog
dashy Dashboard dashy
send Firefox Send send
ytdlp YouTube downloader ytdlp
wealthfolio Finance tracking wealthfolio
audiobookshelf Audiobook server (may be merged into ebooks stack) audiobookshelf
paperless-ngx Document management paperless-ngx
jsoncrack JSON visualizer jsoncrack
servarr Media automation (Sonarr/Radarr/etc) servarr
ntfy Push notifications ntfy
cyberchef Data transformation cyberchef
diun Docker image update notifier — detects new versions, fires webhook to n8n upgrade agent diun
meshcentral Remote management meshcentral
homepage Dashboard/startpage homepage
matrix Matrix chat server matrix
linkwarden Bookmark manager linkwarden
changedetection Web change detection changedetection
tandoor Recipe manager tandoor
n8n Workflow automation n8n
real-estate-crawler Property crawler real-estate-crawler
tor-proxy Tor proxy tor-proxy
forgejo Git forge forgejo
freshrss RSS reader freshrss
navidrome Music streaming navidrome
networking-toolbox Network tools networking-toolbox
stirling-pdf PDF tools stirling-pdf
speedtest Speed testing speedtest
freedify Music streaming (factory pattern) freedify
phpipam IP Address Management (IPAM) + auto-discovery phpipam
netbox Network documentation (disabled, replaced by phpipam) netbox
infra-maintenance Maintenance jobs infra-maintenance
ollama LLM server (GPU) ollama
frigate NVR/camera (GPU) frigate
ebook2audiobook E-book to audio (GPU) ebook2audiobook
affine Visual canvas/whiteboard (PostgreSQL + Redis) affine
health Apple Health data dashboard (PostgreSQL) health
whisper Wyoming Faster Whisper STT (CPU on GPU node) whisper
grampsweb Genealogy web app (Gramps Web) grampsweb
openclaw AI agent gateway (OpenClaw) openclaw
poison-fountain Anti-AI scraping (tarpit + poison) poison-fountain
priority-pass Boarding pass color transformer priority-pass
status-page Status page status-page
plotting-book Book plotting/world-building app plotting-book

Cloudflare Domains

Proxied (CDN + WAF enabled)

blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send,
audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden,
changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser,
travel, netbox, phpipam

Non-Proxied (Direct DNS)

mail, wg, headscale, immich, calibre, vaultwarden,
mailserver-antispam, mailserver-admin, webhook, uptime,
owntracks, dawarich, tuya, meshcentral, nextcloud, actualbudget,
onlyoffice, forgejo, freshrss, navidrome, ollama, openwebui,
isponsorblocktv, speedtest, freedify, rybbit, paperless,
servarr, prowlarr, bazarr, radarr, sonarr, flaresolverr,
jellyfin, jellyseerr, tdarr, affine, health, family, openclaw

Special Subdomains

  • *.viktor.actualbudget - Actualbudget factory instances
  • *.freedify - Freedify factory instances
  • mailserver.* - Mail server components (antispam, admin)