Phase 3 — replication chain (old → v2):
- Discovered the v2 cluster was running redis:7.4-alpine, but the
Bitnami old master ships redis 8.6.2 which writes RDB format 13 —
the 7.4 replicas rejected the stream with "Can't handle RDB format
version 13". Bumped v2 image to redis:8-alpine (also 8.6.2) to
restore PSYNC compatibility.
- Discovered that sentinel on BOTH v2 and old Bitnami clusters
auto-discovered the cross-cluster replication chain when v2-0
REPLICAOF'd the old master, triggering a failover that reparented
old-master to a v2 replica and took HAProxy's backend offline.
Mitigation: `SENTINEL REMOVE mymaster` on all 5 sentinels (both
clusters) during the REPLICAOF surgery, then re-MONITOR after
cutover. This must be done on the OLD sentinels too, not just v2 —
they're the ones that kept fighting our REPLICAOF.
- Set up the chain: v2-0 REPLICAOF old-master; v2-{1,2} REPLICAOF v2-0.
All 76 keys (db0:76, db1:22, db4:16) synced including `immich_bull:*`
BullMQ queues and `_kombu.*` Celery queues — the user-stated
must-survive data class.
Phase 4 — HAProxy cutover:
- Updated `kubernetes_config_map.haproxy` to point at
`redis-v2-{0,1,2}.redis-v2-headless` for both redis_master and
redis_sentinel backends (removed redis-node-{0,1}).
- Promoted v2-0 (`REPLICAOF NO ONE`) at the same time as the
ConfigMap apply so HAProxy's 1s health-check interval found a
role:master within a few seconds. Cutover disruption on HAProxy
rollout was brief; old clients naturally moved to new HAProxy pods
within the rolling update window.
- Re-enabled sentinel monitoring on v2 with `SENTINEL MONITOR
mymaster <hostname> 6379 2` after verifying `resolve-hostnames yes`
+ `announce-hostnames yes` were active — this ensures sentinel
stores the hostname (not resolved IP) in its rewritten config, so
pod-IP churn on restart doesn't break failover.
Phase 5 — chaos:
- Round 1: killed master v2-0 mid-probe. First run exposed the
sentinel IP-storage issue (stored 10.10.107.222, went stale on
restart) — ~12s probe disruption. Fixed hostname persistence and
re-MONITORed.
- Round 2: killed new master v2-2 with hostnames correctly stored.
Sentinel elected v2-0, HAProxy re-routed, 1/40 probe failures over
60s — target <3s of actual user-visible disruption.
Phase 6 — Nextcloud simplification:
- `zzz-redis.config.php` no longer queries sentinel in-process —
just points at `redis-master.redis.svc.cluster.local`. Removed 20
lines of PHP. HAProxy handles master tracking transparently now
that it's scaled to 3 + PDB minAvailable=2.
Phase 7 step 1:
- `kubectl scale statefulset/redis-node --replicas=0` (transient —
TF removal in a 24h follow-up). Old PVCs `redis-data-redis-node-{0,1}`
preserved as cold rollback.
Docs:
- Rewrote `databases.md` Redis section to reflect post-cutover reality
and the sentinel hostname gotcha (so future sessions don't relearn it).
- `.claude/reference/service-catalog.md` entry updated.
The parallel-bootstrap race documented in the previous commit is still
worth watching — the init container now defaults to pod-0 as master
when no peer reports role:master-with-slaves, so fresh boots land in
a deterministic topology.
Closes: code-7n4
Closes: code-9y6
Closes: code-cnf
Closes: code-tc4
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
139 lines
6.2 KiB
Markdown
139 lines
6.2 KiB
Markdown
# Service Catalog
|
|
|
|
> Auto-maintained reference. See `.claude/CLAUDE.md` for operational guidance.
|
|
|
|
## Critical - Network & Auth (Tier: core)
|
|
| Service | Description | Stack |
|
|
|---------|-------------|-------|
|
|
| wireguard | VPN server | wireguard |
|
|
| technitium | DNS server (10.0.20.201, query logging on PostgreSQL via custom PG plugin) | technitium |
|
|
| headscale | Tailscale control server | headscale |
|
|
| traefik | Ingress controller (Helm) | traefik |
|
|
| xray | Proxy/tunnel | platform |
|
|
| authentik | Identity provider (SSO) | authentik |
|
|
| cloudflared | Cloudflare tunnel | cloudflared |
|
|
| authelia | Auth middleware (may be merged into ebooks or removed) | platform |
|
|
| monitoring | Prometheus/Grafana/Loki stack | monitoring |
|
|
|
|
## Storage & Security (Tier: cluster)
|
|
| Service | Description | Stack |
|
|
|---------|-------------|-------|
|
|
| vaultwarden | Bitwarden-compatible password manager | platform |
|
|
| redis | Shared Redis 8.x via HAProxy at `redis-master.redis.svc.cluster.local` — 3-pod raw StatefulSet `redis-v2` (redis+sentinel+exporter per pod), quorum=2. Clients use HAProxy only, no sentinel fallback. | redis |
|
|
| immich | Photo management (GPU) | immich |
|
|
| nvidia | GPU device plugin | nvidia |
|
|
| metrics-server | K8s metrics | metrics-server |
|
|
| uptime-kuma | Status monitoring | uptime-kuma |
|
|
| crowdsec | Security/WAF (PostgreSQL backend) | crowdsec |
|
|
| kyverno | Policy engine | kyverno |
|
|
|
|
## Admin
|
|
| Service | Description | Stack |
|
|
|---------|-------------|-------|
|
|
| k8s-dashboard | Kubernetes dashboard | k8s-dashboard |
|
|
| reverse-proxy | Generic reverse proxy | reverse-proxy |
|
|
|
|
## Active Use
|
|
| Service | Description | Stack |
|
|
|---------|-------------|-------|
|
|
| mailserver | Email (docker-mailserver) | mailserver |
|
|
| shadowsocks | Proxy | shadowsocks |
|
|
| webhook_handler | Webhook processing | webhook_handler |
|
|
| tuya-bridge | Smart home bridge | tuya-bridge |
|
|
| dawarich | Location history | dawarich |
|
|
| owntracks | Location tracking | owntracks |
|
|
| nextcloud | File sync/share | nextcloud |
|
|
| calibre | E-book management (may be merged into ebooks stack) | calibre |
|
|
| onlyoffice | Document editing | onlyoffice |
|
|
| f1-stream | F1 streaming | f1-stream |
|
|
| rybbit | Analytics | rybbit |
|
|
| isponsorblocktv | SponsorBlock for TV | isponsorblocktv |
|
|
| actualbudget | Budgeting (factory pattern) | actualbudget |
|
|
| insta2spotify | Instagram reel song ID to Spotify playlist | insta2spotify |
|
|
| trading-bot | Event-driven trading with sentiment analysis | trading-bot |
|
|
| claude-memory | Persistent memory MCP server | claude-memory |
|
|
| council-complaints | Islington civic reporting pilot | council-complaints |
|
|
|
|
## Optional
|
|
| Service | Description | Stack |
|
|
|---------|-------------|-------|
|
|
| blog | Personal blog | blog |
|
|
| descheduler | Pod descheduler | descheduler |
|
|
| hackmd | Collaborative markdown | hackmd |
|
|
| kms | Key management | kms |
|
|
| privatebin | Encrypted pastebin | privatebin |
|
|
| vault | HashiCorp Vault | vault |
|
|
| reloader | ConfigMap/Secret reloader | reloader |
|
|
| city-guesser | Game | city-guesser |
|
|
| echo | Echo server | echo |
|
|
| url | URL shortener | url |
|
|
| excalidraw | Whiteboard | excalidraw |
|
|
| travel_blog | Travel blog | travel_blog |
|
|
| dashy | Dashboard | dashy |
|
|
| send | Firefox Send | send |
|
|
| ytdlp | YouTube downloader | ytdlp |
|
|
| wealthfolio | Finance tracking | wealthfolio |
|
|
| audiobookshelf | Audiobook server (may be merged into ebooks stack) | audiobookshelf |
|
|
| paperless-ngx | Document management | paperless-ngx |
|
|
| jsoncrack | JSON visualizer | jsoncrack |
|
|
| servarr | Media automation (Sonarr/Radarr/etc) | servarr |
|
|
| ntfy | Push notifications | ntfy |
|
|
| cyberchef | Data transformation | cyberchef |
|
|
| diun | Docker image update notifier — detects new versions, fires webhook to n8n upgrade agent | diun |
|
|
| meshcentral | Remote management | meshcentral |
|
|
| homepage | Dashboard/startpage | homepage |
|
|
| matrix | Matrix chat server | matrix |
|
|
| linkwarden | Bookmark manager | linkwarden |
|
|
| changedetection | Web change detection | changedetection |
|
|
| tandoor | Recipe manager | tandoor |
|
|
| n8n | Workflow automation | n8n |
|
|
| real-estate-crawler | Property crawler | real-estate-crawler |
|
|
| tor-proxy | Tor proxy | tor-proxy |
|
|
| forgejo | Git forge | forgejo |
|
|
| freshrss | RSS reader | freshrss |
|
|
| navidrome | Music streaming | navidrome |
|
|
| networking-toolbox | Network tools | networking-toolbox |
|
|
| stirling-pdf | PDF tools | stirling-pdf |
|
|
| speedtest | Speed testing | speedtest |
|
|
| freedify | Music streaming (factory pattern) | freedify |
|
|
| phpipam | IP Address Management (IPAM) + auto-discovery | phpipam |
|
|
| ~~netbox~~ | ~~Network documentation~~ (disabled, replaced by phpipam) | netbox |
|
|
| infra-maintenance | Maintenance jobs | infra-maintenance |
|
|
| ollama | LLM server (GPU) | ollama |
|
|
| frigate | NVR/camera (GPU) | frigate |
|
|
| ebook2audiobook | E-book to audio (GPU) | ebook2audiobook |
|
|
| affine | Visual canvas/whiteboard (PostgreSQL + Redis) | affine |
|
|
| health | Apple Health data dashboard (PostgreSQL) | health |
|
|
| whisper | Wyoming Faster Whisper STT (CPU on GPU node) | whisper |
|
|
| grampsweb | Genealogy web app (Gramps Web) | grampsweb |
|
|
| openclaw | AI agent gateway (OpenClaw) | openclaw |
|
|
| poison-fountain | Anti-AI scraping (tarpit + poison) | poison-fountain |
|
|
| priority-pass | Boarding pass color transformer | priority-pass |
|
|
| status-page | Status page | status-page |
|
|
| plotting-book | Book plotting/world-building app | plotting-book |
|
|
|
|
## Cloudflare Domains
|
|
|
|
### Proxied (CDN + WAF enabled)
|
|
```
|
|
blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send,
|
|
audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden,
|
|
changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser,
|
|
travel, netbox, phpipam
|
|
```
|
|
|
|
### Non-Proxied (Direct DNS)
|
|
```
|
|
mail, wg, headscale, immich, calibre, vaultwarden,
|
|
mailserver-antispam, mailserver-admin, webhook, uptime,
|
|
owntracks, dawarich, tuya, meshcentral, nextcloud, actualbudget,
|
|
onlyoffice, forgejo, freshrss, navidrome, ollama, openwebui,
|
|
isponsorblocktv, speedtest, freedify, rybbit, paperless,
|
|
servarr, prowlarr, bazarr, radarr, sonarr, flaresolverr,
|
|
jellyfin, jellyseerr, tdarr, affine, health, family, openclaw
|
|
```
|
|
|
|
### Special Subdomains
|
|
- `*.viktor.actualbudget` - Actualbudget factory instances
|
|
- `*.freedify` - Freedify factory instances
|
|
- `mailserver.*` - Mail server components (antispam, admin)
|