infra

Viktor Barzin 8787d361dc All checks were successful ci/woodpecker/push/default Pipeline was successful Details claude-memory: HA (replicas 2 + PDB) to stop recurring MCP disconnects The claude-memory MCP backend ran as a single replica with no PDB, so every voluntary disruption took it to zero for ~30-90s — which surfaced as the memory MCP "keeps getting disconnected" problem. Disruption sources hitting the lone pod: the descheduler (every-5-min CronJob, LowNodeUtilization — caught evicting it live), Keel image bumps, Reloader restarts on the 7-day DB-password rotation, node drains, and CI deploys. The local stdio MCP subprocess itself was proven healthy (fast non-blocking startup, stderr suppressed, graceful degradation), so the fault was purely backend availability, not the MCP plumbing. Fix: run 2 replicas (the backend is stateless FastAPI over shared CNPG Postgres and already has hostname anti-affinity) + restore the PDB at minAvailable=1 (safe now — the drain deadlock that justified removing it only existed at 1 replica) + descheduler evict=false to stop the needless 5-min churn. All five disruption sources become zero-downtime rolling events. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>		2026-06-18 09:13:36 +00:00
..
main.tf	claude-memory: HA (replicas 2 + PDB) to stop recurring MCP disconnects	2026-06-18 09:13:36 +00:00
providers.tf	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
secrets	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
terragrunt.hcl	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00