resilient memory sync: decouple push/pull, startup full resync, auth failure handling

- Decouple push and pull in _sync_once() so pull always runs even if push fails
- Add startup full resync to catch drift from other agents and schema changes
- Add periodic full resync every ~10 minutes for continuous drift correction
- Add auth failure detection (401/403) with graceful SQLite-only degradation
- Add /api/auth-check endpoint for lightweight key validation
- Add retry cap (5 attempts) on pending ops to prevent infinite queue buildup
- Add orphan reconciliation: push local-only records with content dedup
- Add memory_count MCP tool for sync diagnostics
- Add version-based SQLite schema migration (PRAGMA user_version)
- Fix API key in ~/.claude.json to match server
- Update README with sync resilience docs, test structure, project layout
- Add 30 new tests covering all new behaviors (155 total, all passing)

2026-03-16 18:37:59 +00:00

26 KiB

Raw Blame History

Claude Memory MCP

Give Claude persistent memory that survives across sessions, context compactions, and machines.

claude plugins install github:ViktorBarzin/claude-memory-mcp

That's it. Claude now remembers things.

What It Does

You:    "remember I prefer Svelte for all new frontend apps"
Claude: Stored. ✓

  ── 3 weeks later, new session, different machine ──

You:    "build me a dashboard for this API"
Claude: I'll set up a SvelteKit project since that's your preference...

No configuration needed for single-machine use. Memories are stored in a local SQLite database with full-text search. For multi-machine sync, point it at an API server (details below).

Features

Slash Commands

/remember <fact> — store a fact, preference, decision, or person detail
/recall <query> — search memories by topic

Automatic Behaviors (via hooks)

Auto-recall — before responding, Claude checks stored memories for relevant context (preferences, past corrections, decisions). Completely invisible to the user.
Auto-learn — after each response, a background process analyzes the conversation for corrections, preferences, and decisions worth remembering. Uses haiku-as-judge for conservative extraction.
Compaction survival — when Claude's context window compacts, key memories are saved to a marker file and re-injected on the next prompt. No knowledge is lost.
Auto-approve — memory tool calls are approved automatically, no permission prompts.

MCP Tools

Claude has direct access to these tools during conversation:

Tool	Description
`memory_store`	Store a fact with category, tags, importance, and semantic search keywords
`memory_recall`	Search memories using full-text search with expanded query terms
`memory_list`	List recent memories, optionally filtered by category
`memory_delete`	Delete a memory by ID
`secret_get`	Retrieve the decrypted content of a sensitive memory
`memory_count`	Get memory counts by category and sync status diagnostics

Memory Categories

Memories are organized into: facts, preferences, projects, people, decisions

Sensitive Memory Support

Mark memories as sensitive with force_sensitive: true. When Vault is configured, sensitive content is encrypted at rest and only decryptable via secret_get.

Architecture

Claude Code Session
┌──────────────────────────────────────────────────────┐
│  Hooks                        MCP Server             │
│  ┌──────────────────┐        ┌────────────────────┐  │
│  │ auto-recall      │───────▶│ memory_store       │  │
│  │ auto-learn       │        │ memory_recall      │  │
│  │ compaction       │        │ memory_list        │  │
│  │ auto-approve     │        │ memory_delete      │  │
│  └──────────────────┘        │ secret_get         │  │
│                              │ memory_count       │  │
│                              └─────────┬──────────┘  │
│                                        │             │
│              ┌─────────────────────────┼──────────┐  │
│              │ Local SQLite            │          │  │
│              │ (cache + FTS5)    SyncEngine       │  │
│              │ ◄──── always ────(background, 60s) │  │
│              │       used       push queued writes │  │
│              │                  pull server changes│  │
│              │  pending_ops ──────────┘            │  │
│              │  (offline write queue)              │  │
│              └────────────────────────────────────┘  │
└──────────────────────────────────────────────────────┘
                                │
              optional, for multi-machine sync
                                │
                     ┌──────────▼───────────┐
                     │  API Server (FastAPI) │
                     │  Docker / Kubernetes  │
                     └──────────┬───────────┘
                                │
                     ┌──────────▼───────────┐
                     │    PostgreSQL         │
                     │  (authoritative)      │
                     └──────────────────────┘

Sync Resilience

The SyncEngine is designed to handle failures gracefully:

Decoupled push/pull — push failures never block pull. Remote changes from other agents always flow in.
Auth failure detection — on 401/403, the engine sets an auth-failed flag, logs a clear warning, and degrades to SQLite-only mode. A periodic health check detects when auth is restored.
Startup full resync — on MCP server start, a full cache replacement runs to catch drift from other agents, deleted records, and schema changes.
Periodic full resync — every ~10 minutes, a full resync replaces incremental sync to catch any drift.
Retry cap — individual pending ops are retried up to 5 times, then permanently skipped to prevent queue buildup.
Orphan reconciliation — local records that never synced are deduplicated against server content before push.

Search Algorithm

Memory recall uses two different full-text search backends depending on the operating mode. Both follow the same query building pattern: the context and expanded_query fields from the user's recall request are concatenated into a single search string, then processed into a backend-specific query.

Query Construction

When memory_recall is called, the tool receives:

context — the topic or question being asked about
expanded_query — REQUIRED, minimum 5 space-separated semantically related terms generated by Claude

These are concatenated: all_terms = f"{context} {expanded_query}", then split into individual words for the search backend.

SQLite: FTS5 with BM25

The local SQLite database uses an FTS5 virtual table for full-text search with the BM25 ranking algorithm.

Schema:

CREATE VIRTUAL TABLE memories_fts USING fts5(
    content, category, tags, expanded_keywords,
    content='memories', content_rowid='id'
);

This is a content table backed by the memories table — FTS5 reads column data from memories on demand rather than duplicating it. Triggers on INSERT, UPDATE, and DELETE keep the FTS index synchronized:

-- Insert trigger
CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN
    INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
    VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;

-- Delete trigger (FTS5 special 'delete' command)
CREATE TRIGGER memories_ad AFTER DELETE ON memories BEGIN
    INSERT INTO memories_fts(memories_fts, rowid, content, category, tags, expanded_keywords)
    VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
END;

-- Update trigger (delete old + insert new)
CREATE TRIGGER memories_au AFTER UPDATE ON memories BEGIN
    INSERT INTO memories_fts(memories_fts, rowid, ...)
    VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
    INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
    VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;

Query building: Each word is quoted and joined with OR:

words = all_terms.split()
fts_query = " OR ".join(f'"{w}"' for w in words if w)
# Example: "kubernetes" OR "deployment" OR "pod" OR "scaling" OR "replicas"

Ranking: Two sort modes:

sort_by="relevance" — ORDER BY bm25(memories_fts), importance DESC — FTS5's built-in BM25 relevance score, with importance as tiebreaker
sort_by="importance" (default) — ORDER BY importance DESC, created_at DESC — user-assigned importance score, with recency as tiebreaker

Fallback: If the FTS5 MATCH query fails (e.g. syntax error from special characters), the search falls back to a LIKE %query% scan against content and tags columns.

PostgreSQL: tsvector with Weighted Ranking

The API server uses PostgreSQL's native full-text search with a generated tsvector column and weighted ranking.

Schema (from migration 001):

CREATE TABLE memories (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    category VARCHAR(50) DEFAULT 'facts',
    tags TEXT DEFAULT '',
    expanded_keywords TEXT DEFAULT '',
    importance REAL DEFAULT 0.5,
    -- ...
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(content, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(expanded_keywords, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(tags, '')), 'C') ||
        setweight(to_tsvector('english', coalesce(category, '')), 'D')
    ) STORED
);

CREATE INDEX idx_memories_search ON memories USING GIN(search_vector);

The search_vector is a generated column that PostgreSQL maintains automatically on every INSERT/UPDATE — no triggers needed. It combines four fields with different weights:

Weight	Field	Purpose
A (highest)	`content`	The actual memory text — most important for matching
B	`expanded_keywords`	Semantic search terms generated by Claude at store time
C	`tags`	User-provided comma-separated tags
D (lowest)	`category`	Category name (facts, preferences, etc.)

A GIN index on search_vector enables fast lookup without sequential scans.

Query execution:

SELECT id, content, category, tags, importance, is_sensitive,
       ts_rank(search_vector, query) AS rank,
       created_at, updated_at
FROM memories, plainto_tsquery('english', $2) query
WHERE user_id = $1
  AND deleted_at IS NULL
  AND (search_vector @@ query OR $2 = '')
ORDER BY importance DESC, ts_rank(search_vector, query) DESC
LIMIT $3

plainto_tsquery processes the concatenated query text through PostgreSQL's English text search dictionary — it handles stemming (e.g. "running" → "run"), stop word removal, and normalization. The @@ operator matches the resulting tsquery against the stored tsvector.

ts_rank scores each match considering the weight of the matched field — a match in content (weight A) scores higher than a match in tags (weight C).

Sort modes:

sort_by="importance" (default) — ORDER BY importance DESC, ts_rank(search_vector, query) DESC
sort_by="relevance" — ORDER BY ts_rank(search_vector, query) DESC
sort_by="recency" — ORDER BY created_at DESC

Semantic Expansion at Store Time

When Claude stores a memory via memory_store, it generates an expanded_keywords field — at least 5 space-separated semantically related terms. For example, storing "User prefers Svelte for frontends" might produce:

expanded_keywords: "svelte frontend framework ui web sveltekit javascript component reactive"

These keywords are indexed alongside the memory content. When a future memory_recall searches for "what framework should I use for the dashboard?", the expanded keywords create additional matching surface that pure content matching would miss. This is a form of query expansion performed at write time by the LLM.

SQLite vs PostgreSQL Comparison

Feature	SQLite (FTS5)	PostgreSQL (tsvector)
Tokenization	ICU/unicode61 tokenizer	English text search dictionary
Stemming	No (exact word matching)	Yes (Porter stemmer via `english` config)
Stop words	No	Yes (removes "the", "is", "at", etc.)
Ranking	BM25 (term frequency + inverse document frequency)	`ts_rank` (weighted field matching)
Field weighting	No (all FTS5 columns equal)	Yes (A/B/C/D weights per column)
Index type	B-tree on FTS content	GIN (Generalized Inverted Index)
Query syntax	Quoted OR terms (`"word1" OR "word2"`)	`plainto_tsquery` with implicit AND
Fallback	LIKE scan on failure	None needed (robust parser)
Maintenance	Triggers sync content ↔ FTS table	Generated column, auto-maintained

Operating Modes

Mode	When	What happens
SQLite-only	No env vars set	Everything local. Zero config. Works offline.
Hybrid	`MEMORY_API_KEY` set	Local SQLite for reads + background sync to API. Writes queue if API is down.
HTTP-only	`MEMORY_API_KEY` + `MEMORY_SYNC_DISABLE=1`	Direct API calls, no local cache. Legacy mode.
Full	Any mode + Vault configured	Above + Vault for encrypting sensitive memories at rest.

Setup

Option 1: Plugin Install (recommended)

claude plugins install github:ViktorBarzin/claude-memory-mcp

Works immediately with SQLite-only mode. To enable multi-machine sync:

export MEMORY_API_URL="https://your-server.example.com"
export MEMORY_API_KEY="your-api-key"

Option 2: Manual MCP Config

Add to ~/.claude.json under mcpServers:

{
  "mcpServers": {
    "claude_memory": {
      "type": "stdio",
      "command": "python3",
      "args": ["/path/to/claude-memory-mcp/src/claude_memory/mcp_server.py"],
      "env": {
        "PYTHONPATH": "/path/to/claude-memory-mcp/src",
        "MEMORY_API_URL": "https://your-server.example.com",
        "MEMORY_API_KEY": "your-api-key"
      }
    }
  }
}

Omit MEMORY_API_URL and MEMORY_API_KEY for SQLite-only mode.

Verify

You: /remember "I prefer dark mode in all applications"
You: /recall "UI preferences"

Running the API Server

Only needed for multi-machine sync. Skip this if you're using SQLite-only mode.

Docker Compose

cd docker
docker compose up -d

Starts the API server + PostgreSQL. Optionally add Vault:

docker compose --profile vault up -d

Manual

pip install claude-memory-mcp[api]
export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
export API_KEY="your-secret-key"
alembic upgrade head
uvicorn claude_memory.api.app:app --host 0.0.0.0 --port 8000

Kubernetes

Designed for high availability: 2 replicas with pod anti-affinity, PodDisruptionBudget, and startup probes.

helm install claude-memory deploy/helm/claude-memory \
  --set env.DATABASE_URL="postgresql://user:pass@host:5432/db" \
  --set env.API_KEY="your-key" \
  --set ingress.host="claude-memory.yourdomain.com"

Or raw manifests:

kubectl apply -f deploy/kubernetes/namespace.yaml
kubectl create secret generic claude-memory-secrets \
  -n claude-memory \
  --from-literal=database-url="postgresql://user:pass@host:5432/db" \
  --from-literal=api-key="your-key"
kubectl apply -f deploy/kubernetes/

Multi-User Setup

For team deployments, use API_KEYS with a JSON mapping:

export API_KEYS='{"alice": "key-alice-xxx", "bob": "key-bob-yyy"}'

Each user gets isolated memory storage. Users cannot see each other's memories.

Adding a user:

Generate a key: openssl rand -base64 32
Add to API_KEYS JSON
Restart the API server
Share the key with the user

Sensitive Memory & Secrets

Sensitive data is handled through a multi-layer system:

Automatic Detection

credential_detector.py scans all incoming content for secrets using regex patterns with confidence scores (0.5–0.95):

Confidence	Pattern Types
0.95	Private keys (RSA/EC/DSA/OPENSSH), AWS access keys (`AKIA`/`ASIA`), GitHub tokens (`ghp_`/`gho_`/etc.)
0.85–0.90	Connection strings (postgres/mysql/mongodb/redis/amqp), Basic auth headers, Bearer tokens
0.75–0.80	`api_key=`, `password=`, `token=` assignments
0.60–0.65	Generic `secret=`/`credential=` patterns, hex keys

Detected content is redacted in the main DB (e.g., [REDACTED:private_key]) and the original stored in Vault or encrypted storage.

Encryption at Rest

Three tiers, checked in order:

HashiCorp Vault (KV v2) — if VAULT_ADDR + VAULT_TOKEN set. Secrets stored at claude-memory/{user_id}/mem-{id}. Supports Kubernetes Service Account auto-authentication.
AES-256-GCM — if MEMORY_ENCRYPTION_KEY set. Key can be hex-encoded 32 bytes or a passphrase (derived via SHA-256). Stored as nonce (12 bytes) + ciphertext + GCM tag (16 bytes) in the encrypted_content column.
Plaintext fallback — content stored as-is (still redacted in main column if detected as sensitive).

Retrieve the original via secret_get(id=<memory_id>).

Plugin Hooks

Hook	Event	What it does
`pre-compact-backup.sh`	PreCompact	Saves top 20 memories to a marker file before context compaction
`post-compact-recovery.sh`	UserPromptSubmit	Re-injects saved memories after compaction (one-time, then deletes marker)
`user-prompt-recall.py`	UserPromptSubmit	Tells Claude to check `memory_recall` before responding to each message
`auto-learn.py`	Stop (async)	Runs haiku-as-judge on the last exchange to extract durable facts worth storing
`auto-allow-memory-tools.py`	PermissionRequest	Auto-approves `memory_store`, `memory_recall`, `memory_list`, `memory_delete`, `secret_get`

Auto-Learn Details

The auto-learn hook runs asynchronously after each Claude response. It operates in two modes:

Single-turn (every turn) — analyzes just the last user/assistant exchange. Cheap and fast. Extracts corrections, preferences, decisions, and facts.
Deep multi-turn (every 5 turns) — analyzes the last 5 exchanges as a window. Also extracts debugging insights, workarounds, and operational knowledge.

Judge fallback chain: Claude CLI (haiku model) → local Ollama (qwen2.5:3b/llama3.2:3b/gemma2:2b/phi3:mini) → heuristic pattern matching (keyword-based extraction from user messages).

Extracted events are deduplicated via SHA-256 content hashing. Each event is stored to the MCP memory database.

Debug

# See what hooks are doing
export DEBUG_CLAUDE_MEMORY_HOOKS=1

# Disable auto-approve to see permission prompts
export DISABLE_CLAUDE_MEMORY_AUTO_APPROVE=1

Background Sync Engine

In hybrid mode, a SyncEngine runs in a daemon thread with its own SQLite connection (thread-safe via threading.Lock).

Push cycle: Reads the pending_ops queue (SQLite table) and sends each operation to the API. On success, removes the operation from the queue and updates the local server_id mapping. On failure, the operation stays queued for the next cycle.

Pull cycle: Calls GET /api/memories/sync?since=<last_sync_ts> for incremental changes. Server wins on conflicts — remote updates overwrite local rows (matched by server_id). Soft-deleted rows on the server are hard-deleted locally.

Write path: Every memory_store first writes to local SQLite (instant), then attempts an immediate sync to the API. If the immediate sync fails, the write is enqueued in pending_ops for the next background cycle.

Sync interval: Configurable via MEMORY_SYNC_INTERVAL (default: 60 seconds).

API Reference

Endpoint	Method	Description
`/health`	GET	Health check (unauthenticated)
`/api/auth-check`	GET	Validate API key without side effects
`/api/memories`	POST	Store a memory
`/api/memories`	GET	List memories (`?category=facts&limit=20`)
`/api/memories/recall`	POST	Search memories by context and expanded query
`/api/memories/{id}`	DELETE	Soft-delete a memory
`/api/memories/sync`	GET	Incremental sync (`?since=ISO-timestamp`)
`/api/memories/{id}/secret`	POST	Get decrypted sensitive memory content
`/api/memories/migrate-secrets`	POST	Re-encrypt existing secrets with current Vault config
`/api/memories/import`	POST	Bulk import memories (JSON array)

All endpoints except /health require Authorization: Bearer <api-key>.

Environment Variables

MCP Server (client-side)

Variable	Description	Default
`MEMORY_API_URL`	API server URL	`http://localhost:8080`
`MEMORY_API_KEY`	API key (enables hybrid mode)	None
`MEMORY_HOME`	Local storage directory	`~/.claude/claude-memory`
`MEMORY_DB`	SQLite database path override	`$MEMORY_HOME/memory/memory.db`
`MEMORY_SYNC_INTERVAL`	Background sync interval in seconds	`60`
`MEMORY_SYNC_DISABLE`	Set to `1` for HTTP-only mode	None

Aliases CLAUDE_MEMORY_API_URL and CLAUDE_MEMORY_API_KEY are also supported.

API Server

Variable	Description	Default
`DATABASE_URL`	PostgreSQL connection string	Required
`API_KEY`	Single-user API key	None
`API_KEYS`	Multi-user JSON map `{"user": "key"}`	None
`VAULT_ADDR`	Vault server address	None
`VAULT_TOKEN`	Vault authentication token	None
`VAULT_ROLE`	Kubernetes auth role name	`claude-memory`
`MEMORY_ENCRYPTION_KEY`	AES-256 key for non-Vault encryption	None

Database Migrations

PostgreSQL (API Server)

Migrations run automatically on API server startup. To run manually:

export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
alembic upgrade head

Three migrations:

001 — Creates memories table with generated tsvector column and GIN index
002 — Adds user_id (multi-user), is_sensitive, vault_path, encrypted_content
003 — Adds deleted_at (soft delete) and updated_at index for incremental sync

All migrations are idempotent (check column/table existence before altering).

SQLite (MCP Client)

SQLite schema is versioned via PRAGMA user_version and migrated automatically on startup. Current version: 2.

Version	Migration
1	Add `server_id` column to `memories` table
2	Add `retry_count` column to `pending_ops` table

Development

Prerequisites

Python 3.11+
For API server tests: httpx, pytest-asyncio

Quick Start

git clone https://github.com/ViktorBarzin/claude-memory-mcp.git
cd claude-memory-mcp
python -m venv .venv
source .venv/bin/activate
pip install -e ".[api,dev]"

Running Tests

# All tests
pytest tests/ -v

# Individual test suites
pytest tests/test_sync.py -v          # SyncEngine (client-side sync resilience)
pytest tests/test_mcp_server.py -v    # MCP server (SQLite, tools, protocol)
pytest tests/test_api.py -v           # API server (FastAPI endpoints)

# Linting
ruff check src/ tests/
mypy src/claude_memory/ --strict

The MCP server itself (mcp_server.py and sync.py) uses stdlib only — no pip install needed on the client side. The [api] extra adds FastAPI, asyncpg, uvicorn, etc. for the server.

Test Structure

File	Tests	What it covers
`test_sync.py`	SyncEngine unit tests	Push/pull, auth failure handling, retry caps, full resync, orphan reconciliation, decoupled push/pull, diagnostics
`test_mcp_server.py`	MCP server unit tests	SQLite CRUD, FTS search, tool dispatch, MCP protocol, memory_count, schema migration
`test_api.py`	API server integration tests	All REST endpoints, auth, user isolation, soft delete, sync, secrets, import
`test_auth.py`	Auth module tests	Single/multi-user auth, key mapping
`test_credential_detector.py`	Credential detection	Pattern matching for secrets
`test_crypto.py`	Encryption tests	AES-256 encrypt/decrypt
`test_vault_client.py`	Vault integration	Secret storage/retrieval

Project Structure

claude-memory-mcp/
├── src/claude_memory/
│   ├── mcp_server.py          # MCP server entry point (stdio NDJSON)
│   ├── sync.py                # Background sync engine (SQLite ↔ API)
│   ├── credential_detector.py # Sensitive content detection
│   └── api/
│       ├── app.py             # FastAPI application
│       ├── auth.py            # API key authentication
│       ├── database.py        # asyncpg connection pool
│       ├── models.py          # Pydantic models
│       └── vault_service.py   # HashiCorp Vault integration
├── tests/                     # pytest test suite
├── hooks/                     # Claude Code hooks (auto-recall, auto-learn, etc.)
├── docker/                    # Docker Compose for API server + PostgreSQL
├── deploy/                    # Kubernetes manifests and Helm chart
└── pyproject.toml             # Package config (hatchling)

Key Design Decisions

stdlib-only MCP server: The MCP server (mcp_server.py, sync.py) uses only Python stdlib — no pip install required for the client side. This ensures it works in any Claude Code environment without dependency management.
NDJSON transport: Claude Code uses NDJSON (one JSON per line) for stdio MCP, not Content-Length framing.
Non-blocking startup: MCP server startup must complete in ~15s or Claude Code times out. All network calls are deferred to background threads.
Suppress stderr: Any stderr output during MCP startup causes Claude Code to reject the server.

License

Apache License 2.0. See LICENSE for details.

26 KiB Raw Blame History Unescape Escape