- Decouple push and pull in _sync_once() so pull always runs even if push fails - Add startup full resync to catch drift from other agents and schema changes - Add periodic full resync every ~10 minutes for continuous drift correction - Add auth failure detection (401/403) with graceful SQLite-only degradation - Add /api/auth-check endpoint for lightweight key validation - Add retry cap (5 attempts) on pending ops to prevent infinite queue buildup - Add orphan reconciliation: push local-only records with content dedup - Add memory_count MCP tool for sync diagnostics - Add version-based SQLite schema migration (PRAGMA user_version) - Fix API key in ~/.claude.json to match server - Update README with sync resilience docs, test structure, project layout - Add 30 new tests covering all new behaviors (155 total, all passing)
26 KiB
Claude Memory MCP
Give Claude persistent memory that survives across sessions, context compactions, and machines.
claude plugins install github:ViktorBarzin/claude-memory-mcp
That's it. Claude now remembers things.
What It Does
You: "remember I prefer Svelte for all new frontend apps"
Claude: Stored. ✓
── 3 weeks later, new session, different machine ──
You: "build me a dashboard for this API"
Claude: I'll set up a SvelteKit project since that's your preference...
No configuration needed for single-machine use. Memories are stored in a local SQLite database with full-text search. For multi-machine sync, point it at an API server (details below).
Features
Slash Commands
/remember <fact>— store a fact, preference, decision, or person detail/recall <query>— search memories by topic
Automatic Behaviors (via hooks)
- Auto-recall — before responding, Claude checks stored memories for relevant context (preferences, past corrections, decisions). Completely invisible to the user.
- Auto-learn — after each response, a background process analyzes the conversation for corrections, preferences, and decisions worth remembering. Uses haiku-as-judge for conservative extraction.
- Compaction survival — when Claude's context window compacts, key memories are saved to a marker file and re-injected on the next prompt. No knowledge is lost.
- Auto-approve — memory tool calls are approved automatically, no permission prompts.
MCP Tools
Claude has direct access to these tools during conversation:
| Tool | Description |
|---|---|
memory_store |
Store a fact with category, tags, importance, and semantic search keywords |
memory_recall |
Search memories using full-text search with expanded query terms |
memory_list |
List recent memories, optionally filtered by category |
memory_delete |
Delete a memory by ID |
secret_get |
Retrieve the decrypted content of a sensitive memory |
memory_count |
Get memory counts by category and sync status diagnostics |
Memory Categories
Memories are organized into: facts, preferences, projects, people, decisions
Sensitive Memory Support
Mark memories as sensitive with force_sensitive: true. When Vault is configured, sensitive content is encrypted at rest and only decryptable via secret_get.
Architecture
Claude Code Session
┌──────────────────────────────────────────────────────┐
│ Hooks MCP Server │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ auto-recall │───────▶│ memory_store │ │
│ │ auto-learn │ │ memory_recall │ │
│ │ compaction │ │ memory_list │ │
│ │ auto-approve │ │ memory_delete │ │
│ └──────────────────┘ │ secret_get │ │
│ │ memory_count │ │
│ └─────────┬──────────┘ │
│ │ │
│ ┌─────────────────────────┼──────────┐ │
│ │ Local SQLite │ │ │
│ │ (cache + FTS5) SyncEngine │ │
│ │ ◄──── always ────(background, 60s) │ │
│ │ used push queued writes │ │
│ │ pull server changes│ │
│ │ pending_ops ──────────┘ │ │
│ │ (offline write queue) │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
│
optional, for multi-machine sync
│
┌──────────▼───────────┐
│ API Server (FastAPI) │
│ Docker / Kubernetes │
└──────────┬───────────┘
│
┌──────────▼───────────┐
│ PostgreSQL │
│ (authoritative) │
└──────────────────────┘
Sync Resilience
The SyncEngine is designed to handle failures gracefully:
- Decoupled push/pull — push failures never block pull. Remote changes from other agents always flow in.
- Auth failure detection — on 401/403, the engine sets an auth-failed flag, logs a clear warning, and degrades to SQLite-only mode. A periodic health check detects when auth is restored.
- Startup full resync — on MCP server start, a full cache replacement runs to catch drift from other agents, deleted records, and schema changes.
- Periodic full resync — every ~10 minutes, a full resync replaces incremental sync to catch any drift.
- Retry cap — individual pending ops are retried up to 5 times, then permanently skipped to prevent queue buildup.
- Orphan reconciliation — local records that never synced are deduplicated against server content before push.
Search Algorithm
Memory recall uses two different full-text search backends depending on the operating mode. Both follow the same query building pattern: the context and expanded_query fields from the user's recall request are concatenated into a single search string, then processed into a backend-specific query.
Query Construction
When memory_recall is called, the tool receives:
context— the topic or question being asked aboutexpanded_query— REQUIRED, minimum 5 space-separated semantically related terms generated by Claude
These are concatenated: all_terms = f"{context} {expanded_query}", then split into individual words for the search backend.
SQLite: FTS5 with BM25
The local SQLite database uses an FTS5 virtual table for full-text search with the BM25 ranking algorithm.
Schema:
CREATE VIRTUAL TABLE memories_fts USING fts5(
content, category, tags, expanded_keywords,
content='memories', content_rowid='id'
);
This is a content table backed by the memories table — FTS5 reads column data from memories on demand rather than duplicating it. Triggers on INSERT, UPDATE, and DELETE keep the FTS index synchronized:
-- Insert trigger
CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN
INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;
-- Delete trigger (FTS5 special 'delete' command)
CREATE TRIGGER memories_ad AFTER DELETE ON memories BEGIN
INSERT INTO memories_fts(memories_fts, rowid, content, category, tags, expanded_keywords)
VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
END;
-- Update trigger (delete old + insert new)
CREATE TRIGGER memories_au AFTER UPDATE ON memories BEGIN
INSERT INTO memories_fts(memories_fts, rowid, ...)
VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;
Query building: Each word is quoted and joined with OR:
words = all_terms.split()
fts_query = " OR ".join(f'"{w}"' for w in words if w)
# Example: "kubernetes" OR "deployment" OR "pod" OR "scaling" OR "replicas"
Ranking: Two sort modes:
sort_by="relevance"—ORDER BY bm25(memories_fts), importance DESC— FTS5's built-in BM25 relevance score, with importance as tiebreakersort_by="importance"(default) —ORDER BY importance DESC, created_at DESC— user-assigned importance score, with recency as tiebreaker
Fallback: If the FTS5 MATCH query fails (e.g. syntax error from special characters), the search falls back to a LIKE %query% scan against content and tags columns.
PostgreSQL: tsvector with Weighted Ranking
The API server uses PostgreSQL's native full-text search with a generated tsvector column and weighted ranking.
Schema (from migration 001):
CREATE TABLE memories (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
category VARCHAR(50) DEFAULT 'facts',
tags TEXT DEFAULT '',
expanded_keywords TEXT DEFAULT '',
importance REAL DEFAULT 0.5,
-- ...
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(content, '')), 'A') ||
setweight(to_tsvector('english', coalesce(expanded_keywords, '')), 'B') ||
setweight(to_tsvector('english', coalesce(tags, '')), 'C') ||
setweight(to_tsvector('english', coalesce(category, '')), 'D')
) STORED
);
CREATE INDEX idx_memories_search ON memories USING GIN(search_vector);
The search_vector is a generated column that PostgreSQL maintains automatically on every INSERT/UPDATE — no triggers needed. It combines four fields with different weights:
| Weight | Field | Purpose |
|---|---|---|
| A (highest) | content |
The actual memory text — most important for matching |
| B | expanded_keywords |
Semantic search terms generated by Claude at store time |
| C | tags |
User-provided comma-separated tags |
| D (lowest) | category |
Category name (facts, preferences, etc.) |
A GIN index on search_vector enables fast lookup without sequential scans.
Query execution:
SELECT id, content, category, tags, importance, is_sensitive,
ts_rank(search_vector, query) AS rank,
created_at, updated_at
FROM memories, plainto_tsquery('english', $2) query
WHERE user_id = $1
AND deleted_at IS NULL
AND (search_vector @@ query OR $2 = '')
ORDER BY importance DESC, ts_rank(search_vector, query) DESC
LIMIT $3
plainto_tsquery processes the concatenated query text through PostgreSQL's English text search dictionary — it handles stemming (e.g. "running" → "run"), stop word removal, and normalization. The @@ operator matches the resulting tsquery against the stored tsvector.
ts_rank scores each match considering the weight of the matched field — a match in content (weight A) scores higher than a match in tags (weight C).
Sort modes:
sort_by="importance"(default) —ORDER BY importance DESC, ts_rank(search_vector, query) DESCsort_by="relevance"—ORDER BY ts_rank(search_vector, query) DESCsort_by="recency"—ORDER BY created_at DESC
Semantic Expansion at Store Time
When Claude stores a memory via memory_store, it generates an expanded_keywords field — at least 5 space-separated semantically related terms. For example, storing "User prefers Svelte for frontends" might produce:
expanded_keywords: "svelte frontend framework ui web sveltekit javascript component reactive"
These keywords are indexed alongside the memory content. When a future memory_recall searches for "what framework should I use for the dashboard?", the expanded keywords create additional matching surface that pure content matching would miss. This is a form of query expansion performed at write time by the LLM.
SQLite vs PostgreSQL Comparison
| Feature | SQLite (FTS5) | PostgreSQL (tsvector) |
|---|---|---|
| Tokenization | ICU/unicode61 tokenizer | English text search dictionary |
| Stemming | No (exact word matching) | Yes (Porter stemmer via english config) |
| Stop words | No | Yes (removes "the", "is", "at", etc.) |
| Ranking | BM25 (term frequency + inverse document frequency) | ts_rank (weighted field matching) |
| Field weighting | No (all FTS5 columns equal) | Yes (A/B/C/D weights per column) |
| Index type | B-tree on FTS content | GIN (Generalized Inverted Index) |
| Query syntax | Quoted OR terms ("word1" OR "word2") |
plainto_tsquery with implicit AND |
| Fallback | LIKE scan on failure | None needed (robust parser) |
| Maintenance | Triggers sync content ↔ FTS table | Generated column, auto-maintained |
Operating Modes
| Mode | When | What happens |
|---|---|---|
| SQLite-only | No env vars set | Everything local. Zero config. Works offline. |
| Hybrid | MEMORY_API_KEY set |
Local SQLite for reads + background sync to API. Writes queue if API is down. |
| HTTP-only | MEMORY_API_KEY + MEMORY_SYNC_DISABLE=1 |
Direct API calls, no local cache. Legacy mode. |
| Full | Any mode + Vault configured | Above + Vault for encrypting sensitive memories at rest. |
Setup
Option 1: Plugin Install (recommended)
claude plugins install github:ViktorBarzin/claude-memory-mcp
Works immediately with SQLite-only mode. To enable multi-machine sync:
export MEMORY_API_URL="https://your-server.example.com"
export MEMORY_API_KEY="your-api-key"
Option 2: Manual MCP Config
Add to ~/.claude.json under mcpServers:
{
"mcpServers": {
"claude_memory": {
"type": "stdio",
"command": "python3",
"args": ["/path/to/claude-memory-mcp/src/claude_memory/mcp_server.py"],
"env": {
"PYTHONPATH": "/path/to/claude-memory-mcp/src",
"MEMORY_API_URL": "https://your-server.example.com",
"MEMORY_API_KEY": "your-api-key"
}
}
}
}
Omit MEMORY_API_URL and MEMORY_API_KEY for SQLite-only mode.
Verify
You: /remember "I prefer dark mode in all applications"
You: /recall "UI preferences"
Running the API Server
Only needed for multi-machine sync. Skip this if you're using SQLite-only mode.
Docker Compose
cd docker
docker compose up -d
Starts the API server + PostgreSQL. Optionally add Vault:
docker compose --profile vault up -d
Manual
pip install claude-memory-mcp[api]
export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
export API_KEY="your-secret-key"
alembic upgrade head
uvicorn claude_memory.api.app:app --host 0.0.0.0 --port 8000
Kubernetes
Designed for high availability: 2 replicas with pod anti-affinity, PodDisruptionBudget, and startup probes.
helm install claude-memory deploy/helm/claude-memory \
--set env.DATABASE_URL="postgresql://user:pass@host:5432/db" \
--set env.API_KEY="your-key" \
--set ingress.host="claude-memory.yourdomain.com"
Or raw manifests:
kubectl apply -f deploy/kubernetes/namespace.yaml
kubectl create secret generic claude-memory-secrets \
-n claude-memory \
--from-literal=database-url="postgresql://user:pass@host:5432/db" \
--from-literal=api-key="your-key"
kubectl apply -f deploy/kubernetes/
Multi-User Setup
For team deployments, use API_KEYS with a JSON mapping:
export API_KEYS='{"alice": "key-alice-xxx", "bob": "key-bob-yyy"}'
Each user gets isolated memory storage. Users cannot see each other's memories.
Adding a user:
- Generate a key:
openssl rand -base64 32 - Add to
API_KEYSJSON - Restart the API server
- Share the key with the user
Sensitive Memory & Secrets
Sensitive data is handled through a multi-layer system:
Automatic Detection
credential_detector.py scans all incoming content for secrets using regex patterns with confidence scores (0.5–0.95):
| Confidence | Pattern Types |
|---|---|
| 0.95 | Private keys (RSA/EC/DSA/OPENSSH), AWS access keys (AKIA/ASIA), GitHub tokens (ghp_/gho_/etc.) |
| 0.85–0.90 | Connection strings (postgres/mysql/mongodb/redis/amqp), Basic auth headers, Bearer tokens |
| 0.75–0.80 | api_key=, password=, token= assignments |
| 0.60–0.65 | Generic secret=/credential= patterns, hex keys |
Detected content is redacted in the main DB (e.g., [REDACTED:private_key]) and the original stored in Vault or encrypted storage.
Encryption at Rest
Three tiers, checked in order:
- HashiCorp Vault (KV v2) — if
VAULT_ADDR+VAULT_TOKENset. Secrets stored atclaude-memory/{user_id}/mem-{id}. Supports Kubernetes Service Account auto-authentication. - AES-256-GCM — if
MEMORY_ENCRYPTION_KEYset. Key can be hex-encoded 32 bytes or a passphrase (derived via SHA-256). Stored asnonce (12 bytes) + ciphertext + GCM tag (16 bytes)in theencrypted_contentcolumn. - Plaintext fallback — content stored as-is (still redacted in main column if detected as sensitive).
Retrieve the original via secret_get(id=<memory_id>).
Plugin Hooks
| Hook | Event | What it does |
|---|---|---|
pre-compact-backup.sh |
PreCompact | Saves top 20 memories to a marker file before context compaction |
post-compact-recovery.sh |
UserPromptSubmit | Re-injects saved memories after compaction (one-time, then deletes marker) |
user-prompt-recall.py |
UserPromptSubmit | Tells Claude to check memory_recall before responding to each message |
auto-learn.py |
Stop (async) | Runs haiku-as-judge on the last exchange to extract durable facts worth storing |
auto-allow-memory-tools.py |
PermissionRequest | Auto-approves memory_store, memory_recall, memory_list, memory_delete, secret_get |
Auto-Learn Details
The auto-learn hook runs asynchronously after each Claude response. It operates in two modes:
- Single-turn (every turn) — analyzes just the last user/assistant exchange. Cheap and fast. Extracts corrections, preferences, decisions, and facts.
- Deep multi-turn (every 5 turns) — analyzes the last 5 exchanges as a window. Also extracts debugging insights, workarounds, and operational knowledge.
Judge fallback chain: Claude CLI (haiku model) → local Ollama (qwen2.5:3b/llama3.2:3b/gemma2:2b/phi3:mini) → heuristic pattern matching (keyword-based extraction from user messages).
Extracted events are deduplicated via SHA-256 content hashing. Each event is stored to the MCP memory database.
Debug
# See what hooks are doing
export DEBUG_CLAUDE_MEMORY_HOOKS=1
# Disable auto-approve to see permission prompts
export DISABLE_CLAUDE_MEMORY_AUTO_APPROVE=1
Background Sync Engine
In hybrid mode, a SyncEngine runs in a daemon thread with its own SQLite connection (thread-safe via threading.Lock).
Push cycle: Reads the pending_ops queue (SQLite table) and sends each operation to the API. On success, removes the operation from the queue and updates the local server_id mapping. On failure, the operation stays queued for the next cycle.
Pull cycle: Calls GET /api/memories/sync?since=<last_sync_ts> for incremental changes. Server wins on conflicts — remote updates overwrite local rows (matched by server_id). Soft-deleted rows on the server are hard-deleted locally.
Write path: Every memory_store first writes to local SQLite (instant), then attempts an immediate sync to the API. If the immediate sync fails, the write is enqueued in pending_ops for the next background cycle.
Sync interval: Configurable via MEMORY_SYNC_INTERVAL (default: 60 seconds).
API Reference
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Health check (unauthenticated) |
/api/auth-check |
GET | Validate API key without side effects |
/api/memories |
POST | Store a memory |
/api/memories |
GET | List memories (?category=facts&limit=20) |
/api/memories/recall |
POST | Search memories by context and expanded query |
/api/memories/{id} |
DELETE | Soft-delete a memory |
/api/memories/sync |
GET | Incremental sync (?since=ISO-timestamp) |
/api/memories/{id}/secret |
POST | Get decrypted sensitive memory content |
/api/memories/migrate-secrets |
POST | Re-encrypt existing secrets with current Vault config |
/api/memories/import |
POST | Bulk import memories (JSON array) |
All endpoints except /health require Authorization: Bearer <api-key>.
Environment Variables
MCP Server (client-side)
| Variable | Description | Default |
|---|---|---|
MEMORY_API_URL |
API server URL | http://localhost:8080 |
MEMORY_API_KEY |
API key (enables hybrid mode) | None |
MEMORY_HOME |
Local storage directory | ~/.claude/claude-memory |
MEMORY_DB |
SQLite database path override | $MEMORY_HOME/memory/memory.db |
MEMORY_SYNC_INTERVAL |
Background sync interval in seconds | 60 |
MEMORY_SYNC_DISABLE |
Set to 1 for HTTP-only mode |
None |
Aliases CLAUDE_MEMORY_API_URL and CLAUDE_MEMORY_API_KEY are also supported.
API Server
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | Required |
API_KEY |
Single-user API key | None |
API_KEYS |
Multi-user JSON map {"user": "key"} |
None |
VAULT_ADDR |
Vault server address | None |
VAULT_TOKEN |
Vault authentication token | None |
VAULT_ROLE |
Kubernetes auth role name | claude-memory |
MEMORY_ENCRYPTION_KEY |
AES-256 key for non-Vault encryption | None |
Database Migrations
PostgreSQL (API Server)
Migrations run automatically on API server startup. To run manually:
export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
alembic upgrade head
Three migrations:
001— Createsmemoriestable with generatedtsvectorcolumn and GIN index002— Addsuser_id(multi-user),is_sensitive,vault_path,encrypted_content003— Addsdeleted_at(soft delete) andupdated_atindex for incremental sync
All migrations are idempotent (check column/table existence before altering).
SQLite (MCP Client)
SQLite schema is versioned via PRAGMA user_version and migrated automatically on startup. Current version: 2.
| Version | Migration |
|---|---|
| 1 | Add server_id column to memories table |
| 2 | Add retry_count column to pending_ops table |
Development
Prerequisites
- Python 3.11+
- For API server tests:
httpx,pytest-asyncio
Quick Start
git clone https://github.com/ViktorBarzin/claude-memory-mcp.git
cd claude-memory-mcp
python -m venv .venv
source .venv/bin/activate
pip install -e ".[api,dev]"
Running Tests
# All tests
pytest tests/ -v
# Individual test suites
pytest tests/test_sync.py -v # SyncEngine (client-side sync resilience)
pytest tests/test_mcp_server.py -v # MCP server (SQLite, tools, protocol)
pytest tests/test_api.py -v # API server (FastAPI endpoints)
# Linting
ruff check src/ tests/
mypy src/claude_memory/ --strict
The MCP server itself (mcp_server.py and sync.py) uses stdlib only — no pip install needed on the client side. The [api] extra adds FastAPI, asyncpg, uvicorn, etc. for the server.
Test Structure
| File | Tests | What it covers |
|---|---|---|
test_sync.py |
SyncEngine unit tests | Push/pull, auth failure handling, retry caps, full resync, orphan reconciliation, decoupled push/pull, diagnostics |
test_mcp_server.py |
MCP server unit tests | SQLite CRUD, FTS search, tool dispatch, MCP protocol, memory_count, schema migration |
test_api.py |
API server integration tests | All REST endpoints, auth, user isolation, soft delete, sync, secrets, import |
test_auth.py |
Auth module tests | Single/multi-user auth, key mapping |
test_credential_detector.py |
Credential detection | Pattern matching for secrets |
test_crypto.py |
Encryption tests | AES-256 encrypt/decrypt |
test_vault_client.py |
Vault integration | Secret storage/retrieval |
Project Structure
claude-memory-mcp/
├── src/claude_memory/
│ ├── mcp_server.py # MCP server entry point (stdio NDJSON)
│ ├── sync.py # Background sync engine (SQLite ↔ API)
│ ├── credential_detector.py # Sensitive content detection
│ └── api/
│ ├── app.py # FastAPI application
│ ├── auth.py # API key authentication
│ ├── database.py # asyncpg connection pool
│ ├── models.py # Pydantic models
│ └── vault_service.py # HashiCorp Vault integration
├── tests/ # pytest test suite
├── hooks/ # Claude Code hooks (auto-recall, auto-learn, etc.)
├── docker/ # Docker Compose for API server + PostgreSQL
├── deploy/ # Kubernetes manifests and Helm chart
└── pyproject.toml # Package config (hatchling)
Key Design Decisions
- stdlib-only MCP server: The MCP server (
mcp_server.py,sync.py) uses only Python stdlib — no pip install required for the client side. This ensures it works in any Claude Code environment without dependency management. - NDJSON transport: Claude Code uses NDJSON (one JSON per line) for stdio MCP, not Content-Length framing.
- Non-blocking startup: MCP server startup must complete in ~15s or Claude Code times out. All network calls are deferred to background threads.
- Suppress stderr: Any stderr output during MCP startup causes Claude Code to reject the server.
License
Apache License 2.0. See LICENSE for details.