# Claude Memory MCP Give Claude persistent memory that survives across sessions, context compactions, and machines. ```bash claude plugins install github:ViktorBarzin/claude-memory-mcp ``` That's it. Claude now remembers things. ## What It Does ``` You: "remember I prefer Svelte for all new frontend apps" Claude: Stored. ✓ ── 3 weeks later, new session, different machine ── You: "build me a dashboard for this API" Claude: I'll set up a SvelteKit project since that's your preference... ``` **No configuration needed** for single-machine use. Memories are stored in a local SQLite database with full-text search. For multi-machine sync, point it at an API server (details below). ## Features ### Slash Commands - **`/remember `** — store a fact, preference, decision, or person detail - **`/recall `** — search memories by topic ### Automatic Behaviors (via hooks) - **Auto-recall** — before responding, Claude checks stored memories for relevant context (preferences, past corrections, decisions). Completely invisible to the user. - **Auto-learn** — after each response, a background process analyzes the conversation for corrections, preferences, and decisions worth remembering. Uses haiku-as-judge for conservative extraction. - **Compaction survival** — when Claude's context window compacts, key memories are saved to a marker file and re-injected on the next prompt. No knowledge is lost. - **Auto-approve** — memory tool calls are approved automatically, no permission prompts. ### MCP Tools Claude has direct access to these tools during conversation: | Tool | Description | |------|-------------| | `memory_store` | Store a fact with category, tags, importance, and semantic search keywords | | `memory_recall` | Search memories using full-text search with expanded query terms | | `memory_list` | List recent memories, optionally filtered by category | | `memory_delete` | Delete a memory by ID | | `secret_get` | Retrieve the decrypted content of a sensitive memory | | `memory_count` | Get memory counts by category and sync status diagnostics | ### Memory Categories Memories are organized into: `facts`, `preferences`, `projects`, `people`, `decisions` ### Sensitive Memory Support Mark memories as sensitive with `force_sensitive: true`. When Vault is configured, sensitive content is encrypted at rest and only decryptable via `secret_get`. ## Architecture ``` Claude Code Session ┌──────────────────────────────────────────────────────┐ │ Hooks MCP Server │ │ ┌──────────────────┐ ┌────────────────────┐ │ │ │ auto-recall │───────▶│ memory_store │ │ │ │ auto-learn │ │ memory_recall │ │ │ │ compaction │ │ memory_list │ │ │ │ auto-approve │ │ memory_delete │ │ │ └──────────────────┘ │ secret_get │ │ │ │ memory_count │ │ │ └─────────┬──────────┘ │ │ │ │ │ ┌─────────────────────────┼──────────┐ │ │ │ Local SQLite │ │ │ │ │ (cache + FTS5) SyncEngine │ │ │ │ ◄──── always ────(background, 60s) │ │ │ │ used push queued writes │ │ │ │ pull server changes│ │ │ │ pending_ops ──────────┘ │ │ │ │ (offline write queue) │ │ │ └────────────────────────────────────┘ │ └──────────────────────────────────────────────────────┘ │ optional, for multi-machine sync │ ┌──────────▼───────────┐ │ API Server (FastAPI) │ │ Docker / Kubernetes │ └──────────┬───────────┘ │ ┌──────────▼───────────┐ │ PostgreSQL │ │ (authoritative) │ └──────────────────────┘ ``` ### Sync Resilience The SyncEngine is designed to handle failures gracefully: - **Decoupled push/pull** — push failures never block pull. Remote changes from other agents always flow in. - **Auth failure detection** — on 401/403, the engine sets an auth-failed flag, logs a clear warning, and degrades to SQLite-only mode. A periodic health check detects when auth is restored. - **Startup full resync** — on MCP server start, a full cache replacement runs to catch drift from other agents, deleted records, and schema changes. - **Periodic full resync** — every ~10 minutes, a full resync replaces incremental sync to catch any drift. - **Retry cap** — individual pending ops are retried up to 5 times, then permanently skipped to prevent queue buildup. - **Orphan reconciliation** — local records that never synced are deduplicated against server content before push. ## Search Algorithm Memory recall uses two different full-text search backends depending on the operating mode. Both follow the same query building pattern: the `context` and `expanded_query` fields from the user's recall request are concatenated into a single search string, then processed into a backend-specific query. ### Query Construction When `memory_recall` is called, the tool receives: - **`context`** — the topic or question being asked about - **`expanded_query`** — REQUIRED, minimum 5 space-separated semantically related terms generated by Claude These are concatenated: `all_terms = f"{context} {expanded_query}"`, then split into individual words for the search backend. ### SQLite: FTS5 with BM25 The local SQLite database uses an [FTS5 virtual table](https://www.sqlite.org/fts5.html) for full-text search with the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking algorithm. **Schema:** ```sql CREATE VIRTUAL TABLE memories_fts USING fts5( content, category, tags, expanded_keywords, content='memories', content_rowid='id' ); ``` This is a [content table](https://www.sqlite.org/fts5.html#contentless_and_content_external_tables) backed by the `memories` table — FTS5 reads column data from `memories` on demand rather than duplicating it. Triggers on INSERT, UPDATE, and DELETE keep the FTS index synchronized: ```sql -- Insert trigger CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords) VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords); END; -- Delete trigger (FTS5 special 'delete' command) CREATE TRIGGER memories_ad AFTER DELETE ON memories BEGIN INSERT INTO memories_fts(memories_fts, rowid, content, category, tags, expanded_keywords) VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords); END; -- Update trigger (delete old + insert new) CREATE TRIGGER memories_au AFTER UPDATE ON memories BEGIN INSERT INTO memories_fts(memories_fts, rowid, ...) VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords); INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords) VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords); END; ``` **Query building:** Each word is quoted and joined with OR: ```python words = all_terms.split() fts_query = " OR ".join(f'"{w}"' for w in words if w) # Example: "kubernetes" OR "deployment" OR "pod" OR "scaling" OR "replicas" ``` **Ranking:** Two sort modes: - `sort_by="relevance"` — `ORDER BY bm25(memories_fts), importance DESC` — FTS5's built-in BM25 relevance score, with importance as tiebreaker - `sort_by="importance"` (default) — `ORDER BY importance DESC, created_at DESC` — user-assigned importance score, with recency as tiebreaker **Fallback:** If the FTS5 MATCH query fails (e.g. syntax error from special characters), the search falls back to a `LIKE %query%` scan against `content` and `tags` columns. ### PostgreSQL: tsvector with Weighted Ranking The API server uses PostgreSQL's native [full-text search](https://www.postgresql.org/docs/current/textsearch.html) with a generated `tsvector` column and weighted ranking. **Schema** (from migration `001`): ```sql CREATE TABLE memories ( id SERIAL PRIMARY KEY, content TEXT NOT NULL, category VARCHAR(50) DEFAULT 'facts', tags TEXT DEFAULT '', expanded_keywords TEXT DEFAULT '', importance REAL DEFAULT 0.5, -- ... search_vector tsvector GENERATED ALWAYS AS ( setweight(to_tsvector('english', coalesce(content, '')), 'A') || setweight(to_tsvector('english', coalesce(expanded_keywords, '')), 'B') || setweight(to_tsvector('english', coalesce(tags, '')), 'C') || setweight(to_tsvector('english', coalesce(category, '')), 'D') ) STORED ); CREATE INDEX idx_memories_search ON memories USING GIN(search_vector); ``` The `search_vector` is a [generated column](https://www.postgresql.org/docs/current/ddl-generated-columns.html) that PostgreSQL maintains automatically on every INSERT/UPDATE — no triggers needed. It combines four fields with different weights: | Weight | Field | Purpose | |--------|-------|---------| | **A** (highest) | `content` | The actual memory text — most important for matching | | **B** | `expanded_keywords` | Semantic search terms generated by Claude at store time | | **C** | `tags` | User-provided comma-separated tags | | **D** (lowest) | `category` | Category name (facts, preferences, etc.) | A [GIN index](https://www.postgresql.org/docs/current/gin.html) on `search_vector` enables fast lookup without sequential scans. **Query execution:** ```sql SELECT id, content, category, tags, importance, is_sensitive, ts_rank(search_vector, query) AS rank, created_at, updated_at FROM memories, plainto_tsquery('english', $2) query WHERE user_id = $1 AND deleted_at IS NULL AND (search_vector @@ query OR $2 = '') ORDER BY importance DESC, ts_rank(search_vector, query) DESC LIMIT $3 ``` [`plainto_tsquery`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES) processes the concatenated query text through PostgreSQL's English text search dictionary — it handles stemming (e.g. "running" → "run"), stop word removal, and normalization. The `@@` operator matches the resulting `tsquery` against the stored `tsvector`. [`ts_rank`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING) scores each match considering the weight of the matched field — a match in `content` (weight A) scores higher than a match in `tags` (weight C). **Sort modes:** - `sort_by="importance"` (default) — `ORDER BY importance DESC, ts_rank(search_vector, query) DESC` - `sort_by="relevance"` — `ORDER BY ts_rank(search_vector, query) DESC` - `sort_by="recency"` — `ORDER BY created_at DESC` ### Semantic Expansion at Store Time When Claude stores a memory via `memory_store`, it generates an `expanded_keywords` field — at least 5 space-separated semantically related terms. For example, storing "User prefers Svelte for frontends" might produce: ``` expanded_keywords: "svelte frontend framework ui web sveltekit javascript component reactive" ``` These keywords are indexed alongside the memory content. When a future `memory_recall` searches for "what framework should I use for the dashboard?", the expanded keywords create additional matching surface that pure content matching would miss. This is a form of query expansion performed at write time by the LLM. ### SQLite vs PostgreSQL Comparison | Feature | SQLite (FTS5) | PostgreSQL (tsvector) | |---------|---------------|----------------------| | Tokenization | ICU/unicode61 tokenizer | English text search dictionary | | Stemming | No (exact word matching) | Yes (Porter stemmer via `english` config) | | Stop words | No | Yes (removes "the", "is", "at", etc.) | | Ranking | BM25 (term frequency + inverse document frequency) | `ts_rank` (weighted field matching) | | Field weighting | No (all FTS5 columns equal) | Yes (A/B/C/D weights per column) | | Index type | B-tree on FTS content | GIN (Generalized Inverted Index) | | Query syntax | Quoted OR terms (`"word1" OR "word2"`) | `plainto_tsquery` with implicit AND | | Fallback | LIKE scan on failure | None needed (robust parser) | | Maintenance | Triggers sync content ↔ FTS table | Generated column, auto-maintained | ## Operating Modes | Mode | When | What happens | |------|------|--------------| | **SQLite-only** | No env vars set | Everything local. Zero config. Works offline. | | **Hybrid** | `MEMORY_API_KEY` set | Local SQLite for reads + background sync to API. Writes queue if API is down. | | **HTTP-only** | `MEMORY_API_KEY` + `MEMORY_SYNC_DISABLE=1` | Direct API calls, no local cache. Legacy mode. | | **Full** | Any mode + Vault configured | Above + Vault for encrypting sensitive memories at rest. | ## Setup ### Option 1: Plugin Install (recommended) ```bash claude plugins install github:ViktorBarzin/claude-memory-mcp ``` Works immediately with SQLite-only mode. To enable multi-machine sync: ```bash export MEMORY_API_URL="https://your-server.example.com" export MEMORY_API_KEY="your-api-key" ``` ### Option 2: Manual MCP Config Add to `~/.claude.json` under `mcpServers`: ```json { "mcpServers": { "claude_memory": { "type": "stdio", "command": "python3", "args": ["/path/to/claude-memory-mcp/src/claude_memory/mcp_server.py"], "env": { "PYTHONPATH": "/path/to/claude-memory-mcp/src", "MEMORY_API_URL": "https://your-server.example.com", "MEMORY_API_KEY": "your-api-key" } } } } ``` Omit `MEMORY_API_URL` and `MEMORY_API_KEY` for SQLite-only mode. ### Verify ``` You: /remember "I prefer dark mode in all applications" You: /recall "UI preferences" ``` ## Running the API Server Only needed for multi-machine sync. Skip this if you're using SQLite-only mode. ### Docker Compose ```bash cd docker docker compose up -d ``` Starts the API server + PostgreSQL. Optionally add Vault: ```bash docker compose --profile vault up -d ``` ### Manual ```bash pip install claude-memory-mcp[api] export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory" export API_KEY="your-secret-key" alembic upgrade head uvicorn claude_memory.api.app:app --host 0.0.0.0 --port 8000 ``` ### Kubernetes Designed for high availability: 2 replicas with pod anti-affinity, PodDisruptionBudget, and startup probes. ```bash helm install claude-memory deploy/helm/claude-memory \ --set env.DATABASE_URL="postgresql://user:pass@host:5432/db" \ --set env.API_KEY="your-key" \ --set ingress.host="claude-memory.yourdomain.com" ``` Or raw manifests: ```bash kubectl apply -f deploy/kubernetes/namespace.yaml kubectl create secret generic claude-memory-secrets \ -n claude-memory \ --from-literal=database-url="postgresql://user:pass@host:5432/db" \ --from-literal=api-key="your-key" kubectl apply -f deploy/kubernetes/ ``` ## Multi-User Setup For team deployments, use `API_KEYS` with a JSON mapping: ```bash export API_KEYS='{"alice": "key-alice-xxx", "bob": "key-bob-yyy"}' ``` Each user gets isolated memory storage. Users cannot see each other's memories. **Adding a user:** 1. Generate a key: `openssl rand -base64 32` 2. Add to `API_KEYS` JSON 3. Restart the API server 4. Share the key with the user ## Sensitive Memory & Secrets Sensitive data is handled through a multi-layer system: ### Automatic Detection `credential_detector.py` scans all incoming content for secrets using regex patterns with confidence scores (0.5–0.95): | Confidence | Pattern Types | |------------|--------------| | 0.95 | Private keys (RSA/EC/DSA/OPENSSH), AWS access keys (`AKIA`/`ASIA`), GitHub tokens (`ghp_`/`gho_`/etc.) | | 0.85–0.90 | Connection strings (postgres/mysql/mongodb/redis/amqp), Basic auth headers, Bearer tokens | | 0.75–0.80 | `api_key=`, `password=`, `token=` assignments | | 0.60–0.65 | Generic `secret=`/`credential=` patterns, hex keys | Detected content is **redacted** in the main DB (e.g., `[REDACTED:private_key]`) and the original stored in Vault or encrypted storage. ### Encryption at Rest Three tiers, checked in order: 1. **HashiCorp Vault** (KV v2) — if `VAULT_ADDR` + `VAULT_TOKEN` set. Secrets stored at `claude-memory/{user_id}/mem-{id}`. Supports Kubernetes Service Account auto-authentication. 2. **AES-256-GCM** — if `MEMORY_ENCRYPTION_KEY` set. Key can be hex-encoded 32 bytes or a passphrase (derived via SHA-256). Stored as `nonce (12 bytes) + ciphertext + GCM tag (16 bytes)` in the `encrypted_content` column. 3. **Plaintext fallback** — content stored as-is (still redacted in main column if detected as sensitive). Retrieve the original via `secret_get(id=)`. ## Plugin Hooks | Hook | Event | What it does | |------|-------|-------------| | `pre-compact-backup.sh` | PreCompact | Saves top 20 memories to a marker file before context compaction | | `post-compact-recovery.sh` | UserPromptSubmit | Re-injects saved memories after compaction (one-time, then deletes marker) | | `user-prompt-recall.py` | UserPromptSubmit | Tells Claude to check `memory_recall` before responding to each message | | `auto-learn.py` | Stop (async) | Runs haiku-as-judge on the last exchange to extract durable facts worth storing | | `auto-allow-memory-tools.py` | PermissionRequest | Auto-approves `memory_store`, `memory_recall`, `memory_list`, `memory_delete`, `secret_get` | ### Auto-Learn Details The auto-learn hook runs asynchronously after each Claude response. It operates in two modes: - **Single-turn** (every turn) — analyzes just the last user/assistant exchange. Cheap and fast. Extracts corrections, preferences, decisions, and facts. - **Deep multi-turn** (every 5 turns) — analyzes the last 5 exchanges as a window. Also extracts debugging insights, workarounds, and operational knowledge. **Judge fallback chain:** Claude CLI (haiku model) → local Ollama (qwen2.5:3b/llama3.2:3b/gemma2:2b/phi3:mini) → heuristic pattern matching (keyword-based extraction from user messages). Extracted events are deduplicated via SHA-256 content hashing. Each event is stored to the MCP memory database. ### Debug ```bash # See what hooks are doing export DEBUG_CLAUDE_MEMORY_HOOKS=1 # Disable auto-approve to see permission prompts export DISABLE_CLAUDE_MEMORY_AUTO_APPROVE=1 ``` ## Background Sync Engine In hybrid mode, a `SyncEngine` runs in a daemon thread with its own SQLite connection (thread-safe via `threading.Lock`). **Push cycle:** Reads the `pending_ops` queue (SQLite table) and sends each operation to the API. On success, removes the operation from the queue and updates the local `server_id` mapping. On failure, the operation stays queued for the next cycle. **Pull cycle:** Calls `GET /api/memories/sync?since=` for incremental changes. Server wins on conflicts — remote updates overwrite local rows (matched by `server_id`). Soft-deleted rows on the server are hard-deleted locally. **Write path:** Every `memory_store` first writes to local SQLite (instant), then attempts an immediate sync to the API. If the immediate sync fails, the write is enqueued in `pending_ops` for the next background cycle. **Sync interval:** Configurable via `MEMORY_SYNC_INTERVAL` (default: 60 seconds). ## API Reference | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check (unauthenticated) | | `/api/auth-check` | GET | Validate API key without side effects | | `/api/memories` | POST | Store a memory | | `/api/memories` | GET | List memories (`?category=facts&limit=20`) | | `/api/memories/recall` | POST | Search memories by context and expanded query | | `/api/memories/{id}` | DELETE | Soft-delete a memory | | `/api/memories/sync` | GET | Incremental sync (`?since=ISO-timestamp`) | | `/api/memories/{id}/secret` | POST | Get decrypted sensitive memory content | | `/api/memories/migrate-secrets` | POST | Re-encrypt existing secrets with current Vault config | | `/api/memories/import` | POST | Bulk import memories (JSON array) | All endpoints except `/health` require `Authorization: Bearer `. ## Environment Variables ### MCP Server (client-side) | Variable | Description | Default | |----------|-------------|---------| | `MEMORY_API_URL` | API server URL | `http://localhost:8080` | | `MEMORY_API_KEY` | API key (enables hybrid mode) | None | | `MEMORY_HOME` | Local storage directory | `~/.claude/claude-memory` | | `MEMORY_DB` | SQLite database path override | `$MEMORY_HOME/memory/memory.db` | | `MEMORY_SYNC_INTERVAL` | Background sync interval in seconds | `60` | | `MEMORY_SYNC_DISABLE` | Set to `1` for HTTP-only mode | None | Aliases `CLAUDE_MEMORY_API_URL` and `CLAUDE_MEMORY_API_KEY` are also supported. ### API Server | Variable | Description | Default | |----------|-------------|---------| | `DATABASE_URL` | PostgreSQL connection string | Required | | `API_KEY` | Single-user API key | None | | `API_KEYS` | Multi-user JSON map `{"user": "key"}` | None | | `VAULT_ADDR` | Vault server address | None | | `VAULT_TOKEN` | Vault authentication token | None | | `VAULT_ROLE` | Kubernetes auth role name | `claude-memory` | | `MEMORY_ENCRYPTION_KEY` | AES-256 key for non-Vault encryption | None | ## Database Migrations ### PostgreSQL (API Server) Migrations run automatically on API server startup. To run manually: ```bash export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory" alembic upgrade head ``` Three migrations: 1. `001` — Creates `memories` table with generated `tsvector` column and GIN index 2. `002` — Adds `user_id` (multi-user), `is_sensitive`, `vault_path`, `encrypted_content` 3. `003` — Adds `deleted_at` (soft delete) and `updated_at` index for incremental sync All migrations are idempotent (check column/table existence before altering). ### SQLite (MCP Client) SQLite schema is versioned via `PRAGMA user_version` and migrated automatically on startup. Current version: **2**. | Version | Migration | |---------|-----------| | 1 | Add `server_id` column to `memories` table | | 2 | Add `retry_count` column to `pending_ops` table | ## Development ### Prerequisites - Python 3.11+ - For API server tests: `httpx`, `pytest-asyncio` ### Quick Start ```bash git clone https://github.com/ViktorBarzin/claude-memory-mcp.git cd claude-memory-mcp python -m venv .venv source .venv/bin/activate pip install -e ".[api,dev]" ``` ### Running Tests ```bash # All tests pytest tests/ -v # Individual test suites pytest tests/test_sync.py -v # SyncEngine (client-side sync resilience) pytest tests/test_mcp_server.py -v # MCP server (SQLite, tools, protocol) pytest tests/test_api.py -v # API server (FastAPI endpoints) # Linting ruff check src/ tests/ mypy src/claude_memory/ --strict ``` The MCP server itself (`mcp_server.py` and `sync.py`) uses **stdlib only** — no pip install needed on the client side. The `[api]` extra adds FastAPI, asyncpg, uvicorn, etc. for the server. ### Test Structure | File | Tests | What it covers | |------|-------|---------------| | `test_sync.py` | SyncEngine unit tests | Push/pull, auth failure handling, retry caps, full resync, orphan reconciliation, decoupled push/pull, diagnostics | | `test_mcp_server.py` | MCP server unit tests | SQLite CRUD, FTS search, tool dispatch, MCP protocol, memory_count, schema migration | | `test_api.py` | API server integration tests | All REST endpoints, auth, user isolation, soft delete, sync, secrets, import | | `test_auth.py` | Auth module tests | Single/multi-user auth, key mapping | | `test_credential_detector.py` | Credential detection | Pattern matching for secrets | | `test_crypto.py` | Encryption tests | AES-256 encrypt/decrypt | | `test_vault_client.py` | Vault integration | Secret storage/retrieval | ### Project Structure ``` claude-memory-mcp/ ├── src/claude_memory/ │ ├── mcp_server.py # MCP server entry point (stdio NDJSON) │ ├── sync.py # Background sync engine (SQLite ↔ API) │ ├── credential_detector.py # Sensitive content detection │ └── api/ │ ├── app.py # FastAPI application │ ├── auth.py # API key authentication │ ├── database.py # asyncpg connection pool │ ├── models.py # Pydantic models │ └── vault_service.py # HashiCorp Vault integration ├── tests/ # pytest test suite ├── hooks/ # Claude Code hooks (auto-recall, auto-learn, etc.) ├── docker/ # Docker Compose for API server + PostgreSQL ├── deploy/ # Kubernetes manifests and Helm chart └── pyproject.toml # Package config (hatchling) ``` ### Key Design Decisions - **stdlib-only MCP server**: The MCP server (`mcp_server.py`, `sync.py`) uses only Python stdlib — no pip install required for the client side. This ensures it works in any Claude Code environment without dependency management. - **NDJSON transport**: Claude Code uses NDJSON (one JSON per line) for stdio MCP, not Content-Length framing. - **Non-blocking startup**: MCP server startup must complete in ~15s or Claude Code times out. All network calls are deferred to background threads. - **Suppress stderr**: Any stderr output during MCP startup causes Claude Code to reject the server. ## License Apache License 2.0. See [LICENSE](LICENSE) for details.