claude-memory-mcp/README.md
Viktor Barzin a18b94d310 docs: expand README with search algorithm internals and secrets architecture
Add detailed Search Algorithm section covering FTS5/BM25 (SQLite) and
tsvector/ts_rank (PostgreSQL) backends, query construction, semantic
expansion at store time, and a comparison table. Also add Sensitive
Memory & Secrets section, Auto-Learn details, and Background Sync Engine
documentation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-16 13:24:02 +00:00

497 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Claude Memory MCP
Give Claude persistent memory that survives across sessions, context compactions, and machines.
```bash
claude plugins install github:ViktorBarzin/claude-memory-mcp
```
That's it. Claude now remembers things.
## What It Does
```
You: "remember I prefer Svelte for all new frontend apps"
Claude: Stored. ✓
── 3 weeks later, new session, different machine ──
You: "build me a dashboard for this API"
Claude: I'll set up a SvelteKit project since that's your preference...
```
**No configuration needed** for single-machine use. Memories are stored in a local SQLite database with full-text search. For multi-machine sync, point it at an API server (details below).
## Features
### Slash Commands
- **`/remember <fact>`** — store a fact, preference, decision, or person detail
- **`/recall <query>`** — search memories by topic
### Automatic Behaviors (via hooks)
- **Auto-recall** — before responding, Claude checks stored memories for relevant context (preferences, past corrections, decisions). Completely invisible to the user.
- **Auto-learn** — after each response, a background process analyzes the conversation for corrections, preferences, and decisions worth remembering. Uses haiku-as-judge for conservative extraction.
- **Compaction survival** — when Claude's context window compacts, key memories are saved to a marker file and re-injected on the next prompt. No knowledge is lost.
- **Auto-approve** — memory tool calls are approved automatically, no permission prompts.
### MCP Tools
Claude has direct access to these tools during conversation:
| Tool | Description |
|------|-------------|
| `memory_store` | Store a fact with category, tags, importance, and semantic search keywords |
| `memory_recall` | Search memories using full-text search with expanded query terms |
| `memory_list` | List recent memories, optionally filtered by category |
| `memory_delete` | Delete a memory by ID |
| `secret_get` | Retrieve the decrypted content of a sensitive memory |
### Memory Categories
Memories are organized into: `facts`, `preferences`, `projects`, `people`, `decisions`
### Sensitive Memory Support
Mark memories as sensitive with `force_sensitive: true`. When Vault is configured, sensitive content is encrypted at rest and only decryptable via `secret_get`.
## Architecture
```
Claude Code Session
┌──────────────────────────────────────────────────────┐
│ Hooks MCP Server │
│ ┌──────────────────┐ ┌────────────────────┐ │
│ │ auto-recall │───────▶│ memory_store │ │
│ │ auto-learn │ │ memory_recall │ │
│ │ compaction │ │ memory_list │ │
│ │ auto-approve │ │ memory_delete │ │
│ └──────────────────┘ │ secret_get │ │
│ └─────────┬──────────┘ │
│ │ │
│ ┌─────────────────────────┼──────────┐ │
│ │ Local SQLite │ │ │
│ │ (cache + FTS5) SyncEngine │ │
│ │ ◄──── always ────(background, 60s) │ │
│ │ used push queued writes │ │
│ │ pull server changes│ │
│ │ pending_ops ──────────┘ │ │
│ │ (offline write queue) │ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘
optional, for multi-machine sync
┌──────────▼───────────┐
│ API Server (FastAPI) │
│ Docker / Kubernetes │
└──────────┬───────────┘
┌──────────▼───────────┐
│ PostgreSQL │
│ (authoritative) │
└──────────────────────┘
```
## Search Algorithm
Memory recall uses two different full-text search backends depending on the operating mode. Both follow the same query building pattern: the `context` and `expanded_query` fields from the user's recall request are concatenated into a single search string, then processed into a backend-specific query.
### Query Construction
When `memory_recall` is called, the tool receives:
- **`context`** — the topic or question being asked about
- **`expanded_query`** — REQUIRED, minimum 5 space-separated semantically related terms generated by Claude
These are concatenated: `all_terms = f"{context} {expanded_query}"`, then split into individual words for the search backend.
### SQLite: FTS5 with BM25
The local SQLite database uses an [FTS5 virtual table](https://www.sqlite.org/fts5.html) for full-text search with the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking algorithm.
**Schema:**
```sql
CREATE VIRTUAL TABLE memories_fts USING fts5(
content, category, tags, expanded_keywords,
content='memories', content_rowid='id'
);
```
This is a [content table](https://www.sqlite.org/fts5.html#contentless_and_content_external_tables) backed by the `memories` table — FTS5 reads column data from `memories` on demand rather than duplicating it. Triggers on INSERT, UPDATE, and DELETE keep the FTS index synchronized:
```sql
-- Insert trigger
CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN
INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;
-- Delete trigger (FTS5 special 'delete' command)
CREATE TRIGGER memories_ad AFTER DELETE ON memories BEGIN
INSERT INTO memories_fts(memories_fts, rowid, content, category, tags, expanded_keywords)
VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
END;
-- Update trigger (delete old + insert new)
CREATE TRIGGER memories_au AFTER UPDATE ON memories BEGIN
INSERT INTO memories_fts(memories_fts, rowid, ...)
VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;
```
**Query building:** Each word is quoted and joined with OR:
```python
words = all_terms.split()
fts_query = " OR ".join(f'"{w}"' for w in words if w)
# Example: "kubernetes" OR "deployment" OR "pod" OR "scaling" OR "replicas"
```
**Ranking:** Two sort modes:
- `sort_by="relevance"``ORDER BY bm25(memories_fts), importance DESC` — FTS5's built-in BM25 relevance score, with importance as tiebreaker
- `sort_by="importance"` (default) — `ORDER BY importance DESC, created_at DESC` — user-assigned importance score, with recency as tiebreaker
**Fallback:** If the FTS5 MATCH query fails (e.g. syntax error from special characters), the search falls back to a `LIKE %query%` scan against `content` and `tags` columns.
### PostgreSQL: tsvector with Weighted Ranking
The API server uses PostgreSQL's native [full-text search](https://www.postgresql.org/docs/current/textsearch.html) with a generated `tsvector` column and weighted ranking.
**Schema** (from migration `001`):
```sql
CREATE TABLE memories (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
category VARCHAR(50) DEFAULT 'facts',
tags TEXT DEFAULT '',
expanded_keywords TEXT DEFAULT '',
importance REAL DEFAULT 0.5,
-- ...
search_vector tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(content, '')), 'A') ||
setweight(to_tsvector('english', coalesce(expanded_keywords, '')), 'B') ||
setweight(to_tsvector('english', coalesce(tags, '')), 'C') ||
setweight(to_tsvector('english', coalesce(category, '')), 'D')
) STORED
);
CREATE INDEX idx_memories_search ON memories USING GIN(search_vector);
```
The `search_vector` is a [generated column](https://www.postgresql.org/docs/current/ddl-generated-columns.html) that PostgreSQL maintains automatically on every INSERT/UPDATE — no triggers needed. It combines four fields with different weights:
| Weight | Field | Purpose |
|--------|-------|---------|
| **A** (highest) | `content` | The actual memory text — most important for matching |
| **B** | `expanded_keywords` | Semantic search terms generated by Claude at store time |
| **C** | `tags` | User-provided comma-separated tags |
| **D** (lowest) | `category` | Category name (facts, preferences, etc.) |
A [GIN index](https://www.postgresql.org/docs/current/gin.html) on `search_vector` enables fast lookup without sequential scans.
**Query execution:**
```sql
SELECT id, content, category, tags, importance, is_sensitive,
ts_rank(search_vector, query) AS rank,
created_at, updated_at
FROM memories, plainto_tsquery('english', $2) query
WHERE user_id = $1
AND deleted_at IS NULL
AND (search_vector @@ query OR $2 = '')
ORDER BY importance DESC, ts_rank(search_vector, query) DESC
LIMIT $3
```
[`plainto_tsquery`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES) processes the concatenated query text through PostgreSQL's English text search dictionary — it handles stemming (e.g. "running" → "run"), stop word removal, and normalization. The `@@` operator matches the resulting `tsquery` against the stored `tsvector`.
[`ts_rank`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING) scores each match considering the weight of the matched field — a match in `content` (weight A) scores higher than a match in `tags` (weight C).
**Sort modes:**
- `sort_by="importance"` (default) — `ORDER BY importance DESC, ts_rank(search_vector, query) DESC`
- `sort_by="relevance"``ORDER BY ts_rank(search_vector, query) DESC`
- `sort_by="recency"``ORDER BY created_at DESC`
### Semantic Expansion at Store Time
When Claude stores a memory via `memory_store`, it generates an `expanded_keywords` field — at least 5 space-separated semantically related terms. For example, storing "User prefers Svelte for frontends" might produce:
```
expanded_keywords: "svelte frontend framework ui web sveltekit javascript component reactive"
```
These keywords are indexed alongside the memory content. When a future `memory_recall` searches for "what framework should I use for the dashboard?", the expanded keywords create additional matching surface that pure content matching would miss. This is a form of query expansion performed at write time by the LLM.
### SQLite vs PostgreSQL Comparison
| Feature | SQLite (FTS5) | PostgreSQL (tsvector) |
|---------|---------------|----------------------|
| Tokenization | ICU/unicode61 tokenizer | English text search dictionary |
| Stemming | No (exact word matching) | Yes (Porter stemmer via `english` config) |
| Stop words | No | Yes (removes "the", "is", "at", etc.) |
| Ranking | BM25 (term frequency + inverse document frequency) | `ts_rank` (weighted field matching) |
| Field weighting | No (all FTS5 columns equal) | Yes (A/B/C/D weights per column) |
| Index type | B-tree on FTS content | GIN (Generalized Inverted Index) |
| Query syntax | Quoted OR terms (`"word1" OR "word2"`) | `plainto_tsquery` with implicit AND |
| Fallback | LIKE scan on failure | None needed (robust parser) |
| Maintenance | Triggers sync content ↔ FTS table | Generated column, auto-maintained |
## Operating Modes
| Mode | When | What happens |
|------|------|--------------|
| **SQLite-only** | No env vars set | Everything local. Zero config. Works offline. |
| **Hybrid** | `MEMORY_API_KEY` set | Local SQLite for reads + background sync to API. Writes queue if API is down. |
| **HTTP-only** | `MEMORY_API_KEY` + `MEMORY_SYNC_DISABLE=1` | Direct API calls, no local cache. Legacy mode. |
| **Full** | Any mode + Vault configured | Above + Vault for encrypting sensitive memories at rest. |
## Setup
### Option 1: Plugin Install (recommended)
```bash
claude plugins install github:ViktorBarzin/claude-memory-mcp
```
Works immediately with SQLite-only mode. To enable multi-machine sync:
```bash
export MEMORY_API_URL="https://your-server.example.com"
export MEMORY_API_KEY="your-api-key"
```
### Option 2: Manual MCP Config
Add to `~/.claude/settings.json`:
```json
{
"mcpServers": {
"claude_memory": {
"type": "stdio",
"command": "python3",
"args": ["-m", "claude_memory.mcp_server"],
"env": {
"MEMORY_API_URL": "https://your-server.example.com",
"MEMORY_API_KEY": "your-api-key"
}
}
}
}
```
Omit the `env` block for SQLite-only mode. Requires `pip install claude-memory-mcp`.
### Verify
```
You: /remember "I prefer dark mode in all applications"
You: /recall "UI preferences"
```
## Running the API Server
Only needed for multi-machine sync. Skip this if you're using SQLite-only mode.
### Docker Compose
```bash
cd docker
docker compose up -d
```
Starts the API server + PostgreSQL. Optionally add Vault:
```bash
docker compose --profile vault up -d
```
### Manual
```bash
pip install claude-memory-mcp[api]
export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
export API_KEY="your-secret-key"
alembic upgrade head
uvicorn claude_memory.api.app:app --host 0.0.0.0 --port 8000
```
### Kubernetes
Designed for high availability: 2 replicas with pod anti-affinity, PodDisruptionBudget, and startup probes.
```bash
helm install claude-memory deploy/helm/claude-memory \
--set env.DATABASE_URL="postgresql://user:pass@host:5432/db" \
--set env.API_KEY="your-key" \
--set ingress.host="claude-memory.yourdomain.com"
```
Or raw manifests:
```bash
kubectl apply -f deploy/kubernetes/namespace.yaml
kubectl create secret generic claude-memory-secrets \
-n claude-memory \
--from-literal=database-url="postgresql://user:pass@host:5432/db" \
--from-literal=api-key="your-key"
kubectl apply -f deploy/kubernetes/
```
## Multi-User Setup
For team deployments, use `API_KEYS` with a JSON mapping:
```bash
export API_KEYS='{"alice": "key-alice-xxx", "bob": "key-bob-yyy"}'
```
Each user gets isolated memory storage. Users cannot see each other's memories.
**Adding a user:**
1. Generate a key: `openssl rand -base64 32`
2. Add to `API_KEYS` JSON
3. Restart the API server
4. Share the key with the user
## Sensitive Memory & Secrets
Sensitive data is handled through a multi-layer system:
### Automatic Detection
`credential_detector.py` scans all incoming content for secrets using regex patterns with confidence scores (0.50.95):
| Confidence | Pattern Types |
|------------|--------------|
| 0.95 | Private keys (RSA/EC/DSA/OPENSSH), AWS access keys (`AKIA`/`ASIA`), GitHub tokens (`ghp_`/`gho_`/etc.) |
| 0.850.90 | Connection strings (postgres/mysql/mongodb/redis/amqp), Basic auth headers, Bearer tokens |
| 0.750.80 | `api_key=`, `password=`, `token=` assignments |
| 0.600.65 | Generic `secret=`/`credential=` patterns, hex keys |
Detected content is **redacted** in the main DB (e.g., `[REDACTED:private_key]`) and the original stored in Vault or encrypted storage.
### Encryption at Rest
Three tiers, checked in order:
1. **HashiCorp Vault** (KV v2) — if `VAULT_ADDR` + `VAULT_TOKEN` set. Secrets stored at `claude-memory/{user_id}/mem-{id}`. Supports Kubernetes Service Account auto-authentication.
2. **AES-256-GCM** — if `MEMORY_ENCRYPTION_KEY` set. Key can be hex-encoded 32 bytes or a passphrase (derived via SHA-256). Stored as `nonce (12 bytes) + ciphertext + GCM tag (16 bytes)` in the `encrypted_content` column.
3. **Plaintext fallback** — content stored as-is (still redacted in main column if detected as sensitive).
Retrieve the original via `secret_get(id=<memory_id>)`.
## Plugin Hooks
| Hook | Event | What it does |
|------|-------|-------------|
| `pre-compact-backup.sh` | PreCompact | Saves top 20 memories to a marker file before context compaction |
| `post-compact-recovery.sh` | UserPromptSubmit | Re-injects saved memories after compaction (one-time, then deletes marker) |
| `user-prompt-recall.py` | UserPromptSubmit | Tells Claude to check `memory_recall` before responding to each message |
| `auto-learn.py` | Stop (async) | Runs haiku-as-judge on the last exchange to extract durable facts worth storing |
| `auto-allow-memory-tools.py` | PermissionRequest | Auto-approves `memory_store`, `memory_recall`, `memory_list`, `memory_delete`, `secret_get` |
### Auto-Learn Details
The auto-learn hook runs asynchronously after each Claude response. It operates in two modes:
- **Single-turn** (every turn) — analyzes just the last user/assistant exchange. Cheap and fast. Extracts corrections, preferences, decisions, and facts.
- **Deep multi-turn** (every 5 turns) — analyzes the last 5 exchanges as a window. Also extracts debugging insights, workarounds, and operational knowledge.
**Judge fallback chain:** Claude CLI (haiku model) → local Ollama (qwen2.5:3b/llama3.2:3b/gemma2:2b/phi3:mini) → heuristic pattern matching (keyword-based extraction from user messages).
Extracted events are deduplicated via SHA-256 content hashing. Each event is stored to both the MCP memory database and a `~/.claude/projects/<project>/memory/auto-learned.md` markdown file.
### Debug
```bash
# See what hooks are doing
export DEBUG_CLAUDE_MEMORY_HOOKS=1
# Disable auto-approve to see permission prompts
export DISABLE_CLAUDE_MEMORY_AUTO_APPROVE=1
```
## Background Sync Engine
In hybrid mode, a `SyncEngine` runs in a daemon thread with its own SQLite connection (thread-safe via `threading.Lock`).
**Push cycle:** Reads the `pending_ops` queue (SQLite table) and sends each operation to the API. On success, removes the operation from the queue and updates the local `server_id` mapping. On failure, the operation stays queued for the next cycle.
**Pull cycle:** Calls `GET /api/memories/sync?since=<last_sync_ts>` for incremental changes. Server wins on conflicts — remote updates overwrite local rows (matched by `server_id`). Soft-deleted rows on the server are hard-deleted locally.
**Write path:** Every `memory_store` first writes to local SQLite (instant), then attempts an immediate sync to the API. If the immediate sync fails, the write is enqueued in `pending_ops` for the next background cycle.
**Sync interval:** Configurable via `MEMORY_SYNC_INTERVAL` (default: 60 seconds).
## API Reference
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/api/memories` | POST | Store a memory |
| `/api/memories` | GET | List memories (`?category=facts&limit=20`) |
| `/api/memories/recall` | POST | Search memories by context and expanded query |
| `/api/memories/{id}` | DELETE | Soft-delete a memory |
| `/api/memories/sync` | GET | Incremental sync (`?since=ISO-timestamp`) |
| `/api/memories/{id}/secret` | POST | Get decrypted sensitive memory content |
| `/api/memories/migrate-secrets` | POST | Re-encrypt existing secrets with current Vault config |
| `/api/memories/import` | POST | Bulk import memories (JSON array) |
All endpoints except `/health` require `Authorization: Bearer <api-key>`.
## Environment Variables
### MCP Server (client-side)
| Variable | Description | Default |
|----------|-------------|---------|
| `MEMORY_API_URL` | API server URL | `http://localhost:8080` |
| `MEMORY_API_KEY` | API key (enables hybrid mode) | None |
| `MEMORY_HOME` | Local storage directory | `~/.claude/claude-memory` |
| `MEMORY_DB` | SQLite database path override | `$MEMORY_HOME/memory/memory.db` |
| `MEMORY_SYNC_INTERVAL` | Background sync interval in seconds | `60` |
| `MEMORY_SYNC_DISABLE` | Set to `1` for HTTP-only mode | None |
Aliases `CLAUDE_MEMORY_API_URL` and `CLAUDE_MEMORY_API_KEY` are also supported.
### API Server
| Variable | Description | Default |
|----------|-------------|---------|
| `DATABASE_URL` | PostgreSQL connection string | Required |
| `API_KEY` | Single-user API key | None |
| `API_KEYS` | Multi-user JSON map `{"user": "key"}` | None |
| `VAULT_ADDR` | Vault server address | None |
| `VAULT_TOKEN` | Vault authentication token | None |
| `VAULT_ROLE` | Kubernetes auth role name | `claude-memory` |
| `MEMORY_ENCRYPTION_KEY` | AES-256 key for non-Vault encryption | None |
## Database Migrations
Migrations run automatically on API server startup. To run manually:
```bash
export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
alembic upgrade head
```
Three migrations:
1. `001` — Creates `memories` table with generated `tsvector` column and GIN index
2. `002` — Adds `user_id` (multi-user), `is_sensitive`, `vault_path`, `encrypted_content`
3. `003` — Adds `deleted_at` (soft delete) and `updated_at` index for incremental sync
All migrations are idempotent (check column/table existence before altering).
## Development
```bash
git clone https://github.com/ViktorBarzin/claude-memory-mcp.git
cd claude-memory-mcp
python -m venv .venv
source .venv/bin/activate
pip install -e ".[api,dev]"
pytest tests/ -v
ruff check src/ tests/
mypy src/claude_memory/ --strict
```
The MCP server itself (`mcp_server.py` and `sync.py`) uses **stdlib only** — no pip install needed on the client side. The `[api]` extra adds FastAPI, asyncpg, uvicorn, etc. for the server.
## License
Apache License 2.0. See [LICENSE](LICENSE) for details.