claude-memory-mcp/README.md
Viktor Barzin 0e75b3f9e3
docs: add sharing feature to README
- Document 7 new MCP tools for memory sharing
- Add Memory Sharing section with usage examples and permission levels
- Update API Reference with 7 new sharing endpoints + PUT update
- Update migrations list with 004_add_sharing
- Add permissions.py to project structure
2026-03-22 19:58:30 +02:00

28 KiB
Raw Permalink Blame History

Claude Memory MCP

Give Claude persistent memory that survives across sessions, context compactions, and machines.

claude plugins install github:ViktorBarzin/claude-memory-mcp

That's it. Claude now remembers things.

What It Does

You:    "remember I prefer Svelte for all new frontend apps"
Claude: Stored. ✓

  ── 3 weeks later, new session, different machine ──

You:    "build me a dashboard for this API"
Claude: I'll set up a SvelteKit project since that's your preference...

No configuration needed for single-machine use. Memories are stored in a local SQLite database with full-text search. For multi-machine sync, point it at an API server (details below).

Features

Slash Commands

  • /remember <fact> — store a fact, preference, decision, or person detail
  • /recall <query> — search memories by topic

Automatic Behaviors (via hooks)

  • Auto-recall — before responding, Claude checks stored memories for relevant context (preferences, past corrections, decisions). Completely invisible to the user.
  • Auto-learn — after each response, a background process analyzes the conversation for corrections, preferences, and decisions worth remembering. Uses haiku-as-judge for conservative extraction.
  • Compaction survival — when Claude's context window compacts, key memories are saved to a marker file and re-injected on the next prompt. No knowledge is lost.
  • Auto-approve — memory tool calls are approved automatically, no permission prompts.

MCP Tools

Claude has direct access to these tools during conversation:

Tool Description
memory_store Store a fact with category, tags, importance, and semantic search keywords
memory_recall Search memories using full-text search with expanded query terms
memory_list List recent memories, optionally filtered by category
memory_delete Delete a memory by ID
secret_get Retrieve the decrypted content of a sensitive memory
memory_count Get memory counts by category and sync status diagnostics
memory_share Share a memory with another user (read or write access)
memory_unshare Revoke sharing of a memory from a user
memory_share_tag Share all memories with a tag (live rule — future memories included)
memory_unshare_tag Revoke tag-based sharing
memory_update Update an existing memory's content, tags, or importance
memory_shared_with_me List memories others have shared with you
memory_my_shares List all sharing rules you've created

Sharing tools require an API connection (not available in SQLite-only mode).

Memory Categories

Memories are organized into: facts, preferences, projects, people, decisions

Sensitive Memory Support

Mark memories as sensitive with force_sensitive: true. When Vault is configured, sensitive content is encrypted at rest and only decryptable via secret_get.

Architecture

Claude Code Session
┌──────────────────────────────────────────────────────┐
│  Hooks                        MCP Server             │
│  ┌──────────────────┐        ┌────────────────────┐  │
│  │ auto-recall      │───────▶│ memory_store       │  │
│  │ auto-learn       │        │ memory_recall      │  │
│  │ compaction       │        │ memory_list        │  │
│  │ auto-approve     │        │ memory_delete      │  │
│  └──────────────────┘        │ secret_get         │  │
│                              │ memory_count       │  │
│                              └─────────┬──────────┘  │
│                                        │             │
│              ┌─────────────────────────┼──────────┐  │
│              │ Local SQLite            │          │  │
│              │ (cache + FTS5)    SyncEngine       │  │
│              │ ◄──── always ────(background, 60s) │  │
│              │       used       push queued writes │  │
│              │                  pull server changes│  │
│              │  pending_ops ──────────┘            │  │
│              │  (offline write queue)              │  │
│              └────────────────────────────────────┘  │
└──────────────────────────────────────────────────────┘
                                │
              optional, for multi-machine sync
                                │
                     ┌──────────▼───────────┐
                     │  API Server (FastAPI) │
                     │  Docker / Kubernetes  │
                     └──────────┬───────────┘
                                │
                     ┌──────────▼───────────┐
                     │    PostgreSQL         │
                     │  (authoritative)      │
                     └──────────────────────┘

Sync Resilience

The SyncEngine is designed to handle failures gracefully:

  • Decoupled push/pull — push failures never block pull. Remote changes from other agents always flow in.
  • Auth failure detection — on 401/403, the engine sets an auth-failed flag, logs a clear warning, and degrades to SQLite-only mode. A periodic health check detects when auth is restored.
  • Startup full resync — on MCP server start, a full cache replacement runs to catch drift from other agents, deleted records, and schema changes.
  • Periodic full resync — every ~10 minutes, a full resync replaces incremental sync to catch any drift.
  • Retry cap — individual pending ops are retried up to 5 times, then permanently skipped to prevent queue buildup.
  • Orphan reconciliation — local records that never synced are deduplicated against server content before push.

Search Algorithm

Memory recall uses two different full-text search backends depending on the operating mode. Both follow the same query building pattern: the context and expanded_query fields from the user's recall request are concatenated into a single search string, then processed into a backend-specific query.

Query Construction

When memory_recall is called, the tool receives:

  • context — the topic or question being asked about
  • expanded_query — REQUIRED, minimum 5 space-separated semantically related terms generated by Claude

These are concatenated: all_terms = f"{context} {expanded_query}", then split into individual words for the search backend.

SQLite: FTS5 with BM25

The local SQLite database uses an FTS5 virtual table for full-text search with the BM25 ranking algorithm.

Schema:

CREATE VIRTUAL TABLE memories_fts USING fts5(
    content, category, tags, expanded_keywords,
    content='memories', content_rowid='id'
);

This is a content table backed by the memories table — FTS5 reads column data from memories on demand rather than duplicating it. Triggers on INSERT, UPDATE, and DELETE keep the FTS index synchronized:

-- Insert trigger
CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN
    INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
    VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;

-- Delete trigger (FTS5 special 'delete' command)
CREATE TRIGGER memories_ad AFTER DELETE ON memories BEGIN
    INSERT INTO memories_fts(memories_fts, rowid, content, category, tags, expanded_keywords)
    VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
END;

-- Update trigger (delete old + insert new)
CREATE TRIGGER memories_au AFTER UPDATE ON memories BEGIN
    INSERT INTO memories_fts(memories_fts, rowid, ...)
    VALUES ('delete', old.id, old.content, old.category, old.tags, old.expanded_keywords);
    INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
    VALUES (new.id, new.content, new.category, new.tags, new.expanded_keywords);
END;

Query building: Each word is quoted and joined with OR:

words = all_terms.split()
fts_query = " OR ".join(f'"{w}"' for w in words if w)
# Example: "kubernetes" OR "deployment" OR "pod" OR "scaling" OR "replicas"

Ranking: Two sort modes:

  • sort_by="relevance"ORDER BY bm25(memories_fts), importance DESC — FTS5's built-in BM25 relevance score, with importance as tiebreaker
  • sort_by="importance" (default) — ORDER BY importance DESC, created_at DESC — user-assigned importance score, with recency as tiebreaker

Fallback: If the FTS5 MATCH query fails (e.g. syntax error from special characters), the search falls back to a LIKE %query% scan against content and tags columns.

PostgreSQL: tsvector with Weighted Ranking

The API server uses PostgreSQL's native full-text search with a generated tsvector column and weighted ranking.

Schema (from migration 001):

CREATE TABLE memories (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    category VARCHAR(50) DEFAULT 'facts',
    tags TEXT DEFAULT '',
    expanded_keywords TEXT DEFAULT '',
    importance REAL DEFAULT 0.5,
    -- ...
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(content, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(expanded_keywords, '')), 'B') ||
        setweight(to_tsvector('english', coalesce(tags, '')), 'C') ||
        setweight(to_tsvector('english', coalesce(category, '')), 'D')
    ) STORED
);

CREATE INDEX idx_memories_search ON memories USING GIN(search_vector);

The search_vector is a generated column that PostgreSQL maintains automatically on every INSERT/UPDATE — no triggers needed. It combines four fields with different weights:

Weight Field Purpose
A (highest) content The actual memory text — most important for matching
B expanded_keywords Semantic search terms generated by Claude at store time
C tags User-provided comma-separated tags
D (lowest) category Category name (facts, preferences, etc.)

A GIN index on search_vector enables fast lookup without sequential scans.

Query execution:

SELECT id, content, category, tags, importance, is_sensitive,
       ts_rank(search_vector, query) AS rank,
       created_at, updated_at
FROM memories, plainto_tsquery('english', $2) query
WHERE user_id = $1
  AND deleted_at IS NULL
  AND (search_vector @@ query OR $2 = '')
ORDER BY importance DESC, ts_rank(search_vector, query) DESC
LIMIT $3

plainto_tsquery processes the concatenated query text through PostgreSQL's English text search dictionary — it handles stemming (e.g. "running" → "run"), stop word removal, and normalization. The @@ operator matches the resulting tsquery against the stored tsvector.

ts_rank scores each match considering the weight of the matched field — a match in content (weight A) scores higher than a match in tags (weight C).

Sort modes:

  • sort_by="importance" (default) — ORDER BY importance DESC, ts_rank(search_vector, query) DESC
  • sort_by="relevance"ORDER BY ts_rank(search_vector, query) DESC
  • sort_by="recency"ORDER BY created_at DESC

Semantic Expansion at Store Time

When Claude stores a memory via memory_store, it generates an expanded_keywords field — at least 5 space-separated semantically related terms. For example, storing "User prefers Svelte for frontends" might produce:

expanded_keywords: "svelte frontend framework ui web sveltekit javascript component reactive"

These keywords are indexed alongside the memory content. When a future memory_recall searches for "what framework should I use for the dashboard?", the expanded keywords create additional matching surface that pure content matching would miss. This is a form of query expansion performed at write time by the LLM.

SQLite vs PostgreSQL Comparison

Feature SQLite (FTS5) PostgreSQL (tsvector)
Tokenization ICU/unicode61 tokenizer English text search dictionary
Stemming No (exact word matching) Yes (Porter stemmer via english config)
Stop words No Yes (removes "the", "is", "at", etc.)
Ranking BM25 (term frequency + inverse document frequency) ts_rank (weighted field matching)
Field weighting No (all FTS5 columns equal) Yes (A/B/C/D weights per column)
Index type B-tree on FTS content GIN (Generalized Inverted Index)
Query syntax Quoted OR terms ("word1" OR "word2") plainto_tsquery with implicit AND
Fallback LIKE scan on failure None needed (robust parser)
Maintenance Triggers sync content ↔ FTS table Generated column, auto-maintained

Operating Modes

Mode When What happens
SQLite-only No env vars set Everything local. Zero config. Works offline.
Hybrid MEMORY_API_KEY set Local SQLite for reads + background sync to API. Writes queue if API is down.
HTTP-only MEMORY_API_KEY + MEMORY_SYNC_DISABLE=1 Direct API calls, no local cache. Legacy mode.
Full Any mode + Vault configured Above + Vault for encrypting sensitive memories at rest.

Setup

claude plugins install github:ViktorBarzin/claude-memory-mcp

Works immediately with SQLite-only mode. To enable multi-machine sync:

export MEMORY_API_URL="https://your-server.example.com"
export MEMORY_API_KEY="your-api-key"

Option 2: Manual MCP Config

Add to ~/.claude.json under mcpServers:

{
  "mcpServers": {
    "claude_memory": {
      "type": "stdio",
      "command": "python3",
      "args": ["/path/to/claude-memory-mcp/src/claude_memory/mcp_server.py"],
      "env": {
        "PYTHONPATH": "/path/to/claude-memory-mcp/src",
        "MEMORY_API_URL": "https://your-server.example.com",
        "MEMORY_API_KEY": "your-api-key"
      }
    }
  }
}

Omit MEMORY_API_URL and MEMORY_API_KEY for SQLite-only mode.

Verify

You: /remember "I prefer dark mode in all applications"
You: /recall "UI preferences"

Running the API Server

Only needed for multi-machine sync. Skip this if you're using SQLite-only mode.

Docker Compose

cd docker
docker compose up -d

Starts the API server + PostgreSQL. Optionally add Vault:

docker compose --profile vault up -d

Manual

pip install claude-memory-mcp[api]
export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
export API_KEY="your-secret-key"
alembic upgrade head
uvicorn claude_memory.api.app:app --host 0.0.0.0 --port 8000

Kubernetes

Designed for high availability: 2 replicas with pod anti-affinity, PodDisruptionBudget, and startup probes.

helm install claude-memory deploy/helm/claude-memory \
  --set env.DATABASE_URL="postgresql://user:pass@host:5432/db" \
  --set env.API_KEY="your-key" \
  --set ingress.host="claude-memory.yourdomain.com"

Or raw manifests:

kubectl apply -f deploy/kubernetes/namespace.yaml
kubectl create secret generic claude-memory-secrets \
  -n claude-memory \
  --from-literal=database-url="postgresql://user:pass@host:5432/db" \
  --from-literal=api-key="your-key"
kubectl apply -f deploy/kubernetes/

Multi-User Setup

For team deployments, use API_KEYS with a JSON mapping:

export API_KEYS='{"alice": "key-alice-xxx", "bob": "key-bob-yyy"}'

Each user gets isolated memory storage by default. Users can selectively share memories with each other.

Memory Sharing

Users can share individual memories or entire tags with other users:

You: "share memory #42 with bob as read-only"
You: "share all memories tagged 'infra' with alice with write access"

Individual shares grant access to a specific memory. Tag shares are live rules — any future memory with that tag is automatically visible to the target user.

Permission levels:

  • read — can see the memory in recall results
  • write — can also update the memory's content, tags, and importance

Only the owner can delete a memory, even if others have write access. Shared memories appear in memory_recall results with shared_by and share_permission fields.

Adding a user:

  1. Generate a key: openssl rand -base64 32
  2. Add to API_KEYS JSON
  3. Restart the API server
  4. Share the key with the user

Sensitive Memory & Secrets

Sensitive data is handled through a multi-layer system:

Automatic Detection

credential_detector.py scans all incoming content for secrets using regex patterns with confidence scores (0.50.95):

Confidence Pattern Types
0.95 Private keys (RSA/EC/DSA/OPENSSH), AWS access keys (AKIA/ASIA), GitHub tokens (ghp_/gho_/etc.)
0.850.90 Connection strings (postgres/mysql/mongodb/redis/amqp), Basic auth headers, Bearer tokens
0.750.80 api_key=, password=, token= assignments
0.600.65 Generic secret=/credential= patterns, hex keys

Detected content is redacted in the main DB (e.g., [REDACTED:private_key]) and the original stored in Vault or encrypted storage.

Encryption at Rest

Three tiers, checked in order:

  1. HashiCorp Vault (KV v2) — if VAULT_ADDR + VAULT_TOKEN set. Secrets stored at claude-memory/{user_id}/mem-{id}. Supports Kubernetes Service Account auto-authentication.
  2. AES-256-GCM — if MEMORY_ENCRYPTION_KEY set. Key can be hex-encoded 32 bytes or a passphrase (derived via SHA-256). Stored as nonce (12 bytes) + ciphertext + GCM tag (16 bytes) in the encrypted_content column.
  3. Plaintext fallback — content stored as-is (still redacted in main column if detected as sensitive).

Retrieve the original via secret_get(id=<memory_id>).

Plugin Hooks

Hook Event What it does
pre-compact-backup.sh PreCompact Saves top 20 memories to a marker file before context compaction
post-compact-recovery.sh UserPromptSubmit Re-injects saved memories after compaction (one-time, then deletes marker)
user-prompt-recall.py UserPromptSubmit Tells Claude to check memory_recall before responding to each message
auto-learn.py Stop (async) Runs haiku-as-judge on the last exchange to extract durable facts worth storing
auto-allow-memory-tools.py PermissionRequest Auto-approves memory_store, memory_recall, memory_list, memory_delete, secret_get

Auto-Learn Details

The auto-learn hook runs asynchronously after each Claude response. It operates in two modes:

  • Single-turn (every turn) — analyzes just the last user/assistant exchange. Cheap and fast. Extracts corrections, preferences, decisions, and facts.
  • Deep multi-turn (every 5 turns) — analyzes the last 5 exchanges as a window. Also extracts debugging insights, workarounds, and operational knowledge.

Judge fallback chain: Claude CLI (haiku model) → local Ollama (qwen2.5:3b/llama3.2:3b/gemma2:2b/phi3:mini) → heuristic pattern matching (keyword-based extraction from user messages).

Extracted events are deduplicated via SHA-256 content hashing. Each event is stored to the MCP memory database.

Debug

# See what hooks are doing
export DEBUG_CLAUDE_MEMORY_HOOKS=1

# Disable auto-approve to see permission prompts
export DISABLE_CLAUDE_MEMORY_AUTO_APPROVE=1

Background Sync Engine

In hybrid mode, a SyncEngine runs in a daemon thread with its own SQLite connection (thread-safe via threading.Lock).

Push cycle: Reads the pending_ops queue (SQLite table) and sends each operation to the API. On success, removes the operation from the queue and updates the local server_id mapping. On failure, the operation stays queued for the next cycle.

Pull cycle: Calls GET /api/memories/sync?since=<last_sync_ts> for incremental changes. Server wins on conflicts — remote updates overwrite local rows (matched by server_id). Soft-deleted rows on the server are hard-deleted locally.

Write path: Every memory_store first writes to local SQLite (instant), then attempts an immediate sync to the API. If the immediate sync fails, the write is enqueued in pending_ops for the next background cycle.

Sync interval: Configurable via MEMORY_SYNC_INTERVAL (default: 60 seconds).

API Reference

Endpoint Method Description
/health GET Health check (unauthenticated)
/api/auth-check GET Validate API key without side effects
/api/memories POST Store a memory
/api/memories GET List memories (?category=facts&limit=20)
/api/memories/recall POST Search memories by context and expanded query
/api/memories/{id} PUT Update a memory (owner or write-shared user)
/api/memories/{id} DELETE Soft-delete a memory (owner only)
/api/memories/{id}/share POST Share a memory with a user
/api/memories/{id}/share/{user_id} DELETE Revoke sharing from a user
/api/memories/share-tag POST Share all memories with a tag
/api/memories/share-tag DELETE Revoke tag-based sharing
/api/memories/shared-with-me GET List memories shared with current user
/api/memories/my-shares GET List sharing rules created by current user
/api/memories/sync GET Incremental sync (?since=ISO-timestamp)
/api/memories/{id}/secret POST Get decrypted sensitive memory content
/api/memories/migrate-secrets POST Re-encrypt existing secrets with current Vault config
/api/memories/import POST Bulk import memories (JSON array)

All endpoints except /health require Authorization: Bearer <api-key>.

Environment Variables

MCP Server (client-side)

Variable Description Default
MEMORY_API_URL API server URL http://localhost:8080
MEMORY_API_KEY API key (enables hybrid mode) None
MEMORY_HOME Local storage directory ~/.claude/claude-memory
MEMORY_DB SQLite database path override $MEMORY_HOME/memory/memory.db
MEMORY_SYNC_INTERVAL Background sync interval in seconds 60
MEMORY_SYNC_DISABLE Set to 1 for HTTP-only mode None

Aliases CLAUDE_MEMORY_API_URL and CLAUDE_MEMORY_API_KEY are also supported.

API Server

Variable Description Default
DATABASE_URL PostgreSQL connection string Required
API_KEY Single-user API key None
API_KEYS Multi-user JSON map {"user": "key"} None
VAULT_ADDR Vault server address None
VAULT_TOKEN Vault authentication token None
VAULT_ROLE Kubernetes auth role name claude-memory
MEMORY_ENCRYPTION_KEY AES-256 key for non-Vault encryption None

Database Migrations

PostgreSQL (API Server)

Migrations run automatically on API server startup. To run manually:

export DATABASE_URL="postgresql://user:pass@localhost:5432/claude_memory"
alembic upgrade head

Four migrations:

  1. 001 — Creates memories table with generated tsvector column and GIN index
  2. 002 — Adds user_id (multi-user), is_sensitive, vault_path, encrypted_content
  3. 003 — Adds deleted_at (soft delete) and updated_at index for incremental sync
  4. 004 — Adds memory_shares and tag_shares tables for multi-user sharing

All migrations are idempotent (check column/table existence before altering).

SQLite (MCP Client)

SQLite schema is versioned via PRAGMA user_version and migrated automatically on startup. Current version: 2.

Version Migration
1 Add server_id column to memories table
2 Add retry_count column to pending_ops table

Development

Prerequisites

  • Python 3.11+
  • For API server tests: httpx, pytest-asyncio

Quick Start

git clone https://github.com/ViktorBarzin/claude-memory-mcp.git
cd claude-memory-mcp
python -m venv .venv
source .venv/bin/activate
pip install -e ".[api,dev]"

Running Tests

# All tests
pytest tests/ -v

# Individual test suites
pytest tests/test_sync.py -v          # SyncEngine (client-side sync resilience)
pytest tests/test_mcp_server.py -v    # MCP server (SQLite, tools, protocol)
pytest tests/test_api.py -v           # API server (FastAPI endpoints)

# Linting
ruff check src/ tests/
mypy src/claude_memory/ --strict

The MCP server itself (mcp_server.py and sync.py) uses stdlib only — no pip install needed on the client side. The [api] extra adds FastAPI, asyncpg, uvicorn, etc. for the server.

Test Structure

File Tests What it covers
test_sync.py SyncEngine unit tests Push/pull, auth failure handling, retry caps, full resync, orphan reconciliation, decoupled push/pull, diagnostics
test_mcp_server.py MCP server unit tests SQLite CRUD, FTS search, tool dispatch, MCP protocol, memory_count, schema migration
test_api.py API server integration tests All REST endpoints, auth, user isolation, soft delete, sync, secrets, import
test_auth.py Auth module tests Single/multi-user auth, key mapping
test_credential_detector.py Credential detection Pattern matching for secrets
test_crypto.py Encryption tests AES-256 encrypt/decrypt
test_vault_client.py Vault integration Secret storage/retrieval

Project Structure

claude-memory-mcp/
├── src/claude_memory/
│   ├── mcp_server.py          # MCP server entry point (stdio NDJSON)
│   ├── sync.py                # Background sync engine (SQLite ↔ API)
│   ├── credential_detector.py # Sensitive content detection
│   └── api/
│       ├── app.py             # FastAPI application
│       ├── auth.py            # API key authentication
│       ├── database.py        # asyncpg connection pool
│       ├── models.py          # Pydantic models
│       ├── permissions.py     # Sharing permission checks
│       └── vault_service.py   # HashiCorp Vault integration
├── tests/                     # pytest test suite
├── hooks/                     # Claude Code hooks (auto-recall, auto-learn, etc.)
├── docker/                    # Docker Compose for API server + PostgreSQL
├── deploy/                    # Kubernetes manifests and Helm chart
└── pyproject.toml             # Package config (hatchling)

Key Design Decisions

  • stdlib-only MCP server: The MCP server (mcp_server.py, sync.py) uses only Python stdlib — no pip install required for the client side. This ensures it works in any Claude Code environment without dependency management.
  • NDJSON transport: Claude Code uses NDJSON (one JSON per line) for stdio MCP, not Content-Length framing.
  • Non-blocking startup: MCP server startup must complete in ~15s or Claude Code times out. All network calls are deferred to background threads.
  • Suppress stderr: Any stderr output during MCP startup causes Claude Code to reject the server.

License

Apache License 2.0. See LICENSE for details.