**No configuration needed** for single-machine use. Memories are stored in a local SQLite database with full-text search. For multi-machine sync, point it at an API server (details below).
- **Auto-recall** — before responding, Claude checks stored memories for relevant context (preferences, past corrections, decisions). Completely invisible to the user.
- **Auto-learn** — after each response, a background process analyzes the conversation for corrections, preferences, and decisions worth remembering. Uses haiku-as-judge for conservative extraction.
- **Compaction survival** — when Claude's context window compacts, key memories are saved to a marker file and re-injected on the next prompt. No knowledge is lost.
- **Auto-approve** — memory tool calls are approved automatically, no permission prompts.
Mark memories as sensitive with `force_sensitive: true`. When Vault is configured, sensitive content is encrypted at rest and only decryptable via `secret_get`.
The SyncEngine is designed to handle failures gracefully:
- **Decoupled push/pull** — push failures never block pull. Remote changes from other agents always flow in.
- **Auth failure detection** — on 401/403, the engine sets an auth-failed flag, logs a clear warning, and degrades to SQLite-only mode. A periodic health check detects when auth is restored.
- **Startup full resync** — on MCP server start, a full cache replacement runs to catch drift from other agents, deleted records, and schema changes.
- **Periodic full resync** — every ~10 minutes, a full resync replaces incremental sync to catch any drift.
- **Retry cap** — individual pending ops are retried up to 5 times, then permanently skipped to prevent queue buildup.
- **Orphan reconciliation** — local records that never synced are deduplicated against server content before push.
Memory recall uses two different full-text search backends depending on the operating mode. Both follow the same query building pattern: the `context` and `expanded_query` fields from the user's recall request are concatenated into a single search string, then processed into a backend-specific query.
### Query Construction
When `memory_recall` is called, the tool receives:
- **`context`** — the topic or question being asked about
- **`expanded_query`** — REQUIRED, minimum 5 space-separated semantically related terms generated by Claude
These are concatenated: `all_terms = f"{context} {expanded_query}"`, then split into individual words for the search backend.
### SQLite: FTS5 with BM25
The local SQLite database uses an [FTS5 virtual table](https://www.sqlite.org/fts5.html) for full-text search with the [BM25](https://en.wikipedia.org/wiki/Okapi_BM25) ranking algorithm.
**Schema:**
```sql
CREATE VIRTUAL TABLE memories_fts USING fts5(
content, category, tags, expanded_keywords,
content='memories', content_rowid='id'
);
```
This is a [content table](https://www.sqlite.org/fts5.html#contentless_and_content_external_tables) backed by the `memories` table — FTS5 reads column data from `memories` on demand rather than duplicating it. Triggers on INSERT, UPDATE, and DELETE keep the FTS index synchronized:
```sql
-- Insert trigger
CREATE TRIGGER memories_ai AFTER INSERT ON memories BEGIN
INSERT INTO memories_fts(rowid, content, category, tags, expanded_keywords)
**Query building:** Each word is quoted and joined with OR:
```python
words = all_terms.split()
fts_query = " OR ".join(f'"{w}"' for w in words if w)
# Example: "kubernetes" OR "deployment" OR "pod" OR "scaling" OR "replicas"
```
**Ranking:** Two sort modes:
-`sort_by="relevance"` — `ORDER BY bm25(memories_fts), importance DESC` — FTS5's built-in BM25 relevance score, with importance as tiebreaker
-`sort_by="importance"` (default) — `ORDER BY importance DESC, created_at DESC` — user-assigned importance score, with recency as tiebreaker
**Fallback:** If the FTS5 MATCH query fails (e.g. syntax error from special characters), the search falls back to a `LIKE %query%` scan against `content` and `tags` columns.
### PostgreSQL: tsvector with Weighted Ranking
The API server uses PostgreSQL's native [full-text search](https://www.postgresql.org/docs/current/textsearch.html) with a generated `tsvector` column and weighted ranking.
CREATE INDEX idx_memories_search ON memories USING GIN(search_vector);
```
The `search_vector` is a [generated column](https://www.postgresql.org/docs/current/ddl-generated-columns.html) that PostgreSQL maintains automatically on every INSERT/UPDATE — no triggers needed. It combines four fields with different weights:
| Weight | Field | Purpose |
|--------|-------|---------|
| **A** (highest) | `content` | The actual memory text — most important for matching |
| **B** | `expanded_keywords` | Semantic search terms generated by Claude at store time |
FROM memories, plainto_tsquery('english', $2) query
WHERE user_id = $1
AND deleted_at IS NULL
AND (search_vector @@ query OR $2 = '')
ORDER BY importance DESC, ts_rank(search_vector, query) DESC
LIMIT $3
```
[`plainto_tsquery`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES) processes the concatenated query text through PostgreSQL's English text search dictionary — it handles stemming (e.g. "running" → "run"), stop word removal, and normalization. The `@@` operator matches the resulting `tsquery` against the stored `tsvector`.
[`ts_rank`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING) scores each match considering the weight of the matched field — a match in `content` (weight A) scores higher than a match in `tags` (weight C).
**Sort modes:**
-`sort_by="importance"` (default) — `ORDER BY importance DESC, ts_rank(search_vector, query) DESC`
-`sort_by="relevance"` — `ORDER BY ts_rank(search_vector, query) DESC`
-`sort_by="recency"` — `ORDER BY created_at DESC`
### Semantic Expansion at Store Time
When Claude stores a memory via `memory_store`, it generates an `expanded_keywords` field — at least 5 space-separated semantically related terms. For example, storing "User prefers Svelte for frontends" might produce:
```
expanded_keywords: "svelte frontend framework ui web sveltekit javascript component reactive"
```
These keywords are indexed alongside the memory content. When a future `memory_recall` searches for "what framework should I use for the dashboard?", the expanded keywords create additional matching surface that pure content matching would miss. This is a form of query expansion performed at write time by the LLM.
Each user gets isolated memory storage by default. Users can selectively share memories with each other.
### Memory Sharing
Users can share individual memories or entire tags with other users:
```
You: "share memory #42 with bob as read-only"
You: "share all memories tagged 'infra' with alice with write access"
```
**Individual shares** grant access to a specific memory. **Tag shares** are live rules — any future memory with that tag is automatically visible to the target user.
**Permission levels:**
-`read` — can see the memory in recall results
-`write` — can also update the memory's content, tags, and importance
Only the owner can delete a memory, even if others have write access. Shared memories appear in `memory_recall` results with `shared_by` and `share_permission` fields.
Detected content is **redacted** in the main DB (e.g., `[REDACTED:private_key]`) and the original stored in Vault or encrypted storage.
### Encryption at Rest
Three tiers, checked in order:
1.**HashiCorp Vault** (KV v2) — if `VAULT_ADDR` + `VAULT_TOKEN` set. Secrets stored at `claude-memory/{user_id}/mem-{id}`. Supports Kubernetes Service Account auto-authentication.
2.**AES-256-GCM** — if `MEMORY_ENCRYPTION_KEY` set. Key can be hex-encoded 32 bytes or a passphrase (derived via SHA-256). Stored as `nonce (12 bytes) + ciphertext + GCM tag (16 bytes)` in the `encrypted_content` column.
3.**Plaintext fallback** — content stored as-is (still redacted in main column if detected as sensitive).
Retrieve the original via `secret_get(id=<memory_id>)`.
The auto-learn hook runs asynchronously after each Claude response. It operates in two modes:
- **Single-turn** (every turn) — analyzes just the last user/assistant exchange. Cheap and fast. Extracts corrections, preferences, decisions, and facts.
- **Deep multi-turn** (every 5 turns) — analyzes the last 5 exchanges as a window. Also extracts debugging insights, workarounds, and operational knowledge.
**Judge fallback chain:** Claude CLI (haiku model) → local Ollama (qwen2.5:3b/llama3.2:3b/gemma2:2b/phi3:mini) → heuristic pattern matching (keyword-based extraction from user messages).
In hybrid mode, a `SyncEngine` runs in a daemon thread with its own SQLite connection (thread-safe via `threading.Lock`).
**Push cycle:** Reads the `pending_ops` queue (SQLite table) and sends each operation to the API. On success, removes the operation from the queue and updates the local `server_id` mapping. On failure, the operation stays queued for the next cycle.
**Pull cycle:** Calls `GET /api/memories/sync?since=<last_sync_ts>` for incremental changes. Server wins on conflicts — remote updates overwrite local rows (matched by `server_id`). Soft-deleted rows on the server are hard-deleted locally.
**Write path:** Every `memory_store` first writes to local SQLite (instant), then attempts an immediate sync to the API. If the immediate sync fails, the write is enqueued in `pending_ops` for the next background cycle.
**Sync interval:** Configurable via `MEMORY_SYNC_INTERVAL` (default: 60 seconds).
The MCP server itself (`mcp_server.py` and `sync.py`) uses **stdlib only** — no pip install needed on the client side. The `[api]` extra adds FastAPI, asyncpg, uvicorn, etc. for the server.
├── hooks/ # Claude Code hooks (auto-recall, auto-learn, etc.)
├── docker/ # Docker Compose for API server + PostgreSQL
├── deploy/ # Kubernetes manifests and Helm chart
└── pyproject.toml # Package config (hatchling)
```
### Key Design Decisions
- **stdlib-only MCP server**: The MCP server (`mcp_server.py`, `sync.py`) uses only Python stdlib — no pip install required for the client side. This ensures it works in any Claude Code environment without dependency management.
- **NDJSON transport**: Claude Code uses NDJSON (one JSON per line) for stdio MCP, not Content-Length framing.
- **Non-blocking startup**: MCP server startup must complete in ~15s or Claude Code times out. All network calls are deferred to background threads.
- **Suppress stderr**: Any stderr output during MCP startup causes Claude Code to reject the server.