From 7439540f8f2552824884ef695f8b3b9878007e07 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Thu, 25 Jun 2026 15:36:30 +0000 Subject: [PATCH] docs: glossary + ADRs for semantic/concept-graph memory Captures the design language (CONTEXT.md) and the framing decisions from the requirements interview: pursue hybrid embeddings+concept-graph retrieval gated on a benchmark (0001), target the API/Postgres deployment while SQLite stays lexical (0002), and permit hosted embedding APIs for non-sensitive memories only (0003). Groundwork for the research/prototype/benchmark effort. Co-Authored-By: Claude Opus 4.8 --- CONTEXT.md | 45 +++++++++++++++++++ ...-retrieval-embeddings-and-concept-graph.md | 21 +++++++++ ...api-postgres-first-sqlite-stays-lexical.md | 20 +++++++++ ...apis-allowed-for-non-sensitive-memories.md | 19 ++++++++ 4 files changed, 105 insertions(+) create mode 100644 CONTEXT.md create mode 100644 docs/adr/0001-pursue-hybrid-retrieval-embeddings-and-concept-graph.md create mode 100644 docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md create mode 100644 docs/adr/0003-external-embedding-apis-allowed-for-non-sensitive-memories.md diff --git a/CONTEXT.md b/CONTEXT.md new file mode 100644 index 0000000..6782e70 --- /dev/null +++ b/CONTEXT.md @@ -0,0 +1,45 @@ +# Claude Memory MCP + +Persistent cross-session memory for Claude. Today it stores **Memories** as rows and +retrieves them by **lexical recall** (full-text keyword matching). This context is being +extended with **semantic recall** (embeddings) and a **concept graph** so retrieval works +by meaning and related memories become traversable. + +## Language + +**Memory**: +A single stored unit of knowledge — a fact, preference, decision, project note, or person +detail — with content plus metadata (category, tags, importance). The atomic thing a user +stores and recalls. + +**Recall**: +Retrieving the Memories most relevant to a query. The read path. + +**Lexical recall**: +The existing retrieval method — matches Memories whose words (content, tags, LLM-generated +keywords) overlap the query, ranked by BM25 / `ts_rank`. Matches *tokens*, not meaning. +_Avoid_: calling this "semantic search" — it is not semantic. + +**Semantic recall**: +Retrieval by meaning via dense-vector **Embedding** similarity, so a query surfaces a Memory +even with zero shared words (e.g. "what UI library?" → "prefers Svelte"). + +**Embedding**: +A dense vector representation of a Memory's (or Concept's) meaning, used for Semantic recall. + +**Concept**: +A distinct entity or idea that recurs across Memories (e.g. "Svelte", "Viktor", "TripIt", +"frontend framework"). A node in the Concept graph. Distinct from a Memory: one Memory can +mention several Concepts, and one Concept spans many Memories. + +**Concept graph**: +The network of Concepts joined by typed **Relationships**, making the memory store +traversable — from one Memory or Concept to related ones. + +**Relationship**: +A typed, directed edge in the Concept graph, between two Concepts or between a Memory and a +Concept (e.g. `prefers`, `is-a`, `used-in`, `mentions`). + +**Hybrid retrieval**: +The target read path — combining Lexical recall, Semantic recall, and Concept-graph +traversal into one ranked result set. diff --git a/docs/adr/0001-pursue-hybrid-retrieval-embeddings-and-concept-graph.md b/docs/adr/0001-pursue-hybrid-retrieval-embeddings-and-concept-graph.md new file mode 100644 index 0000000..e9f0cf9 --- /dev/null +++ b/docs/adr/0001-pursue-hybrid-retrieval-embeddings-and-concept-graph.md @@ -0,0 +1,21 @@ +# Pursue hybrid retrieval: embeddings + concept graph over pure lexical + +Today recall is **lexical only** (BM25 in SQLite, `tsvector`/`ts_rank` in Postgres over +content + LLM-generated `expanded_keywords`). It matches *tokens*, so it misses +paraphrase/synonym queries and cannot traverse between related Memories. We will pursue a +**hybrid** read path that adds dense-vector **Semantic recall** and a traversable **Concept +graph** (typed Relationships) alongside the existing Lexical recall. + +This decision is **gated on a benchmark**: we adopt hybrid only if it shows a material +recall-quality uplift over the current lexical system on a stratified eval set (exact / +paraphrase / multi-hop). If the benchmark shows no improvement, a later ADR supersedes this +and we stay lexical. + +## Considered options + +- **Pure semantic (embeddings only)** — fixes paraphrase gaps but gives no real concept + traversal; rejected as the *sole* mechanism. +- **Pure concept graph** — enables traversal but node-matching stays lexical, so paraphrase + gaps remain; rejected as the *sole* mechanism. +- **Hybrid (chosen)** — embeddings for meaning + graph for traversal + existing FTS, fused + into one ranked result. Highest ceiling; the GraphRAG / Zep-Graphiti / HippoRAG family. diff --git a/docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md b/docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md new file mode 100644 index 0000000..9310b72 --- /dev/null +++ b/docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md @@ -0,0 +1,20 @@ +# API/Postgres deployment gets semantics; SQLite-only stays lexical + +The semantic + concept-graph layer targets the **API/Postgres** deployment only: embeddings +in pgvector on the (CNPG) Postgres, the Concept graph as node/edge tables in Postgres, and +embedding/extraction via reused cluster infra (llama-cpp on GPU, or a hosted API). The +**SQLite-only** mode keeps working but stays **lexical (FTS) only** — it gains no embeddings +or graph, degrading gracefully. + +This is surprising because the README markets zero-config offline SQLite as the headline +feature. We accept that trade-off: the operator actually runs the remote API/Postgres store, +reuse-before-building favours cluster infra, and bundling a local embedding model into the +zero-config path would add heavy dependencies and double the build/test matrix for little +real-world benefit. + +## Consequences + +- All benchmark numbers are produced in API/Postgres mode. +- Offline zero-config users see no behaviour change. +- A future ADR may revisit offline semantics (e.g. via `sqlite-vec` + a small local model) + if there is demand. diff --git a/docs/adr/0003-external-embedding-apis-allowed-for-non-sensitive-memories.md b/docs/adr/0003-external-embedding-apis-allowed-for-non-sensitive-memories.md new file mode 100644 index 0000000..e0ec9f4 --- /dev/null +++ b/docs/adr/0003-external-embedding-apis-allowed-for-non-sensitive-memories.md @@ -0,0 +1,19 @@ +# External embedding/extraction APIs allowed for non-sensitive memories + +Embedding and concept extraction may use **hosted APIs** (e.g. OpenAI `text-embedding-3`, +Voyage, Cohere) for **non-sensitive** memories, to access a higher quality ceiling than +self-hosted models alone. **Sensitive / Vault-encrypted (secret) memories are never sent +externally** and are excluded from the corpus that gets embedded or extracted. + +This is a deliberate relaxation of the homelab's usual local-only posture, made because the +quality gain is worth it for non-secret personal memory content. The research/benchmark may +still compare hosted vs self-hostable models (nomic-embed, bge-m3, gte-Qwen2, e5) so the +production choice is data-driven; this ADR only records that egress is *permitted* within the +sensitive-data boundary. + +## Consequences + +- The corpus-export step MUST filter out `is_sensitive` / secret memories before any external + call. +- Production deployment needs an embedding API key (or falls back to the in-cluster + llama-cpp model when absent).