docs: glossary + ADRs for semantic/concept-graph memory
Captures the design language (CONTEXT.md) and the framing decisions from the requirements interview: pursue hybrid embeddings+concept-graph retrieval gated on a benchmark (0001), target the API/Postgres deployment while SQLite stays lexical (0002), and permit hosted embedding APIs for non-sensitive memories only (0003). Groundwork for the research/prototype/benchmark effort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
5151bbe0d5
commit
7439540f8f
4 changed files with 105 additions and 0 deletions
45
CONTEXT.md
Normal file
45
CONTEXT.md
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
# Claude Memory MCP
|
||||
|
||||
Persistent cross-session memory for Claude. Today it stores **Memories** as rows and
|
||||
retrieves them by **lexical recall** (full-text keyword matching). This context is being
|
||||
extended with **semantic recall** (embeddings) and a **concept graph** so retrieval works
|
||||
by meaning and related memories become traversable.
|
||||
|
||||
## Language
|
||||
|
||||
**Memory**:
|
||||
A single stored unit of knowledge — a fact, preference, decision, project note, or person
|
||||
detail — with content plus metadata (category, tags, importance). The atomic thing a user
|
||||
stores and recalls.
|
||||
|
||||
**Recall**:
|
||||
Retrieving the Memories most relevant to a query. The read path.
|
||||
|
||||
**Lexical recall**:
|
||||
The existing retrieval method — matches Memories whose words (content, tags, LLM-generated
|
||||
keywords) overlap the query, ranked by BM25 / `ts_rank`. Matches *tokens*, not meaning.
|
||||
_Avoid_: calling this "semantic search" — it is not semantic.
|
||||
|
||||
**Semantic recall**:
|
||||
Retrieval by meaning via dense-vector **Embedding** similarity, so a query surfaces a Memory
|
||||
even with zero shared words (e.g. "what UI library?" → "prefers Svelte").
|
||||
|
||||
**Embedding**:
|
||||
A dense vector representation of a Memory's (or Concept's) meaning, used for Semantic recall.
|
||||
|
||||
**Concept**:
|
||||
A distinct entity or idea that recurs across Memories (e.g. "Svelte", "Viktor", "TripIt",
|
||||
"frontend framework"). A node in the Concept graph. Distinct from a Memory: one Memory can
|
||||
mention several Concepts, and one Concept spans many Memories.
|
||||
|
||||
**Concept graph**:
|
||||
The network of Concepts joined by typed **Relationships**, making the memory store
|
||||
traversable — from one Memory or Concept to related ones.
|
||||
|
||||
**Relationship**:
|
||||
A typed, directed edge in the Concept graph, between two Concepts or between a Memory and a
|
||||
Concept (e.g. `prefers`, `is-a`, `used-in`, `mentions`).
|
||||
|
||||
**Hybrid retrieval**:
|
||||
The target read path — combining Lexical recall, Semantic recall, and Concept-graph
|
||||
traversal into one ranked result set.
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
# Pursue hybrid retrieval: embeddings + concept graph over pure lexical
|
||||
|
||||
Today recall is **lexical only** (BM25 in SQLite, `tsvector`/`ts_rank` in Postgres over
|
||||
content + LLM-generated `expanded_keywords`). It matches *tokens*, so it misses
|
||||
paraphrase/synonym queries and cannot traverse between related Memories. We will pursue a
|
||||
**hybrid** read path that adds dense-vector **Semantic recall** and a traversable **Concept
|
||||
graph** (typed Relationships) alongside the existing Lexical recall.
|
||||
|
||||
This decision is **gated on a benchmark**: we adopt hybrid only if it shows a material
|
||||
recall-quality uplift over the current lexical system on a stratified eval set (exact /
|
||||
paraphrase / multi-hop). If the benchmark shows no improvement, a later ADR supersedes this
|
||||
and we stay lexical.
|
||||
|
||||
## Considered options
|
||||
|
||||
- **Pure semantic (embeddings only)** — fixes paraphrase gaps but gives no real concept
|
||||
traversal; rejected as the *sole* mechanism.
|
||||
- **Pure concept graph** — enables traversal but node-matching stays lexical, so paraphrase
|
||||
gaps remain; rejected as the *sole* mechanism.
|
||||
- **Hybrid (chosen)** — embeddings for meaning + graph for traversal + existing FTS, fused
|
||||
into one ranked result. Highest ceiling; the GraphRAG / Zep-Graphiti / HippoRAG family.
|
||||
20
docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md
Normal file
20
docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
# API/Postgres deployment gets semantics; SQLite-only stays lexical
|
||||
|
||||
The semantic + concept-graph layer targets the **API/Postgres** deployment only: embeddings
|
||||
in pgvector on the (CNPG) Postgres, the Concept graph as node/edge tables in Postgres, and
|
||||
embedding/extraction via reused cluster infra (llama-cpp on GPU, or a hosted API). The
|
||||
**SQLite-only** mode keeps working but stays **lexical (FTS) only** — it gains no embeddings
|
||||
or graph, degrading gracefully.
|
||||
|
||||
This is surprising because the README markets zero-config offline SQLite as the headline
|
||||
feature. We accept that trade-off: the operator actually runs the remote API/Postgres store,
|
||||
reuse-before-building favours cluster infra, and bundling a local embedding model into the
|
||||
zero-config path would add heavy dependencies and double the build/test matrix for little
|
||||
real-world benefit.
|
||||
|
||||
## Consequences
|
||||
|
||||
- All benchmark numbers are produced in API/Postgres mode.
|
||||
- Offline zero-config users see no behaviour change.
|
||||
- A future ADR may revisit offline semantics (e.g. via `sqlite-vec` + a small local model)
|
||||
if there is demand.
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
# External embedding/extraction APIs allowed for non-sensitive memories
|
||||
|
||||
Embedding and concept extraction may use **hosted APIs** (e.g. OpenAI `text-embedding-3`,
|
||||
Voyage, Cohere) for **non-sensitive** memories, to access a higher quality ceiling than
|
||||
self-hosted models alone. **Sensitive / Vault-encrypted (secret) memories are never sent
|
||||
externally** and are excluded from the corpus that gets embedded or extracted.
|
||||
|
||||
This is a deliberate relaxation of the homelab's usual local-only posture, made because the
|
||||
quality gain is worth it for non-secret personal memory content. The research/benchmark may
|
||||
still compare hosted vs self-hostable models (nomic-embed, bge-m3, gte-Qwen2, e5) so the
|
||||
production choice is data-driven; this ADR only records that egress is *permitted* within the
|
||||
sensitive-data boundary.
|
||||
|
||||
## Consequences
|
||||
|
||||
- The corpus-export step MUST filter out `is_sensitive` / secret memories before any external
|
||||
call.
|
||||
- Production deployment needs an embedding API key (or falls back to the in-cluster
|
||||
llama-cpp model when absent).
|
||||
Loading…
Add table
Add a link
Reference in a new issue