Viktor Barzin 7439540f8f docs: glossary + ADRs for semantic/concept-graph memory

Captures the design language (CONTEXT.md) and the framing decisions from the
requirements interview: pursue hybrid embeddings+concept-graph retrieval gated
on a benchmark (0001), target the API/Postgres deployment while SQLite stays
lexical (0002), and permit hosted embedding APIs for non-sensitive memories
only (0003). Groundwork for the research/prototype/benchmark effort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-25 15:36:30 +00:00

1.2 KiB

Raw Blame History

Pursue hybrid retrieval: embeddings + concept graph over pure lexical

Today recall is lexical only (BM25 in SQLite, tsvector/ts_rank in Postgres over content + LLM-generated expanded_keywords). It matches tokens, so it misses paraphrase/synonym queries and cannot traverse between related Memories. We will pursue a hybrid read path that adds dense-vector Semantic recall and a traversable Concept graph (typed Relationships) alongside the existing Lexical recall.

This decision is gated on a benchmark: we adopt hybrid only if it shows a material recall-quality uplift over the current lexical system on a stratified eval set (exact / paraphrase / multi-hop). If the benchmark shows no improvement, a later ADR supersedes this and we stay lexical.

Considered options

Pure semantic (embeddings only) — fixes paraphrase gaps but gives no real concept traversal; rejected as the sole mechanism.
Pure concept graph — enables traversal but node-matching stays lexical, so paraphrase gaps remain; rejected as the sole mechanism.
Hybrid (chosen) — embeddings for meaning + graph for traversal + existing FTS, fused into one ranked result. Highest ceiling; the GraphRAG / Zep-Graphiti / HippoRAG family.

1.2 KiB Raw Blame History

Pursue hybrid retrieval: embeddings + concept graph over pure lexical

Considered options

1.2 KiB

Raw Blame History