docs: glossary + ADRs for semantic/concept-graph memory

Captures the design language (CONTEXT.md) and the framing decisions from the requirements interview: pursue hybrid embeddings+concept-graph retrieval gated on a benchmark (0001), target the API/Postgres deployment while SQLite stays lexical (0002), and permit hosted embedding APIs for non-sensitive memories only (0003). Groundwork for the research/prototype/benchmark effort. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 15:36:30 +00:00 · 2026-06-25 15:36:30 +00:00 · 7439540f8f
commit 7439540f8f
parent 5151bbe0d5
4 changed files with 105 additions and 0 deletions
--- a/docs/adr/0001-pursue-hybrid-retrieval-embeddings-and-concept-graph.md
+++ b/docs/adr/0001-pursue-hybrid-retrieval-embeddings-and-concept-graph.md
@ -0,0 +1,21 @@
+# Pursue hybrid retrieval: embeddings + concept graph over pure lexical
+
+Today recall is **lexical only** (BM25 in SQLite, `tsvector`/`ts_rank` in Postgres over
+content + LLM-generated `expanded_keywords`). It matches *tokens*, so it misses
+paraphrase/synonym queries and cannot traverse between related Memories. We will pursue a
+**hybrid** read path that adds dense-vector **Semantic recall** and a traversable **Concept
+graph** (typed Relationships) alongside the existing Lexical recall.
+
+This decision is **gated on a benchmark**: we adopt hybrid only if it shows a material
+recall-quality uplift over the current lexical system on a stratified eval set (exact /
+paraphrase / multi-hop). If the benchmark shows no improvement, a later ADR supersedes this
+and we stay lexical.
+
+## Considered options
+
+- **Pure semantic (embeddings only)** — fixes paraphrase gaps but gives no real concept
+  traversal; rejected as the *sole* mechanism.
+- **Pure concept graph** — enables traversal but node-matching stays lexical, so paraphrase
+  gaps remain; rejected as the *sole* mechanism.
+- **Hybrid (chosen)** — embeddings for meaning + graph for traversal + existing FTS, fused
+  into one ranked result. Highest ceiling; the GraphRAG / Zep-Graphiti / HippoRAG family.
--- a/docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md
+++ b/docs/adr/0002-api-postgres-first-sqlite-stays-lexical.md
@ -0,0 +1,20 @@
+# API/Postgres deployment gets semantics; SQLite-only stays lexical
+
+The semantic + concept-graph layer targets the **API/Postgres** deployment only: embeddings
+in pgvector on the (CNPG) Postgres, the Concept graph as node/edge tables in Postgres, and
+embedding/extraction via reused cluster infra (llama-cpp on GPU, or a hosted API). The
+**SQLite-only** mode keeps working but stays **lexical (FTS) only** — it gains no embeddings
+or graph, degrading gracefully.
+
+This is surprising because the README markets zero-config offline SQLite as the headline
+feature. We accept that trade-off: the operator actually runs the remote API/Postgres store,
+reuse-before-building favours cluster infra, and bundling a local embedding model into the
+zero-config path would add heavy dependencies and double the build/test matrix for little
+real-world benefit.
+
+## Consequences
+
+- All benchmark numbers are produced in API/Postgres mode.
+- Offline zero-config users see no behaviour change.
+- A future ADR may revisit offline semantics (e.g. via `sqlite-vec` + a small local model)
+  if there is demand.
--- a/docs/adr/0003-external-embedding-apis-allowed-for-non-sensitive-memories.md
+++ b/docs/adr/0003-external-embedding-apis-allowed-for-non-sensitive-memories.md
@ -0,0 +1,19 @@
+# External embedding/extraction APIs allowed for non-sensitive memories
+
+Embedding and concept extraction may use **hosted APIs** (e.g. OpenAI `text-embedding-3`,
+Voyage, Cohere) for **non-sensitive** memories, to access a higher quality ceiling than
+self-hosted models alone. **Sensitive / Vault-encrypted (secret) memories are never sent
+externally** and are excluded from the corpus that gets embedded or extracted.
+
+This is a deliberate relaxation of the homelab's usual local-only posture, made because the
+quality gain is worth it for non-secret personal memory content. The research/benchmark may
+still compare hosted vs self-hostable models (nomic-embed, bge-m3, gte-Qwen2, e5) so the
+production choice is data-driven; this ADR only records that egress is *permitted* within the
+sensitive-data boundary.
+
+## Consequences
+
+- The corpus-export step MUST filter out `is_sensitive` / secret memories before any external
+  call.
+- Production deployment needs an embedding API key (or falls back to the in-cluster
+  llama-cpp model when absent).