claude-memory-mcp/docs/adr/0006-pgvector-hnsw-halfvec-1024d-embeddings.md
Viktor Barzin 1cc8a2b378
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled
research: benchmark hybrid (lexical+dense+graph) recall vs current FTS
Viktor asked to enhance the memory system with 'semantics' — remember concepts
(not just tokens) linked in a graph — and to prove, by benchmarking against the
current system, that it actually improves recall. A multi-phase research workflow
(18 agents) did landscape research, an adversarially-reviewed integration design,
a stratified eval set over the real 5,452-memory corpus, and a head-to-head
prototype-vs-current benchmark.

Result: hybrid (lexical FTS + dense embeddings, RRF-fused) beats FTS on every
overall metric, driven by a robust paraphrase win (recall@10 +0.350). Recommend
adopting lexical+dense; the concept graph is DEFERRED.

Post-run adversarial review correction (applied to all docs before commit): the
prototype's fusion config structurally barred the graph leg from the ranked top-k,
so the 'graph contributes nothing' ablation was a math artifact, NOT an empirical
result — the graph is UNEVALUATED, not disproven (deferred on cost+uncertainty).
Multi-hop deltas are not statistically significant. Glossary in CONTEXT.md; framing
in ADR-0001-0003; findings in ADR-0004-0006 + docs/research/.

Privacy: the corpus/queries/qrels/results are the user's real memories and stay
gitignored (data/, cache/, results/, build_eval_set.py); only harness code,
aggregate numbers, and synthetic examples are committed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-25 17:51:53 +00:00

49 lines
3.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Production vector storage: pgvector HNSW + halfvec(1024); 1024-d embeddings (Voyage-3.5 / bge-large)
Phase 1 of the hybrid ([ADR-0004](0004-phase-the-hybrid-lexical-dense-first-graph-gated.md)) needs a
production home for the dense embeddings. Per [ADR-0002](0002-api-postgres-first-sqlite-stays-lexical.md)
that is **pgvector on the shared CNPG Postgres**, where claude-memory is already a database tenant — no new
datastore.
> ⚠️ **Correction (verified against live infra by a design challenger):** an earlier draft justified this
> with "Immich already runs pgvector on the same cluster." That is **false** — Immich runs its **own**
> Postgres, not the shared CNPG — so it is NOT evidence the shared cluster has the extension. **pgvector
> must be explicitly enabled on CNPG** (extension install, and possibly a CNPG operand-image change) via
> Terraform **before this can land**; do not assume it is already available.
Decisions:
- **Index: HNSW** (`USING hnsw (embedding halfvec_cosine_ops) WITH (m=16, ef_construction=64)`,
query knob `hnsw.ef_search` set via `SET LOCAL` inside the recall txn under PgBouncer). Best
speed-recall tradeoff, buildable on an empty table. **IVFFlat rejected** — it must be built *after*
data exists (empty-table footgun) and has a lower recall ceiling.
- **Type: `halfvec(1024)`** (fp16) — halves index size at ~no recall loss; 1024-d halfvec = 2048
bytes/row → single-digit MB for the whole corpus.
- **Dimension fixed at 1024**, chosen **once** (changing it later forces a full re-embed + HNSW
rebuild). 1024 matches both the production model (Voyage-3.5) and the prototype model
(bge-large-en-v1.5), so the column and all fusion code are identical regardless of model.
- **Model: Voyage-3.5** (1024-d, hosted) for **non-sensitive** memories (highest measured retrieval
quality of the hosted options, free tier covers the corpus); **bge-large-en-v1.5** (1024-d, local,
MIT) for **sensitive memories and the no-API-key fallback** ([ADR-0003](0003-external-embedding-apis-allowed-for-non-sensitive-memories.md)).
`is_sensitive=1` rows are never embedded externally — `embedding=NULL`, lexical-only.
- **pgvectorscale / StreamingDiskANN deferred** — Rust/pgrx must be compiled into the CNPG operand
image, and it only earns its keep above ~15M vectors; our corpus is orders of magnitude below that.
## Migration shape
A single **additive** Alembic migration: `ALTER TABLE memories ADD COLUMN embedding halfvec(1024)`
(NULL for sensitive) + `CREATE INDEX CONCURRENTLY … USING hnsw …`. The existing generated
`search_vector tsvector` + GIN index (`migrations/001`) are **untouched**, so lexical behaviour and
the SQLite-only degrade path are unchanged. pgvector enablement on CNPG and any extension/operand
change land as **Terraform/Terragrunt** in `infra/stacks/…` (GitOps, never kubectl) and trigger a
rolling restart of the shared cluster — coordinate accordingly.
## Consequences
- The prototype's in-process numpy matrix maps directly to this column; only the substrate changes,
not the retrieval math.
- The prototype measured **bge-large** quality; a cheap follow-up should re-run the dense leg with
**Voyage-3.5** on the non-sensitive corpus to confirm the hosted ceiling holds on our content
before locking the production default.
- Production latency/ANN-approximation/filtered-top-k behaviour are unmeasured in the prototype and
must be validated post-migration (a stated benchmark limitation).