Under heavy concurrent memory_store (many subagents/sessions writing close
together) the local SQLite layer raced on the single SQLite writer and surfaced
sqlite3.OperationalError: database is locked, which made memory tools slow and
eventually dropped whole sessions. Two root causes:
- The MCP server (mcp_server.py) and the background SyncEngine (sync.py) each
opened a SEPARATE connection to the same SQLite file. WAL allows one writer;
when the sync writer held the lock across a resync, a concurrent store blew
past busy_timeout and raised "database is locked".
- mcp_server's connection was opened WITHOUT check_same_thread=False, so the
moment two requests were handled on different threads every local store/recall
raised ProgrammingError "SQLite objects created in a thread...".
Fix: a single process-wide serialized writer.
- New LocalStore (local_store.py) owns ONE connection (check_same_thread=False)
guarded by ONE re-entrant lock, keeps WAL, and wraps writes in
transaction()/write() with bounded exponential-backoff retry on the rare
residual lock (e.g. another OS process) instead of failing the call.
- MemoryServer builds the LocalStore and SHARES it with the SyncEngine, so the
sync writer no longer opens a second connection — the two-connections race is
eliminated. All server reads/writes go through the shared lock; stores stay
snappy (enqueue-local + async sync).
- Bound the one genuinely slow path (remote semantic memory_recall) with an
explicit RECALL_TIMEOUT and, on timeout/unreachable backend, return a clear
"working / retry" signal instead of hanging silently or crashing. When a
client supplies _meta.progressToken, emit one notifications/progress so the
call shows life.
Ships to users via the plugin (mcp/memory-mcp.json runs src/.../mcp_server.py);
no server-side/API change needed. TDD: added concurrency tests (many threads +
sync writer on one file) and recall progress/bounding tests; full gate green
(ruff + mypy strict + 185 pytest).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>