fix(mcp): serialize local SQLite writes to end "database is locked" under concurrent stores
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled

Under heavy concurrent memory_store (many subagents/sessions writing close
together) the local SQLite layer raced on the single SQLite writer and surfaced
sqlite3.OperationalError: database is locked, which made memory tools slow and
eventually dropped whole sessions. Two root causes:

  - The MCP server (mcp_server.py) and the background SyncEngine (sync.py) each
    opened a SEPARATE connection to the same SQLite file. WAL allows one writer;
    when the sync writer held the lock across a resync, a concurrent store blew
    past busy_timeout and raised "database is locked".
  - mcp_server's connection was opened WITHOUT check_same_thread=False, so the
    moment two requests were handled on different threads every local store/recall
    raised ProgrammingError "SQLite objects created in a thread...".

Fix: a single process-wide serialized writer.

  - New LocalStore (local_store.py) owns ONE connection (check_same_thread=False)
    guarded by ONE re-entrant lock, keeps WAL, and wraps writes in
    transaction()/write() with bounded exponential-backoff retry on the rare
    residual lock (e.g. another OS process) instead of failing the call.
  - MemoryServer builds the LocalStore and SHARES it with the SyncEngine, so the
    sync writer no longer opens a second connection — the two-connections race is
    eliminated. All server reads/writes go through the shared lock; stores stay
    snappy (enqueue-local + async sync).
  - Bound the one genuinely slow path (remote semantic memory_recall) with an
    explicit RECALL_TIMEOUT and, on timeout/unreachable backend, return a clear
    "working / retry" signal instead of hanging silently or crashing. When a
    client supplies _meta.progressToken, emit one notifications/progress so the
    call shows life.

Ships to users via the plugin (mcp/memory-mcp.json runs src/.../mcp_server.py);
no server-side/API change needed. TDD: added concurrency tests (many threads +
sync writer on one file) and recall progress/bounding tests; full gate green
(ruff + mypy strict + 185 pytest).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-19 06:06:09 +00:00
parent 68088e684e
commit 5151bbe0d5
5 changed files with 590 additions and 116 deletions

View file

@ -5,14 +5,14 @@ Uses only stdlib — no pip install required.
import json
import logging
import sqlite3
import threading
import urllib.error
import urllib.parse
import urllib.request
from typing import Any
from datetime import datetime, timezone
from pathlib import Path
from claude_memory.local_store import LocalStore
logger = logging.getLogger(__name__)
@ -26,7 +26,14 @@ FULL_RESYNC_EVERY = 10
class SyncEngine:
"""Background sync between local SQLite cache and remote API."""
def __init__(self, db_path: str, api_base_url: str, api_key: str, sync_interval: int = 60):
def __init__(
self,
db_path: str,
api_base_url: str,
api_key: str,
sync_interval: int = 60,
store: "LocalStore | None" = None,
):
self.db_path = db_path
self.api_base_url = api_base_url.rstrip("/")
self.api_key = api_key
@ -37,13 +44,20 @@ class SyncEngine:
self._last_sync_success = False
self._auth_failed = False
# Own connection for thread safety
Path(db_path).parent.mkdir(parents=True, exist_ok=True)
self._conn = sqlite3.connect(db_path, timeout=30.0, check_same_thread=False)
self._conn.row_factory = sqlite3.Row
self._conn.execute("PRAGMA journal_mode=WAL")
self._conn.execute("PRAGMA busy_timeout=30000")
self._lock = threading.Lock()
# Share the caller's LocalStore (one connection, one lock) when given, so the
# background sync writer never opens a SECOND connection to the same file and
# never races the store path on the single SQLite writer. When run standalone
# (e.g. tests), open our own store. Either way, all SQLite access below goes
# through self._store / self._conn / self._lock, which now point at one shared
# connection guarded by one re-entrant lock.
if store is None:
self._store = LocalStore.open(db_path)
self._owns_store = True
else:
self._store = store
self._owns_store = False
self._conn = self._store.conn
self._lock = self._store.lock
self._init_sync_tables()
@ -121,7 +135,10 @@ class SyncEngine:
self._stop_event.set()
if self._thread and self._thread.is_alive():
self._thread.join(timeout=5)
self._conn.close()
# Only close the connection if we opened it; when sharing the server's
# LocalStore, the server owns the connection lifecycle.
if self._owns_store:
self._store.close()
def _sync_loop(self) -> None:
"""Periodic sync loop running in background thread."""