fix(mcp): serialize local SQLite writes to end "database is locked" under concurrent stores
Under heavy concurrent memory_store (many subagents/sessions writing close
together) the local SQLite layer raced on the single SQLite writer and surfaced
sqlite3.OperationalError: database is locked, which made memory tools slow and
eventually dropped whole sessions. Two root causes:
- The MCP server (mcp_server.py) and the background SyncEngine (sync.py) each
opened a SEPARATE connection to the same SQLite file. WAL allows one writer;
when the sync writer held the lock across a resync, a concurrent store blew
past busy_timeout and raised "database is locked".
- mcp_server's connection was opened WITHOUT check_same_thread=False, so the
moment two requests were handled on different threads every local store/recall
raised ProgrammingError "SQLite objects created in a thread...".
Fix: a single process-wide serialized writer.
- New LocalStore (local_store.py) owns ONE connection (check_same_thread=False)
guarded by ONE re-entrant lock, keeps WAL, and wraps writes in
transaction()/write() with bounded exponential-backoff retry on the rare
residual lock (e.g. another OS process) instead of failing the call.
- MemoryServer builds the LocalStore and SHARES it with the SyncEngine, so the
sync writer no longer opens a second connection — the two-connections race is
eliminated. All server reads/writes go through the shared lock; stores stay
snappy (enqueue-local + async sync).
- Bound the one genuinely slow path (remote semantic memory_recall) with an
explicit RECALL_TIMEOUT and, on timeout/unreachable backend, return a clear
"working / retry" signal instead of hanging silently or crashing. When a
client supplies _meta.progressToken, emit one notifications/progress so the
call shows life.
Ships to users via the plugin (mcp/memory-mcp.json runs src/.../mcp_server.py);
no server-side/API change needed. TDD: added concurrency tests (many threads +
sync writer on one file) and recall progress/bounding tests; full gate green
(ruff + mypy strict + 185 pytest).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
68088e684e
commit
5151bbe0d5
5 changed files with 590 additions and 116 deletions
|
|
@ -5,14 +5,14 @@ Uses only stdlib — no pip install required.
|
|||
|
||||
import json
|
||||
import logging
|
||||
import sqlite3
|
||||
import threading
|
||||
import urllib.error
|
||||
import urllib.parse
|
||||
import urllib.request
|
||||
from typing import Any
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
from claude_memory.local_store import LocalStore
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
@ -26,7 +26,14 @@ FULL_RESYNC_EVERY = 10
|
|||
class SyncEngine:
|
||||
"""Background sync between local SQLite cache and remote API."""
|
||||
|
||||
def __init__(self, db_path: str, api_base_url: str, api_key: str, sync_interval: int = 60):
|
||||
def __init__(
|
||||
self,
|
||||
db_path: str,
|
||||
api_base_url: str,
|
||||
api_key: str,
|
||||
sync_interval: int = 60,
|
||||
store: "LocalStore | None" = None,
|
||||
):
|
||||
self.db_path = db_path
|
||||
self.api_base_url = api_base_url.rstrip("/")
|
||||
self.api_key = api_key
|
||||
|
|
@ -37,13 +44,20 @@ class SyncEngine:
|
|||
self._last_sync_success = False
|
||||
self._auth_failed = False
|
||||
|
||||
# Own connection for thread safety
|
||||
Path(db_path).parent.mkdir(parents=True, exist_ok=True)
|
||||
self._conn = sqlite3.connect(db_path, timeout=30.0, check_same_thread=False)
|
||||
self._conn.row_factory = sqlite3.Row
|
||||
self._conn.execute("PRAGMA journal_mode=WAL")
|
||||
self._conn.execute("PRAGMA busy_timeout=30000")
|
||||
self._lock = threading.Lock()
|
||||
# Share the caller's LocalStore (one connection, one lock) when given, so the
|
||||
# background sync writer never opens a SECOND connection to the same file and
|
||||
# never races the store path on the single SQLite writer. When run standalone
|
||||
# (e.g. tests), open our own store. Either way, all SQLite access below goes
|
||||
# through self._store / self._conn / self._lock, which now point at one shared
|
||||
# connection guarded by one re-entrant lock.
|
||||
if store is None:
|
||||
self._store = LocalStore.open(db_path)
|
||||
self._owns_store = True
|
||||
else:
|
||||
self._store = store
|
||||
self._owns_store = False
|
||||
self._conn = self._store.conn
|
||||
self._lock = self._store.lock
|
||||
|
||||
self._init_sync_tables()
|
||||
|
||||
|
|
@ -121,7 +135,10 @@ class SyncEngine:
|
|||
self._stop_event.set()
|
||||
if self._thread and self._thread.is_alive():
|
||||
self._thread.join(timeout=5)
|
||||
self._conn.close()
|
||||
# Only close the connection if we opened it; when sharing the server's
|
||||
# LocalStore, the server owns the connection lifecycle.
|
||||
if self._owns_store:
|
||||
self._store.close()
|
||||
|
||||
def _sync_loop(self) -> None:
|
||||
"""Periodic sync loop running in background thread."""
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue