Multiple agent calls now run concurrently, each in its own isolated git checkout (local clone of the warm base, hardlinked objects, git-crypt re-unlocked), so concurrent jobs never share a working tree. - execution_lock (asyncio.Lock) -> execution_semaphore (default MAX_CONCURRENCY=10); excess calls queue FIFO instead of 409/503. MAX_QUEUE_DEPTH safety valve. - /execute never returns 409; jobs go queued -> running. Timeout covers execution only, not queue wait. - /v1/chat/completions queues for a slot instead of 503-busy. - /health: busy = at-capacity, plus active/queued/capacity fields. - per-job workspace prepare/cleanup under a short git lock; the agent run holds none. - in-memory job registry evicted past JOB_TTL_SECONDS. Design: docs/2026-06-02-parallel-execution-design.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.8 KiB
Parallel, independent execution — design
Date: 2026-06-02
Status: approved, in implementation
Scope: claude-agent-service — remove the single-flight execution lock so
multiple agent calls run concurrently, each in its own isolated workspace.
Problem
Today a single global asyncio.Lock (execution_lock) serializes every
agent invocation:
POST /executereturns409 Agent is busywhen a job is in flight.POST /v1/chat/completionsreturns503 agent is busylikewise.- All calls run
claude -pwithcwd=/workspace/infra— one shared working tree,git pull --rebase'd before each call.
The lock exists because two claude -p processes in the same working tree
would clobber each other's file edits and git state (.git/index.lock
contention, racing git pull --rebase).
Goal
Run calls in parallel, each fully independent of the others, without
the git/file collisions that the lock currently prevents — on a single pod
(replicas=1), keeping the in-memory job registry coherent for /jobs/{id}
polling.
Design
Workspace isolation — per-job local clone
Each job gets its own git checkout so file edits and git operations never touch another job's state:
- A warm base clone lives at
/workspace/base(created by the existing init container; renamed from/workspace/infra), git-crypt-unlocked. - Per job, under a short-held
git_lock:- Debounced
git fetch origin && git reset --hard origin/masteron the base (skipped if fetched withinFETCH_DEBOUNCE_SECONDS) so bursts share one network fetch. git clone --local /workspace/base /workspace/jobs/<id>— objects are hardlinked (near-free disk, no.terraformcarried since clone takes tracked content only).- Re-point
originto the GitHub URL andgit-crypt unlock <key>in the job dir.
- Debounced
- The job runs
claude -pwithcwd=/workspace/jobs/<id>holding no lock. finally→rm -rf /workspace/jobs/<id>.
git_lock is held only for the fast setup/teardown (~<2 s); execution is fully
parallel. Rejected alternatives: git worktree (shares one .git → agents
that git commit/pull still contend — not truly independent) and cp -a
(copies accumulated .terraform provider caches → disk blowup).
Distinct cwd per job also isolates Claude CLI per-project state
(~/.claude/projects/<cwd-hash>/). The long-lived CLAUDE_CODE_OAUTH_TOKEN
avoids credential-file write races in the shared ~/.claude.
Concurrency model
execution_semaphore = asyncio.Semaphore(MAX_CONCURRENCY)replacesexecution_lock. DefaultMAX_CONCURRENCY=10("soft-unbounded").- Requests beyond the limit queue FIFO (asyncio fairness) — they are not rejected.
MAX_QUEUE_DEPTHsafety valve (default 100): ifactive + queuedexceeds it, reject (429on/execute,503on chat) to bound memory.- A
concurrency_slot()async context manager wraps acquire/release and keepsinflight_active/inflight_queuedcounters for/health.
Endpoint behavior
| Endpoint | Before | After |
|---|---|---|
POST /execute |
202 or 409 busy |
202 always (unless queue full → 429); job status="queued" until a slot frees, then running. Timeout clock starts on execution, not queue-wait. |
POST /v1/chat/completions |
200 or 503 busy |
queues for a slot (caller waits, bounded by the 900 s timeout); still 503 on execution failure/timeout or if queue full |
GET /jobs/{id} |
unchanged | unchanged (can now report queued) |
GET /health |
{status, busy=lock.locked()} |
{status, busy=(active>=capacity), active, queued, capacity} — keeps BeadBoard /api/agent-status + beads-dispatcher working |
Housekeeping
- Job eviction: completed/failed/timeout/error jobs older than
JOB_TTL_SECONDS(default 3600) are evicted; the in-memoryjobsdict currently grows unbounded and parallelism increases churn. - Pod restart still loses in-flight jobs (pre-existing; out of scope — no shared store, matching the in-pod decision).
Infra (infra/stacks/claude-agent-service/main.tf)
- Mount the existing
git-crypt-keyconfigmap into the main container (today only the init container has it) — needed for per-job unlock. - Pod memory: request
2Gi, limit12Gi(Burstable, tier-aux); CPU request1, no CPU limit. Fits node2/3/5 headroom (~22–26 GB free). - Wire
MAX_CONCURRENCYenv. Rename init-container clone target to/workspace/base;WORKSPACE_DIR→ base path. replicas=1,Recreateunchanged.
Blast radius (verified)
All callers handle the busy responses gracefully or fail safely, so removing them is safe:
- n8n DIUN (
/execute) — rate-limited 5/6h, no retry; 409 was rare. - payslip-ingest (
/execute+poll) — 90× retry; big win from parallelism. - recruiter-responder (
/execute+poll) — returnsbusy, OpenClaw retries. - fire-planner (
/v1/chat/completions) — client-side semaphore; can be relaxed after this. - BeadBoard (
/execute) — UI shows busy via/api/agent-status(/health). - beads-dispatcher CronJob — gates on
/healthbusy; 2-min tick.
Testing (TDD)
Rewrite test_execute_respects_sequential_lock and
test_chat_completions_returns_503_when_agent_busy (they encode the removed
behavior). New tests: two concurrent /execute both run; safety-queue at
MAX_CONCURRENCY=2; concurrent chat-completions both run; /health capacity
fields; per-job distinct workspace cwd; timeout excludes queue-wait; job
eviction; queue-depth rejection. An autouse fixture resets semaphore + counters
- jobs between tests.
Docs to update (same change)
infra/docs/architecture/automated-upgrades.md,
infra/docs/runbooks/beads-auto-dispatch.md, infra/AGENTS.md, root
CLAUDE.md — all currently describe "sequential / single-slot".