Commit graph

6 commits

Author SHA1 Message Date
Viktor Barzin
66104a32ab parallel execution: replace single-flight lock with bounded semaphore + per-job workspace
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Multiple agent calls now run concurrently, each in its own isolated git
checkout (local clone of the warm base, hardlinked objects, git-crypt
re-unlocked), so concurrent jobs never share a working tree.

- execution_lock (asyncio.Lock) -> execution_semaphore (default MAX_CONCURRENCY=10);
  excess calls queue FIFO instead of 409/503. MAX_QUEUE_DEPTH safety valve.
- /execute never returns 409; jobs go queued -> running. Timeout covers
  execution only, not queue wait.
- /v1/chat/completions queues for a slot instead of 503-busy.
- /health: busy = at-capacity, plus active/queued/capacity fields.
- per-job workspace prepare/cleanup under a short git lock; the agent run holds none.
- in-memory job registry evicted past JOB_TTL_SECONDS.

Design: docs/2026-06-02-parallel-execution-design.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:57:41 +00:00
Viktor Barzin
add15325bb openai-compat: tolerate legacy date-suffixed model names during transition
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2026-06-01 21:59:50 +00:00
Viktor Barzin
1132777705 openai-compat: use bare model aliases (haiku/sonnet/opus) to auto-roll forward
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2026-06-01 19:55:19 +00:00
Viktor Barzin
7baa66d994 openai-compat: pass --model from request through to claude -p
Replaces the MODEL_TO_AGENT dict (which only mapped model -> agent and
ignored the model itself) with a SUPPORTED_MODELS allowlist + per-request
--model CLI flag. Callers can now pick Haiku/Sonnet/Opus per request to
control cost; unknown model IDs 400 with the supported list; missing
model defaults to claude-sonnet-4-6 (mid-tier).

The --model CLI flag overrides whatever model: is in the agent's
frontmatter, so recruiter-triage's `model: sonnet` no longer pins
every request to Sonnet.

Verified with claude CLI 2.1.153 that the bare-form IDs
(claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7) are accepted
without date suffixes — confirmed via modelUsage keys in the JSON
output.

Six new tests cover: default routing, haiku/sonnet/opus pass-through,
unsupported-model 400 shape, and the response.model echo.
2026-06-01 19:33:54 +00:00
Viktor Barzin
07dcfca333 openai-compat: add /v1/chat/completions endpoint
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
OpenAI-compatible chat completions endpoint so existing OpenAI-API
clients (fire-planner's examples/llm_extract.py and others) can target
this service without rewriting their client.

Behaviour:
- POST /v1/chat/completions accepts the OpenAI chat-completions request
  shape (model, messages, max_tokens?, temperature?, stream?).
- Reuses the existing Bearer auth from /execute.
- Synthesises a single prompt body from system+user messages
  ("System instructions:\n... --- Request:\n...") so the agent treats
  them as the user's request rather than seeing raw JSON.
- Internally shares the execution path with /execute by extracting
  _invoke_claude_subprocess(). Holds execution_lock for the duration;
  returns 503 (not 409) when busy, since OpenAI callers have no
  job-id model to retry against.
- Returns the OpenAI chat-completions envelope with the final
  assistant text extracted from `claude -p --output-format json`
  (falls back to raw stdout if parsing fails).
- stream=true -> 400 {"error": "streaming not supported"}.
- Underlying failure (non-zero exit, timeout, exception) -> 503
  {"error": "execution failed", "detail": "<one line>"}.

Model -> agent mapping is hardcoded to `recruiter-triage` for all
models for v1 (broadest tool surface among current agents). Budget
is hardcoded to $2.00/call; timeout 900s. Revisit when a true
general-purpose agent lands.

Tests: 9 new tests covering happy path, streaming rejection, missing
auth, wrong token, job failure, empty messages, JSON-parse fallback,
prompt synthesis, and busy-503. All 20 tests (11 existing + 9 new)
pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 06:24:20 +00:00
Viktor Barzin
6fa60fdd1a Initial extraction from monorepo 2026-05-07 17:07:12 +00:00