claude-agent-service

Author	SHA1	Message	Date
Viktor Barzin	eccf0dd407	conversational: trim per-turn context to cut brain TTFT ~1.3s Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details The no-tools conversational agent was dragging the full project context (this repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s time-to-first-token (measured against the live pod, 2026-06-21). Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both the gateway (json) and realtime (stream-json) conversational argvs: context drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both run the same turn). The /execute agent path is untouched. Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT was unchanged) — the cost was the context size, not the spawn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 18:00:21 +00:00
Viktor Barzin	a29bffdda3	chat-completions: stream conversational turns (SSE token relay) for realtime voice Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details Adds stream=true support to POST /v1/chat/completions (it previously 400'd). When streaming, it runs the no-tools `conversational` agent via `claude -p --output-format stream-json --include-partial-messages --verbose` and relays each content_block_delta as an OpenAI chat.completion.chunk SSE event, ending with finish_reason=stop + [DONE]. Free CLI/subscription auth, no tools, no API key. Stateless by design: the full message history is flattened into the prompt (prior assistant turns kept), so an OpenAI-style client that re-sends history each turn — e.g. Pipecat's OpenAILLMService — can stream from us directly. The non-streaming path (recruiter-triage workspace agent) is unchanged. This is phase 1 of the Pipecat realtime full-duplex voice-agent rebuild for portal-assistant (continuous audio, VAD endpointing, barge-in, ~seconds to first words). New pure helpers (stream_argv/delta_text/openai_chunk/ synthesise_chat_prompt) are unit-tested; the SSE endpoint has a mocked-subprocess integration test. 429 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 22:22:38 +00:00
Viktor Barzin	33ff0868c3	conversational: add no-tools multi-turn Brain endpoint for portal-assistant The portal-assistant voice gateway needs a Claude that is conversational, free (on the cluster subscription, no metered API), and safe to sit behind a public edge. Add POST /v1/conversational: it drives a new no-tools `conversational` agent with per-conversation --resume so a voice turn keeps context, and is lean on purpose — no workspace clone, no tools, and crucially NO --dangerously-skip-permissions (so even a leaked agent can't execute anything). This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002). The conversational agent replies in the speaker's language (Bulgarian/English), short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy path, multi-turn resume across calls, auth, and failure → 503. Full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 18:38:44 +00:00
Viktor Barzin	e34640cc47	afk: wire the T3 adapter to the REAL orchestration contract + fix priority Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details The T3 dispatch adapter was written against a guessed wire shape that the test fake accepted but the live t3-afk server 400s — so the previously-green suite did NOT mean the loop was actually wired to T3. Reverse-engineered the real contract from the v0.0.27 binary, verified it live against t3-afk (including multi-turn), and rewrote the adapter to match: - dispatch sends BARE commands keyed by `type` (not a `command` string), with client-minted threadId/commandId/messageId + createdAt; the server replies {sequence}, so dispatch returns the id it generated (never one parsed back). - a thread lives in a project (workspaceRoot = the repo checkout the agent runs in), so dispatch ensures the repo's project (snapshot -> project.create iff absent) before thread.create + thread.turn.start. - add send_turn() for follow-up turns on an existing thread — multi-turn context retention is verified live (turn 2 recalled turn 1). - watcher reads thread liveness from latestTurn.state (completed->idle, running/in_progress/pending->running, errored->error), not a non-existent top-level `status` field. Guard against recurrence: the test fake now REJECTS any command lacking a `type` discriminator (the original bug fails loudly), plus an opt-in live smoke test (tests/test_afk_t3_live.py) so "green" can mean "wired to T3". Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching tracker conventions and Issue.priority's own docstring — it had deliberately diverged to higher-first. Loop still ships DISABLED (kill switch on, empty allowlist). 416 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 22:27:00 +00:00
Viktor Barzin	2ef0db9a96	afk: add the autonomous issue-implementer loop (SHIPS DISABLED) Adds app/afk/ — the "away-from-keyboard" control plane that watches the issue tracker for ready-for-agent issues, dispatches each to a fresh full-access T3 thread (with the issue-implementer preamble prepended, because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed, escalating or fix-forwarding via a small pure state machine. The loop is split into pure cores (no I/O, exhaustively unit-tested) and thin injected adapters (the only edges that ever touch T3, the tracker, CI, or Slack — faked in every test, so nothing here talks to a real server, GitHub/Forgejo, or the cluster): pure: types, dispatch_policy, run_state_machine, phase_checklist, config, issue_implementer_prompt adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher, notifier loops: poller — CronJob tick #1: list_ready -> select_dispatchable -> dispatch + stamp the in-progress lock (label only AFTER a successful dispatch, so a failed dispatch never leaves a phantom lock). Per-repo lock derived from the ready set, since the CronJob is stateless between ticks. watcher — CronJob tick #2: assemble RunState from snapshot + CI -> next_action -> act (close on success; relabel ready-for-human + ring the doorbell on the two escalations; dispatch a corrective turn on fix-forward; refresh the progress checklist). SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an empty allowlist, so a freshly-loaded config dispatches nothing and does zero I/O. The package is not imported by the running service and has no auto-enable path. Arming it is a deliberate, later, manual step requiring BOTH gates (clear the kill switch AND enrol the exact repos) so one fat-fingered env var can't arm every repo. Test-first throughout: 412 tests pass (poller + watcher add integration tests wiring the real pure cores to in-memory fakes). mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 21:15:11 +00:00
Viktor Barzin	5b5daa4bea	breakglass UI v2: attachable sessions (tmux model) + mobile-first redesign Full audit-driven rework. Keeps the proven SSE-translation + verb logic; everything else upgraded for phone-primary use. Backend — server owns the session, clients attach (Viktor's tmux idea): - session.py: SessionManager + Session with an event log, subscriber pub/sub, and turns that run DETACHED (keep going if the client disconnects). - GET /api/session/{id}/stream = attach (SSE): replays the transcript then tails live; per-event id: lines so an EventSource auto-reconnect resumes from Last-Event-ID (free re-attach). POST /{id}/prompt starts a detached turn; POST /{id}/cancel = Stop. Replaces the old one-shot /api/chat. - agent_session trimmed to the argv + translate_event helpers; 21 new/updated tests (replay, Last-Event-ID resume, broadcast, detached turn, resume, cancel, routes) — 53 green. Frontend — mobile-first via the frontend-design skill (emergency-console aesthetic): - EventSource attach (native auto-reconnect, zero client reconnect logic); transcript.js folds events->messages with id-dedupe so replays never double-render (30 unit assertions). - Installable PWA: manifest + icons (wrench/break-glass mark) + apple-mobile-web-app meta + theme-color; viewport-fit=cover + safe-area; 100dvh; 16px composer (no iOS zoom). - One-tap diagnosis presets (Triage / Memory-OOM / Disk / Services / QEMU-wedged) mapped to the devvm's real failure modes; Stop button while a turn runs. - Foldable VM-control sheet, cycle the dominant recovery action w/ confirm, output capped 46vh. - a11y: fixed --ink-faint contrast 3.6:1 -> 6.1:1 (WCAG AA); >=44px tap targets. Deleted the obsolete fetch-reader sse.js (EventSource replaces it). Verified: 53 backend tests + 30 transcript assertions; Playwright @390x844 (input on-screen y=721-821, presets/sheet/fold/cap); local integration smoke vs the real backend (attach->caught-up, 404, verbs, PWA served). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-14 19:19:03 +00:00
Viktor Barzin	4f361d91eb	breakglass: in-cluster emergency-recovery UI for the devvm All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details Viktor wanted a web UI on the claude service to act as his breakglass when the devvm is down: open it, have Claude SSH in to diagnose/repair, and power-cycle the VM via the Proxmox host if needed. This is the app half (the infra stack + host bootstrap live in the infra repo). New, ISOLATED ASGI app under app/breakglass/ (never imports app.main, so the untrusted-input agents — recruiter-triage, nextcloud-todos — can't share a process with the root-on-devvm / PVE-reset SSH key): - pve.py: the LLM-independent power-verb path (status\|forensics\|reset\|stop\| start\|cycle on VM 102), whitelist-validated client-side, executed over the forced-command SSH key (list argv, no shell). - agent_session.py: multi-turn streamed chat — claude -p --session-id / --resume with --output-format stream-json, translated to a small SSE vocabulary (session/text/tool/result/error/done). - auth.py: edge Authentik header OR bearer; fail-closed. - server.py: FastAPI (session/chat-SSE/pve-verb routes) + serves the Svelte UI. - Svelte SPA (frontend/, built into app/breakglass/static/ and committed — no in-cluster build, per ADR-0002): streamed chat + danger-styled manual VM controls with confirm-on-mutate. - agents/breakglass.md: narrow tools (Bash/Read/Grep/Glob, no web), taught the ssh devvm / ssh pve aliases and cycle-vs-reset. - docker-entrypoint-breakglass.sh: ssh-agent bootstrap from the mounted key + ssh aliases, then uvicorn app.breakglass.server. The breakglass Deployment overrides the image CMD with this; the existing service is untouched. 26 new tests (verb whitelist incl. injection attempts, stream-json→SSE translation, auth gating, route behaviour); full suite 58 green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:36:05 +00:00
Viktor Barzin	66104a32ab	parallel execution: replace single-flight lock with bounded semaphore + per-job workspace All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details Multiple agent calls now run concurrently, each in its own isolated git checkout (local clone of the warm base, hardlinked objects, git-crypt re-unlocked), so concurrent jobs never share a working tree. - execution_lock (asyncio.Lock) -> execution_semaphore (default MAX_CONCURRENCY=10); excess calls queue FIFO instead of 409/503. MAX_QUEUE_DEPTH safety valve. - /execute never returns 409; jobs go queued -> running. Timeout covers execution only, not queue wait. - /v1/chat/completions queues for a slot instead of 503-busy. - /health: busy = at-capacity, plus active/queued/capacity fields. - per-job workspace prepare/cleanup under a short git lock; the agent run holds none. - in-memory job registry evicted past JOB_TTL_SECONDS. Design: docs/2026-06-02-parallel-execution-design.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 20:57:41 +00:00
Viktor Barzin	1132777705	openai-compat: use bare model aliases (haiku/sonnet/opus) to auto-roll forward All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details	2026-06-01 19:55:19 +00:00
Viktor Barzin	7baa66d994	openai-compat: pass --model from request through to claude -p Replaces the MODEL_TO_AGENT dict (which only mapped model -> agent and ignored the model itself) with a SUPPORTED_MODELS allowlist + per-request --model CLI flag. Callers can now pick Haiku/Sonnet/Opus per request to control cost; unknown model IDs 400 with the supported list; missing model defaults to claude-sonnet-4-6 (mid-tier). The --model CLI flag overrides whatever model: is in the agent's frontmatter, so recruiter-triage's `model: sonnet` no longer pins every request to Sonnet. Verified with claude CLI 2.1.153 that the bare-form IDs (claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7) are accepted without date suffixes — confirmed via modelUsage keys in the JSON output. Six new tests cover: default routing, haiku/sonnet/opus pass-through, unsupported-model 400 shape, and the response.model echo.	2026-06-01 19:33:54 +00:00
Viktor Barzin	07dcfca333	openai-compat: add /v1/chat/completions endpoint All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details OpenAI-compatible chat completions endpoint so existing OpenAI-API clients (fire-planner's examples/llm_extract.py and others) can target this service without rewriting their client. Behaviour: - POST /v1/chat/completions accepts the OpenAI chat-completions request shape (model, messages, max_tokens?, temperature?, stream?). - Reuses the existing Bearer auth from /execute. - Synthesises a single prompt body from system+user messages ("System instructions:\n... --- Request:\n...") so the agent treats them as the user's request rather than seeing raw JSON. - Internally shares the execution path with /execute by extracting _invoke_claude_subprocess(). Holds execution_lock for the duration; returns 503 (not 409) when busy, since OpenAI callers have no job-id model to retry against. - Returns the OpenAI chat-completions envelope with the final assistant text extracted from `claude -p --output-format json` (falls back to raw stdout if parsing fails). - stream=true -> 400 {"error": "streaming not supported"}. - Underlying failure (non-zero exit, timeout, exception) -> 503 {"error": "execution failed", "detail": "<one line>"}. Model -> agent mapping is hardcoded to `recruiter-triage` for all models for v1 (broadest tool surface among current agents). Budget is hardcoded to $2.00/call; timeout 900s. Revisit when a true general-purpose agent lands. Tests: 9 new tests covering happy path, streaming rejection, missing auth, wrong token, job failure, empty messages, JSON-parse fallback, prompt synthesis, and busy-503. All 20 tests (11 existing + 9 new) pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 06:24:20 +00:00
Viktor Barzin	6fa60fdd1a	Initial extraction from monorepo	2026-05-07 17:07:12 +00:00

12 commits