claude-agent-service

Author	SHA1	Message	Date
Viktor Barzin	33ff0868c3	conversational: add no-tools multi-turn Brain endpoint for portal-assistant The portal-assistant voice gateway needs a Claude that is conversational, free (on the cluster subscription, no metered API), and safe to sit behind a public edge. Add POST /v1/conversational: it drives a new no-tools `conversational` agent with per-conversation --resume so a voice turn keeps context, and is lean on purpose — no workspace clone, no tools, and crucially NO --dangerously-skip-permissions (so even a leaked agent can't execute anything). This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002). The conversational agent replies in the speaker's language (Bulgarian/English), short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy path, multi-turn resume across calls, auth, and failure → 503. Full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 18:38:44 +00:00
Viktor Barzin	e34640cc47	afk: wire the T3 adapter to the REAL orchestration contract + fix priority Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details The T3 dispatch adapter was written against a guessed wire shape that the test fake accepted but the live t3-afk server 400s — so the previously-green suite did NOT mean the loop was actually wired to T3. Reverse-engineered the real contract from the v0.0.27 binary, verified it live against t3-afk (including multi-turn), and rewrote the adapter to match: - dispatch sends BARE commands keyed by `type` (not a `command` string), with client-minted threadId/commandId/messageId + createdAt; the server replies {sequence}, so dispatch returns the id it generated (never one parsed back). - a thread lives in a project (workspaceRoot = the repo checkout the agent runs in), so dispatch ensures the repo's project (snapshot -> project.create iff absent) before thread.create + thread.turn.start. - add send_turn() for follow-up turns on an existing thread — multi-turn context retention is verified live (turn 2 recalled turn 1). - watcher reads thread liveness from latestTurn.state (completed->idle, running/in_progress/pending->running, errored->error), not a non-existent top-level `status` field. Guard against recurrence: the test fake now REJECTS any command lacking a `type` discriminator (the original bug fails loudly), plus an opt-in live smoke test (tests/test_afk_t3_live.py) so "green" can mean "wired to T3". Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching tracker conventions and Issue.priority's own docstring — it had deliberately diverged to higher-first. Loop still ships DISABLED (kill switch on, empty allowlist). 416 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 22:27:00 +00:00
Viktor Barzin	2ef0db9a96	afk: add the autonomous issue-implementer loop (SHIPS DISABLED) Adds app/afk/ — the "away-from-keyboard" control plane that watches the issue tracker for ready-for-agent issues, dispatches each to a fresh full-access T3 thread (with the issue-implementer preamble prepended, because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed, escalating or fix-forwarding via a small pure state machine. The loop is split into pure cores (no I/O, exhaustively unit-tested) and thin injected adapters (the only edges that ever touch T3, the tracker, CI, or Slack — faked in every test, so nothing here talks to a real server, GitHub/Forgejo, or the cluster): pure: types, dispatch_policy, run_state_machine, phase_checklist, config, issue_implementer_prompt adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher, notifier loops: poller — CronJob tick #1: list_ready -> select_dispatchable -> dispatch + stamp the in-progress lock (label only AFTER a successful dispatch, so a failed dispatch never leaves a phantom lock). Per-repo lock derived from the ready set, since the CronJob is stateless between ticks. watcher — CronJob tick #2: assemble RunState from snapshot + CI -> next_action -> act (close on success; relabel ready-for-human + ring the doorbell on the two escalations; dispatch a corrective turn on fix-forward; refresh the progress checklist). SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an empty allowlist, so a freshly-loaded config dispatches nothing and does zero I/O. The package is not imported by the running service and has no auto-enable path. Arming it is a deliberate, later, manual step requiring BOTH gates (clear the kill switch AND enrol the exact repos) so one fat-fingered env var can't arm every repo. Test-first throughout: 412 tests pass (poller + watcher add integration tests wiring the real pure cores to in-memory fakes). mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 21:15:11 +00:00
Viktor Barzin	5b5daa4bea	breakglass UI v2: attachable sessions (tmux model) + mobile-first redesign Full audit-driven rework. Keeps the proven SSE-translation + verb logic; everything else upgraded for phone-primary use. Backend — server owns the session, clients attach (Viktor's tmux idea): - session.py: SessionManager + Session with an event log, subscriber pub/sub, and turns that run DETACHED (keep going if the client disconnects). - GET /api/session/{id}/stream = attach (SSE): replays the transcript then tails live; per-event id: lines so an EventSource auto-reconnect resumes from Last-Event-ID (free re-attach). POST /{id}/prompt starts a detached turn; POST /{id}/cancel = Stop. Replaces the old one-shot /api/chat. - agent_session trimmed to the argv + translate_event helpers; 21 new/updated tests (replay, Last-Event-ID resume, broadcast, detached turn, resume, cancel, routes) — 53 green. Frontend — mobile-first via the frontend-design skill (emergency-console aesthetic): - EventSource attach (native auto-reconnect, zero client reconnect logic); transcript.js folds events->messages with id-dedupe so replays never double-render (30 unit assertions). - Installable PWA: manifest + icons (wrench/break-glass mark) + apple-mobile-web-app meta + theme-color; viewport-fit=cover + safe-area; 100dvh; 16px composer (no iOS zoom). - One-tap diagnosis presets (Triage / Memory-OOM / Disk / Services / QEMU-wedged) mapped to the devvm's real failure modes; Stop button while a turn runs. - Foldable VM-control sheet, cycle the dominant recovery action w/ confirm, output capped 46vh. - a11y: fixed --ink-faint contrast 3.6:1 -> 6.1:1 (WCAG AA); >=44px tap targets. Deleted the obsolete fetch-reader sse.js (EventSource replaces it). Verified: 53 backend tests + 30 transcript assertions; Playwright @390x844 (input on-screen y=721-821, presets/sheet/fold/cap); local integration smoke vs the real backend (attach->caught-up, 404, verbs, PWA served). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-14 19:19:03 +00:00
Viktor Barzin	0e45445341	breakglass UI: foldable control sections for small screens All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details Viktor: "make screens foldable so they can be viewed on small screens." The VM-control sheet packed Inspect + 4 power buttons + a long output dump into one scroll on a phone. Made the dense sections collapsible with native <details>/<summary> (zero-JS, accessible): - Inspect and Power are foldable groups, open by default (nothing important hidden), tap the caret header to collapse the one you are not using. - Command output (e.g. a long forensics dump) is a foldable block; its <pre> is capped at 46vh with internal scroll so it never runs off the page. Verified via Playwright at 390x844: tapping Power collapses it to its header; the forensics output folds and scrolls within a bounded box. Works on desktop too (side panel stays expanded). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:21:01 +00:00
Viktor Barzin	aa054cac3f	breakglass UI: mobile-first rework (chat input was hidden on phones) All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details Mobile is the primary client for fixing the devvm, but the first cut was desktop-first and the chat input was unreachable on a phone: - Root cause: the shell used height:100%/100vh, so on mobile browsers the composer at the bottom sat behind the address/tool bar — you saw the VM buttons but no place to type. Switched #app to 100dvh (dynamic viewport height) with a 100vh fallback; body no longer scrolls (chat scrolls internally), killing iOS rubber-banding. - Layout is now mobile-first single-column: the chat fills the screen with the composer pinned at the bottom and always visible. The VM power controls moved into a slide-up bottom sheet behind a compact "⚡ VM" header button (backdrop + close + grab handle). At ≥900px the sheet becomes a static side column again and the toggle is hidden — desktop unchanged. - Touch targets ≥40px; composer textarea bumped to 16px so iOS Safari doesn't auto-zoom on focus (which itself shoved the composer out of view). Verified at 390×844 (iPhone) and 1280×800 via Playwright: input box renders at y=723–821 (inside the 844 viewport), sheet slides in on tap, desktop keeps the 2-column side panel. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:11:46 +00:00
Viktor Barzin	4f361d91eb	breakglass: in-cluster emergency-recovery UI for the devvm All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details Viktor wanted a web UI on the claude service to act as his breakglass when the devvm is down: open it, have Claude SSH in to diagnose/repair, and power-cycle the VM via the Proxmox host if needed. This is the app half (the infra stack + host bootstrap live in the infra repo). New, ISOLATED ASGI app under app/breakglass/ (never imports app.main, so the untrusted-input agents — recruiter-triage, nextcloud-todos — can't share a process with the root-on-devvm / PVE-reset SSH key): - pve.py: the LLM-independent power-verb path (status\|forensics\|reset\|stop\| start\|cycle on VM 102), whitelist-validated client-side, executed over the forced-command SSH key (list argv, no shell). - agent_session.py: multi-turn streamed chat — claude -p --session-id / --resume with --output-format stream-json, translated to a small SSE vocabulary (session/text/tool/result/error/done). - auth.py: edge Authentik header OR bearer; fail-closed. - server.py: FastAPI (session/chat-SSE/pve-verb routes) + serves the Svelte UI. - Svelte SPA (frontend/, built into app/breakglass/static/ and committed — no in-cluster build, per ADR-0002): streamed chat + danger-styled manual VM controls with confirm-on-mutate. - agents/breakglass.md: narrow tools (Bash/Read/Grep/Glob, no web), taught the ssh devvm / ssh pve aliases and cycle-vs-reset. - docker-entrypoint-breakglass.sh: ssh-agent bootstrap from the mounted key + ssh aliases, then uvicorn app.breakglass.server. The breakglass Deployment overrides the image CMD with this; the existing service is untouched. 26 new tests (verb whitelist incl. injection attempts, stream-json→SSE translation, auth gating, route behaviour); full suite 58 green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:36:05 +00:00
Viktor Barzin	66104a32ab	parallel execution: replace single-flight lock with bounded semaphore + per-job workspace All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details Multiple agent calls now run concurrently, each in its own isolated git checkout (local clone of the warm base, hardlinked objects, git-crypt re-unlocked), so concurrent jobs never share a working tree. - execution_lock (asyncio.Lock) -> execution_semaphore (default MAX_CONCURRENCY=10); excess calls queue FIFO instead of 409/503. MAX_QUEUE_DEPTH safety valve. - /execute never returns 409; jobs go queued -> running. Timeout covers execution only, not queue wait. - /v1/chat/completions queues for a slot instead of 503-busy. - /health: busy = at-capacity, plus active/queued/capacity fields. - per-job workspace prepare/cleanup under a short git lock; the agent run holds none. - in-memory job registry evicted past JOB_TTL_SECONDS. Design: docs/2026-06-02-parallel-execution-design.md Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 20:57:41 +00:00
Viktor Barzin	add15325bb	openai-compat: tolerate legacy date-suffixed model names during transition All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details	2026-06-01 21:59:50 +00:00
Viktor Barzin	1132777705	openai-compat: use bare model aliases (haiku/sonnet/opus) to auto-roll forward All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details	2026-06-01 19:55:19 +00:00
Viktor Barzin	7baa66d994	openai-compat: pass --model from request through to claude -p Replaces the MODEL_TO_AGENT dict (which only mapped model -> agent and ignored the model itself) with a SUPPORTED_MODELS allowlist + per-request --model CLI flag. Callers can now pick Haiku/Sonnet/Opus per request to control cost; unknown model IDs 400 with the supported list; missing model defaults to claude-sonnet-4-6 (mid-tier). The --model CLI flag overrides whatever model: is in the agent's frontmatter, so recruiter-triage's `model: sonnet` no longer pins every request to Sonnet. Verified with claude CLI 2.1.153 that the bare-form IDs (claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7) are accepted without date suffixes — confirmed via modelUsage keys in the JSON output. Six new tests cover: default routing, haiku/sonnet/opus pass-through, unsupported-model 400 shape, and the response.model echo.	2026-06-01 19:33:54 +00:00
Viktor Barzin	07dcfca333	openai-compat: add /v1/chat/completions endpoint All checks were successful ci/woodpecker/push/woodpecker Pipeline was successful Details OpenAI-compatible chat completions endpoint so existing OpenAI-API clients (fire-planner's examples/llm_extract.py and others) can target this service without rewriting their client. Behaviour: - POST /v1/chat/completions accepts the OpenAI chat-completions request shape (model, messages, max_tokens?, temperature?, stream?). - Reuses the existing Bearer auth from /execute. - Synthesises a single prompt body from system+user messages ("System instructions:\n... --- Request:\n...") so the agent treats them as the user's request rather than seeing raw JSON. - Internally shares the execution path with /execute by extracting _invoke_claude_subprocess(). Holds execution_lock for the duration; returns 503 (not 409) when busy, since OpenAI callers have no job-id model to retry against. - Returns the OpenAI chat-completions envelope with the final assistant text extracted from `claude -p --output-format json` (falls back to raw stdout if parsing fails). - stream=true -> 400 {"error": "streaming not supported"}. - Underlying failure (non-zero exit, timeout, exception) -> 503 {"error": "execution failed", "detail": "<one line>"}. Model -> agent mapping is hardcoded to `recruiter-triage` for all models for v1 (broadest tool surface among current agents). Budget is hardcoded to $2.00/call; timeout 900s. Revisit when a true general-purpose agent lands. Tests: 9 new tests covering happy path, streaming rejection, missing auth, wrong token, job failure, empty messages, JSON-parse fallback, prompt synthesis, and busy-503. All 20 tests (11 existing + 9 new) pass; ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-29 06:24:20 +00:00
Viktor Barzin	6fa60fdd1a	Initial extraction from monorepo	2026-05-07 17:07:12 +00:00

13 commits