claude-agent-service

viktor/claude-agent-service

Fork 0

Commit graph

Author	SHA1	Message	Date
Viktor Barzin	eccf0dd407	conversational: trim per-turn context to cut brain TTFT ~1.3s Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details The no-tools conversational agent was dragging the full project context (this repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s time-to-first-token (measured against the live pod, 2026-06-21). Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both the gateway (json) and realtime (stream-json) conversational argvs: context drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both run the same turn). The /execute agent path is untouched. Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT was unchanged) — the cost was the context size, not the spawn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 18:00:21 +00:00
Viktor Barzin	a29bffdda3	chat-completions: stream conversational turns (SSE token relay) for realtime voice Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details Adds stream=true support to POST /v1/chat/completions (it previously 400'd). When streaming, it runs the no-tools `conversational` agent via `claude -p --output-format stream-json --include-partial-messages --verbose` and relays each content_block_delta as an OpenAI chat.completion.chunk SSE event, ending with finish_reason=stop + [DONE]. Free CLI/subscription auth, no tools, no API key. Stateless by design: the full message history is flattened into the prompt (prior assistant turns kept), so an OpenAI-style client that re-sends history each turn — e.g. Pipecat's OpenAILLMService — can stream from us directly. The non-streaming path (recruiter-triage workspace agent) is unchanged. This is phase 1 of the Pipecat realtime full-duplex voice-agent rebuild for portal-assistant (continuous audio, VAD endpointing, barge-in, ~seconds to first words). New pure helpers (stream_argv/delta_text/openai_chunk/ synthesise_chat_prompt) are unit-tested; the SSE endpoint has a mocked-subprocess integration test. 429 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 22:22:38 +00:00
Viktor Barzin	33ff0868c3	conversational: add no-tools multi-turn Brain endpoint for portal-assistant The portal-assistant voice gateway needs a Claude that is conversational, free (on the cluster subscription, no metered API), and safe to sit behind a public edge. Add POST /v1/conversational: it drives a new no-tools `conversational` agent with per-conversation --resume so a voice turn keeps context, and is lean on purpose — no workspace clone, no tools, and crucially NO --dangerously-skip-permissions (so even a leaked agent can't execute anything). This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002). The conversational agent replies in the speaker's language (Bulgarian/English), short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy path, multi-turn resume across calls, auth, and failure → 503. Full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 18:38:44 +00:00

Author

SHA1

Message

Date

Viktor Barzin

eccf0dd407

conversational: trim per-turn context to cut brain TTFT ~1.3s

Build and Push / lint-and-test (push) Has been cancelled

Details

Build and Push / build (push) Has been cancelled

Details

Build and Push / deploy (push) Has been cancelled

Details

Build and Push / notify-failure (push) Has been cancelled

Details

The no-tools conversational agent was dragging the full project context (this
repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic
system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s
time-to-first-token (measured against the live pod, 2026-06-21).

Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both
the gateway (json) and realtime (stream-json) conversational argvs: context
drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the
reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both
run the same turn). The /execute agent path is untouched.

Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a
warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT
was unchanged) — the cost was the context size, not the spawn.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-21 18:00:21 +00:00

Viktor Barzin

a29bffdda3

chat-completions: stream conversational turns (SSE token relay) for realtime voice

Build and Push / lint-and-test (push) Has been cancelled

Details

Build and Push / build (push) Has been cancelled

Details

Build and Push / deploy (push) Has been cancelled

Details

Build and Push / notify-failure (push) Has been cancelled

Details

Adds stream=true support to POST /v1/chat/completions (it previously 400'd).
When streaming, it runs the no-tools `conversational` agent via
`claude -p --output-format stream-json --include-partial-messages --verbose`
and relays each content_block_delta as an OpenAI chat.completion.chunk SSE
event, ending with finish_reason=stop + [DONE]. Free CLI/subscription auth, no
tools, no API key.

Stateless by design: the full message history is flattened into the prompt
(prior assistant turns kept), so an OpenAI-style client that re-sends history
each turn — e.g. Pipecat's OpenAILLMService — can stream from us directly. The
non-streaming path (recruiter-triage workspace agent) is unchanged.

This is phase 1 of the Pipecat realtime full-duplex voice-agent rebuild for
portal-assistant (continuous audio, VAD endpointing, barge-in, ~seconds to
first words). New pure helpers (stream_argv/delta_text/openai_chunk/
synthesise_chat_prompt) are unit-tested; the SSE endpoint has a mocked-subprocess
integration test. 429 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-17 22:22:38 +00:00

Viktor Barzin

33ff0868c3

conversational: add no-tools multi-turn Brain endpoint for portal-assistant

The portal-assistant voice gateway needs a Claude that is conversational, free
(on the cluster subscription, no metered API), and safe to sit behind a public
edge. Add POST /v1/conversational: it drives a new no-tools `conversational`
agent with per-conversation --resume so a voice turn keeps context, and is lean
on purpose — no workspace clone, no tools, and crucially NO
--dangerously-skip-permissions (so even a leaked agent can't execute anything).
This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra
repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002).

The conversational agent replies in the speaker's language (Bulgarian/English),
short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy
path, multi-turn resume across calls, auth, and failure → 503. Full suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-17 18:38:44 +00:00

3 commits