conversational: trim per-turn context to cut brain TTFT ~1.3s
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled

The no-tools conversational agent was dragging the full project context (this
repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic
system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s
time-to-first-token (measured against the live pod, 2026-06-21).

Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both
the gateway (json) and realtime (stream-json) conversational argvs: context
drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the
reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both
run the same turn). The /execute agent path is untouched.

Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a
warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT
was unchanged) — the cost was the context size, not the spawn.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-21 18:00:21 +00:00
parent a29bffdda3
commit eccf0dd407
2 changed files with 22 additions and 0 deletions

View file

@ -30,6 +30,10 @@ def test_conversational_argv_new_session():
assert "--dangerously-skip-permissions" not in argv
assert argv[argv.index("--model") + 1] == "sonnet"
assert argv[argv.index("--output-format") + 1] == "json"
# latency: trims project CLAUDE.md/MCP + dynamic system-prompt sections off
# the no-tools voice turn (~45k -> ~23k input tokens, ~1.3s faster TTFT)
assert argv[argv.index("--setting-sources") + 1] == "user"
assert "--exclude-dynamic-system-prompt-sections" in argv
assert argv[-1] == "Hi there"
@ -189,6 +193,9 @@ def test_stream_argv_uses_stream_json_and_is_stateless():
assert "--include-partial-messages" in argv
assert "--verbose" in argv
assert "--model" in argv and "sonnet" in argv
# latency: same lean-context trim as the gateway path
assert argv[argv.index("--setting-sources") + 1] == "user"
assert "--exclude-dynamic-system-prompt-sections" in argv
assert argv[-1] == "hello"
# stateless + no tools
assert "--resume" not in argv and "--session-id" not in argv