The no-tools conversational agent was dragging the full project context (this repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s time-to-first-token (measured against the live pod, 2026-06-21). Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both the gateway (json) and realtime (stream-json) conversational argvs: context drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both run the same turn). The /execute agent path is untouched. Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT was unchanged) — the cost was the context size, not the spawn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| conftest.py | ||
| test_afk_ci_watcher.py | ||
| test_afk_dispatch_policy.py | ||
| test_afk_notifier.py | ||
| test_afk_phase_checklist.py | ||
| test_afk_poller.py | ||
| test_afk_run_state_machine.py | ||
| test_afk_t3_client.py | ||
| test_afk_t3_live.py | ||
| test_afk_tracker.py | ||
| test_afk_watcher.py | ||
| test_breakglass.py | ||
| test_concurrency.py | ||
| test_conversational.py | ||
| test_main.py | ||
| test_openai_compat.py | ||