The portal-assistant voice gateway needs a Claude that is conversational, free
(on the cluster subscription, no metered API), and safe to sit behind a public
edge. Add POST /v1/conversational: it drives a new no-tools `conversational`
agent with per-conversation --resume so a voice turn keeps context, and is lean
on purpose — no workspace clone, no tools, and crucially NO
--dangerously-skip-permissions (so even a leaked agent can't execute anything).
This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra
repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002).
The conversational agent replies in the speaker's language (Bulgarian/English),
short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy
path, multi-turn resume across calls, auth, and failure → 503. Full suite green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:
- dispatch sends BARE commands keyed by `type` (not a `command` string), with
client-minted threadId/commandId/messageId + createdAt; the server replies
{sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
in), so dispatch ensures the repo's project (snapshot -> project.create iff
absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
running/in_progress/pending->running, errored->error), not a non-existent
top-level `status` field.
Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".
Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Viktor: "make screens foldable so they can be viewed on small screens." The VM-control sheet packed Inspect + 4 power buttons + a long output dump into one scroll on a phone. Made the dense sections collapsible with native <details>/<summary> (zero-JS, accessible):
- Inspect and Power are foldable groups, open by default (nothing important hidden), tap the caret header to collapse the one you are not using.
- Command output (e.g. a long forensics dump) is a foldable block; its <pre> is capped at 46vh with internal scroll so it never runs off the page.
Verified via Playwright at 390x844: tapping Power collapses it to its header; the forensics output folds and scrolls within a bounded box. Works on desktop too (side panel stays expanded).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mobile is the primary client for fixing the devvm, but the first cut was
desktop-first and the chat input was unreachable on a phone:
- Root cause: the shell used height:100%/100vh, so on mobile browsers the
composer at the bottom sat behind the address/tool bar — you saw the VM
buttons but no place to type. Switched #app to 100dvh (dynamic viewport
height) with a 100vh fallback; body no longer scrolls (chat scrolls
internally), killing iOS rubber-banding.
- Layout is now mobile-first single-column: the chat fills the screen with
the composer pinned at the bottom and always visible. The VM power controls
moved into a slide-up bottom sheet behind a compact "⚡ VM" header button
(backdrop + close + grab handle). At ≥900px the sheet becomes a static side
column again and the toggle is hidden — desktop unchanged.
- Touch targets ≥40px; composer textarea bumped to 16px so iOS Safari doesn't
auto-zoom on focus (which itself shoved the composer out of view).
Verified at 390×844 (iPhone) and 1280×800 via Playwright: input box renders at
y=723–821 (inside the 844 viewport), sheet slides in on tap, desktop keeps the
2-column side panel.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor wanted a web UI on the claude service to act as his breakglass when
the devvm is down: open it, have Claude SSH in to diagnose/repair, and
power-cycle the VM via the Proxmox host if needed. This is the app half
(the infra stack + host bootstrap live in the infra repo).
New, ISOLATED ASGI app under app/breakglass/ (never imports app.main, so the
untrusted-input agents — recruiter-triage, nextcloud-todos — can't share a
process with the root-on-devvm / PVE-reset SSH key):
- pve.py: the LLM-independent power-verb path (status|forensics|reset|stop|
start|cycle on VM 102), whitelist-validated client-side, executed over the
forced-command SSH key (list argv, no shell).
- agent_session.py: multi-turn streamed chat — claude -p --session-id /
--resume with --output-format stream-json, translated to a small SSE
vocabulary (session/text/tool/result/error/done).
- auth.py: edge Authentik header OR bearer; fail-closed.
- server.py: FastAPI (session/chat-SSE/pve-verb routes) + serves the Svelte UI.
- Svelte SPA (frontend/, built into app/breakglass/static/ and committed — no
in-cluster build, per ADR-0002): streamed chat + danger-styled manual VM
controls with confirm-on-mutate.
- agents/breakglass.md: narrow tools (Bash/Read/Grep/Glob, no web), taught the
ssh devvm / ssh pve aliases and cycle-vs-reset.
- docker-entrypoint-breakglass.sh: ssh-agent bootstrap from the mounted key +
ssh aliases, then uvicorn app.breakglass.server. The breakglass Deployment
overrides the image CMD with this; the existing service is untouched.
26 new tests (verb whitelist incl. injection attempts, stream-json→SSE
translation, auth gating, route behaviour); full suite 58 green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Multiple agent calls now run concurrently, each in its own isolated git
checkout (local clone of the warm base, hardlinked objects, git-crypt
re-unlocked), so concurrent jobs never share a working tree.
- execution_lock (asyncio.Lock) -> execution_semaphore (default MAX_CONCURRENCY=10);
excess calls queue FIFO instead of 409/503. MAX_QUEUE_DEPTH safety valve.
- /execute never returns 409; jobs go queued -> running. Timeout covers
execution only, not queue wait.
- /v1/chat/completions queues for a slot instead of 503-busy.
- /health: busy = at-capacity, plus active/queued/capacity fields.
- per-job workspace prepare/cleanup under a short git lock; the agent run holds none.
- in-memory job registry evicted past JOB_TTL_SECONDS.
Design: docs/2026-06-02-parallel-execution-design.md
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the MODEL_TO_AGENT dict (which only mapped model -> agent and
ignored the model itself) with a SUPPORTED_MODELS allowlist + per-request
--model CLI flag. Callers can now pick Haiku/Sonnet/Opus per request to
control cost; unknown model IDs 400 with the supported list; missing
model defaults to claude-sonnet-4-6 (mid-tier).
The --model CLI flag overrides whatever model: is in the agent's
frontmatter, so recruiter-triage's `model: sonnet` no longer pins
every request to Sonnet.
Verified with claude CLI 2.1.153 that the bare-form IDs
(claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7) are accepted
without date suffixes — confirmed via modelUsage keys in the JSON
output.
Six new tests cover: default routing, haiku/sonnet/opus pass-through,
unsupported-model 400 shape, and the response.model echo.
OpenAI-compatible chat completions endpoint so existing OpenAI-API
clients (fire-planner's examples/llm_extract.py and others) can target
this service without rewriting their client.
Behaviour:
- POST /v1/chat/completions accepts the OpenAI chat-completions request
shape (model, messages, max_tokens?, temperature?, stream?).
- Reuses the existing Bearer auth from /execute.
- Synthesises a single prompt body from system+user messages
("System instructions:\n... --- Request:\n...") so the agent treats
them as the user's request rather than seeing raw JSON.
- Internally shares the execution path with /execute by extracting
_invoke_claude_subprocess(). Holds execution_lock for the duration;
returns 503 (not 409) when busy, since OpenAI callers have no
job-id model to retry against.
- Returns the OpenAI chat-completions envelope with the final
assistant text extracted from `claude -p --output-format json`
(falls back to raw stdout if parsing fails).
- stream=true -> 400 {"error": "streaming not supported"}.
- Underlying failure (non-zero exit, timeout, exception) -> 503
{"error": "execution failed", "detail": "<one line>"}.
Model -> agent mapping is hardcoded to `recruiter-triage` for all
models for v1 (broadest tool surface among current agents). Budget
is hardcoded to $2.00/call; timeout 900s. Revisit when a true
general-purpose agent lands.
Tests: 9 new tests covering happy path, streaming rejection, missing
auth, wrong token, job failure, empty messages, JSON-parse fallback,
prompt synthesis, and busy-503. All 20 tests (11 existing + 9 new)
pass; ruff clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>