Commit graph

7 commits

Author SHA1 Message Date
Viktor Barzin
5b5daa4bea breakglass UI v2: attachable sessions (tmux model) + mobile-first redesign
Full audit-driven rework. Keeps the proven SSE-translation + verb logic; everything else upgraded for phone-primary use.

Backend — server owns the session, clients attach (Viktor's tmux idea):
- session.py: SessionManager + Session with an event log, subscriber pub/sub, and turns that run DETACHED (keep going if the client disconnects).
- GET /api/session/{id}/stream = attach (SSE): replays the transcript then tails live; per-event id: lines so an EventSource auto-reconnect resumes from Last-Event-ID (free re-attach). POST /{id}/prompt starts a detached turn; POST /{id}/cancel = Stop. Replaces the old one-shot /api/chat.
- agent_session trimmed to the argv + translate_event helpers; 21 new/updated tests (replay, Last-Event-ID resume, broadcast, detached turn, resume, cancel, routes) — 53 green.

Frontend — mobile-first via the frontend-design skill (emergency-console aesthetic):
- EventSource attach (native auto-reconnect, zero client reconnect logic); transcript.js folds events->messages with id-dedupe so replays never double-render (30 unit assertions).
- Installable PWA: manifest + icons (wrench/break-glass mark) + apple-mobile-web-app meta + theme-color; viewport-fit=cover + safe-area; 100dvh; 16px composer (no iOS zoom).
- One-tap diagnosis presets (Triage / Memory-OOM / Disk / Services / QEMU-wedged) mapped to the devvm's real failure modes; Stop button while a turn runs.
- Foldable VM-control sheet, cycle the dominant recovery action w/ confirm, output capped 46vh.
- a11y: fixed --ink-faint contrast 3.6:1 -> 6.1:1 (WCAG AA); >=44px tap targets. Deleted the obsolete fetch-reader sse.js (EventSource replaces it).

Verified: 53 backend tests + 30 transcript assertions; Playwright @390x844 (input on-screen y=721-821, presets/sheet/fold/cap); local integration smoke vs the real backend (attach->caught-up, 404, verbs, PWA served).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 19:19:03 +00:00
Viktor Barzin
4f361d91eb breakglass: in-cluster emergency-recovery UI for the devvm
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Viktor wanted a web UI on the claude service to act as his breakglass when
the devvm is down: open it, have Claude SSH in to diagnose/repair, and
power-cycle the VM via the Proxmox host if needed. This is the app half
(the infra stack + host bootstrap live in the infra repo).

New, ISOLATED ASGI app under app/breakglass/ (never imports app.main, so the
untrusted-input agents — recruiter-triage, nextcloud-todos — can't share a
process with the root-on-devvm / PVE-reset SSH key):
- pve.py: the LLM-independent power-verb path (status|forensics|reset|stop|
  start|cycle on VM 102), whitelist-validated client-side, executed over the
  forced-command SSH key (list argv, no shell).
- agent_session.py: multi-turn streamed chat — claude -p --session-id /
  --resume with --output-format stream-json, translated to a small SSE
  vocabulary (session/text/tool/result/error/done).
- auth.py: edge Authentik header OR bearer; fail-closed.
- server.py: FastAPI (session/chat-SSE/pve-verb routes) + serves the Svelte UI.
- Svelte SPA (frontend/, built into app/breakglass/static/ and committed — no
  in-cluster build, per ADR-0002): streamed chat + danger-styled manual VM
  controls with confirm-on-mutate.
- agents/breakglass.md: narrow tools (Bash/Read/Grep/Glob, no web), taught the
  ssh devvm / ssh pve aliases and cycle-vs-reset.
- docker-entrypoint-breakglass.sh: ssh-agent bootstrap from the mounted key +
  ssh aliases, then uvicorn app.breakglass.server. The breakglass Deployment
  overrides the image CMD with this; the existing service is untouched.

26 new tests (verb whitelist incl. injection attempts, stream-json→SSE
translation, auth gating, route behaviour); full suite 58 green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 21:36:05 +00:00
Viktor Barzin
66104a32ab parallel execution: replace single-flight lock with bounded semaphore + per-job workspace
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
Multiple agent calls now run concurrently, each in its own isolated git
checkout (local clone of the warm base, hardlinked objects, git-crypt
re-unlocked), so concurrent jobs never share a working tree.

- execution_lock (asyncio.Lock) -> execution_semaphore (default MAX_CONCURRENCY=10);
  excess calls queue FIFO instead of 409/503. MAX_QUEUE_DEPTH safety valve.
- /execute never returns 409; jobs go queued -> running. Timeout covers
  execution only, not queue wait.
- /v1/chat/completions queues for a slot instead of 503-busy.
- /health: busy = at-capacity, plus active/queued/capacity fields.
- per-job workspace prepare/cleanup under a short git lock; the agent run holds none.
- in-memory job registry evicted past JOB_TTL_SECONDS.

Design: docs/2026-06-02-parallel-execution-design.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:57:41 +00:00
Viktor Barzin
1132777705 openai-compat: use bare model aliases (haiku/sonnet/opus) to auto-roll forward
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
2026-06-01 19:55:19 +00:00
Viktor Barzin
7baa66d994 openai-compat: pass --model from request through to claude -p
Replaces the MODEL_TO_AGENT dict (which only mapped model -> agent and
ignored the model itself) with a SUPPORTED_MODELS allowlist + per-request
--model CLI flag. Callers can now pick Haiku/Sonnet/Opus per request to
control cost; unknown model IDs 400 with the supported list; missing
model defaults to claude-sonnet-4-6 (mid-tier).

The --model CLI flag overrides whatever model: is in the agent's
frontmatter, so recruiter-triage's `model: sonnet` no longer pins
every request to Sonnet.

Verified with claude CLI 2.1.153 that the bare-form IDs
(claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7) are accepted
without date suffixes — confirmed via modelUsage keys in the JSON
output.

Six new tests cover: default routing, haiku/sonnet/opus pass-through,
unsupported-model 400 shape, and the response.model echo.
2026-06-01 19:33:54 +00:00
Viktor Barzin
07dcfca333 openai-compat: add /v1/chat/completions endpoint
All checks were successful
ci/woodpecker/push/woodpecker Pipeline was successful
OpenAI-compatible chat completions endpoint so existing OpenAI-API
clients (fire-planner's examples/llm_extract.py and others) can target
this service without rewriting their client.

Behaviour:
- POST /v1/chat/completions accepts the OpenAI chat-completions request
  shape (model, messages, max_tokens?, temperature?, stream?).
- Reuses the existing Bearer auth from /execute.
- Synthesises a single prompt body from system+user messages
  ("System instructions:\n... --- Request:\n...") so the agent treats
  them as the user's request rather than seeing raw JSON.
- Internally shares the execution path with /execute by extracting
  _invoke_claude_subprocess(). Holds execution_lock for the duration;
  returns 503 (not 409) when busy, since OpenAI callers have no
  job-id model to retry against.
- Returns the OpenAI chat-completions envelope with the final
  assistant text extracted from `claude -p --output-format json`
  (falls back to raw stdout if parsing fails).
- stream=true -> 400 {"error": "streaming not supported"}.
- Underlying failure (non-zero exit, timeout, exception) -> 503
  {"error": "execution failed", "detail": "<one line>"}.

Model -> agent mapping is hardcoded to `recruiter-triage` for all
models for v1 (broadest tool surface among current agents). Budget
is hardcoded to $2.00/call; timeout 900s. Revisit when a true
general-purpose agent lands.

Tests: 9 new tests covering happy path, streaming rejection, missing
auth, wrong token, job failure, empty messages, JSON-parse fallback,
prompt synthesis, and busy-503. All 20 tests (11 existing + 9 new)
pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 06:24:20 +00:00
Viktor Barzin
6fa60fdd1a Initial extraction from monorepo 2026-05-07 17:07:12 +00:00