breakglass UI v2: attachable sessions (tmux model) + mobile-first redesign

Full audit-driven rework. Keeps the proven SSE-translation + verb logic; everything else upgraded for phone-primary use.

Backend — server owns the session, clients attach (Viktor's tmux idea):
- session.py: SessionManager + Session with an event log, subscriber pub/sub, and turns that run DETACHED (keep going if the client disconnects).
- GET /api/session/{id}/stream = attach (SSE): replays the transcript then tails live; per-event id: lines so an EventSource auto-reconnect resumes from Last-Event-ID (free re-attach). POST /{id}/prompt starts a detached turn; POST /{id}/cancel = Stop. Replaces the old one-shot /api/chat.
- agent_session trimmed to the argv + translate_event helpers; 21 new/updated tests (replay, Last-Event-ID resume, broadcast, detached turn, resume, cancel, routes) — 53 green.

Frontend — mobile-first via the frontend-design skill (emergency-console aesthetic):
- EventSource attach (native auto-reconnect, zero client reconnect logic); transcript.js folds events->messages with id-dedupe so replays never double-render (30 unit assertions).
- Installable PWA: manifest + icons (wrench/break-glass mark) + apple-mobile-web-app meta + theme-color; viewport-fit=cover + safe-area; 100dvh; 16px composer (no iOS zoom).
- One-tap diagnosis presets (Triage / Memory-OOM / Disk / Services / QEMU-wedged) mapped to the devvm's real failure modes; Stop button while a turn runs.
- Foldable VM-control sheet, cycle the dominant recovery action w/ confirm, output capped 46vh.
- a11y: fixed --ink-faint contrast 3.6:1 -> 6.1:1 (WCAG AA); >=44px tap targets. Deleted the obsolete fetch-reader sse.js (EventSource replaces it).

Verified: 53 backend tests + 30 transcript assertions; Playwright @390x844 (input on-screen y=721-821, presets/sheet/fold/cap); local integration smoke vs the real backend (attach->caught-up, 404, verbs, PWA served).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-14 19:19:03 +00:00
parent 9d8afdd884
commit 5b5daa4bea
30 changed files with 1961 additions and 968 deletions

View file

@ -1,26 +1,13 @@
"""Drive the breakglass Claude agent and stream its work to the browser.
"""Claude CLI argv + stream-json → UI-event translation for the breakglass agent.
Each chat turn runs ``claude -p --output-format stream-json`` in the session's
persistent workspace; the first turn opens the session with ``--session-id`` and
later turns ``--resume`` it, so the conversation has memory across turns. The
CLI's JSON events are translated to a small, stable SSE vocabulary the UI
renders (``session`` / ``text`` / ``tool`` / ``result`` / ``error``) we do not
leak the raw event firehose to the client.
Subprocesses use ``asyncio.create_subprocess_exec`` (list argv, no shell): the
prompt and ids are argv elements, never interpreted by a shell.
The session lifecycle (running turns, attaching clients) lives in ``session.py``;
this module is just the two helpers it builds on:
* ``_turn_argv`` the no-shell list argv for one ``claude -p`` turn.
* ``translate_event`` map a raw stream-json event to the small UI vocabulary
(session / text / tool / result), dropping the hook/thinking-token noise.
"""
import asyncio
import json
import os
from subprocess import PIPE
from typing import AsyncIterator
from . import config
# Sessions we've already opened (so the next turn resumes instead of re-creating).
_started: set[str] = set()
def _turn_argv(session_id: str, prompt: str, resume: bool, model: str) -> list[str]:
argv = [
@ -66,7 +53,7 @@ def translate_event(obj: dict) -> dict | None:
})
if not events:
return None
# The server flattens a "batch" into individual SSE frames.
# The session log flattens a "batch" into individual events.
return events[0] if len(events) == 1 else {"kind": "batch", "events": events}
if etype == "result":
@ -78,68 +65,3 @@ def translate_event(obj: dict) -> dict | None:
}
return None
async def run_turn(
session_id: str, prompt: str, model: str | None = None
) -> AsyncIterator[dict]:
"""Run one chat turn, yielding translated UI events as they arrive."""
resume = session_id in _started
model = model or config.DEFAULT_MODEL
workspace = os.path.join(config.SESSIONS_DIR, session_id)
os.makedirs(workspace, exist_ok=True)
argv = _turn_argv(session_id, prompt, resume, model)
proc = await asyncio.create_subprocess_exec(
*argv, cwd=workspace, stdout=PIPE, stderr=PIPE,
)
_started.add(session_id)
assert proc.stdout is not None and proc.stderr is not None
try:
async def _pump() -> AsyncIterator[dict]:
async for raw in proc.stdout:
line = raw.decode(errors="replace").strip()
if not line:
continue
try:
obj = json.loads(line)
except json.JSONDecodeError:
continue
ev = translate_event(obj)
if ev is None:
continue
if ev.get("kind") == "batch":
for sub in ev["events"]:
yield sub
else:
yield ev
async for ev in _with_timeout(_pump(), config.TURN_TIMEOUT_SECONDS):
yield ev
except asyncio.TimeoutError:
proc.kill()
await proc.wait()
yield {"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"}
return
await proc.wait()
if proc.returncode not in (0, None):
err = (await proc.stderr.read()).decode(errors="replace")
yield {"kind": "error", "error": err.strip()[:500] or f"exit {proc.returncode}"}
async def _with_timeout(agen: AsyncIterator[dict], timeout: float) -> AsyncIterator[dict]:
"""Yield from an async generator but raise TimeoutError if the WHOLE turn
exceeds ``timeout`` seconds (a wedged agent shouldn't stream forever)."""
loop = asyncio.get_event_loop()
deadline = loop.time() + timeout
it = agen.__aiter__()
while True:
remaining = deadline - loop.time()
if remaining <= 0:
raise asyncio.TimeoutError
try:
yield await asyncio.wait_for(it.__anext__(), timeout=remaining)
except StopAsyncIteration:
return

View file

@ -25,6 +25,9 @@ MAX_CONCURRENT_TURNS = int(os.environ.get("BREAKGLASS_MAX_CONCURRENT_TURNS", "2"
TURN_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_TURN_TIMEOUT_SECONDS", "1800"))
# A single PVE power verb must return fast; a wedged host shouldn't hang the UI.
PVE_VERB_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_PVE_VERB_TIMEOUT_SECONDS", "120"))
# How long an idle attach stream waits before emitting an SSE keepalive comment
# (keeps proxies/CDN from closing the long-lived connection).
SSE_KEEPALIVE_SECONDS = int(os.environ.get("BREAKGLASS_SSE_KEEPALIVE_SECONDS", "20"))
# Auth. The app sits behind the ingress `auth = "required"` resilience proxy
# (Authentik SSO, basic-auth fallback when Authentik is down). We additionally

View file

@ -1,38 +1,44 @@
"""Breakglass FastAPI app — the in-cluster emergency recovery UI.
The chat uses the tmux/attach model (see session.py): the server owns the
conversation; clients attach over SSE and the turn keeps running if they
disconnect.
Routes:
GET /health liveness (no auth)
GET / the single-page UI (static)
POST /api/session open a chat session, returns {session_id}
POST /api/chat run one turn, streams SSE events (text/tool/result)
POST /api/pve/{verb} LLM-independent PVE power verb (manual buttons)
GET /api/pve/verbs list allowed verbs + which mutate
GET /health liveness (no auth)
GET / the single-page UI (static)
POST /api/session create a session, returns {session_id}
GET /api/session/{id}/stream ATTACH (SSE): replay + live tail
POST /api/session/{id}/prompt run a turn (detached; survives disconnect)
POST /api/session/{id}/cancel stop the in-flight turn
GET /api/pve/verbs list allowed verbs + which mutate
POST /api/pve/{verb} LLM-independent PVE power verb (buttons)
Everything under /api requires auth (edge Authentik header or bearer token).
"""
import json
import os
import uuid
from fastapi import Depends, FastAPI, HTTPException
from fastapi import Depends, FastAPI, Header, HTTPException
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel, Field
from . import agent_session, config, pve
from . import config, pve
from .auth import require_auth
from .session import SessionManager, attach_stream
app = FastAPI(title="Claude Breakglass")
_STATIC_DIR = os.path.join(os.path.dirname(__file__), "static")
manager = SessionManager()
class SessionResponse(BaseModel):
session_id: str
class ChatRequest(BaseModel):
session_id: str
class PromptRequest(BaseModel):
prompt: str = Field(..., min_length=1)
model: str | None = None
@ -44,30 +50,53 @@ async def health():
@app.post("/api/session", response_model=SessionResponse)
async def open_session(_identity: str = Depends(require_auth)):
# Claude wants a UUID for --session-id.
return SessionResponse(session_id=str(uuid.uuid4()))
return SessionResponse(session_id=manager.create().id)
@app.post("/api/chat")
async def chat(req: ChatRequest, _identity: str = Depends(require_auth)):
"""Stream one chat turn as Server-Sent Events. The browser reads the
response body incrementally (fetch + ReadableStream)."""
async def _sse():
try:
async for ev in agent_session.run_turn(req.session_id, req.prompt, req.model):
yield f"data: {json.dumps(ev)}\n\n"
except Exception as exc: # noqa: BLE001 — surface any failure to the UI
yield f"data: {json.dumps({'kind': 'error', 'error': str(exc)[:500]})}\n\n"
yield f"data: {json.dumps({'kind': 'done'})}\n\n"
@app.get("/api/session/{session_id}/stream")
async def attach(
session_id: str,
_identity: str = Depends(require_auth),
last_event_id: str | None = Header(default=None, alias="Last-Event-ID"),
):
"""Attach to a session (SSE). Replays the conversation so far, then tails
live. On an EventSource auto-reconnect the browser sends Last-Event-ID, so we
replay only what was missed."""
session = manager.get(session_id)
if session is None:
raise HTTPException(status_code=404, detail="session not found")
try:
leid = int(last_event_id) if last_event_id is not None else None
except ValueError:
leid = None
return StreamingResponse(
_sse(),
attach_stream(session, leid),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "Connection": "keep-alive"},
)
@app.post("/api/session/{session_id}/prompt")
async def prompt(session_id: str, req: PromptRequest, _identity: str = Depends(require_auth)):
"""Start a turn. It runs DETACHED (keeps going if the client disconnects);
output is delivered via the attach stream, not this response."""
session = manager.get(session_id)
if session is None:
raise HTTPException(status_code=404, detail="session not found")
if not session.start_turn(req.prompt, req.model):
raise HTTPException(status_code=409, detail="a turn is already running")
return {"status": "started"}
@app.post("/api/session/{session_id}/cancel")
async def cancel(session_id: str, _identity: str = Depends(require_auth)):
session = manager.get(session_id)
if session is None:
raise HTTPException(status_code=404, detail="session not found")
cancelled = await session.cancel()
return {"cancelled": cancelled}
@app.get("/api/pve/verbs")
async def pve_verbs(_identity: str = Depends(require_auth)):
return {

201
app/breakglass/session.py Normal file
View file

@ -0,0 +1,201 @@
"""Attachable server-side sessions — the tmux model for the breakglass chat.
Instead of the client owning conversation state, the SERVER owns it and clients
*attach*. A turn runs as a detached task that keeps going if the client
disconnects (you can background the phone / hit a tunnel blip and the agent
keeps working); its output is appended to a per-session event log and broadcast
to every attached subscriber. A client attaches over SSE, gets the log replayed
(or only the part it missed, via Last-Event-ID), then tails live exactly like
re-attaching to a tmux session. ``EventSource`` reconnects natively, so the
"re-attach" needs zero client logic.
This module owns the lifecycle; ``agent_session`` still provides the claude
argv + the stream-jsonUI-event translation (all subprocesses use the no-shell
list-argv form), and ``config`` the knobs.
"""
import asyncio
import json
import os
import uuid
from subprocess import PIPE
from typing import AsyncIterator
from . import agent_session, config
class Session:
"""One conversation. Owns the replay log + live subscribers + the in-flight
turn. The claude ``session_id`` is reused with ``--resume`` so the agent
keeps its own context across turns."""
def __init__(self, session_id: str):
self.id = session_id
# The replay log: every UI event, in order. Index in the list IS the
# SSE event id, so a reconnecting client replays only what it missed.
self.events: list[dict] = []
self._subscribers: set[asyncio.Queue] = set()
self._turn: asyncio.Task | None = None
self._proc: asyncio.subprocess.Process | None = None
self._started = False # has claude opened this session id yet?
# ── event log + fan-out ────────────────────────────────────────────────
def add_event(self, event: dict) -> dict:
"""Append an event to the log and broadcast it to attached clients."""
stored = {**event, "id": len(self.events)}
self.events.append(stored)
for q in list(self._subscribers):
q.put_nowait(stored)
return stored
def subscribe(self) -> asyncio.Queue:
q: asyncio.Queue = asyncio.Queue()
self._subscribers.add(q)
return q
def unsubscribe(self, q: asyncio.Queue) -> None:
self._subscribers.discard(q)
@property
def turn_active(self) -> bool:
return self._turn is not None and not self._turn.done()
# ── running a turn (detached from any client) ──────────────────────────
def start_turn(self, prompt: str, model: str | None = None) -> bool:
"""Kick off a turn as a background task. Returns False if one is already
running (one turn at a time per session)."""
if self.turn_active:
return False
self.add_event({"kind": "user", "text": prompt})
self._turn = asyncio.create_task(self._run_turn(prompt, model))
return True
async def _run_turn(self, prompt: str, model: str | None) -> None:
model = model or config.DEFAULT_MODEL
resume = self._started
argv = agent_session._turn_argv(self.id, prompt, resume, model)
try:
self._proc = await asyncio.create_subprocess_exec(
*argv, cwd=_workspace_for(self.id), stdout=PIPE, stderr=PIPE,
)
except Exception as exc: # noqa: BLE001
self.add_event({"kind": "error", "error": f"could not start agent: {exc}"})
self.add_event({"kind": "turn_end"})
return
self._started = True
assert self._proc.stdout is not None and self._proc.stderr is not None
try:
async def _pump():
async for raw in self._proc.stdout:
line = raw.decode(errors="replace").strip()
if not line:
continue
try:
obj = json.loads(line)
except json.JSONDecodeError:
continue
ev = agent_session.translate_event(obj)
if ev is None:
continue
if ev.get("kind") == "batch":
for sub in ev["events"]:
self.add_event(sub)
else:
self.add_event(ev)
await asyncio.wait_for(_pump(), timeout=config.TURN_TIMEOUT_SECONDS)
await self._proc.wait()
if self._proc.returncode not in (0, None):
err = (await self._proc.stderr.read()).decode(errors="replace")
self.add_event({"kind": "error", "error": err.strip()[:500] or f"exit {self._proc.returncode}"})
except asyncio.TimeoutError:
await self._kill_proc()
self.add_event({"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"})
except asyncio.CancelledError:
await self._kill_proc()
self.add_event({"kind": "cancelled"})
raise
finally:
self._proc = None
self.add_event({"kind": "turn_end"})
async def _kill_proc(self) -> None:
if self._proc and self._proc.returncode is None:
try:
self._proc.kill()
await self._proc.wait()
except ProcessLookupError:
pass
async def cancel(self) -> bool:
"""Stop the in-flight turn. Returns True if a turn was cancelled."""
if not self.turn_active:
return False
await self._kill_proc()
if self._turn:
self._turn.cancel()
try:
await self._turn
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
return True
def _workspace_for(session_id: str) -> str:
path = os.path.join(config.SESSIONS_DIR, session_id)
os.makedirs(path, exist_ok=True)
return path
class SessionManager:
"""Holds all live sessions. The breakglass is single-operator, so callers
typically reuse one persistent session; multiple are still supported."""
def __init__(self):
self.sessions: dict[str, Session] = {}
def create(self) -> Session:
sid = str(uuid.uuid4())
s = Session(sid)
self.sessions[sid] = s
return s
def get(self, session_id: str) -> Session | None:
return self.sessions.get(session_id)
def get_or_create(self, session_id: str | None) -> Session:
if session_id and session_id in self.sessions:
return self.sessions[session_id]
return self.create()
async def attach_stream(session: Session, last_event_id: int | None) -> AsyncIterator[str]:
"""Yield SSE frames for an attached client: first the replay (everything, or
only events after ``last_event_id`` on a reconnect), then live events as they
arrive. Each frame carries an ``id:`` so EventSource resumes precisely."""
q = session.subscribe()
try:
start = 0 if last_event_id is None else last_event_id + 1
backlog = session.events[start:]
for ev in backlog:
yield _sse_frame(ev)
# Tell the client the replay is done and it's now live.
yield "event: caught-up\ndata: {}\n\n"
seen = backlog[-1]["id"] if backlog else (last_event_id if last_event_id is not None else -1)
while True:
try:
ev = await asyncio.wait_for(q.get(), timeout=config.SSE_KEEPALIVE_SECONDS)
except asyncio.TimeoutError:
yield ": keepalive\n\n" # comment frame keeps the connection warm
continue
if ev["id"] <= seen:
continue
seen = ev["id"]
yield _sse_frame(ev)
finally:
session.unsubscribe(q)
def _sse_frame(event: dict) -> str:
return f"id: {event['id']}\ndata: {json.dumps(event)}\n\n"

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

View file

@ -0,0 +1,64 @@
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
<defs>
<!-- layered near-black surface, matching the app theme -->
<radialGradient id="bg" cx="68%" cy="22%" r="92%">
<stop offset="0%" stop-color="#12303a"/>
<stop offset="42%" stop-color="#0b0f14"/>
<stop offset="100%" stop-color="#06080b"/>
</radialGradient>
<linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stop-color="#7df0f3"/>
<stop offset="55%" stop-color="#3dd1d6"/>
<stop offset="100%" stop-color="#1f6f72"/>
</linearGradient>
<filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
<feGaussianBlur stdDeviation="7" result="b"/>
<feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
</filter>
</defs>
<!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
<rect width="512" height="512" rx="112" fill="url(#bg)"/>
<rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
<!-- faint scanline texture -->
<g opacity="0.05" stroke="#ffffff" stroke-width="2">
<line x1="0" y1="148" x2="512" y2="148"/>
<line x1="0" y1="220" x2="512" y2="220"/>
<line x1="0" y1="292" x2="512" y2="292"/>
<line x1="0" y1="364" x2="512" y2="364"/>
</g>
<!-- fracture burst (amber): the "break the glass" radiating cracks -->
<g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
fill="none" opacity="0.92" filter="url(#glow)">
<path d="M256 256 L142 132"/>
<path d="M256 256 L120 250"/>
<path d="M256 256 L150 372"/>
<path d="M256 256 L372 380"/>
<path d="M256 256 L392 246"/>
<path d="M256 256 L360 138"/>
<!-- cross-cracks -->
<path d="M186 196 L150 250"/>
<path d="M210 320 L172 318" opacity="0.7"/>
<path d="M326 318 L356 350" opacity="0.7"/>
</g>
<!-- wrench, struck across the burst (cyan steel) -->
<g filter="url(#glow)">
<path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
d="M344 150
a62 62 0 0 0 -82 76
L150 338
a26 26 0 0 0 0 37
l11 11
a26 26 0 0 0 37 0
l112 -112
a62 62 0 0 0 76 -82
l-41 41
l-40 -11
l-11 -40
z"/>
<!-- handle highlight -->
<path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.5 KiB

View file

@ -2,12 +2,31 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
user-scaling keeps the cockpit layout stable under stress on mobile. -->
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
/>
<meta name="color-scheme" content="dark" />
<meta name="robots" content="noindex, nofollow" />
<!-- PWA / installable. theme-color tints the mobile status bar to the dark
theme; black-translucent lets the app draw under the iOS status bar. -->
<meta name="theme-color" content="#06080b" />
<link rel="manifest" href="./manifest.webmanifest" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="apple-mobile-web-app-title" content="breakglass" />
<link rel="apple-touch-icon" href="./apple-touch-icon.png" />
<link rel="icon" type="image/svg+xml" href="./icon.svg" />
<link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
<title>devvm breakglass</title>
<script type="module" crossorigin src="./assets/index-DjaW81Sq.js"></script>
<link rel="stylesheet" crossorigin href="./assets/index-DWHIP1Zw.css">
<script type="module" crossorigin src="./assets/index-CLbKo1Yx.js"></script>
<link rel="stylesheet" crossorigin href="./assets/index-BoWC1Onq.css">
</head>
<body>
<div id="app"></div>

View file

@ -0,0 +1,31 @@
{
"name": "devvm breakglass",
"short_name": "breakglass",
"description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
"start_url": "./",
"scope": "./",
"display": "standalone",
"orientation": "portrait",
"background_color": "#06080b",
"theme_color": "#06080b",
"icons": [
{
"src": "./icon.svg",
"type": "image/svg+xml",
"sizes": "any",
"purpose": "any maskable"
},
{
"src": "./icon-192.png",
"type": "image/png",
"sizes": "192x192",
"purpose": "any maskable"
},
{
"src": "./icon-512.png",
"type": "image/png",
"sizes": "512x512",
"purpose": "any maskable"
}
]
}