Compare commits

...

9 commits

Author SHA1 Message Date
Viktor Barzin
eccf0dd407 conversational: trim per-turn context to cut brain TTFT ~1.3s
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled
The no-tools conversational agent was dragging the full project context (this
repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic
system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s
time-to-first-token (measured against the live pod, 2026-06-21).

Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both
the gateway (json) and realtime (stream-json) conversational argvs: context
drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the
reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both
run the same turn). The /execute agent path is untouched.

Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a
warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT
was unchanged) — the cost was the context size, not the spawn.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 18:00:21 +00:00
Viktor Barzin
a29bffdda3 chat-completions: stream conversational turns (SSE token relay) for realtime voice
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled
Adds stream=true support to POST /v1/chat/completions (it previously 400'd).
When streaming, it runs the no-tools `conversational` agent via
`claude -p --output-format stream-json --include-partial-messages --verbose`
and relays each content_block_delta as an OpenAI chat.completion.chunk SSE
event, ending with finish_reason=stop + [DONE]. Free CLI/subscription auth, no
tools, no API key.

Stateless by design: the full message history is flattened into the prompt
(prior assistant turns kept), so an OpenAI-style client that re-sends history
each turn — e.g. Pipecat's OpenAILLMService — can stream from us directly. The
non-streaming path (recruiter-triage workspace agent) is unchanged.

This is phase 1 of the Pipecat realtime full-duplex voice-agent rebuild for
portal-assistant (continuous audio, VAD endpointing, barge-in, ~seconds to
first words). New pure helpers (stream_argv/delta_text/openai_chunk/
synthesise_chat_prompt) are unit-tested; the SSE endpoint has a mocked-subprocess
integration test. 429 passing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 22:22:38 +00:00
Viktor Barzin
4e48214c0b Merge portal-assistant-brain: no-tools conversational endpoint
Some checks are pending
Build and Push / lint-and-test (push) Waiting to run
Build and Push / build (push) Blocked by required conditions
Build and Push / deploy (push) Blocked by required conditions
Build and Push / notify-failure (push) Blocked by required conditions
Adds POST /v1/conversational + a no-tools `conversational` agent for the
portal-assistant voice gateway: a lean Claude path (persistent --resume, no
workspace clone, no --dangerously-skip-permissions) on the subscription token.
See portal-assistant ADR-0002. 6 new tests; full suite green (422 passed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 19:51:34 +00:00
Viktor Barzin
33ff0868c3 conversational: add no-tools multi-turn Brain endpoint for portal-assistant
The portal-assistant voice gateway needs a Claude that is conversational, free
(on the cluster subscription, no metered API), and safe to sit behind a public
edge. Add POST /v1/conversational: it drives a new no-tools `conversational`
agent with per-conversation --resume so a voice turn keeps context, and is lean
on purpose — no workspace clone, no tools, and crucially NO
--dangerously-skip-permissions (so even a leaked agent can't execute anything).
This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra
repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002).

The conversational agent replies in the speaker's language (Bulgarian/English),
short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy
path, multi-turn resume across calls, auth, and failure → 503. Full suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-17 18:38:44 +00:00
Viktor Barzin
e34640cc47 afk: wire the T3 adapter to the REAL orchestration contract + fix priority
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled
The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:

- dispatch sends BARE commands keyed by `type` (not a `command` string), with
  client-minted threadId/commandId/messageId + createdAt; the server replies
  {sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
  in), so dispatch ensures the repo's project (snapshot -> project.create iff
  absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
  retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
  running/in_progress/pending->running, errored->error), not a non-existent
  top-level `status` field.

Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".

Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:27:00 +00:00
Viktor Barzin
2ef0db9a96 afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.

The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):

  pure:     types, dispatch_policy, run_state_machine, phase_checklist,
            config, issue_implementer_prompt
  adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
            notifier
  loops:    poller  — CronJob tick #1: list_ready -> select_dispatchable
                      -> dispatch + stamp the in-progress lock (label only
                      AFTER a successful dispatch, so a failed dispatch
                      never leaves a phantom lock). Per-repo lock derived
                      from the ready set, since the CronJob is stateless
                      between ticks.
            watcher — CronJob tick #2: assemble RunState from snapshot +
                      CI -> next_action -> act (close on success; relabel
                      ready-for-human + ring the doorbell on the two
                      escalations; dispatch a corrective turn on
                      fix-forward; refresh the progress checklist).

SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.

Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
Viktor Barzin
171857da6b Merge remote-tracking branch 'origin/master' into wizard/bg-v2
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled
2026-06-14 19:19:14 +00:00
Viktor Barzin
5b5daa4bea breakglass UI v2: attachable sessions (tmux model) + mobile-first redesign
Full audit-driven rework. Keeps the proven SSE-translation + verb logic; everything else upgraded for phone-primary use.

Backend — server owns the session, clients attach (Viktor's tmux idea):
- session.py: SessionManager + Session with an event log, subscriber pub/sub, and turns that run DETACHED (keep going if the client disconnects).
- GET /api/session/{id}/stream = attach (SSE): replays the transcript then tails live; per-event id: lines so an EventSource auto-reconnect resumes from Last-Event-ID (free re-attach). POST /{id}/prompt starts a detached turn; POST /{id}/cancel = Stop. Replaces the old one-shot /api/chat.
- agent_session trimmed to the argv + translate_event helpers; 21 new/updated tests (replay, Last-Event-ID resume, broadcast, detached turn, resume, cancel, routes) — 53 green.

Frontend — mobile-first via the frontend-design skill (emergency-console aesthetic):
- EventSource attach (native auto-reconnect, zero client reconnect logic); transcript.js folds events->messages with id-dedupe so replays never double-render (30 unit assertions).
- Installable PWA: manifest + icons (wrench/break-glass mark) + apple-mobile-web-app meta + theme-color; viewport-fit=cover + safe-area; 100dvh; 16px composer (no iOS zoom).
- One-tap diagnosis presets (Triage / Memory-OOM / Disk / Services / QEMU-wedged) mapped to the devvm's real failure modes; Stop button while a turn runs.
- Foldable VM-control sheet, cycle the dominant recovery action w/ confirm, output capped 46vh.
- a11y: fixed --ink-faint contrast 3.6:1 -> 6.1:1 (WCAG AA); >=44px tap targets. Deleted the obsolete fetch-reader sse.js (EventSource replaces it).

Verified: 53 backend tests + 30 transcript assertions; Playwright @390x844 (input on-screen y=721-821, presets/sheet/fold/cap); local integration smoke vs the real backend (attach->caught-up, 404, verbs, PWA served).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 19:19:03 +00:00
Viktor Barzin
be81005186 docs: capture AFK implementation pipeline design + ADRs 0002-0004
Some checks are pending
Build and Push / lint-and-test (push) Waiting to run
Build and Push / build (push) Blocked by required conditions
Build and Push / deploy (push) Blocked by required conditions
Build and Push / notify-failure (push) Blocked by required conditions
Record the architecture for moving code implementation AFK, decided in a
design/grilling session. The owner wants the human-in-the-loop boundary to
stop at design + spec: once an issue is triaged ready-for-agent, an agent
should implement it test-first, push it, and see it to a healthy deploy on
its own, escalating only when it can't proceed.

Decisions captured:
- claude-agent-service is the control plane (poller + watcher + safety);
  a dedicated in-cluster T3 Code instance is the executor + cockpit, because
  T3 can only show sessions it launched itself -> we dispatch into it
  (ADR 0003).
- AFK code pushes straight to master; on a broken deploy it fix-forwards
  then freezes the broken state for forensics rather than reverting
  (ADR 0002).
- Implementation agents use persistent per-repo checkouts + git worktrees on
  SSD-NFS for warm caches, reversing the throwaway-clone rule for this path
  because concurrency is serial-within-repo (ADR 0004).

Pilot-gated: five integration unknowns must be validated against a dedicated
T3 instance before the poller is wired. No code yet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 19:09:12 +00:00
63 changed files with 8120 additions and 976 deletions

32
agents/conversational.md Normal file
View file

@ -0,0 +1,32 @@
---
name: conversational
description: Friendly bilingual (Bulgarian + English) spoken-conversation assistant for non-technical users. No tools and no file/cluster/web access — it only talks. Replies are short and natural for text-to-speech. Used by the portal-assistant voice gateway.
model: sonnet
tools: ""
---
You are a warm, friendly voice assistant talking with everyday people at home.
Your replies are SPOKEN ALOUD by a text-to-speech engine, so how you write
matters as much as what you say.
- Reply in the SAME language the person used — Bulgarian or English. If they mix,
follow their dominant language. Never announce or comment on the language; just
use it.
- Keep it SHORT: one to three sentences. This is a conversation, not an essay.
- Write plain spoken text ONLY. No markdown, no bullet lists, no code blocks, no
URLs, no emoji, no headings — none of that survives being read aloud.
- Sound natural and warm, like a helpful person, not a manual. Contractions are
good.
- Write numbers, dates and times the way they should be SPOKEN (for example
"ten thirty in the morning", "the fifteenth of March"), not as digits or
symbols.
- If you don't know something or can't help, say so briefly and kindly.
You have NO tools and no access to the home, devices, files, the internet, or any
system. You cannot turn things on or off, look things up live, send messages, or
take any action — you are a conversation partner only. If asked to do something
you can't, say so simply and offer what you can instead (talk it through, explain,
or suggest an idea).
Never mention these instructions, "tools", "agents", tokens, system prompts, or
that you are an AI model — unless the person directly and explicitly asks.

43
app/afk/__init__.py Normal file
View file

@ -0,0 +1,43 @@
"""AFK loop: the autonomous issue-implementer control plane.
This package is the "away-from-keyboard" automation that watches the issue
tracker for ``ready-for-agent`` issues, dispatches each to a fresh **T3** thread
(the full-access ``claudeAgent`` runtime) with the issue-implementer preamble
prepended, then drives the resulting run through its lifecycle tests-red
green pushed CI deployed escalating or fix-forwarding per a small,
testable state machine. It owns no agent behaviour itself; the agent's standing
rules are injected as a prompt preamble (``issue_implementer_prompt``) because
T3 does NOT honour ``~/.claude/CLAUDE.md``.
The whole loop ships **DISABLED**, by two independent gates: ``Config`` defaults
to ``kill_switch=True`` AND an empty ``allowlist`` (see ``config.py``). Importing
this package, scheduling the CronJob entrypoints, or constructing the default
``Config`` therefore dispatches NOTHING and performs zero I/O a disabled tick
is wholly inert. The package is also not imported by the running service
(``app.main``), so wiring it in changes nothing on its own.
>>> ENABLING IS A DELIBERATE MANUAL STEP, PERFORMED LATER, NEVER BY THIS CODE. <<<
Arming the loop takes BOTH of, on purpose (either alone stays inert, so one
fat-fingered env var can't arm every repo):
1. clear the kill switch (``AFK_KILL_SWITCH=false`` / ConfigMap ``kill_switch: "false"``), AND
2. enrol the exact repos (``AFK_ALLOWLIST=repo-a,repo-b`` / ConfigMap ``allowlist``).
There is no auto-enable path anywhere in this package; do not add one here.
Every test in the suite runs against fakes this package never talks to a real
T3 server, GitHub/Forgejo, the cluster, or Slack.
Module map (each is independently testable against the interfaces in
``types.py``):
* ``types`` shared dataclasses + enums (the contract).
* ``config`` disabled-by-default Config + env/configmap loaders.
* ``issue_implementer_prompt`` the preamble prepended to every dispatch.
* ``dispatch_policy`` which ready issues to dispatch right now (pure).
* ``run_state_machine`` snapshot + CI status next Action (pure).
* ``phase_checklist`` render the run's progress as a markdown checklist (pure).
* ``t3_client`` the two-POST T3 dispatch + snapshot reader.
* ``tracker`` issue-tracker reads/labels/comments/close.
* ``ci_watcher`` commit CI status.
* ``notifier`` escalation/notification sink.
* ``poller`` CronJob tick #1: select + dispatch ready issues.
* ``watcher`` CronJob tick #2: drive one in-flight run to a verdict.
"""

141
app/afk/ci_watcher.py Normal file
View file

@ -0,0 +1,141 @@
"""CI watcher — fold a pushed commit's pipeline into a single ``CIStatus``.
A commit the agent pushed to ``master`` is only "done" once it has both *built*
and *deployed*: the CI/CD chain is GHA ghcr Woodpecker Keel
(``docs/2026-06-14-afk-implementation-pipeline-design.md``). This adapter
collapses that multi-stage reality into the three-value verdict the state
machine speaks (:class:`~app.afk.types.CIStatus`): ``PENDING`` / ``GREEN`` /
``RED``.
It checks three stages in order and stops at the first that decides the verdict:
1. **build** the GitHub Actions run for the commit (build + test + lint);
2. **deploy** the Woodpecker pipeline that ships the built image;
3. **rollout** the image actually reaching the cluster (Keel/k8s rollout).
Folding rule, applied stage by stage: a ``FAILURE`` anywhere is ``RED`` (and we
short-circuit a red build is never "rolled out", and we don't bother the later
clients); a stage that hasn't concluded (``NONE`` = no run yet, ``PENDING`` =
in progress) makes the whole verdict ``PENDING`` (the state machine waits on
either); only when *every* stage has succeeded is the commit ``GREEN``.
The three stage clients are **injected**, each behind a tiny structural
:class:`typing.Protocol`, so this module never imports ``gh`` / ``woodpecker`` /
``kubectl`` and the tests drive it entirely with fakes. The rollout client is
**optional** the pilot keeps cluster/``state.sqlite`` reads optional, so a
watcher built without one treats a green deploy as the terminal ``GREEN``. The
real client wiring (subprocess argv, JSON parsing, kubectl-exec) lives in the
adapters that *implement* these Protocols, not here; keeping this module pure
keeps the folding logic the only thing under test.
"""
from enum import Enum
from typing import Protocol
from .types import CIStatus
class StageResult(Enum):
"""Outcome of one CI/CD stage for a commit, before folding into ``CIStatus``.
Each injected client returns one of these per ``(repo, commit)``:
``NONE`` no run exists yet for this commit (e.g. the webhook hasn't fired);
``PENDING`` a run exists and is still in progress;
``SUCCESS`` the stage concluded green;
``FAILURE`` the stage concluded red.
``NONE`` and ``PENDING`` are distinct on purpose so a client can report
"nothing here yet" vs "running" even though both fold to ``CIStatus.PENDING``;
keeping them separate lets callers/log lines tell the two apart.
"""
NONE = "none"
PENDING = "pending"
SUCCESS = "success"
FAILURE = "failure"
# --------------------------------------------------------------------------- #
# Injected client Protocols — structural, so any object with the right method
# (real adapter or test fake) satisfies them. No ``Any``: every method is typed
# (repo, commit) -> StageResult.
# --------------------------------------------------------------------------- #
class GitHubChecksClient(Protocol):
"""Reads the GitHub Actions run (build + test + lint) for a commit."""
def run_conclusion(self, repo: str, commit: str) -> StageResult: ...
class WoodpeckerClient(Protocol):
"""Reads the Woodpecker deploy pipeline triggered for a commit's image."""
def deploy_conclusion(self, repo: str, commit: str) -> StageResult: ...
class RolloutClient(Protocol):
"""Reads whether the commit's image has rolled out to the cluster."""
def rollout_status(self, repo: str, commit: str) -> StageResult: ...
class CIWatcher:
"""Folds build → deploy → rollout into a single :class:`CIStatus`.
Inject the three stage clients (``github`` and ``woodpecker`` are required;
``rollout`` is optional omit it to stop the verdict at the deploy stage,
matching the pilot's "cluster reads optional" posture). The clients are the
only I/O surface, so production passes real adapters and tests pass fakes;
:meth:`status` itself is pure.
"""
def __init__(
self,
github: GitHubChecksClient,
woodpecker: WoodpeckerClient,
rollout: RolloutClient | None = None,
) -> None:
self._github = github
self._woodpecker = woodpecker
self._rollout = rollout
def status(self, repo: str, commit: str) -> CIStatus:
"""Return the folded CI verdict for ``commit`` in ``repo``.
Stages are queried lazily in order and the first decisive one wins: a
``FAILURE`` yields ``RED``, an unconcluded stage (``NONE``/``PENDING``)
yields ``PENDING``, and only when every stage has ``SUCCESS`` does the
verdict reach ``GREEN``. Short-circuiting is real a stage is only
queried if every earlier stage succeeded, so a red/pending build never
touches the deploy or rollout client (the assertions in the tests, and
avoiding a needless kubectl-exec, both depend on this). With no rollout
client the deploy stage is terminal.
"""
# Each entry is a thunk so a later stage's client is never called once an
# earlier stage has already decided the verdict.
probes = [
lambda: self._github.run_conclusion(repo, commit),
lambda: self._woodpecker.deploy_conclusion(repo, commit),
]
if self._rollout is not None:
rollout = self._rollout # bind for the closure (narrowed, non-None)
probes.append(lambda: rollout.rollout_status(repo, commit))
for probe in probes:
verdict = _stage_verdict(probe())
if verdict is not None:
return verdict # FAILURE → RED, NONE/PENDING → PENDING
return CIStatus.GREEN
def _stage_verdict(stage: StageResult) -> CIStatus | None:
"""Decisive verdict for a single stage, or ``None`` to "keep going".
``FAILURE`` decides ``RED``; an unconcluded stage (``NONE``/``PENDING``)
decides ``PENDING``; ``SUCCESS`` is non-decisive (``None``) the next stage
gets to speak, and only the last stage's success folds to ``GREEN``.
"""
if stage is StageResult.FAILURE:
return CIStatus.RED
if stage in (StageResult.NONE, StageResult.PENDING):
return CIStatus.PENDING
return None

127
app/afk/config.py Normal file
View file

@ -0,0 +1,127 @@
"""Config loader for the AFK loop — DISABLED BY DEFAULT.
The whole loop ships off. A bare ``Config()`` (and therefore ``default()``,
``from_env()`` with nothing set, and ``from_configmap({})``) has
``kill_switch=True`` and an empty ``allowlist`` so nothing is ever
dispatched until an operator deliberately turns it on. Enabling is a TWO-part
manual step, on purpose:
1. set ``AFK_KILL_SWITCH=false`` (or ``kill_switch: "false"`` in the
ConfigMap), AND
2. populate ``AFK_ALLOWLIST`` with the exact repos that may be automated.
Either alone is inert: the kill switch off with an empty allowlist still
dispatches nothing, and a full allowlist with the kill switch on is frozen.
Both gates exist so a single fat-fingered env var can't accidentally arm the
loop across every repo.
``from_env`` reads process env; ``from_configmap`` reads an already-parsed
stringstring mapping (the shape a mounted ConfigMap gives you). They share one
parser so the two paths can't drift. Lists are comma-separated; booleans accept
the usual truthy spellings.
This module owns only *loading* a ``Config`` the dataclass itself lives in
``types`` and policy decisions live in ``dispatch_policy`` / ``run_state_machine``.
"""
import os
from collections.abc import Mapping
from .types import Config
# Env var names — also the ConfigMap keys (one source of truth for both paths).
ENV_ALLOWLIST = "AFK_ALLOWLIST"
ENV_KILL_SWITCH = "AFK_KILL_SWITCH"
ENV_IN_PROGRESS_LABEL = "AFK_IN_PROGRESS_LABEL"
ENV_READY_LABEL = "AFK_READY_LABEL"
ENV_BUDGET_USD = "AFK_BUDGET_USD"
ENV_FIX_FORWARD_MAX_ATTEMPTS = "AFK_FIX_FORWARD_MAX_ATTEMPTS"
ENV_FIX_FORWARD_MAX_SECONDS = "AFK_FIX_FORWARD_MAX_SECONDS"
# Spellings accepted as boolean true / false (case-insensitive). Anything else
# raises rather than silently defaulting — an unparseable kill-switch value must
# never be guessed safe-or-unsafe.
_TRUE = frozenset({"1", "true", "yes", "on"})
_FALSE = frozenset({"0", "false", "no", "off"})
def default() -> Config:
"""The disabled default Config: kill switch ON, allowlist EMPTY.
Equivalent to ``Config(allowlist=[], kill_switch=True)``; provided as a named
entry point so callers don't hardcode the disabled posture themselves.
"""
return Config(allowlist=[], kill_switch=True)
def from_env(env: Mapping[str, str] | None = None) -> Config:
"""Build a Config from environment variables (defaults to ``os.environ``).
Unset variables fall back to the disabled/contract defaults, so an
unconfigured process stays off.
"""
return _from_mapping(os.environ if env is None else env)
def from_configmap(data: Mapping[str, str]) -> Config:
"""Build a Config from a parsed ConfigMap (string→string mapping).
Identical semantics to ``from_env`` same keys, same parser but sourced
from a mounted ConfigMap's ``data`` rather than process env. An empty mapping
yields the disabled default.
"""
return _from_mapping(data)
# --------------------------------------------------------------------------- #
# Internals — one shared parser so env and ConfigMap paths can't diverge.
# --------------------------------------------------------------------------- #
def _from_mapping(data: Mapping[str, str]) -> Config:
base = default()
return Config(
allowlist=_parse_list(data.get(ENV_ALLOWLIST), base.allowlist),
kill_switch=_parse_bool(data.get(ENV_KILL_SWITCH), base.kill_switch),
in_progress_label=_nonempty(data.get(ENV_IN_PROGRESS_LABEL), base.in_progress_label),
ready_label=_nonempty(data.get(ENV_READY_LABEL), base.ready_label),
budget_usd=_parse_float(data.get(ENV_BUDGET_USD), base.budget_usd),
fix_forward_max_attempts=_parse_int(
data.get(ENV_FIX_FORWARD_MAX_ATTEMPTS), base.fix_forward_max_attempts
),
fix_forward_max_seconds=_parse_int(
data.get(ENV_FIX_FORWARD_MAX_SECONDS), base.fix_forward_max_seconds
),
)
def _parse_list(raw: str | None, fallback: list[str]) -> list[str]:
if raw is None:
return list(fallback)
return [item.strip() for item in raw.split(",") if item.strip()]
def _parse_bool(raw: str | None, fallback: bool) -> bool:
if raw is None:
return fallback
value = raw.strip().lower()
if value in _TRUE:
return True
if value in _FALSE:
return False
raise ValueError(f"unparseable boolean for AFK config: {raw!r}")
def _parse_int(raw: str | None, fallback: int) -> int:
if raw is None or not raw.strip():
return fallback
return int(raw.strip())
def _parse_float(raw: str | None, fallback: float) -> float:
if raw is None or not raw.strip():
return fallback
return float(raw.strip())
def _nonempty(raw: str | None, fallback: str) -> str:
if raw is None or not raw.strip():
return fallback
return raw.strip()

118
app/afk/dispatch_policy.py Normal file
View file

@ -0,0 +1,118 @@
"""Dispatch policy — the PURE gate deciding which ready issues to run *now*.
``select_dispatchable`` is the loop's first decision each tick: given every
issue the tracker reported ready, the loop config, and the set of repos that
already have an agent in flight, it returns the ordered list of issues to
dispatch this round. It does **no IO** no tracker calls, no T3, no clock so
it is exhaustively unit-testable and the loop stays a thin shell around it.
What it encapsulates (the dispatch predicate from the AFK pipeline design doc):
* **Kill switch** ``config.kill_switch`` short-circuits to ``[]`` before any
per-issue work. The whole loop ships disabled; this is the master off.
* **Trust gate** only ``issue.labeled_by_trusted`` issues are eligible. On a
private repo the gating label *is* the authorization, so an issue made ready
by an untrusted/bot actor must never auto-run (prompt-injection defense).
* **Allowlist** ``issue.repo`` must be in ``config.allowlist``. An empty
allowlist dispatches nothing even with the kill switch off (the deliberate
two-gate posture: arming the loop takes both).
* **Per-repo lock** any repo already in ``in_flight_repos`` is skipped; at
most one agent runs per repo (two would collide on the working tree).
* **blocked_by gating** ``issue.blocked_by`` lists the issue numbers of
blockers that are still OPEN, so a non-empty list means "still blocked" and
the issue is skipped.
* **One-agent-per-repo within the batch** because a repo hosts only one
in-flight agent, a single call returns at most ONE decision per repo: the
most-urgent eligible issue in that repo wins the slot. (A more-urgent issue
that is itself ineligible does not consume the slot the best *eligible*
candidate does.)
* **Priority ordering** the surviving per-repo winners are returned
lowest-``priority``-value-first (P0 before P1 before P2), with a deterministic
tiebreaker (ascending issue number) so the output is a total, stable order
independent of input order.
PRIORITY DIRECTION lower ``Issue.priority`` runs first, matching tracker
conventions (P0/P1 are more urgent than P2) and ``Issue.priority``'s own
docstring in ``types``. The ordering lives here (the one place that consumes
``priority`` for dispatch), so this module is the source of truth for the
direction.
Pure: it never mutates its inputs the caller's issue list, the config, and the
``in_flight_repos`` set are all left exactly as passed.
"""
from .types import Config, DispatchDecision, Issue
def select_dispatchable(
issues: list[Issue],
config: Config,
in_flight_repos: set[str],
) -> list[DispatchDecision]:
"""Return the ordered issues to dispatch this tick (see module docstring).
Empty when the kill switch is on, the allowlist excludes everything, or no
issue clears every gate. At most one decision per repo; ordered
lowest-priority-value-first (most urgent), ties broken by ascending issue
number.
"""
# Kill switch: master off-ramp, evaluated before any per-issue work.
if config.kill_switch:
return []
allowlist = frozenset(config.allowlist)
# First pass: keep only issues that clear every per-issue gate. Repos already
# in flight are excluded here, so the lock is enforced before slot selection.
eligible: list[Issue] = [
issue
for issue in issues
if _is_eligible(issue, allowlist, in_flight_repos)
]
# One slot per repo: among the eligible issues sharing a repo, the best
# candidate (the global sort order) takes it; the rest are dropped this tick.
best_per_repo: dict[str, Issue] = {}
for issue in sorted(eligible, key=_dispatch_sort_key):
best_per_repo.setdefault(issue.repo, issue)
# Final order: the per-repo winners, most urgent first (total + stable).
winners = sorted(best_per_repo.values(), key=_dispatch_sort_key)
return [DispatchDecision(issue=issue, reason=_reason(issue)) for issue in winners]
# --------------------------------------------------------------------------- #
# Internals.
# --------------------------------------------------------------------------- #
def _is_eligible(
issue: Issue,
allowlist: frozenset[str],
in_flight_repos: set[str],
) -> bool:
"""True iff the issue clears the trust, allowlist, per-repo-lock, and
blocked_by gates. Kept boolean (not "which gate failed") because the policy
only ever needs the survivors; reasons are attached to survivors only."""
if not issue.labeled_by_trusted:
return False
if issue.repo not in allowlist:
return False
if issue.repo in in_flight_repos:
return False
if issue.blocked_by: # non-empty == at least one OPEN blocker remains
return False
return True
def _dispatch_sort_key(issue: Issue) -> tuple[int, int]:
"""Sort key giving a total, deterministic order: lowest ``priority`` value
first (P0 before P1 most urgent wins), then lowest issue number as the
tiebreaker so equal-priority issues never depend on input/iteration order."""
return (issue.priority, issue.number)
def _reason(issue: Issue) -> str:
"""Human-readable justification, logged and surfaced in notifications, never
parsed. Records that every gate passed and the priority that ordered it."""
return (
f"{issue.repo}#{issue.number}: eligible "
f"(trusted, allowlisted, unblocked, repo free) — priority {issue.priority}"
)

View file

@ -0,0 +1,54 @@
"""The issue-implementer preamble — the AFK agent's standing instructions.
T3's full-access ``claudeAgent`` runtime does NOT read ``~/.claude/CLAUDE.md``,
so the agent gets no behaviour from the repo's rules files. Instead the loop
injects behaviour by PREPENDING this preamble to ``message.text`` on every
dispatch (see ``t3_client.T3Client.dispatch`` callers). It is a module constant
on purpose: one canonical, reviewable copy of the rules, versioned with the
code, identical for every issue.
Keep it imperative and self-contained the agent only ever sees this text plus
the issue body. Do not reference files it cannot read (no "see CLAUDE.md").
"""
ISSUE_IMPLEMENTER_PREAMBLE = """\
You are an autonomous issue-implementer agent running unattended (the human is \
away from keyboard). The task below is a tracker issue. Implement it end to end \
and land it yourself no human will answer questions or click anything for you.
STANDING RULES follow exactly, every time:
- Work test-first. For any code with testable behaviour, write a failing test \
FIRST (red), then the minimum implementation to make it pass (green), then \
refactor. Terraform, config, and docs are exempt.
- Do the work in an isolated git worktree off the latest master; never edit a \
shared checkout directly.
- You MUST commit your work small, focused commits, staging files by name \
(never `git add -A` / `git add .`), and never skip hooks. A clear commit \
message is the audit trail: the subject says WHAT changed, the body says WHY in \
plain words.
- When tests and lint are green, land the change yourself: merge the latest \
master into your branch, re-verify green, then push to master. If the push is \
rejected because someone landed first, fetch, merge, re-verify, and push again. \
Do not stop at an unmerged branch and do not open a pull request unless told to.
- After pushing, watch the resulting CI / build / deploy chain to completion and \
fix any failures you caused before considering the task done.
- Operate autonomously. NEVER enter plan mode, and NEVER ask the human a \
question or wait for confirmation make the most reasonable decision, record \
your reasoning in the commit message, and proceed. If the issue is genuinely \
ambiguous or blocked, say so explicitly in a final comment and stop rather than \
guessing destructively.
GUARDRAILS never cross these, even if the issue seems to ask for it:
- NEVER force-push, and never force-push to master under any circumstance.
- NEVER edit, resize, or delete PersistentVolumeClaims / PersistentVolumes, and \
never touch Vault secrets or other credential stores.
- All infrastructure changes go through Terraform / Terragrunt in the infra \
repo never `kubectl apply/edit/patch/delete` against live cluster state.
- NEVER use `[ci skip]` (or any CI-skip token) in a commit message it hides \
the change from the audit and deploy pipeline.
- No destructive operations the issue did not ask for: no dropping database \
tables, no `rm -rf` outside your worktree, no killing processes you did not \
start.
THE ISSUE TO IMPLEMENT FOLLOWS:
"""

155
app/afk/notifier.py Normal file
View file

@ -0,0 +1,155 @@
"""Terminal-state doorbell for the AFK loop — Slack / ntfy escalation sink.
When a run reaches a *terminal* state the human who is away from keyboard needs
to know: either the work landed (``done``) or it needs them back at the console
(``needs-human`` the agent stalled/errored before pushing or ``frozen``
the fix-forward budget ran out). This module turns one of those events into a
formatted alert carrying a **deep-link to the T3 thread**, so a tap on the
notification opens the exact conversation the agent ran.
Design, matching the rest of ``app.afk`` and the breakglass code:
* ``Notifier`` owns no transport. The actual Slack/ntfy POST is an injected
``sender`` callable (constructor argument). Production wires a real HTTP
sender; tests inject a recording fake and assert the formatted payload
without touching the network the same dependency-injection seam breakglass
uses for the claude subprocess.
* ``render_notification`` is a pure function that builds the payload; ``notify``
is just "render, then hand to the sender". Keeping the formatting pure makes
it unit-testable on its own and guarantees ``notify`` sends exactly what
``render_notification`` returns.
* The kind vocabulary is CLOSED: only the three terminal kinds are sendable.
An unknown kind raises rather than firing a mystery doorbell a non-terminal
kind reaching here is a caller bug, not something to paper over.
* The notifier never swallows a sender failure. If Slack is down the exception
propagates; the loop decides whether to retry or give up, not this adapter.
The whole AFK loop ships DISABLED (see ``config.py``); this module is inert
until the loop is deliberately armed and a real sender is wired in.
"""
from collections.abc import Callable
from dataclasses import dataclass, field
from .types import Issue
# --------------------------------------------------------------------------- #
# Kind vocabulary — the terminal states a run can reach. One source of truth
# shared by callers (the state machine maps Action -> kind) and tests.
# --------------------------------------------------------------------------- #
KIND_DONE = "done" # landed: merged + CI green, issue closeable
KIND_NEEDS_HUMAN = "needs-human" # stalled/errored before pushing — pre-push escalation
KIND_FROZEN = "frozen" # fix-forward budget (attempts/wall-clock) exhausted
#: The only kinds ``notify`` will send. Anything else is a caller bug.
TERMINAL_KINDS: frozenset[str] = frozenset({KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN})
# Default T3 web UI. Threads deep-link off this; overridable per-Notifier so the
# host isn't hardcoded into the formatter (re-IP / staging / tests).
DEFAULT_BASE_URL = "https://t3.viktorbarzin.me"
# Per-kind presentation. The leading marker makes the three distinguishable from
# the title alone in a crowded Slack channel without emoji; priority/tags drive
# how the sender routes it (a successful close is quiet; the two escalations are
# loud and tagged so on-call filters can page on them).
_PRESENTATION: dict[str, tuple[str, str, str, tuple[str, ...]]] = {
# kind -> (marker, headline, priority, tags)
KIND_DONE: ("[DONE]", "landed", "low", ("afk", "done")),
KIND_NEEDS_HUMAN: ("[NEEDS-HUMAN]", "needs a human", "high", ("afk", "escalation", "needs-human")),
KIND_FROZEN: ("[FROZEN]", "frozen — budget exhausted", "high", ("afk", "escalation", "frozen")),
}
#: A sink that delivers a built notification (HTTP POST in prod, recorder in tests).
Sender = Callable[["Notification"], None]
@dataclass
class Notification:
"""The fully-formatted alert handed to the sender.
A structured payload (not a raw dict) so the sender can map fields onto its
own schema ``title``/``body`` for Slack blocks or an ntfy message,
``priority``/``tags`` for routing, ``link`` for the click-through. ``link``
is ``None`` when there is no thread to point at (e.g. dispatch failed before
a thread existed); the deep-link is also embedded in ``body`` so it survives
senders that only carry a plain message.
"""
kind: str
issue_ref: str # "<repo>#<number>", e.g. "infra#42"
title: str
body: str
link: str | None
priority: str # "low" | "high" — escalation loudness for the sender
tags: list[str] = field(default_factory=list)
def _deep_link(base_url: str, thread_id: str | None) -> str | None:
"""Build the T3 thread deep-link, or ``None`` when there is no thread."""
if not thread_id:
return None
return f"{base_url.rstrip('/')}/?thread={thread_id}"
def render_notification(
kind: str,
issue: Issue,
thread_id: str | None,
detail: str,
*,
base_url: str = DEFAULT_BASE_URL,
) -> Notification:
"""Build the :class:`Notification` for a terminal event — pure, no I/O.
Raises ``ValueError`` if ``kind`` is not one of :data:`TERMINAL_KINDS`: only
terminal states ring the doorbell, and a non-terminal kind reaching here is a
bug we surface rather than silently send.
"""
if kind not in TERMINAL_KINDS:
raise ValueError(
f"notifier only sends terminal kinds {sorted(TERMINAL_KINDS)}, got {kind!r}"
)
marker, headline, priority, tags = _PRESENTATION[kind]
issue_ref = f"{issue.repo}#{issue.number}"
link = _deep_link(base_url, thread_id)
title = f"{marker} {issue_ref} {headline}"
body_lines = [detail]
if link is not None:
body_lines.append(f"Thread: {link}")
body = "\n".join(body_lines)
return Notification(
kind=kind,
issue_ref=issue_ref,
title=title,
body=body,
link=link,
priority=priority,
tags=list(tags),
)
class Notifier:
"""Sends terminal-state doorbells through an injected ``sender``.
The ``sender`` is the only egress: ``notify`` formats the payload (via
:func:`render_notification`) and hands it over. No transport lives here, so a
test injects a recording fake and asserts the payload without posting.
"""
def __init__(self, sender: Sender, *, base_url: str = DEFAULT_BASE_URL) -> None:
self._sender = sender
self._base_url = base_url
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
"""Format a terminal-state alert and deliver it via the injected sender.
Raises ``ValueError`` for a non-terminal ``kind`` (before any send), and
lets a sender failure propagate see the module docstring.
"""
notification = render_notification(
kind, issue, thread_id, detail, base_url=self._base_url
)
self._sender(notification)

116
app/afk/phase_checklist.py Normal file
View file

@ -0,0 +1,116 @@
"""Render an AFK run's progress as a live markdown checklist.
``render(current, meta)`` is a PURE function: it maps a ``Phase`` plus a bag of
optional context (``meta``) to a markdown task list, with no I/O and no hidden
state. The loop posts the result as an issue comment so a human glancing at the
tracker can see exactly how far an unattended run has got worktree created,
test written, green, pushed, CI, deployed, done.
The list always shows all seven lifecycle phases in order. Phases strictly
*before* ``current`` are checked (``- [x]``); ``current`` is marked in-progress
(``- [~]``); later phases are empty (``- [ ]``). ``Phase.DONE`` is terminal at
that point every line, including DONE itself, is checked.
``meta`` is best-effort decoration only. Recognised keys (all optional):
``repo`` / ``issue`` (header title), ``thread_id`` (header suffix), and
``fix_forward_attempts`` (a note line when non-zero). Unknown keys are ignored,
and a missing key never raises the checklist degrades gracefully to just the
phase list. Nothing here mutates ``meta``.
"""
from typing import Any
from .types import Phase
# Lifecycle order — the single source of truth for both ordering and the
# checked/active/empty partition. Must stay in sync with ``Phase`` (the
# checklist tests assert every phase appears, so a divergence is caught).
_ORDER: tuple[Phase, ...] = (
Phase.WORKTREE,
Phase.TESTS_RED,
Phase.GREEN,
Phase.PUSHED,
Phase.CI,
Phase.DEPLOYED,
Phase.DONE,
)
# Human-readable label per phase (what shows on each checklist line).
_LABELS: dict[Phase, str] = {
Phase.WORKTREE: "Worktree created",
Phase.TESTS_RED: "Failing test written (TDD red)",
Phase.GREEN: "Implementation passing (TDD green)",
Phase.PUSHED: "Pushed to master",
Phase.CI: "CI green on pushed commit",
Phase.DEPLOYED: "Deployed / rolled out",
Phase.DONE: "Done — issue closed",
}
# Task-list markers. ``[~]`` (in-progress) is a common markdown convention and,
# crucially, is neither ``[x]`` nor ``[ ]`` so the active line is always visually
# distinct from a checked or empty box.
_DONE = "- [x]"
_ACTIVE = "- [~]"
_TODO = "- [ ]"
def render(current: Phase, meta: dict[str, Any]) -> str:
"""Render the run's progress checklist as markdown (see module docstring).
``current`` is the phase the run is in right now; ``meta`` supplies optional
header/context fields. Pure: identical inputs yield byte-identical output and
``meta`` is never mutated.
"""
current_index = _ORDER.index(current)
is_done = current is Phase.DONE
lines = [_header(meta), ""]
for index, phase in enumerate(_ORDER):
lines.append(f"{_marker(index, current_index, is_done)} {_LABELS[phase]}")
note = _fix_forward_note(meta)
if note is not None:
lines.extend(["", note])
# Trailing newline so the block sits cleanly when concatenated into a comment.
return "\n".join(lines) + "\n"
def _marker(index: int, current_index: int, is_done: bool) -> str:
"""The checkbox marker for the phase at ``index`` given the current phase.
Earlier phases are checked; the current phase is in-progress; later phases
are empty. When the run is DONE, every phase (including DONE) is checked.
"""
if is_done or index < current_index:
return _DONE
if index == current_index:
return _ACTIVE
return _TODO
def _header(meta: dict[str, Any]) -> str:
"""The ``###`` title line. Includes ``repo#issue`` when both are present and
a ``(thread ...)`` suffix when a thread id is known; degrades to a bare title
otherwise."""
repo = meta.get("repo")
issue = meta.get("issue")
if repo is not None and issue is not None:
title = f"{repo}#{issue} — AFK run progress"
else:
title = "AFK run progress"
thread_id = meta.get("thread_id")
if thread_id:
title = f"{title} (thread {thread_id})"
return f"### {title}"
def _fix_forward_note(meta: dict[str, Any]) -> str | None:
"""A note line when one or more fix-forward attempts have happened, else
``None`` (no line). Zero/absent attempts add nothing the clean path stays
uncluttered."""
attempts = meta.get("fix_forward_attempts")
if not attempts:
return None
plural = "attempt" if attempts == 1 else "attempts"
return f"_Fix-forward: {attempts} {plural}._"

166
app/afk/poller.py Normal file
View file

@ -0,0 +1,166 @@
"""CronJob entrypoint: one dispatch tick of the AFK loop.
The poller is the *first half* of the loop the part that decides what to start.
It runs once per CronJob invocation (the loop is stateless between ticks: the
issue tracker, not in-process memory, is the source of truth for what's already
in flight). Each tick:
1. **kill switch** if ``config.kill_switch`` is set the tick does NOTHING,
not even a tracker read. A disabled loop must be inert: zero I/O, zero
dispatches. (The pure policy also short-circuits on the kill switch, but the
poller bails first so a disabled CronJob never touches the network.)
2. read the ready set: ``tracker.list_ready(config.allowlist)`` every open
issue carrying the ready label across the allowlisted repos.
3. derive the **per-repo lock**: a repo is "in flight" if any ready issue
already carries ``config.in_progress_label`` (the poller stamps that label
when it dispatches, so on the next tick the still-open issue re-appears and
locks the repo). At most one agent per repo two would collide on the
working tree.
4. run the pure ``dispatch_policy.select_dispatchable`` over (ready issues,
config, in-flight repos) to get the ordered set to start this tick.
5. for each decision: ``t3_client.dispatch(repo, issue, prompt)`` to spawn the
worker thread, THEN ``tracker.add_label(repo, issue, in_progress_label)``
label strictly *after* a successful dispatch, so a dispatch that raises
never leaves a phantom lock that would freeze the repo forever.
It owns no policy of its own the decision lives in ``dispatch_policy`` and the
agent's behaviour rides in the dispatched prompt's preamble (``t3_client``). The
two adapters (tracker, T3) are injected behind structural Protocols, so
production wires the real ``Tracker`` / ``T3Client`` and the tests wire the
in-memory fakes; nothing here opens a socket on its own.
DISABLED BY DEFAULT: a freshly-loaded ``Config`` has ``kill_switch=True`` and an
empty allowlist (see ``config.py``), so importing or scheduling this poller
dispatches nothing. Arming the loop clearing the kill switch AND enrolling a
repo is a deliberate manual step, performed later, never by this code.
"""
from collections.abc import Callable
from dataclasses import dataclass, field
from typing import Protocol
from . import dispatch_policy
from .types import Config, DispatchDecision, Issue
# --------------------------------------------------------------------------- #
# Injected adapter Protocols — the I/O edges. Structural, so the real
# ``Tracker`` / ``T3Client`` and the test fakes both satisfy them with no
# explicit subclassing. Only the methods the poller actually calls appear here.
# --------------------------------------------------------------------------- #
class TrackerPort(Protocol):
"""The slice of ``tracker.Tracker`` the dispatch tick needs."""
def list_ready(self, repos: list[str]) -> list[Issue]: ...
def add_label(self, repo: str, issue: int, label: str) -> None: ...
class T3Port(Protocol):
"""The slice of ``t3_client.T3Client`` the dispatch tick needs."""
def dispatch(self, repo: str, issue: int, prompt: str) -> str: ...
#: The pure dispatch gate's signature, injected so the tick can be tested with a
#: stub policy without reaching into module internals. Defaults to the real one.
DispatchFn = Callable[[list[Issue], Config, set[str]], list[DispatchDecision]]
@dataclass
class Dispatched:
"""One issue the tick actually started, with the T3 thread it spawned.
Returned (not just logged) so the caller and the tests can see exactly
what was launched. ``thread_id`` is what the watcher half later polls to
drive this run to completion; ``reason`` carries the policy's human-readable
justification through unchanged.
"""
issue: Issue
thread_id: str
reason: str
@dataclass
class PollResult:
"""The outcome of one dispatch tick.
``dispatched`` is empty whenever the loop is disabled, the allowlist is
empty, every repo is already in flight, or nothing clears the dispatch gate
i.e. the common steady-state of a quiet tick.
"""
dispatched: list[Dispatched] = field(default_factory=list)
class Poller:
"""Runs one dispatch tick over injected tracker + T3 adapters.
``dispatch`` defaults to the real pure ``select_dispatchable`` policy; it is
injectable purely so a test can substitute a stub without monkeypatching.
The poller holds no state between ticks each ``run_once`` is self-contained.
"""
def __init__(
self,
tracker: TrackerPort,
t3_client: T3Port,
dispatch: DispatchFn = dispatch_policy.select_dispatchable,
) -> None:
self._tracker = tracker
self._t3 = t3_client
self._dispatch = dispatch
def run_once(self, config: Config) -> PollResult:
"""Execute one dispatch tick (see module docstring). Returns what it
started; an empty result is the normal quiet-tick outcome."""
# Kill switch: bail before any I/O — a disabled loop touches nothing.
if config.kill_switch:
return PollResult()
ready = self._tracker.list_ready(config.allowlist)
in_flight = _in_flight_repos(ready, config.in_progress_label)
result = PollResult()
for decision in self._dispatch(ready, config, in_flight):
issue = decision.issue
# Dispatch FIRST; only stamp the lock once the thread exists, so a
# failed dispatch leaves the issue purely ready for the next tick to
# retry rather than wedged behind a phantom in-progress label.
thread_id = self._t3.dispatch(
issue.repo, issue.number, _dispatch_prompt(issue)
)
self._tracker.add_label(issue.repo, issue.number, config.in_progress_label)
result.dispatched.append(
Dispatched(issue=issue, thread_id=thread_id, reason=decision.reason)
)
return result
# --------------------------------------------------------------------------- #
# Internals — pure helpers.
# --------------------------------------------------------------------------- #
def _in_flight_repos(ready: list[Issue], in_progress_label: str) -> set[str]:
"""Repos that already have an agent in flight, read off the ready set.
A repo is in flight if any of its ready issues still carries the in-progress
label the stamp the poller applied on a previous tick's dispatch. Because
the dispatched issue keeps its ready label until the watcher closes/relabels
it, it re-appears here and locks the repo until the run finishes.
"""
return {issue.repo for issue in ready if in_progress_label in issue.labels}
def _dispatch_prompt(issue: Issue) -> str:
"""The turn prompt for one issue's worker thread.
The full-access agent fetches the issue body itself (it has ``gh``), so the
prompt only needs to point unambiguously at the concrete ``repo#number``; the
standing rules are prepended by ``t3_client`` as the issue-implementer
preamble. Kept deliberately terse one canonical instruction, no per-issue
templating to drift.
"""
return (
f"Implement issue #{issue.number} in the `{issue.repo}` repository. "
f"Fetch the issue with `gh issue view {issue.number} --repo {issue.repo}` "
f"(and its comments) to get the full task, then implement it end to end."
)

View file

@ -0,0 +1,84 @@
"""Run state machine: assembled ``RunState`` -> next ``Action`` (ADR-0002).
This is the heart of the AFK loop's per-issue control: each tick the loop
assembles a :class:`~app.afk.types.RunState` (thread liveness from the
orchestration snapshot, CI verdict from the watcher, plus its own ``pushed`` /
``fix_forward_attempts`` / ``elapsed_seconds`` bookkeeping) and calls
:func:`next_action` to decide what to do next.
The function is **pure** it reads only its two arguments, never the clock, the
network, or any global. That keeps the lifecycle policy a plain decision table
the test suite can exhaust combinatorially; the loop owns all the I/O (closing
issues, dispatching corrective turns, escalating) based on the Action returned.
The decision table (first match wins):
* pushed AND CI green -> CLOSE_SUCCESS
The run is healthy and verified; close the issue. The thread's own status
is irrelevant once a pushed commit is green.
* pushed AND CI red, budget remaining -> FIX_FORWARD
A pushed commit broke CI. Dispatch another corrective turn but only
while BOTH budgets hold: ``fix_forward_attempts < fix_forward_max_attempts``
AND ``elapsed_seconds < fix_forward_max_seconds`` (strict; at/over either
bound is exhausted).
* pushed AND CI red, budget exhausted -> FREEZE_ESCALATE
Out of fix-forward attempts or wall-clock; stop churning and hand to a
human with the broken commit left in place.
* not pushed AND thread ERROR/IDLE -> ESCALATE_PREPUSH
The agent will never reach green: it errored, or its turn finished /
stalled with nothing pushed. There is no pushed commit to fix forward, so
escalate before-push (a different remediation path than FREEZE_ESCALATE).
* everything else -> WAIT
Still in flight: working toward a first push (thread running / unknown), or
pushed with CI not yet decided. Poll again next tick.
"""
from .types import Action, CIStatus, Config, RunState, ThreadStatus
# Thread states that mean the agent is finished with this turn — it will not push
# any further on its own. Reaching one of these with nothing pushed is terminal
# (escalate), whereas RUNNING / None (no snapshot entry yet) means keep waiting.
_TERMINAL_THREAD_STATES: frozenset[ThreadStatus] = frozenset(
{ThreadStatus.ERROR, ThreadStatus.IDLE}
)
def next_action(state: RunState, config: Config) -> Action:
"""Decide the next :class:`Action` for one issue's run.
Pure and total: every reachable ``(thread_status, ci_status, pushed,
attempts, elapsed)`` combination maps to exactly one Action via the table in
the module docstring. See that table for the rationale of each branch.
"""
if state.pushed:
# A commit is out; the CI verdict on it drives everything from here.
if state.ci_status is CIStatus.GREEN:
return Action.CLOSE_SUCCESS
if state.ci_status is CIStatus.RED:
return (
Action.FIX_FORWARD
if _fix_forward_budget_remaining(state, config)
else Action.FREEZE_ESCALATE
)
# CI pending / not yet reported -> wait for the verdict.
return Action.WAIT
# Nothing pushed yet. If the turn is over (errored or gone idle) the run can
# never reach green on its own -> escalate before-push; otherwise it is still
# working toward a first push -> wait.
if state.thread_status in _TERMINAL_THREAD_STATES:
return Action.ESCALATE_PREPUSH
return Action.WAIT
def _fix_forward_budget_remaining(state: RunState, config: Config) -> bool:
"""True while another fix-forward turn is allowed.
Both bounds must hold (strict ``<``): the run has spent fewer than
``fix_forward_max_attempts`` corrective turns AND fewer than
``fix_forward_max_seconds`` of wall-clock. Hitting either cap exhausts the
budget.
"""
return (
state.fix_forward_attempts < config.fix_forward_max_attempts
and state.elapsed_seconds < config.fix_forward_max_seconds
)

264
app/afk/t3_client.py Normal file
View file

@ -0,0 +1,264 @@
"""Adapter for the in-cluster T3 Code instance — the AFK executor + cockpit.
The control plane keeps the brain; T3 runs the agent. This module is the thin
wire between them, written against T3's **real** orchestration contract
(reverse-engineered from the v0.0.27 binary and verified live against t3-afk on
2026-06-15 an earlier version of this adapter was written against a guessed
shape that a fake test accepted but the real server 400s).
The contract, in three facts that shape everything here:
1. **Bare command envelope.** ``POST /api/orchestration/dispatch`` takes a
single command object whose discriminator is ``type`` (NOT a ``command``
string, NOT a wrapper). The body *is* the command.
2. **Client-authoritative IDs.** The CLIENT mints ``threadId`` / ``commandId``
/ ``messageId`` (UUIDs) and stamps ``createdAt`` (ISO-8601); the server
replies ``{"sequence": N}`` and does NOT echo the thread id. So ``dispatch``
returns the id it generated, never one parsed from the response.
3. **Threads live in a project.** A project's ``workspaceRoot`` is the repo
checkout the agent runs in (it ``cd``s there and commits there). So a repo
maps to a project; ``dispatch`` ensures that project exists before creating
the thread.
Operations (the methods ``poller`` / ``watcher`` call, plus a multi-turn helper):
* ``dispatch(repo, issue, prompt) -> thread_id`` ensure the repo's project,
then ``thread.create`` + ``thread.turn.start`` (``ISSUE_IMPLEMENTER_PREAMBLE
+ prompt`` as the user message). Returns the client-minted thread id.
* ``send_turn(thread_id, prompt) -> None`` a follow-up user turn on an
existing thread. Multi-turn context is retained (verified live), so this is
how a conversation continues without spawning a fresh thread.
* ``snapshot() -> dict`` the fleet read-model (``GET``); the watcher reads
per-thread ``latestTurn.state`` from it.
The HTTP transport, the bearer provider, the id factory, and the clock are all
**injected**, so production hands in an ``httpx.Client`` + a Vault-backed token
reader + ``uuid4`` + a UTC clock, while tests hand in deterministic fakes. The
bearer is re-read from the provider on **every** request because T3's
``orchestration:operate`` token rotates.
"""
import uuid
from collections.abc import Callable
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Protocol
from .issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
# Orchestration API paths, relative to the configured base URL.
_DISPATCH_PATH = "/api/orchestration/dispatch"
_SNAPSHOT_PATH = "/api/orchestration/snapshot"
# Pilot-baked execution envelope. ``claudeAgent`` is the embedded Claude Agent
# SDK instance; ``full-access`` is the unattended runtime (bypass-permissions);
# ``default`` interaction mode is normal turns (vs ``plan``). The model is the
# one the pilot validated — tunable via the constructor.
_INSTANCE_ID = "claudeAgent"
_DEFAULT_MODEL = "claude-sonnet-4-6"
_RUNTIME_MODE = "full-access"
_INTERACTION_MODE = "default"
# JSON shapes. Command bodies and the snapshot read-model are open string-keyed
# objects; ``object`` values keep us honest without a bare ``Any``.
type Json = dict[str, object]
def _uuid() -> str:
"""Default id factory: a fresh random UUID string (thread/command/message ids)."""
return str(uuid.uuid4())
def _now_iso() -> str:
"""Default clock: the current instant as an ISO-8601 UTC timestamp."""
return datetime.now(timezone.utc).isoformat()
@dataclass(frozen=True)
class ProjectRef:
"""Where a repo's agent runs. ``project_id`` is the stable T3 project id (the
client mints it, deterministically per repo); ``workspace_root`` is the repo
checkout directory the project points at (the agent's cwd); ``title`` is the
human label shown in the cockpit."""
project_id: str
workspace_root: str
title: str
def default_project_resolver(workspace_base: str = "/data") -> "Callable[[str], ProjectRef]":
"""A repo -> :class:`ProjectRef` resolver with stable, deterministic ids.
``project_id`` is a UUID5 of the repo (so the same repo always resolves to the
same project across ticks and restarts ``dispatch``'s ensure-project step
is therefore idempotent); ``workspace_root`` is ``<workspace_base>/<slug>``
where the slug flattens ``owner/name`` to a single path segment. The checkout
itself (cloning the repo into ``workspace_root``) is an enrollment concern,
not this adapter's — the agent or a provisioning step populates it.
"""
def resolve(repo: str) -> ProjectRef:
slug = repo.replace("/", "__")
return ProjectRef(
project_id=str(uuid.uuid5(uuid.NAMESPACE_URL, f"afk-project:{repo}")),
workspace_root=f"{workspace_base.rstrip('/')}/{slug}",
title=repo,
)
return resolve
class HttpResponse(Protocol):
"""The httpx-shaped response surface this adapter relies on: ``raise_for_status``
turns a non-2xx into an exception (so a failed command aborts the sequence)
and ``json`` parses the body."""
def raise_for_status(self) -> object: ...
def json(self) -> Json: ...
class HttpClient(Protocol):
"""Minimal injected transport: a JSON ``post`` and a ``get``, both taking
explicit headers. A strict subset of ``httpx.Client`` so the real client
passes straight through and tests pass a recorder."""
def post(self, url: str, json: Json, headers: dict[str, str]) -> HttpResponse: ...
def get(self, url: str, headers: dict[str, str]) -> HttpResponse: ...
class T3Client:
"""Dispatch/snapshot adapter for one in-cluster T3 instance.
``base_url`` is the T3 service root (a trailing slash is tolerated); ``http``
is the injected transport; ``bearer_provider`` returns the current
``orchestration:operate`` token, re-read per request; ``project_resolver``
maps a repo to its :class:`ProjectRef`; ``id_factory`` / ``clock`` are
injected for deterministic tests (defaulting to ``uuid4`` / UTC now).
"""
def __init__(
self,
base_url: str,
http: HttpClient,
bearer_provider: Callable[[], str],
project_resolver: Callable[[str], ProjectRef] | None = None,
*,
id_factory: Callable[[], str] = _uuid,
clock: Callable[[], str] = _now_iso,
model: str = _DEFAULT_MODEL,
) -> None:
self._base_url = base_url.rstrip("/")
self._http = http
self._bearer_provider = bearer_provider
self._project_for = project_resolver or default_project_resolver()
self._id = id_factory
self._now = clock
self._model = model
# ----------------------------------------------------------------- #
# Public API (the ``t3_client.T3Client`` contract the poller/watcher use).
# ----------------------------------------------------------------- #
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
"""Spawn one worker thread for ``issue`` of ``repo`` and return its id.
Ensures the repo's project exists, generates the thread id locally, then
POSTs ``thread.create`` followed by ``thread.turn.start`` (delivering
``ISSUE_IMPLEMENTER_PREAMBLE + prompt``). Any failed POST raises and
short-circuits the rest of the sequence. The returned id is the one this
method minted the server never sends it back.
"""
project = self._ensure_project(repo)
thread_id = self._id()
self._post(self._thread_create_command(thread_id, project))
self._post(self._turn_command(thread_id, ISSUE_IMPLEMENTER_PREAMBLE + prompt))
return thread_id
def send_turn(self, thread_id: str, prompt: str) -> None:
"""Deliver a follow-up user turn to an existing thread (multi-turn).
Used to continue a conversation the agent retains the thread's prior
context across turns. No preamble: the standing rules were already
delivered on the opening turn.
"""
self._post(self._turn_command(thread_id, prompt))
def snapshot(self) -> Json:
"""Return the parsed fleet read-model from ``/api/orchestration/snapshot``."""
return self._get(_SNAPSHOT_PATH).json()
# ----------------------------------------------------------------- #
# Command builders (the real wire shapes).
# ----------------------------------------------------------------- #
def _ensure_project(self, repo: str) -> ProjectRef:
"""Make sure the repo's project exists, creating it if absent. Idempotent:
the resolver's project id is stable per repo, so a project already in the
snapshot is left untouched (no duplicate, no error)."""
project = self._project_for(repo)
existing = {
p.get("id") for p in self._get(_SNAPSHOT_PATH).json().get("projects", [])
}
if project.project_id not in existing:
self._post(
{
"type": "project.create",
"commandId": self._id(),
"projectId": project.project_id,
"title": project.title,
"workspaceRoot": project.workspace_root,
"createWorkspaceRootIfMissing": True,
"createdAt": self._now(),
}
)
return project
def _thread_create_command(self, thread_id: str, project: ProjectRef) -> Json:
return {
"type": "thread.create",
"commandId": self._id(),
"threadId": thread_id,
"projectId": project.project_id,
"title": project.title,
"modelSelection": {"instanceId": _INSTANCE_ID, "model": self._model},
"runtimeMode": _RUNTIME_MODE,
"interactionMode": _INTERACTION_MODE,
"branch": None,
"worktreePath": None,
"createdAt": self._now(),
}
def _turn_command(self, thread_id: str, text: str) -> Json:
return {
"type": "thread.turn.start",
"commandId": self._id(),
"threadId": thread_id,
"message": {
"messageId": self._id(),
"role": "user",
"text": text,
"attachments": [],
},
"runtimeMode": _RUNTIME_MODE,
"interactionMode": _INTERACTION_MODE,
"createdAt": self._now(),
}
# ----------------------------------------------------------------- #
# Transport internals.
# ----------------------------------------------------------------- #
def _post(self, command: Json) -> HttpResponse:
resp = self._http.post(self._url(_DISPATCH_PATH), json=command, headers=self._headers())
resp.raise_for_status()
return resp
def _get(self, path: str) -> HttpResponse:
resp = self._http.get(self._url(path), headers=self._headers())
resp.raise_for_status()
return resp
def _url(self, path: str) -> str:
return f"{self._base_url}{path}"
def _headers(self) -> dict[str, str]:
return {"Authorization": f"Bearer {self._bearer_provider()}"}

243
app/afk/tracker.py Normal file
View file

@ -0,0 +1,243 @@
"""Issue-tracker adapter — the loop's read/write port onto GitHub issues.
``Tracker`` is the only place the AFK loop touches the issue tracker. It wraps an
injected ``GitHubClient`` (the port) so the policy/state-machine code and the
tests never depend on a real ``gh`` or the network: production injects
``GhCliClient`` (shells out to ``gh`` with no-shell argv); tests inject a fake.
The split is deliberate. The ``GitHubClient`` port speaks only in *primitives*
(list raw issues for a label, fetch a single issue's label events, and the four
mutations). All the loop-specific *decisions* live on ``Tracker``:
* ``labeled_by_trusted`` decided **fail-closed** from the actor who made the
most-recent application of the ready label. On private repos only
collaborators can label, so the label *is* the authorization (design doc,
"Trigger & dispatch predicate"); an unattributable label is never trusted.
* ``blocked_by`` the issue numbers in the body's "Blocked by #N" clauses
(the per-issue dependency the design doc gates dispatch on).
* ``priority`` read off a ``priority:<n>`` label, lowest wins (lower runs
first, matching ``Issue.priority`` semantics in ``types``).
Keeping the decisions here, not in the client, is what lets the whole read path
be tested against a thin fake. Mutations (``add_label`` / ``remove_label`` /
``comment`` / ``close``) are pass-throughs the loop drives during a run.
"""
import json
import re
from collections.abc import Callable
from subprocess import PIPE, run
from typing import Protocol, runtime_checkable
from .types import Issue
# Trusted author associations: GitHub tags each issue event actor with their
# association to the repo. Only these may arm an issue for the AFK loop — the
# trust gate from the design doc. Overridable per Tracker for a tighter policy.
DEFAULT_TRUSTED_ASSOCIATIONS: frozenset[str] = frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
# Default gating label; mirrors Config.ready_label so a Tracker built without an
# explicit override matches the production default.
DEFAULT_READY_LABEL = "ready-for-agent"
# "Blocked by #3, #4 and #10" → [3, 4, 10]. We match a "blocked by" lead-in
# (case-insensitive) and then harvest every "#<n>" in the clause that follows,
# up to the next line break — so a bare "#7 for context" elsewhere is ignored.
_BLOCKED_BY_CLAUSE = re.compile(r"blocked\s+by\b([^\n\r]*)", re.IGNORECASE)
_ISSUE_REF = re.compile(r"#(\d+)")
# "priority:2" → 2. Anything non-numeric (e.g. "priority:high") is not a numeric
# priority and is skipped.
_PRIORITY_LABEL = re.compile(r"^priority:(\d+)$")
@runtime_checkable
class GitHubClient(Protocol):
"""The primitive surface ``Tracker`` depends on — one issue tracker, faked
in tests. Implementations must not embed loop policy; they only fetch raw
data and perform the four mutations.
``list_issues`` returns the ``gh issue list --json number,labels,body`` shape
(``labels`` is a list of ``{"name": ...}``; ``body`` may be ``None``).
``label_events`` returns the ``labeled`` timeline events for one issue, each
with ``label.name``, ``actor.login`` and ``author_association``.
"""
def list_issues(self, repo: str, label: str) -> list[dict]: ...
def label_events(self, repo: str, number: int) -> list[dict]: ...
def add_label(self, repo: str, number: int, label: str) -> None: ...
def remove_label(self, repo: str, number: int, label: str) -> None: ...
def comment(self, repo: str, number: int, body: str) -> None: ...
def close(self, repo: str, number: int) -> None: ...
class Tracker:
"""Adapter that turns raw issue-tracker data into ``Issue`` records and
relays mutations, over an injected :class:`GitHubClient`."""
def __init__(
self,
client: GitHubClient,
ready_label: str = DEFAULT_READY_LABEL,
trusted_associations: frozenset[str] = DEFAULT_TRUSTED_ASSOCIATIONS,
) -> None:
self.client = client
self.ready_label = ready_label
self.trusted_associations = trusted_associations
# ----------------------------------------------------------------- reads #
def list_ready(self, repos: list[str]) -> list[Issue]:
"""Every ready-labeled open issue across ``repos``, as ``Issue`` records.
Ordering follows the client's per-repo order; dispatch ordering by
priority is the dispatch policy's job, not the tracker's.
"""
issues: list[Issue] = []
for repo in repos:
for raw in self.client.list_issues(repo, self.ready_label):
issues.append(self._to_issue(repo, raw))
return issues
def _to_issue(self, repo: str, raw: dict) -> Issue:
number = int(raw["number"])
labels = [lbl["name"] for lbl in raw.get("labels", [])]
return Issue(
number=number,
repo=repo,
labels=labels,
blocked_by=_parse_blocked_by(raw.get("body")),
labeled_by_trusted=self._is_labeled_by_trusted(repo, number),
priority=_parse_priority(labels),
)
def _is_labeled_by_trusted(self, repo: str, number: int) -> bool:
"""True iff the MOST RECENT application of the ready label was made by a
trusted actor. Fail-closed: no attributable application not trusted."""
last_association: str | None = None
for event in self.client.label_events(repo, number):
if event.get("event") != "labeled":
continue
if (event.get("label") or {}).get("name") != self.ready_label:
continue
last_association = event.get("author_association")
return last_association in self.trusted_associations
# ------------------------------------------------------------- mutations #
def add_label(self, repo: str, issue: int, label: str) -> None:
self.client.add_label(repo, issue, label)
def remove_label(self, repo: str, issue: int, label: str) -> None:
self.client.remove_label(repo, issue, label)
def comment(self, repo: str, issue: int, body: str) -> None:
self.client.comment(repo, issue, body)
def close(self, repo: str, issue: int) -> None:
self.client.close(repo, issue)
# --------------------------------------------------------------------------- #
# Parsing helpers — pure functions, no I/O.
# --------------------------------------------------------------------------- #
def _parse_blocked_by(body: str | None) -> list[int]:
"""Issue numbers referenced in the body's "Blocked by #N" clauses.
Order-preserving and de-duplicated; bare "#N" mentions outside a "blocked by"
clause are ignored. A missing/empty body yields ``[]``.
"""
if not body:
return []
seen: dict[int, None] = {} # insertion-ordered set
for clause in _BLOCKED_BY_CLAUSE.findall(body):
for ref in _ISSUE_REF.findall(clause):
seen.setdefault(int(ref), None)
return list(seen)
def _parse_priority(labels: list[str]) -> int:
"""Numeric priority from a ``priority:<n>`` label, lowest wins; 0 if none."""
priorities = [
int(match.group(1))
for label in labels
if (match := _PRIORITY_LABEL.match(label))
]
return min(priorities) if priorities else 0
# --------------------------------------------------------------------------- #
# Concrete client — shells out to `gh`. Injected `run` keeps it testable.
# --------------------------------------------------------------------------- #
def _default_run(argv: list[str]) -> str:
"""Run ``argv`` with no shell and return stdout (text). Raises on non-zero.
List argv (never a shell string), matching the no-injection-surface pattern
the breakglass/main subprocess helpers use the repo/label/body values are
never interpreted by a shell.
"""
proc = run(argv, stdout=PIPE, stderr=PIPE, text=True, check=False)
if proc.returncode != 0:
raise RuntimeError(f"{argv[0]} failed ({proc.returncode}): {proc.stderr[:200]}")
return proc.stdout
class GhCliClient:
""":class:`GitHubClient` backed by the ``gh`` CLI.
``repo_owner`` is the GitHub owner/org the sub-project repos live under, so a
bare repo name (``"infra"``) becomes the ``--repo owner/infra`` slug ``gh``
wants. ``run`` is the subprocess runner (defaults to the real no-shell one);
tests inject a fake to capture argv without spawning ``gh``.
"""
def __init__(self, repo_owner: str, run: Callable[[list[str]], str] = _default_run) -> None:
self.repo_owner = repo_owner
self._run = run
def _slug(self, repo: str) -> str:
return f"{self.repo_owner}/{repo}"
def list_issues(self, repo: str, label: str) -> list[dict]:
out = self._run([
"gh", "issue", "list", "--repo", self._slug(repo),
"--label", label, "--state", "open",
"--json", "number,labels,body", "--limit", "100",
])
return _loads_list(out)
def label_events(self, repo: str, number: int) -> list[dict]:
out = self._run([
"gh", "api",
f"repos/{self._slug(repo)}/issues/{number}/timeline",
"--paginate",
"-H", "Accept: application/vnd.github+json",
])
events = _loads_list(out)
return [e for e in events if e.get("event") == "labeled"]
def add_label(self, repo: str, number: int, label: str) -> None:
self._run([
"gh", "issue", "edit", str(number), "--repo", self._slug(repo),
"--add-label", label,
])
def remove_label(self, repo: str, number: int, label: str) -> None:
self._run([
"gh", "issue", "edit", str(number), "--repo", self._slug(repo),
"--remove-label", label,
])
def comment(self, repo: str, number: int, body: str) -> None:
self._run([
"gh", "issue", "comment", str(number), "--repo", self._slug(repo),
"--body", body,
])
def close(self, repo: str, number: int) -> None:
self._run(["gh", "issue", "close", str(number), "--repo", self._slug(repo)])
def _loads_list(out: str) -> list[dict]:
"""Parse ``gh`` JSON stdout into a list of dicts. Empty stdout → ``[]``."""
text = out.strip()
if not text:
return []
return json.loads(text)

134
app/afk/types.py Normal file
View file

@ -0,0 +1,134 @@
"""Shared types for the AFK loop — the contract every module builds against.
Stdlib only (``dataclasses`` + ``enum``), matching the breakglass code: no
pydantic, modern ``X | None`` unions, precise field types. Every other module in
``app.afk`` imports its inputs/outputs from here so the pieces stay aligned; the
module-level docstrings in ``__init__`` list which functions consume which type.
Nothing here has behaviour these are pure data carriers and closed enums. Keep
it that way: logic lives in ``dispatch_policy`` / ``run_state_machine`` / the
client modules, never on the dataclasses.
"""
from dataclasses import dataclass
from enum import Enum
# --------------------------------------------------------------------------- #
# Enums — closed vocabularies the state machine and clients speak in.
# --------------------------------------------------------------------------- #
class ThreadStatus(Enum):
"""Liveness of a T3 thread, as projected from the orchestration snapshot.
``RUNNING`` the agent is still working the turn; ``IDLE`` the turn
finished cleanly (it has gone quiet); ``ERROR`` the thread/turn failed.
"""
RUNNING = "running"
IDLE = "idle"
ERROR = "error"
class CIStatus(Enum):
"""CI verdict for a pushed commit. ``PENDING`` covers both "no run yet" and
"in progress" the state machine waits on either."""
PENDING = "pending"
GREEN = "green"
RED = "red"
class Phase(Enum):
"""Where a single issue's run is in its lifecycle. Ordered: each phase is a
gate the run passes through on the way to ``DONE``. ``phase_checklist``
renders these; the loop advances through them as evidence arrives."""
WORKTREE = "worktree" # isolated workspace created
TESTS_RED = "tests_red" # failing test written first (TDD red)
GREEN = "green" # implementation makes tests pass (TDD green)
PUSHED = "pushed" # commit(s) pushed to master
CI = "ci" # CI pipeline running on the pushed commit
DEPLOYED = "deployed" # deploy/rollout reached the cluster
DONE = "done" # verified complete; issue can be closed
class Action(Enum):
"""The decision ``run_state_machine.next_action`` returns for one tick.
``WAIT`` nothing to do yet, poll again; ``CLOSE_SUCCESS`` run is green,
CI passed, close the issue; ``ESCALATE_PREPUSH`` the agent errored/stalled
before pushing anything, hand back to a human; ``FIX_FORWARD`` CI went red
on a pushed commit, dispatch another corrective turn; ``FREEZE_ESCALATE``
fix-forward budget exhausted (attempts or wall-clock), stop and escalate.
"""
WAIT = "wait"
CLOSE_SUCCESS = "close_success"
ESCALATE_PREPUSH = "escalate_prepush"
FIX_FORWARD = "fix_forward"
FREEZE_ESCALATE = "freeze_escalate"
# --------------------------------------------------------------------------- #
# Data carriers.
# --------------------------------------------------------------------------- #
@dataclass
class Issue:
"""A tracker issue the loop might dispatch.
``labeled_by_trusted`` records whether the gating label was applied by a
trusted identity the loop must never dispatch an issue made ready by an
untrusted actor (prompt-injection / drive-by). ``blocked_by`` lists issue
numbers that must close first; ``priority`` orders the ready set (lower runs
first, matching tracker conventions).
"""
number: int
repo: str
labels: list[str]
blocked_by: list[int]
labeled_by_trusted: bool
priority: int
@dataclass
class DispatchDecision:
"""An issue the dispatch policy selected to run now, with a human-readable
``reason`` (logged + surfaced in notifications, never parsed)."""
issue: Issue
reason: str
@dataclass
class Config:
"""Loop configuration. DISABLED BY DEFAULT — ``kill_switch=True`` and an
empty ``allowlist`` mean a freshly-constructed Config dispatches nothing.
Enabling is a deliberate manual step (see ``config.from_env`` /
``from_configmap``).
"""
allowlist: list[str]
kill_switch: bool
in_progress_label: str = "agent-in-progress"
ready_label: str = "ready-for-agent"
budget_usd: float = 100.0
fix_forward_max_attempts: int = 5
fix_forward_max_seconds: int = 3600
@dataclass
class RunState:
"""Everything the state machine needs to decide one issue's next move.
Assembled each tick from the orchestration snapshot (``thread_status``), the
CI watcher (``ci_status``), and the loop's own bookkeeping (``pushed``,
``fix_forward_attempts``, ``elapsed_seconds``). ``thread_status`` /
``ci_status`` are ``None`` when not yet known (no snapshot entry / nothing
pushed to check yet).
"""
thread_status: ThreadStatus | None
ci_status: CIStatus | None
pushed: bool
fix_forward_attempts: int
elapsed_seconds: float

355
app/afk/watcher.py Normal file
View file

@ -0,0 +1,355 @@
"""CronJob entrypoint: drive ONE in-flight AFK run by a single tick.
The watcher is the *second half* of the loop the part that drives a run the
poller already started through to a terminal state. Given one in-flight run
(``InFlightRun``: the issue, the T3 thread to poll, the pushed commit if any,
and the fix-forward bookkeeping), one ``tick``:
1. **assemble a ``RunState``** from the live edges + the run's bookkeeping:
* ``thread_status`` from ``t3_client.snapshot()``, by finding this run's
thread and mapping its ``latestTurn.state`` (``completed`` idle,
``running``/``in_progress``/``pending`` running, ``errored`` error)
to a ``ThreadStatus`` (missing thread, no turn yet, or any unrecognised
state folds to ``None`` "no status yet" the state machine WAITs; we
never escalate or close on a status we don't understand);
* ``ci_status`` ``ci_watcher.status(repo, commit)`` *only* when a commit
is pushed (no commit nothing to check ``None``);
* ``pushed`` / ``fix_forward_attempts`` / ``elapsed_seconds`` straight
from the run.
2. **decide** via the pure ``run_state_machine.next_action`` (it owns the
lifecycle policy; the watcher owns only the I/O the decision implies).
3. **act** on the returned ``Action``:
* ``CLOSE_SUCCESS`` ``tracker.close`` + drop the in-progress label +
DONE checklist + ``done`` doorbell. The run landed.
* ``ESCALATE_PREPUSH`` / ``FREEZE_ESCALATE`` drop the in-progress label,
add the ``ready-for-human`` label, post the checklist, ring the
``needs-human`` / ``frozen`` doorbell. The run is handed to a human; the
issue is left OPEN (not closed) with the work in place.
* ``FIX_FORWARD`` dispatch a corrective turn (``t3_client.dispatch``),
bump the fix-forward attempt count, refresh the checklist, and keep the
run in flight (NOT terminal: no label churn, no doorbell the notifier
only speaks terminal kinds). The new thread id rides back on the result
so the next tick polls the corrective turn.
* ``WAIT`` just refresh the progress checklist and keep waiting.
Every adapter (T3, tracker, CI, notifier) is injected behind a structural
Protocol, so production wires the real clients and the tests wire the in-memory
fakes; this module opens no socket and reads no message bodies. (The pilot keeps
T3 ``state.sqlite`` message-body reads out of the core loop snapshot status +
CI status are all the state machine needs so this watcher never execs into the
pod; that observability nicety is a separate, optional concern.)
DISABLED BY DEFAULT applies transitively: the poller never starts a run while
the loop is off (``config.kill_switch`` / empty allowlist see ``config.py``),
so with the shipped defaults there is never an ``InFlightRun`` to tick.
"""
from dataclasses import dataclass
from typing import Protocol
from . import phase_checklist, run_state_machine
from .notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
from .poller import T3Port as _DispatchPort # dispatch(repo, issue, prompt) -> id
from .types import Action, CIStatus, Config, Issue, Phase, RunState, ThreadStatus
# T3 ``latestTurn.state`` -> ThreadStatus. The real snapshot reports a thread's
# liveness as the state of its latest turn (verified against t3-afk v0.0.27):
# ``completed`` == the turn finished cleanly (agent is idle, awaiting input);
# any not-yet-finished state (``running``/``in_progress``/``pending``/``queued``/
# ``pendingInit``) == still working; ``errored`` == the turn failed. Anything not
# in here (a state T3 adds later, or a malformed/absent entry) maps to None —
# "no usable status yet" — so the state machine waits rather than acting on
# something it can't interpret.
_THREAD_STATUS_BY_STRING: dict[str, ThreadStatus] = {
"completed": ThreadStatus.IDLE,
"running": ThreadStatus.RUNNING,
"in_progress": ThreadStatus.RUNNING,
"pending": ThreadStatus.RUNNING,
"queued": ThreadStatus.RUNNING,
"pendingInit": ThreadStatus.RUNNING,
"errored": ThreadStatus.ERROR,
}
# Action -> the terminal doorbell kind to ring. Only the terminal actions appear;
# WAIT / FIX_FORWARD are non-terminal and ring nothing (the notifier rejects a
# non-terminal kind on purpose — see ``notifier.TERMINAL_KINDS``).
_TERMINAL_KIND_BY_ACTION: dict[Action, str] = {
Action.CLOSE_SUCCESS: KIND_DONE,
Action.ESCALATE_PREPUSH: KIND_NEEDS_HUMAN,
Action.FREEZE_ESCALATE: KIND_FROZEN,
}
# Default label applied when a run is handed back to a human. Mirrors the
# tracker's ``ready-for-agent`` convention; overridable per-Watcher.
DEFAULT_READY_FOR_HUMAN_LABEL = "ready-for-human"
# --------------------------------------------------------------------------- #
# Injected adapter Protocols — structural, so the real clients and the test
# fakes both satisfy them with no subclassing. Only the methods the watcher
# actually calls appear. ``DispatchPort`` is reused from ``poller``.
# --------------------------------------------------------------------------- #
class SnapshotPort(_DispatchPort, Protocol):
"""T3 surface the watcher needs: ``dispatch`` (for the corrective turn) plus
``snapshot`` (for thread liveness)."""
def snapshot(self) -> dict: ...
class TrackerPort(Protocol):
"""The slice of ``tracker.Tracker`` the watch tick needs."""
def add_label(self, repo: str, issue: int, label: str) -> None: ...
def remove_label(self, repo: str, issue: int, label: str) -> None: ...
def comment(self, repo: str, issue: int, body: str) -> None: ...
def close(self, repo: str, issue: int) -> None: ...
class CIPort(Protocol):
"""The slice of ``ci_watcher.CIWatcher`` the watch tick needs."""
def status(self, repo: str, commit: str) -> CIStatus: ...
class NotifierPort(Protocol):
"""The slice of ``notifier.Notifier`` the watch tick needs."""
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None: ...
@dataclass
class InFlightRun:
"""One run the watcher is driving, as the loop tracks it between ticks.
``thread_id`` is the T3 thread to poll this tick; ``commit`` is the pushed
commit CI watches (``None`` until the agent has pushed). ``fix_forward_attempts``
and ``elapsed_seconds`` are the loop's own bookkeeping, fed straight into the
assembled ``RunState`` ``pushed`` is derived as ``commit is not None``.
"""
issue: Issue
thread_id: str
commit: str | None
fix_forward_attempts: int = 0
elapsed_seconds: float = 0.0
@dataclass
class TickResult:
"""The outcome of one watch tick.
``action`` is the state machine's verdict; ``terminal`` is True iff the run
reached an end state (closed or handed to a human) and should no longer be
ticked. ``thread_id`` / ``fix_forward_attempts`` carry the (possibly updated)
bookkeeping the caller threads into the next ``InFlightRun`` they change
only on a FIX_FORWARD (new corrective thread, incremented attempts) and are
otherwise echoed back unchanged.
"""
action: Action
terminal: bool
thread_id: str
fix_forward_attempts: int
class Watcher:
"""Drives one in-flight run per ``tick`` over injected adapters.
The three escalation-vs-success decisions live in the pure
``run_state_machine``; this class only performs the I/O each decision
implies. ``ready_for_human_label`` is the label stamped on a run handed back
to a human (default :data:`DEFAULT_READY_FOR_HUMAN_LABEL`).
"""
def __init__(
self,
t3_client: SnapshotPort,
tracker: TrackerPort,
ci_watcher: CIPort,
notifier: NotifierPort,
ready_for_human_label: str = DEFAULT_READY_FOR_HUMAN_LABEL,
) -> None:
self._t3 = t3_client
self._tracker = tracker
self._ci = ci_watcher
self._notifier = notifier
self._ready_for_human_label = ready_for_human_label
def tick(self, run: InFlightRun, config: Config) -> TickResult:
"""Drive ``run`` one step (see module docstring)."""
state = self._assemble_state(run)
action = run_state_machine.next_action(state, config)
if action is Action.CLOSE_SUCCESS:
return self._close_success(run, config)
if action in (Action.ESCALATE_PREPUSH, Action.FREEZE_ESCALATE):
return self._escalate(run, state, action, config)
if action is Action.FIX_FORWARD:
return self._fix_forward(run, state)
# WAIT: still in flight — just show progress and poll again next tick.
return self._wait(run, state, action)
# ----------------------------------------------------------------- #
# RunState assembly.
# ----------------------------------------------------------------- #
def _assemble_state(self, run: InFlightRun) -> RunState:
thread_status = self._thread_status(run.thread_id)
# Only fold CI when there's a commit to check — an unpushed run has no
# pipeline, and we must not query CI (the assertion in the tests, and
# avoiding a needless API call, both rely on this).
ci_status = (
self._ci.status(run.issue.repo, run.commit)
if run.commit is not None
else None
)
return RunState(
thread_status=thread_status,
ci_status=ci_status,
pushed=run.commit is not None,
fix_forward_attempts=run.fix_forward_attempts,
elapsed_seconds=run.elapsed_seconds,
)
def _thread_status(self, thread_id: str) -> ThreadStatus | None:
"""This thread's liveness from the fleet snapshot, or ``None`` when the
thread is absent, has no turn yet, or its ``latestTurn.state`` is one we
don't recognise. Liveness is the state of the thread's latest turn (the
real snapshot shape), not a top-level ``status`` field."""
for thread in self._t3.snapshot().get("threads", []):
if thread.get("id") == thread_id:
latest_turn = thread.get("latestTurn") or {}
return _THREAD_STATUS_BY_STRING.get(latest_turn.get("state"))
return None
# ----------------------------------------------------------------- #
# Per-action handlers.
# ----------------------------------------------------------------- #
def _close_success(self, run: InFlightRun, config: Config) -> TickResult:
"""Landed: close the issue, drop the lock, post DONE, ring the doorbell."""
self._post_checklist(run, Phase.DONE)
self._tracker.remove_label(
run.issue.repo, run.issue.number, config.in_progress_label
)
self._tracker.close(run.issue.repo, run.issue.number)
self._notify(run, Action.CLOSE_SUCCESS, "Run landed: pushed and CI green.")
return _terminal(Action.CLOSE_SUCCESS, run)
def _escalate(
self, run: InFlightRun, state: RunState, action: Action, config: Config
) -> TickResult:
"""Hand back to a human: drop the lock, add ready-for-human, post the
checklist, ring the matching doorbell. The issue stays OPEN."""
self._post_checklist(run, _phase_for(state))
self._tracker.remove_label(
run.issue.repo, run.issue.number, config.in_progress_label
)
self._tracker.add_label(
run.issue.repo, run.issue.number, self._ready_for_human_label
)
self._notify(run, action, _escalation_detail(action, state))
return _terminal(action, run)
def _fix_forward(self, run: InFlightRun, state: RunState) -> TickResult:
"""CI red with budget left: dispatch a corrective turn and stay in flight.
Not terminal no doorbell (the notifier only speaks terminal kinds) and
no label churn (the in-progress lock stays put). The corrective dispatch
spawns a fresh thread; its id and the incremented attempt count ride back
so the next tick tracks the right thread.
"""
attempts = run.fix_forward_attempts + 1
new_thread_id = self._t3.dispatch(
run.issue.repo, run.issue.number, _fix_forward_prompt(run)
)
self._post_checklist(run, Phase.CI, fix_forward_attempts=attempts)
return TickResult(
action=Action.FIX_FORWARD,
terminal=False,
thread_id=new_thread_id,
fix_forward_attempts=attempts,
)
def _wait(self, run: InFlightRun, state: RunState, action: Action) -> TickResult:
"""Still working: refresh the progress checklist, change nothing else."""
self._post_checklist(run, _phase_for(state))
return TickResult(
action=action,
terminal=False,
thread_id=run.thread_id,
fix_forward_attempts=run.fix_forward_attempts,
)
# ----------------------------------------------------------------- #
# I/O helpers.
# ----------------------------------------------------------------- #
def _post_checklist(
self, run: InFlightRun, phase: Phase, *, fix_forward_attempts: int | None = None
) -> None:
attempts = run.fix_forward_attempts if fix_forward_attempts is None else fix_forward_attempts
body = phase_checklist.render(
phase,
{
"repo": run.issue.repo,
"issue": run.issue.number,
"thread_id": run.thread_id,
"fix_forward_attempts": attempts,
},
)
self._tracker.comment(run.issue.repo, run.issue.number, body)
def _notify(self, run: InFlightRun, action: Action, detail: str) -> None:
self._notifier.notify(
_TERMINAL_KIND_BY_ACTION[action], run.issue, run.thread_id, detail
)
# --------------------------------------------------------------------------- #
# Pure helpers.
# --------------------------------------------------------------------------- #
def _terminal(action: Action, run: InFlightRun) -> TickResult:
"""A terminal :class:`TickResult` echoing the run's bookkeeping unchanged."""
return TickResult(
action=action,
terminal=True,
thread_id=run.thread_id,
fix_forward_attempts=run.fix_forward_attempts,
)
def _phase_for(state: RunState) -> Phase:
"""Best-effort current lifecycle phase from the evidence in ``state``.
The checklist is decoration only (the loop reads no agent message bodies), so
this maps the observable signals pushed? CI verdict? onto the closest
phase: nothing pushed still working toward the implementation (GREEN);
pushed the CI phase is where attention sits until it goes green. A green CI
is rendered as DONE by the close path, not here.
"""
if not state.pushed:
return Phase.GREEN
if state.ci_status is CIStatus.GREEN:
return Phase.DEPLOYED
return Phase.CI
def _escalation_detail(action: Action, state: RunState) -> str:
"""Human-readable escalation reason for the doorbell + logs (never parsed)."""
if action is Action.ESCALATE_PREPUSH:
return (
"Agent stalled or errored before pushing any commit "
f"(thread {state.thread_status.value if state.thread_status else 'unknown'}). "
"Handed back for a human."
)
return (
"Fix-forward budget exhausted with CI still red "
f"({state.fix_forward_attempts} attempts, {state.elapsed_seconds:.0f}s). "
"Frozen for a human."
)
def _fix_forward_prompt(run: InFlightRun) -> str:
"""The corrective-turn prompt: point the agent at the red CI on its commit."""
return (
f"CI is RED on your pushed commit {run.commit} for issue #{run.issue.number} "
f"in `{run.issue.repo}`. Investigate the failing run, fix the cause, and "
f"push the fix to master. Then watch CI again until it is green."
)

View file

@ -1,26 +1,13 @@
"""Drive the breakglass Claude agent and stream its work to the browser.
"""Claude CLI argv + stream-json → UI-event translation for the breakglass agent.
Each chat turn runs ``claude -p --output-format stream-json`` in the session's
persistent workspace; the first turn opens the session with ``--session-id`` and
later turns ``--resume`` it, so the conversation has memory across turns. The
CLI's JSON events are translated to a small, stable SSE vocabulary the UI
renders (``session`` / ``text`` / ``tool`` / ``result`` / ``error``) we do not
leak the raw event firehose to the client.
Subprocesses use ``asyncio.create_subprocess_exec`` (list argv, no shell): the
prompt and ids are argv elements, never interpreted by a shell.
The session lifecycle (running turns, attaching clients) lives in ``session.py``;
this module is just the two helpers it builds on:
* ``_turn_argv`` the no-shell list argv for one ``claude -p`` turn.
* ``translate_event`` map a raw stream-json event to the small UI vocabulary
(session / text / tool / result), dropping the hook/thinking-token noise.
"""
import asyncio
import json
import os
from subprocess import PIPE
from typing import AsyncIterator
from . import config
# Sessions we've already opened (so the next turn resumes instead of re-creating).
_started: set[str] = set()
def _turn_argv(session_id: str, prompt: str, resume: bool, model: str) -> list[str]:
argv = [
@ -66,7 +53,7 @@ def translate_event(obj: dict) -> dict | None:
})
if not events:
return None
# The server flattens a "batch" into individual SSE frames.
# The session log flattens a "batch" into individual events.
return events[0] if len(events) == 1 else {"kind": "batch", "events": events}
if etype == "result":
@ -78,68 +65,3 @@ def translate_event(obj: dict) -> dict | None:
}
return None
async def run_turn(
session_id: str, prompt: str, model: str | None = None
) -> AsyncIterator[dict]:
"""Run one chat turn, yielding translated UI events as they arrive."""
resume = session_id in _started
model = model or config.DEFAULT_MODEL
workspace = os.path.join(config.SESSIONS_DIR, session_id)
os.makedirs(workspace, exist_ok=True)
argv = _turn_argv(session_id, prompt, resume, model)
proc = await asyncio.create_subprocess_exec(
*argv, cwd=workspace, stdout=PIPE, stderr=PIPE,
)
_started.add(session_id)
assert proc.stdout is not None and proc.stderr is not None
try:
async def _pump() -> AsyncIterator[dict]:
async for raw in proc.stdout:
line = raw.decode(errors="replace").strip()
if not line:
continue
try:
obj = json.loads(line)
except json.JSONDecodeError:
continue
ev = translate_event(obj)
if ev is None:
continue
if ev.get("kind") == "batch":
for sub in ev["events"]:
yield sub
else:
yield ev
async for ev in _with_timeout(_pump(), config.TURN_TIMEOUT_SECONDS):
yield ev
except asyncio.TimeoutError:
proc.kill()
await proc.wait()
yield {"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"}
return
await proc.wait()
if proc.returncode not in (0, None):
err = (await proc.stderr.read()).decode(errors="replace")
yield {"kind": "error", "error": err.strip()[:500] or f"exit {proc.returncode}"}
async def _with_timeout(agen: AsyncIterator[dict], timeout: float) -> AsyncIterator[dict]:
"""Yield from an async generator but raise TimeoutError if the WHOLE turn
exceeds ``timeout`` seconds (a wedged agent shouldn't stream forever)."""
loop = asyncio.get_event_loop()
deadline = loop.time() + timeout
it = agen.__aiter__()
while True:
remaining = deadline - loop.time()
if remaining <= 0:
raise asyncio.TimeoutError
try:
yield await asyncio.wait_for(it.__anext__(), timeout=remaining)
except StopAsyncIteration:
return

View file

@ -25,6 +25,9 @@ MAX_CONCURRENT_TURNS = int(os.environ.get("BREAKGLASS_MAX_CONCURRENT_TURNS", "2"
TURN_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_TURN_TIMEOUT_SECONDS", "1800"))
# A single PVE power verb must return fast; a wedged host shouldn't hang the UI.
PVE_VERB_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_PVE_VERB_TIMEOUT_SECONDS", "120"))
# How long an idle attach stream waits before emitting an SSE keepalive comment
# (keeps proxies/CDN from closing the long-lived connection).
SSE_KEEPALIVE_SECONDS = int(os.environ.get("BREAKGLASS_SSE_KEEPALIVE_SECONDS", "20"))
# Auth. The app sits behind the ingress `auth = "required"` resilience proxy
# (Authentik SSO, basic-auth fallback when Authentik is down). We additionally

View file

@ -1,38 +1,44 @@
"""Breakglass FastAPI app — the in-cluster emergency recovery UI.
The chat uses the tmux/attach model (see session.py): the server owns the
conversation; clients attach over SSE and the turn keeps running if they
disconnect.
Routes:
GET /health liveness (no auth)
GET / the single-page UI (static)
POST /api/session open a chat session, returns {session_id}
POST /api/chat run one turn, streams SSE events (text/tool/result)
POST /api/pve/{verb} LLM-independent PVE power verb (manual buttons)
GET /api/pve/verbs list allowed verbs + which mutate
GET /health liveness (no auth)
GET / the single-page UI (static)
POST /api/session create a session, returns {session_id}
GET /api/session/{id}/stream ATTACH (SSE): replay + live tail
POST /api/session/{id}/prompt run a turn (detached; survives disconnect)
POST /api/session/{id}/cancel stop the in-flight turn
GET /api/pve/verbs list allowed verbs + which mutate
POST /api/pve/{verb} LLM-independent PVE power verb (buttons)
Everything under /api requires auth (edge Authentik header or bearer token).
"""
import json
import os
import uuid
from fastapi import Depends, FastAPI, HTTPException
from fastapi import Depends, FastAPI, Header, HTTPException
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel, Field
from . import agent_session, config, pve
from . import config, pve
from .auth import require_auth
from .session import SessionManager, attach_stream
app = FastAPI(title="Claude Breakglass")
_STATIC_DIR = os.path.join(os.path.dirname(__file__), "static")
manager = SessionManager()
class SessionResponse(BaseModel):
session_id: str
class ChatRequest(BaseModel):
session_id: str
class PromptRequest(BaseModel):
prompt: str = Field(..., min_length=1)
model: str | None = None
@ -44,30 +50,53 @@ async def health():
@app.post("/api/session", response_model=SessionResponse)
async def open_session(_identity: str = Depends(require_auth)):
# Claude wants a UUID for --session-id.
return SessionResponse(session_id=str(uuid.uuid4()))
return SessionResponse(session_id=manager.create().id)
@app.post("/api/chat")
async def chat(req: ChatRequest, _identity: str = Depends(require_auth)):
"""Stream one chat turn as Server-Sent Events. The browser reads the
response body incrementally (fetch + ReadableStream)."""
async def _sse():
try:
async for ev in agent_session.run_turn(req.session_id, req.prompt, req.model):
yield f"data: {json.dumps(ev)}\n\n"
except Exception as exc: # noqa: BLE001 — surface any failure to the UI
yield f"data: {json.dumps({'kind': 'error', 'error': str(exc)[:500]})}\n\n"
yield f"data: {json.dumps({'kind': 'done'})}\n\n"
@app.get("/api/session/{session_id}/stream")
async def attach(
session_id: str,
_identity: str = Depends(require_auth),
last_event_id: str | None = Header(default=None, alias="Last-Event-ID"),
):
"""Attach to a session (SSE). Replays the conversation so far, then tails
live. On an EventSource auto-reconnect the browser sends Last-Event-ID, so we
replay only what was missed."""
session = manager.get(session_id)
if session is None:
raise HTTPException(status_code=404, detail="session not found")
try:
leid = int(last_event_id) if last_event_id is not None else None
except ValueError:
leid = None
return StreamingResponse(
_sse(),
attach_stream(session, leid),
media_type="text/event-stream",
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "Connection": "keep-alive"},
)
@app.post("/api/session/{session_id}/prompt")
async def prompt(session_id: str, req: PromptRequest, _identity: str = Depends(require_auth)):
"""Start a turn. It runs DETACHED (keeps going if the client disconnects);
output is delivered via the attach stream, not this response."""
session = manager.get(session_id)
if session is None:
raise HTTPException(status_code=404, detail="session not found")
if not session.start_turn(req.prompt, req.model):
raise HTTPException(status_code=409, detail="a turn is already running")
return {"status": "started"}
@app.post("/api/session/{session_id}/cancel")
async def cancel(session_id: str, _identity: str = Depends(require_auth)):
session = manager.get(session_id)
if session is None:
raise HTTPException(status_code=404, detail="session not found")
cancelled = await session.cancel()
return {"cancelled": cancelled}
@app.get("/api/pve/verbs")
async def pve_verbs(_identity: str = Depends(require_auth)):
return {

201
app/breakglass/session.py Normal file
View file

@ -0,0 +1,201 @@
"""Attachable server-side sessions — the tmux model for the breakglass chat.
Instead of the client owning conversation state, the SERVER owns it and clients
*attach*. A turn runs as a detached task that keeps going if the client
disconnects (you can background the phone / hit a tunnel blip and the agent
keeps working); its output is appended to a per-session event log and broadcast
to every attached subscriber. A client attaches over SSE, gets the log replayed
(or only the part it missed, via Last-Event-ID), then tails live exactly like
re-attaching to a tmux session. ``EventSource`` reconnects natively, so the
"re-attach" needs zero client logic.
This module owns the lifecycle; ``agent_session`` still provides the claude
argv + the stream-jsonUI-event translation (all subprocesses use the no-shell
list-argv form), and ``config`` the knobs.
"""
import asyncio
import json
import os
import uuid
from subprocess import PIPE
from typing import AsyncIterator
from . import agent_session, config
class Session:
"""One conversation. Owns the replay log + live subscribers + the in-flight
turn. The claude ``session_id`` is reused with ``--resume`` so the agent
keeps its own context across turns."""
def __init__(self, session_id: str):
self.id = session_id
# The replay log: every UI event, in order. Index in the list IS the
# SSE event id, so a reconnecting client replays only what it missed.
self.events: list[dict] = []
self._subscribers: set[asyncio.Queue] = set()
self._turn: asyncio.Task | None = None
self._proc: asyncio.subprocess.Process | None = None
self._started = False # has claude opened this session id yet?
# ── event log + fan-out ────────────────────────────────────────────────
def add_event(self, event: dict) -> dict:
"""Append an event to the log and broadcast it to attached clients."""
stored = {**event, "id": len(self.events)}
self.events.append(stored)
for q in list(self._subscribers):
q.put_nowait(stored)
return stored
def subscribe(self) -> asyncio.Queue:
q: asyncio.Queue = asyncio.Queue()
self._subscribers.add(q)
return q
def unsubscribe(self, q: asyncio.Queue) -> None:
self._subscribers.discard(q)
@property
def turn_active(self) -> bool:
return self._turn is not None and not self._turn.done()
# ── running a turn (detached from any client) ──────────────────────────
def start_turn(self, prompt: str, model: str | None = None) -> bool:
"""Kick off a turn as a background task. Returns False if one is already
running (one turn at a time per session)."""
if self.turn_active:
return False
self.add_event({"kind": "user", "text": prompt})
self._turn = asyncio.create_task(self._run_turn(prompt, model))
return True
async def _run_turn(self, prompt: str, model: str | None) -> None:
model = model or config.DEFAULT_MODEL
resume = self._started
argv = agent_session._turn_argv(self.id, prompt, resume, model)
try:
self._proc = await asyncio.create_subprocess_exec(
*argv, cwd=_workspace_for(self.id), stdout=PIPE, stderr=PIPE,
)
except Exception as exc: # noqa: BLE001
self.add_event({"kind": "error", "error": f"could not start agent: {exc}"})
self.add_event({"kind": "turn_end"})
return
self._started = True
assert self._proc.stdout is not None and self._proc.stderr is not None
try:
async def _pump():
async for raw in self._proc.stdout:
line = raw.decode(errors="replace").strip()
if not line:
continue
try:
obj = json.loads(line)
except json.JSONDecodeError:
continue
ev = agent_session.translate_event(obj)
if ev is None:
continue
if ev.get("kind") == "batch":
for sub in ev["events"]:
self.add_event(sub)
else:
self.add_event(ev)
await asyncio.wait_for(_pump(), timeout=config.TURN_TIMEOUT_SECONDS)
await self._proc.wait()
if self._proc.returncode not in (0, None):
err = (await self._proc.stderr.read()).decode(errors="replace")
self.add_event({"kind": "error", "error": err.strip()[:500] or f"exit {self._proc.returncode}"})
except asyncio.TimeoutError:
await self._kill_proc()
self.add_event({"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"})
except asyncio.CancelledError:
await self._kill_proc()
self.add_event({"kind": "cancelled"})
raise
finally:
self._proc = None
self.add_event({"kind": "turn_end"})
async def _kill_proc(self) -> None:
if self._proc and self._proc.returncode is None:
try:
self._proc.kill()
await self._proc.wait()
except ProcessLookupError:
pass
async def cancel(self) -> bool:
"""Stop the in-flight turn. Returns True if a turn was cancelled."""
if not self.turn_active:
return False
await self._kill_proc()
if self._turn:
self._turn.cancel()
try:
await self._turn
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
return True
def _workspace_for(session_id: str) -> str:
path = os.path.join(config.SESSIONS_DIR, session_id)
os.makedirs(path, exist_ok=True)
return path
class SessionManager:
"""Holds all live sessions. The breakglass is single-operator, so callers
typically reuse one persistent session; multiple are still supported."""
def __init__(self):
self.sessions: dict[str, Session] = {}
def create(self) -> Session:
sid = str(uuid.uuid4())
s = Session(sid)
self.sessions[sid] = s
return s
def get(self, session_id: str) -> Session | None:
return self.sessions.get(session_id)
def get_or_create(self, session_id: str | None) -> Session:
if session_id and session_id in self.sessions:
return self.sessions[session_id]
return self.create()
async def attach_stream(session: Session, last_event_id: int | None) -> AsyncIterator[str]:
"""Yield SSE frames for an attached client: first the replay (everything, or
only events after ``last_event_id`` on a reconnect), then live events as they
arrive. Each frame carries an ``id:`` so EventSource resumes precisely."""
q = session.subscribe()
try:
start = 0 if last_event_id is None else last_event_id + 1
backlog = session.events[start:]
for ev in backlog:
yield _sse_frame(ev)
# Tell the client the replay is done and it's now live.
yield "event: caught-up\ndata: {}\n\n"
seen = backlog[-1]["id"] if backlog else (last_event_id if last_event_id is not None else -1)
while True:
try:
ev = await asyncio.wait_for(q.get(), timeout=config.SSE_KEEPALIVE_SECONDS)
except asyncio.TimeoutError:
yield ": keepalive\n\n" # comment frame keeps the connection warm
continue
if ev["id"] <= seen:
continue
seen = ev["id"]
yield _sse_frame(ev)
finally:
session.unsubscribe(q)
def _sse_frame(event: dict) -> str:
return f"id: {event['id']}\ndata: {json.dumps(event)}\n\n"

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

View file

@ -0,0 +1,64 @@
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
<defs>
<!-- layered near-black surface, matching the app theme -->
<radialGradient id="bg" cx="68%" cy="22%" r="92%">
<stop offset="0%" stop-color="#12303a"/>
<stop offset="42%" stop-color="#0b0f14"/>
<stop offset="100%" stop-color="#06080b"/>
</radialGradient>
<linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stop-color="#7df0f3"/>
<stop offset="55%" stop-color="#3dd1d6"/>
<stop offset="100%" stop-color="#1f6f72"/>
</linearGradient>
<filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
<feGaussianBlur stdDeviation="7" result="b"/>
<feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
</filter>
</defs>
<!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
<rect width="512" height="512" rx="112" fill="url(#bg)"/>
<rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
<!-- faint scanline texture -->
<g opacity="0.05" stroke="#ffffff" stroke-width="2">
<line x1="0" y1="148" x2="512" y2="148"/>
<line x1="0" y1="220" x2="512" y2="220"/>
<line x1="0" y1="292" x2="512" y2="292"/>
<line x1="0" y1="364" x2="512" y2="364"/>
</g>
<!-- fracture burst (amber): the "break the glass" radiating cracks -->
<g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
fill="none" opacity="0.92" filter="url(#glow)">
<path d="M256 256 L142 132"/>
<path d="M256 256 L120 250"/>
<path d="M256 256 L150 372"/>
<path d="M256 256 L372 380"/>
<path d="M256 256 L392 246"/>
<path d="M256 256 L360 138"/>
<!-- cross-cracks -->
<path d="M186 196 L150 250"/>
<path d="M210 320 L172 318" opacity="0.7"/>
<path d="M326 318 L356 350" opacity="0.7"/>
</g>
<!-- wrench, struck across the burst (cyan steel) -->
<g filter="url(#glow)">
<path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
d="M344 150
a62 62 0 0 0 -82 76
L150 338
a26 26 0 0 0 0 37
l11 11
a26 26 0 0 0 37 0
l112 -112
a62 62 0 0 0 76 -82
l-41 41
l-40 -11
l-11 -40
z"/>
<!-- handle highlight -->
<path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.5 KiB

View file

@ -2,12 +2,31 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
user-scaling keeps the cockpit layout stable under stress on mobile. -->
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
/>
<meta name="color-scheme" content="dark" />
<meta name="robots" content="noindex, nofollow" />
<!-- PWA / installable. theme-color tints the mobile status bar to the dark
theme; black-translucent lets the app draw under the iOS status bar. -->
<meta name="theme-color" content="#06080b" />
<link rel="manifest" href="./manifest.webmanifest" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="apple-mobile-web-app-title" content="breakglass" />
<link rel="apple-touch-icon" href="./apple-touch-icon.png" />
<link rel="icon" type="image/svg+xml" href="./icon.svg" />
<link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
<title>devvm breakglass</title>
<script type="module" crossorigin src="./assets/index-DjaW81Sq.js"></script>
<link rel="stylesheet" crossorigin href="./assets/index-DWHIP1Zw.css">
<script type="module" crossorigin src="./assets/index-CLbKo1Yx.js"></script>
<link rel="stylesheet" crossorigin href="./assets/index-BoWC1Onq.css">
</head>
<body>
<div id="app"></div>

View file

@ -0,0 +1,31 @@
{
"name": "devvm breakglass",
"short_name": "breakglass",
"description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
"start_url": "./",
"scope": "./",
"display": "standalone",
"orientation": "portrait",
"background_color": "#06080b",
"theme_color": "#06080b",
"icons": [
{
"src": "./icon.svg",
"type": "image/svg+xml",
"sizes": "any",
"purpose": "any maskable"
},
{
"src": "./icon-192.png",
"type": "image/png",
"sizes": "192x192",
"purpose": "any maskable"
},
{
"src": "./icon-512.png",
"type": "image/png",
"sizes": "512x512",
"purpose": "any maskable"
}
]
}

220
app/conversational.py Normal file
View file

@ -0,0 +1,220 @@
"""Conversational Brain — drives the Claude CLI for the portal-assistant gateway.
A lean, no-tools, multi-turn path (portal-assistant ADR-0002): no workspace clone,
no tool-enabled agent, and NO --dangerously-skip-permissions. Per-conversation
continuity comes from the Claude CLI's own --session-id / --resume, so the gateway
only has to hand us a stable session id per conversation.
"""
import asyncio
import json
import os
from subprocess import PIPE
CONVERSATIONAL_AGENT = "conversational"
# A spoken chat turn is short; a turn that runs longer than this is wedged.
CONVERSATIONAL_TIMEOUT_SECONDS = int(
os.environ.get("CONVERSATIONAL_TIMEOUT_SECONDS", "120")
)
# Latency: the conversational agent is no-tools (ADR-0002), so the CLI's default
# project context — this repo's CLAUDE.md, the MCP server configs, local settings
# — plus the dynamic system-prompt sections are pure overhead on a voice turn.
# Measured 2026-06-21: the default load is ~45k input tokens/turn -> ~3.4s TTFT;
# restricting settings to `user` and excluding the dynamic sections more than
# halves the context (~23k) and cuts TTFT to ~2.1s (~1.3s/turn faster) with no
# change to the reply. Applies to BOTH the gateway (json) and realtime (stream)
# paths, since both run the same no-tools conversational turn.
_LEAN_CONTEXT_FLAGS = [
"--setting-sources", "user",
"--exclude-dynamic-system-prompt-sections",
]
# Session ids the Claude CLI has already opened in THIS process, so a follow-up
# turn resumes instead of re-opening. In-memory + single-replica: a pod restart
# clears this AND the CLI's emptyDir session state together, so they stay in sync.
_started: set[str] = set()
def reset_started() -> None:
"""Forget all opened sessions (used by tests)."""
_started.clear()
def conversational_argv(
session_id: str, message: str, model: str, resume: bool
) -> list[str]:
"""Build the argv for one conversational turn.
A new conversation opens the session with --session-id; subsequent turns
continue it with --resume so Claude keeps its own context. We never pass
--dangerously-skip-permissions: the conversational agent has no tools and the
endpoint is public-facing, so nothing may be auto-permitted.
"""
argv = [
"claude", "-p",
"--agent", CONVERSATIONAL_AGENT,
"--output-format", "json",
"--model", model,
*_LEAN_CONTEXT_FLAGS,
]
argv += ["--resume", session_id] if resume else ["--session-id", session_id]
argv.append(message)
return argv
def extract_reply(output_lines: list[str]) -> str:
"""Pull the final assistant text out of `claude -p --output-format json`.
The CLI emits one JSON object with the final message under `result`; fall
back to the raw text if it isn't parseable so callers always get something.
"""
raw = "".join(output_lines).strip()
if not raw:
return ""
try:
parsed = json.loads(raw)
except json.JSONDecodeError:
return raw
if isinstance(parsed, dict):
for key in ("result", "content", "text"):
value = parsed.get(key)
if isinstance(value, str) and value:
return value
return raw
async def run_turn(session_id: str, message: str, model: str) -> dict:
"""Run one conversational turn and return {exit_code, reply, stderr}.
Resumes the Claude session if we've opened it before; otherwise opens it.
The session is only marked opened on success so a failed first turn can be
retried cleanly as a new one.
"""
resume = session_id in _started
argv = conversational_argv(session_id, message, model, resume)
proc = await asyncio.create_subprocess_exec(*argv, stdout=PIPE, stderr=PIPE)
assert proc.stdout is not None and proc.stderr is not None
output_lines: list[str] = []
async for line in proc.stdout:
output_lines.append(line.decode(errors="replace"))
stderr = await proc.stderr.read()
await proc.wait()
if proc.returncode == 0:
_started.add(session_id)
return {
"exit_code": proc.returncode,
"reply": extract_reply(output_lines),
"stderr": stderr.decode(errors="replace"),
}
# ---------------------------------------------------------------------------
# Streaming (OpenAI-compatible) path — token-level deltas for the realtime
# voice agent. Pipecat's OpenAILLMService streams from /v1/chat/completions and
# re-sends the FULL history each turn, so this path is STATELESS: the whole
# dialogue goes in the prompt and we run a fresh CLI with stream-json to relay
# incremental tokens as OpenAI chat-completion SSE chunks. (run_turn above stays
# the session-based path for the non-streaming gateway.)
# ---------------------------------------------------------------------------
def stream_argv(prompt: str, model: str) -> list[str]:
"""Argv for a STREAMING conversational turn (token deltas via stream-json).
Stateless the full conversation is in `prompt` (no --session-id/--resume).
`--include-partial-messages` makes the CLI emit `content_block_delta` token
events; `--verbose` is required by the CLI for stream-json under --print. No
--dangerously-skip-permissions: the conversational agent has no tools.
"""
return [
"claude", "-p",
"--agent", CONVERSATIONAL_AGENT,
"--model", model,
"--output-format", "stream-json",
"--include-partial-messages",
"--verbose",
*_LEAN_CONTEXT_FLAGS,
prompt,
]
def delta_text(line: str) -> str | None:
"""Extract the incremental assistant text from one stream-json line.
Returns the text of a `content_block_delta` / `text_delta` event, or None
for any other event (system, message_start, content_block_stop, result) or
an unparseable line.
"""
line = line.strip()
if not line:
return None
try:
event = json.loads(line)
except json.JSONDecodeError:
return None
if not isinstance(event, dict) or event.get("type") != "stream_event":
return None
inner = event.get("event") or {}
if inner.get("type") != "content_block_delta":
return None
delta = inner.get("delta") or {}
if delta.get("type") == "text_delta":
return delta.get("text") or None
return None
def openai_chunk(
completion_id: str,
model: str,
created: int,
*,
role: str | None = None,
content: str | None = None,
finish_reason: str | None = None,
) -> str:
"""Format one OpenAI `chat.completion.chunk` as an SSE `data:` line.
ensure_ascii=False keeps Cyrillic (Bulgarian) intact on the wire.
"""
delta: dict[str, str] = {}
if role is not None:
delta["role"] = role
if content is not None:
delta["content"] = content
payload = {
"id": completion_id,
"object": "chat.completion.chunk",
"created": created,
"model": model,
"choices": [{"index": 0, "delta": delta, "finish_reason": finish_reason}],
}
return "data: " + json.dumps(payload, ensure_ascii=False) + "\n\n"
def synthesise_chat_prompt(messages) -> str:
"""Flatten OpenAI chat messages into a dialogue prompt for the conversational
agent, KEEPING prior assistant turns.
Pipecat re-sends the full message history every call, so multi-turn context
is preserved here (statelessly) by replaying the dialogue. Each message is a
duck-typed object with `.role` and `.content`. System messages become a
preamble; user/assistant turns are rendered as a `User:`/`Assistant:`
dialogue ending on the latest user turn.
"""
system = [m.content for m in messages if m.role == "system" and m.content]
turns = []
for m in messages:
if m.role == "user" and m.content:
turns.append("User: " + m.content)
elif m.role == "assistant" and m.content:
turns.append("Assistant: " + m.content)
parts = []
if system:
parts.append("\n\n".join(system))
if turns:
parts.append("\n".join(turns))
return "\n\n".join(parts).strip()

View file

@ -2,6 +2,8 @@ import asyncio
import hmac
import json
import os
import shutil
import tempfile
import time
import uuid
from contextlib import asynccontextmanager
@ -10,9 +12,11 @@ from subprocess import PIPE
from typing import Any, Literal
from fastapi import FastAPI, HTTPException, Header
from fastapi.responses import JSONResponse
from fastapi.responses import JSONResponse, StreamingResponse
from pydantic import BaseModel, Field
from app import conversational
app = FastAPI(title="Claude Agent Service")
API_TOKEN = os.environ.get("API_BEARER_TOKEN", "")
@ -104,6 +108,15 @@ class ChatCompletionsRequest(BaseModel):
model_config = {"extra": "allow"}
class ConversationalRequest(BaseModel):
# The portal-assistant gateway owns the conversation; it hands us a stable
# session id (for Claude --resume) plus the next user message. Model is
# selectable per request, same as the OpenAI-compat path.
session_id: str
message: str
model: str | None = None
def verify_token(authorization: str | None):
# Reject everything when the service is unconfigured. compare_digest("", "")
# returns True, so without this guard an empty API_TOKEN would happily
@ -435,9 +448,6 @@ async def chat_completions(
):
verify_token(authorization)
if request.stream:
raise HTTPException(status_code=400, detail="streaming not supported")
model = request.model if request.model is not None else DEFAULT_MODEL
if model not in SUPPORTED_MODELS:
return JSONResponse(
@ -448,6 +458,64 @@ async def chat_completions(
},
)
# Streaming path (the realtime voice agent / Pipecat). Token-level deltas via
# the conversational (no-tools) agent in stream-json mode, relayed as
# OpenAI chat.completion.chunk SSE. Stateless: the full history is in the
# prompt (the client re-sends it each turn). No workspace clone — the
# conversational agent reads no files.
if request.stream:
if not _reserve_queue_slot():
return JSONResponse(
status_code=503,
content={"error": "execution failed", "detail": "queue full"},
)
prompt = conversational.synthesise_chat_prompt(request.messages)
completion_id = "chatcmpl-" + uuid.uuid4().hex[:24]
created = int(time.time())
spawn = asyncio.create_subprocess_exec # bound alias (keeps subprocess use tidy)
async def event_stream():
workspace = tempfile.mkdtemp(prefix="conv-stream-")
proc = None
try:
async with _execution_slot():
proc = await spawn(
*conversational.stream_argv(prompt, model),
cwd=workspace, stdout=PIPE, stderr=PIPE,
)
assert proc.stdout is not None
yield conversational.openai_chunk(
completion_id, model, created, role="assistant"
)
try:
async with asyncio.timeout(
conversational.CONVERSATIONAL_TIMEOUT_SECONDS
):
async for raw in proc.stdout:
text = conversational.delta_text(
raw.decode(errors="replace")
)
if text:
yield conversational.openai_chunk(
completion_id, model, created, content=text
)
except asyncio.TimeoutError:
pass # wedged turn — close the stream cleanly
yield conversational.openai_chunk(
completion_id, model, created, finish_reason="stop"
)
yield "data: [DONE]\n\n"
finally:
if proc is not None and proc.returncode is None:
try:
proc.kill()
await proc.wait()
except ProcessLookupError:
pass
shutil.rmtree(workspace, ignore_errors=True)
return StreamingResponse(event_stream(), media_type="text/event-stream")
prompt = _synthesise_prompt(request.messages)
if not _reserve_queue_slot():
@ -510,3 +578,56 @@ async def chat_completions(
"total_tokens": 0,
},
}
@app.post("/v1/conversational")
async def conversational_turn(
request: ConversationalRequest,
authorization: str | None = Header(default=None),
):
"""Lean, multi-turn conversational Brain for the portal-assistant gateway.
Drives a no-tools conversational agent with per-conversation --resume no
workspace clone, no tools (see portal-assistant ADR-0002). Returns the
assistant's reply text keyed to the caller's session id.
"""
verify_token(authorization)
model = request.model if request.model is not None else DEFAULT_MODEL
if model not in SUPPORTED_MODELS:
return JSONResponse(
status_code=400,
content={"error": "unsupported model", "supported": sorted(SUPPORTED_MODELS)},
)
if not _reserve_queue_slot():
return JSONResponse(
status_code=503,
content={"error": "execution failed", "detail": "queue full"},
)
try:
async with _execution_slot():
result = await asyncio.wait_for(
conversational.run_turn(request.session_id, request.message, model),
timeout=conversational.CONVERSATIONAL_TIMEOUT_SECONDS,
)
except asyncio.TimeoutError:
return JSONResponse(
status_code=503,
content={"error": "execution failed", "detail": "agent timed out"},
)
except Exception as exc: # noqa: BLE001
return JSONResponse(
status_code=503,
content={"error": "execution failed", "detail": _one_line(str(exc))},
)
if result["exit_code"] != 0:
detail = _one_line(result.get("stderr") or "") or f"exit {result['exit_code']}"
return JSONResponse(
status_code=503,
content={"error": "execution failed", "detail": detail},
)
return {"session_id": request.session_id, "reply": result["reply"]}

View file

@ -0,0 +1,259 @@
# AFK implementation pipeline — design
**Date:** 2026-06-14
**Status:** proposed — pilot pending (see "Pilot" below; no code yet)
**Scope:** A new autonomous path that turns a triaged `ready-for-agent` issue
into tested, deployed code with no human at the keyboard. `claude-agent-service`
becomes the **control plane**; a dedicated in-cluster **T3 Code** instance
becomes the **executor + cockpit**. Touches: `claude-agent-service` (new poller
+ dispatch + watcher), a new T3 stack in `infra/`, a shared SSD-NFS volume, and
the per-repo issue trackers.
> Provenance: this design is the output of a long grilling session
> (2026-06-14). It records the decisions *and* the alternatives that were
> considered and dropped, so the reasoning survives. The three hardest-to-reverse
> calls are split into ADRs 00020004.
## Problem
Today the development flow is **grill-with-docs → to-prd → to-issues → triage →
implement**, and *every* stage is human-in-the-loop (HITL), including
implementation. The owner wants the HITL boundary to stop at **design + spec**:
once an issue is triaged `ready-for-agent`, an agent should pick it up and
implement it **AFK** (away from keyboard) — write it test-first, push it, and
see it through to a healthy deploy — escalating to a human only when it genuinely
can't proceed.
Two gaps block this today:
- The only existing issue→agent automation is the **infra `issue-responder`**,
which fires on `user-report`/`feature-request` labels on the `infra` repo
only — not on `ready-for-agent`, not on the other sub-project repos that the
general design flow produces.
- `claude-agent-service` only ever clones `infra`, runs one-shot fire-and-forget
`claude -p` jobs (no session, no live stream, no attach), and has no
multi-repo checkout. The owner wants to *watch and steer* in-flight work, which
the batch model can't offer.
## Goal
- HITL covers design + spec only. Publishing `ready-for-agent` issues is the
release signal (the `to-issues` quiz is the review gate).
- An autonomous loop picks up unblocked `ready-for-agent` issues from
**enrolled** repos, implements them test-first, and lands them — pushing
straight to `master` so CI deploys them (see ADR 0002 for the risk posture).
- The owner can **see all in-flight workers and converse with any of them** from
one UI — the T3 cockpit (see ADR 0003).
- Reuse before building: lean on the existing CI/CD chain, the design skills, T3
Code's multi-agent cockpit, and the persistence/worktree machinery — rather
than hand-building a session console and a bespoke runtime.
## Design
### Roles: control plane vs executor + cockpit
| Concern | Owner |
|---|---|
| When to start, which issue, the prompt, the safety envelope | **claude-agent-service** (control plane) — poller + watcher |
| Running the agent (Claude Agent SDK), the worktree, the fleet UI | **T3 Code** (executor + cockpit) — one dedicated in-cluster instance |
| Build → image → deploy → rollout | existing CI/CD (GHA → ghcr → Woodpecker → Keel) |
| Issue queue + state | the per-repo GitHub issue trackers |
The pivotal constraint that forces this split: **T3 can only display sessions it
launched itself** — it has no command to adopt an externally-started session. So
"viewable in T3" ⟺ "launched by T3". To keep `claude-agent-service` in charge
*and* get the fleet view, the control plane **dispatches into T3** rather than
running `claude` itself. See ADR 0003.
### End-to-end flow
```
HUMAN (interactive session)
/grill-with-docs → /to-prd → /to-issues → /triage
└ produces ready-for-agent issues (dependency-ordered), labeled by a
trusted collaborator. Publishing them = the release signal.
══════════════════════ HANDOFF ══════════════════════
CONTROL PLANE (claude-agent-service, in-cluster)
poller CronJob (every few min):
for repo in allowlist:
skip repo if it already has an agent-in-progress issue (per-repo lock)
pick highest-priority ready-for-agent issue where:
• all "Blocked by" closed • labeled by a trusted collaborator
→ stamp agent-in-progress
→ POST /api/orchestration/dispatch (thread.turn.start + bootstrap:
create thread, prepare worktree, run setup, deliver the prompt)
EXECUTOR + COCKPIT (dedicated T3 instance, in-cluster)
runs the issue-implementer agent (our prompt) in the worktree:
read issue + AGENT-BRIEF + repo CONTEXT.md/ADRs → TDD red-green-refactor
→ commit (paraphrase issue, "Closes #N", AFK trailer) → push master
watcher (control plane) polls GET /api/orchestration/snapshot + CI:
├─ healthy ──────► comment + close issue, drop lock, notify ✅
├─ pre-push block ► do NOT push, relabel ready-for-human, escalate
└─ post-push red ► fix-forward (≤5 attempts / 60 min)
├─ recovers ► healthy
└─ exhausts ► FREEZE broken (preserve forensics),
relabel ready-for-human, hard page
```
### Trigger & dispatch predicate
A poller CronJob (mirrors the existing `beads-dispatcher` pattern; stays
in-cluster because neither the service nor T3 has public ingress). It dispatches
issue *I* in repo *R* iff **all** hold:
- `R` is in the **allowlist** ConfigMap, and the **kill switch** is off;
- `I` has label `ready-for-agent`, applied by a **trusted collaborator** (the
trust gate — on private repos only collaborators can label, so the label *is*
the authorization; external/bot issues never auto-run);
- every issue in `I`'s "Blocked by" is closed;
- `R` has no issue currently labeled `agent-in-progress` (the per-repo lock).
On dispatch it stamps `agent-in-progress`; on any terminal outcome it removes it.
### Concurrency & locking
**Parallel across repos, serial within a repo.** Multiple repos progress at
once; at most one agent per repo (two agents in one repo would collide on the
working tree). Enforced by the `agent-in-progress` label as a per-repo lock.
Starting value; raise later.
### Merge & failure posture — see ADR 0002
- **Always push to master** (no PR gate). Tests-green is the merge gate; CI +
rollback are the safety net, matching the human allow-then-audit model.
- **Pre-push** failure (can't get green / blocked / would need a disallowed op):
do *not* push; relabel `ready-for-human`; comment what was tried; page.
- **Post-push** failure (CI build or rollout red): **fix-forward** up to **5
attempts or 60 minutes**, then if still red **freeze in the broken state**
(preserve forensics — do not auto-revert), relabel `ready-for-human`, hard
page. The owner explicitly chose debuggability over availability here.
- **Budget:** `max_budget_usd = 100` per issue (time/attempt caps usually bite
first).
### Build/test environment & worktrees — see ADR 0004
The agent must run the target repo's test suite (TDD gate) before pushing.
Therefore:
- **Local toolchains scoped to the allowlist** — the executor image carries only
the *enrolled* repos' runtimes; the toolchain set grows in lockstep with the
allowlist.
- **Persistent per-repo checkout + `git worktree` per issue** on a shared
**SSD-NFS** volume, so git objects, installed deps, and package-manager caches
stay warm across jobs. This **supersedes** the throwaway `git clone --local`
model from `2026-06-02-parallel-execution-design.md`; that rejection was
correct for *concurrent* same-repo jobs, but the serial-within-repo choice
here removes the `.git` contention it guarded against (ADR 0004). It pays off
precisely because `to-issues` clusters many slices in one repo, processed
serially — slice N reuses the warm checkout slice 1 paid for.
### T3 integration: thin dispatch — see ADR 0003
The control plane holds a capability-scoped **`orchestration:operate`** bearer
token (minted via `t3 auth`, stored in Vault, refreshed for the 1-hour expiry)
and calls T3's HTTP API:
- `POST /api/orchestration/dispatch``thread.turn.start` with a `bootstrap`
that creates the thread, prepares the worktree, optionally runs a setup
script, and delivers the prompt — one call spawns a worktree-isolated worker.
- `GET /api/orchestration/snapshot` → the full fleet read-model (per-thread
`running`/`idle`/`error`, `hasPendingUserInput`, `hasPendingApprovals`,
`branch`, `worktreePath`). T3 has **no outbound webhooks**, so the watcher
**polls** this to drive CI-watch, freeze, and label transitions.
The AFK *behavior and safety* (issue-implementer prompt, guardrails, always-push,
fix-forward/freeze, issue integration) live in **our** thin layer, so T3 is a
**swappable, version-pinned backend** — never Keel-auto-upgraded, reversible to a
self-hosted runtime if it goes sideways.
### Observability & interaction
The "active sessions layer" and the "attach and converse" surface **converge
into one screen — the T3 cockpit**: a live list of all worker threads grouped by
project; click one to stream its transcript and send it a turn. This dissolves
the earlier intermediate ideas of a generalized-breakglass console and a
raw-tmux hybrid attach — T3 provides converse / approve / resume natively
(`thread.user-input.respond`, `thread.approval.respond`).
Cross-system, durable signals the control plane still emits:
- **Phase-checklist comment** on the issue, edited in place as phases complete
(worktree → tests-red → green → pushed → CI → deployed). Durable, low-noise,
lives on the issue, doubles as audit trail.
- **Loki** logs labeled `{repo, issue}` for deep-dive.
- **Presence** claim per running session (`repo:<name>`, purpose `AFK #N`),
heartbeated — so AFK work shows up next to human sessions in the layer the
prompt hook already injects.
- **Doorbell**: Slack / ntfy ping on terminal states, deep-linking into the T3
thread. Notify, not control — the dedicated-Slack-control-plane idea is
dropped in favour of the T3 cockpit.
### Safety envelope
- **Trust gate** — only collaborator-labeled `ready-for-agent` issues run.
- **Allowlist** — a repo is untouchable until enrolled (prereqs: tests + GHA CI
+ `CONTEXT.md`). Start with 12 repos; expand deliberately.
- **Kill switch** — one ConfigMap flag pauses all pickup (the Keel
scale-to-0 reflex, built in from day one).
- **Per-repo lock** — ≤1 agent per repo.
- **Guardrails** (reused from `issue-responder`) — no PVC/PV deletes, no direct
Vault edits, no force-push to master, infra changes Terraform-only, never
`[ci skip]`.
- **Identity & audit** — shared service identity; each commit body paraphrases
the issue and carries `Closes #N` + an AFK-agent trailer, so the commit
message stays the audit trail.
## Parameters (chosen starting values — all tunable)
| Knob | Value |
|---|---|
| Merge gate | always push to master |
| Post-push failure | fix-forward, then freeze-broken |
| Fix-forward cap | 5 attempts **or** 60 minutes |
| Per-issue budget | `max_budget_usd = 100` |
| Concurrency | parallel across repos, serial within a repo |
| Repo scope | opt-in allowlist, start small |
| Progress detail | phase-checklist on issue + Loki logs |
| Alert channel | Slack (+ ntfy), as a doorbell into T3 |
| Executor | dedicated in-cluster T3 (thin dispatch), version-pinned |
## Pilot — validate before wiring the poller
The thin model rests on five unknowns. Stand up the dedicated T3 instance and
drive a couple of allowlist-repo issues **by hand** via the dispatch API to
confirm each, *before* building the poller and committing the architecture:
1. **Per-thread custom agent + skip-permissions** — can a dispatched thread
carry *our* `issue-implementer` system prompt and run unattended without
stalling on T3's approval gating? *(biggest unknown)*
2. **Dispatch auth** — mint `orchestration:operate`, store in Vault, refresh the
1-hour token.
3. **Status/completion** — drive CI-watch/freeze/labels purely from polling
`GET /api/orchestration/snapshot`.
4. **Worktree reconciliation** — T3's native `prepareWorktree` vs our
persistent-checkout-with-warm-caches; pick one or make them cooperate on the
volume.
5. **The in-cluster T3 pod** — headless `t3 serve --no-browser`, version-pinned
and **Keel-excluded**, internal ingress + Authentik, with tokens / toolchains
/ SSD volume / `claude auth` provisioned.
## Relationship to prior decisions
- **Supersedes** the worktree rejection in
`2026-06-02-parallel-execution-design.md` (contextualized, not contradicted —
ADR 0004).
- **Drops** two intermediate ideas explored and rejected this session:
evolving `claude-agent-service` into its own session/tmux/worktree runtime,
and building a bespoke breakglass-generalized console — both replaced by T3.
- **Reuses** the `issue-responder` guardrails, the CI/CD chain, the
`beads-dispatcher` CronJob pattern, presence, Loki, and the design skills.
## Out of scope / open questions
- Raw-terminal "take-over" of a worker (T3 is a GUI cockpit, not a terminal); if
ever needed, that's a separate add-on.
- Multi-tenant T3 (it is single-operator by design — fine, it matches the shared
service identity).
- Cross-repo dependency orchestration beyond per-issue "Blocked by".
- T3 Code is pre-1.0 (~v0.0.x) and churny; the version-pin + Keel-exclude +
swappable-backend discipline (ADR 0003) is the mitigation.

View file

@ -0,0 +1,69 @@
# AFK agents push straight to master; failures fix-forward then freeze, not revert
The AFK implementation pipeline (see
`docs/2026-06-14-afk-implementation-pipeline-design.md`) lets an autonomous
agent land code with no human at the keyboard. The owner deliberately chose the
most hands-off posture: **AFK-written code pushes straight to `master`** (which
then deploys via the existing CI/CD chain) with **no pull-request review gate**,
and when a deploy breaks, the agent **fixes forward and then freezes the broken
state** rather than auto-reverting. This ADR records that risk posture and why it
was chosen over the safer alternatives, because it is surprising and not cheap to
walk back once callers and habits depend on it.
## Status
accepted (2026-06-14) — posture decided; enforced once the pipeline ships
(pilot-gated).
## Context
`master` on every enrolled repo deploys continuously (GHA build → ghcr →
Woodpecker → Keel). So "where AFK code lands" is really "what reaches a live
deploy without a human looking". The owner weighed three merge gates and three
post-push failure responses and picked the autonomy-maximizing end of both,
accepting the blast radius explicitly.
## Considered options — merge gate
- **Always push to master (chosen).** Tests-green is the gate; CI + rollback are
the safety net. Matches the existing human allow-then-audit model (non-admins
already push straight to master). Most hands-off.
- **Adaptive (push if confident, else PR)** — rejected as the *default* though it
is what `issue-responder` does; the owner wanted full hands-off, not a
confidence-gated PR for otherwise-working code.
- **Always open a PR** — rejected: reintroduces a human merge step on every
issue, i.e. "AFK implementation, human merge" — not the goal.
## Considered options — post-push failure (CI/rollout goes red after a green push)
- **Fix-forward then freeze (chosen).** Iterate with corrective commits up to
**5 attempts or 60 minutes**; if still red, **leave the broken state in place**
(do not revert), relabel the issue `ready-for-human`, and hard-page. Same
forensics-first instinct as the breakglass (ADR 0001): preserve the exact
failing state for debugging rather than auto-cleaning it away.
- **Auto-revert + escalate** — rejected (was the recommendation): restores green
fastest, but destroys the forensic state the owner wants to inspect.
- **Alert and freeze immediately (no fix-forward)** — rejected: gives up on
transient/env-drift failures a corrective commit would clear.
Pre-push failure (can't reach green, blocked, or would need a disallowed op) is
not a dilemma: the agent does **not** push, relabels `ready-for-human`, comments
what it tried, and pages.
## Consequences
- An unreviewed logic error can deploy before any human sees it; rollback (not
review) is the safety net. Bounded by: tests-as-gate, the start-small
allowlist, the per-repo lock, and the kill switch.
- A frozen-broken deploy can sit unhealthy until the owner answers the page —
availability is traded for debuggability, by explicit choice. Acceptable
because enrolled repos are non-critical by the allowlist prerequisite, and the
owner is paged hard (Slack + ntfy).
- Fix-forward can stack up to 5 commits on a bad change before freezing; the
60-minute cap bounds the churn window.
- Per-issue spend is capped at `max_budget_usd = 100`.
- Guardrails still hold underneath this posture: no PVC/PV deletes, no direct
Vault edits, no force-push, infra changes Terraform-only, never `[ci skip]`.
- Reversible: tightening to adaptive/PR or to auto-revert is a config + watcher
change, not a re-architecture — but callers/habits will have formed around
"it just lands", so flag loudly if reversing.

View file

@ -0,0 +1,70 @@
# AFK workers run inside a dedicated T3 Code instance; claude-agent-service dispatches into it
The owner wants one UI to see and converse with every in-flight AFK worker, and
named **T3 Code** (the self-hosted multi-agent cockpit already running at
`t3.viktorbarzin.me`) as that UI. Research into T3's source
(`pingdotgg/t3code`, ~v0.0.27) found it is genuinely built for this — a fleet of
worker "threads" with a live read-model and a scoped HTTP dispatch API — **but**
it can only display sessions **it launched itself**; there is no command to adopt
a session another process started. So "viewable in T3" ⟺ "launched by T3". This
ADR records the resulting architecture: `claude-agent-service` stays the
**control plane** and **dispatches into a dedicated, in-cluster T3 instance**
which is the **executor + cockpit**. The agent runs inside T3; we keep the brain.
## Status
accepted (2026-06-14) — direction decided; **gated on a pilot** (the five
unknowns in the design doc) before the poller is wired and the architecture is
committed.
## Why T3, and why "thin"
T3 provides, out of the box, what we would otherwise hand-build: a three-panel
fleet cockpit (`projects → threads → conversation`), an
`OrchestrationReadModel` with per-thread live status, and
`POST /api/orchestration/dispatch` whose `thread.turn.start` + `bootstrap` can
**create a thread, prepare a git worktree, run a setup script, and deliver a
prompt in one call** — exactly the worker-spawn primitive. Converse / approve /
resume are native (`thread.user-input.respond`, `thread.approval.respond`). For
Claude it embeds `@anthropic-ai/claude-agent-sdk`.
"Thin" = the AFK *behavior and safety* (the `issue-implementer` prompt,
guardrails, always-push, fix-forward/freeze, CI-watch, issue integration) live
in **our** layer (the poller + watcher), not in T3. T3 is a **swappable backend**
we drive over its API.
## Considered options
- **Thin: claude-agent-service dispatches into T3 (chosen).** Control plane calls
T3's dispatch API; T3 runs the agent in a worktree and shows it. Get the fleet
view, keep the brain, least to build. Cost: execution moves into the T3 pod, so
T3's runtime is in the *hot path* (not just the window).
- **claude-agent-service runs the agent, T3 only displays it** — rejected because
it is impossible: T3 cannot adopt an externally-started session
(`thread.session.set` is server-internal; no external-session-id field). This
is the constraint that shaped the whole decision.
- **Deep: claude-agent-service as a custom T3 provider (ACP-style)** — rejected
for now: keeps the runtime ours with a T3 UI, but means building and
maintaining a provider against a pre-1.0, internal, no-contributions interface
— effectively a fork. Revisit only if "thin" proves too limiting.
- **Skip T3; build our own console** (generalized breakglass + tmux) — rejected:
most stable and fully in-house, but abandons the owner's explicit "see workers
in T3" goal and means owning a session console forever.
## Consequences
- A **dedicated in-cluster T3 instance** (a pod, consistent with the earlier
in-cluster-over-devvm substrate choice) is the worker host, separate from the
per-user devvm T3 instances. It needs the SSD worktree volume, git/Anthropic
tokens, toolchains, `claude auth`, and an internal Authentik-gated ingress.
- T3's runtime is now in the **execution hot path** — its maturity affects
whether work *runs*, not only whether it can be *seen*. Mitigations: **pin the
version and exclude it from Keel** (its churn + hard-cutover auth migrations
make auto-upgrade a Keel-class hazard), keep the integration thin and the
backend swappable, and **pilot** the five unknowns first.
- T3 is **single-operator** — fine here: it matches the already-accepted shared
service identity for AFK work.
- No outbound webhooks from T3 → the watcher **polls**
`GET /api/orchestration/snapshot`.
- This supersedes the intermediate ideas of evolving `claude-agent-service` into
its own session/tmux/worktree runtime and building a bespoke attach console.

View file

@ -0,0 +1,68 @@
# Implementation agents use persistent per-repo checkouts + git worktrees, reversing the throwaway-clone rule for this path
`2026-06-02-parallel-execution-design.md` deliberately **rejected git worktrees**
and chose throwaway `git clone --local` per job, "because worktrees share one
`.git` → agents that `git commit`/`pull` still contend — not truly independent".
The AFK implementation pipeline
(`docs/2026-06-14-afk-implementation-pipeline-design.md`) **reverses that for its
own path**: each enrolled repo gets a **persistent checkout**, and each issue
runs in a **`git worktree`** off it, on a shared **SSD-NFS** volume. This ADR
records why the earlier rejection does not apply here — so the two decisions
read as complementary, not contradictory.
## Status
accepted (2026-06-14) — for the AFK implementation path only; the existing
job-runner (recruiter-triage, nextcloud-todos, etc.) keeps throwaway clones.
## Why the 2026-06-02 rejection doesn't bind this path
The rejection's premise was **concurrent jobs in the same checkout** contending
on `.git/index.lock` and racing `git pull`. The AFK pipeline's concurrency model
is **serial within a repo, parallel only across repos** (ADR-adjacent decision in
the design doc): at most one agent ever touches a given repo's `.git` at a time,
and different repos are different checkouts. The contention the rejection guarded
against cannot occur here. With that removed, worktrees become the *better*
choice because they unlock cache reuse the throwaway model can't.
## Considered options
- **Persistent checkout + worktree per issue, on SSD-NFS (chosen).** Warm git
objects, **persisted `node_modules`/venv/build caches**, and shared
package-manager caches survive across jobs, so the TDD loop stops reinstalling
deps every run. Compounds with `to-issues` clustering many slices in one repo,
processed serially — slice N reuses slice 1's warm tree.
- **Throwaway `git clone --local` per job (status quo elsewhere)** — rejected for
this path: correct for the concurrent job-runner, but re-pays dependency
install on every issue, which dominates wall-clock for an
implement-test-fix-forward loop.
- **`cp -a` of a warm tree** — rejected (same reason as 2026-06-02): copies
accumulated caches → disk blowup, and no git isolation.
## Considered options — storage
- **SSD-NFS (chosen).** The current `/persistent` PVC is `5Gi` **HDD NFS**
(`nfs-truenas``/srv/nfs`) and unused; git checkouts + `node_modules` are
death-by-small-files on HDD NFS and 5Gi is too small. Provision an SSD-backed
NFS class over `/srv/nfs-ssd` (other apps already use that path) at a realistic
size (tens of GB).
- **HDD NFS / `/persistent` as-is** — rejected: too slow for many small files,
too small.
- **Local block (proxmox-lvm)** — rejected: faster but HDD and node-pinned (RWO),
lost on reschedule; NFS RWX survives and the volume also holds session state.
## Consequences
- One **SSD-NFS volume** holds, per enrolled repo: the persistent checkout, the
warm dep/package caches, and (under ADR 0003) the worktrees T3 prepares. Cache
env (`pip`, `GOMODCACHE`/`GOCACHE`, `PNPM_HOME`/npm, cargo) must be wired to it
— today caching is off (`pip --no-cache-dir`, no cache envs set).
- Housekeeping the throwaway model didn't need: `git fetch` before each
`worktree add`, periodic `git worktree prune` + `git gc`, and cache eviction if
the volume fills.
- **`infra` stays on its own path** — it is git-crypt, and editing encrypted
files from a worktree is disallowed; the persistent-worktree model is for the
non-`infra` app repos in the allowlist.
- Open reconciliation (pilot): whether T3's native `prepareWorktree` writes into
this volume + our persistent checkouts, or we manage the checkout and point T3
at it. Resolve before committing the architecture.

View file

@ -2,9 +2,28 @@
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
user-scaling keeps the cockpit layout stable under stress on mobile. -->
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
/>
<meta name="color-scheme" content="dark" />
<meta name="robots" content="noindex, nofollow" />
<!-- PWA / installable. theme-color tints the mobile status bar to the dark
theme; black-translucent lets the app draw under the iOS status bar. -->
<meta name="theme-color" content="#06080b" />
<link rel="manifest" href="./manifest.webmanifest" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
<meta name="apple-mobile-web-app-title" content="breakglass" />
<link rel="apple-touch-icon" href="./apple-touch-icon.png" />
<link rel="icon" type="image/svg+xml" href="./icon.svg" />
<link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
<title>devvm breakglass</title>
</head>
<body>

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

64
frontend/public/icon.svg Normal file
View file

@ -0,0 +1,64 @@
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
<defs>
<!-- layered near-black surface, matching the app theme -->
<radialGradient id="bg" cx="68%" cy="22%" r="92%">
<stop offset="0%" stop-color="#12303a"/>
<stop offset="42%" stop-color="#0b0f14"/>
<stop offset="100%" stop-color="#06080b"/>
</radialGradient>
<linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
<stop offset="0%" stop-color="#7df0f3"/>
<stop offset="55%" stop-color="#3dd1d6"/>
<stop offset="100%" stop-color="#1f6f72"/>
</linearGradient>
<filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
<feGaussianBlur stdDeviation="7" result="b"/>
<feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
</filter>
</defs>
<!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
<rect width="512" height="512" rx="112" fill="url(#bg)"/>
<rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
<!-- faint scanline texture -->
<g opacity="0.05" stroke="#ffffff" stroke-width="2">
<line x1="0" y1="148" x2="512" y2="148"/>
<line x1="0" y1="220" x2="512" y2="220"/>
<line x1="0" y1="292" x2="512" y2="292"/>
<line x1="0" y1="364" x2="512" y2="364"/>
</g>
<!-- fracture burst (amber): the "break the glass" radiating cracks -->
<g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
fill="none" opacity="0.92" filter="url(#glow)">
<path d="M256 256 L142 132"/>
<path d="M256 256 L120 250"/>
<path d="M256 256 L150 372"/>
<path d="M256 256 L372 380"/>
<path d="M256 256 L392 246"/>
<path d="M256 256 L360 138"/>
<!-- cross-cracks -->
<path d="M186 196 L150 250"/>
<path d="M210 320 L172 318" opacity="0.7"/>
<path d="M326 318 L356 350" opacity="0.7"/>
</g>
<!-- wrench, struck across the burst (cyan steel) -->
<g filter="url(#glow)">
<path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
d="M344 150
a62 62 0 0 0 -82 76
L150 338
a26 26 0 0 0 0 37
l11 11
a26 26 0 0 0 37 0
l112 -112
a62 62 0 0 0 76 -82
l-41 41
l-40 -11
l-11 -40
z"/>
<!-- handle highlight -->
<path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
</g>
</svg>

After

Width:  |  Height:  |  Size: 2.5 KiB

View file

@ -0,0 +1,31 @@
{
"name": "devvm breakglass",
"short_name": "breakglass",
"description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
"start_url": "./",
"scope": "./",
"display": "standalone",
"orientation": "portrait",
"background_color": "#06080b",
"theme_color": "#06080b",
"icons": [
{
"src": "./icon.svg",
"type": "image/svg+xml",
"sizes": "any",
"purpose": "any maskable"
},
{
"src": "./icon-192.png",
"type": "image/png",
"sizes": "192x192",
"purpose": "any maskable"
},
{
"src": "./icon-512.png",
"type": "image/png",
"sizes": "512x512",
"purpose": "any maskable"
}
]
}

View file

@ -1,100 +1,294 @@
<script>
import { onMount } from 'svelte';
import { openSession } from './lib/api.js';
import { onMount, onDestroy } from 'svelte';
import {
openSession,
attachStream,
sendPrompt,
cancelTurn,
loadSessionId,
saveSessionId,
clearSessionId,
} from './lib/api.js';
import { createTranscript, reduceEvent } from './lib/transcript.js';
import Chat from './Chat.svelte';
import VmControls from './VmControls.svelte';
// ── session lifecycle ────────────────────────────────────────────────────
// ── lifecycle state ───────────────────────────────────────────────────────
// link: connecting | attached | error (the EventSource to the session)
let link = $state('connecting');
let linkError = $state('');
let sessionId = $state('');
let sessionState = $state('connecting'); // connecting | ready | error
let sessionError = $state('');
let streaming = $state(false);
let caughtUp = $state(false); // replay drained → live tailing
let turnActive = $state(false); // a turn is running (Stop shown, Send off)
let sending = $state(false); // a prompt POST is in flight
// Mobile: the VM controls live in a slide-up sheet. Desktop: a side column
// (CSS hides the toggle and pins the sheet open as a column ≥900px).
// The transcript is folded with a plain mutable object; we bump `rev` to
// notify the view of in-place mutations (cheaper than cloning the whole
// message list on every streamed token). `tx` is $state too, so REASSIGNING
// it (reset / new session) also propagates to the Chat prop. $state.raw keeps
// the object un-proxied so the hot per-token path stays a plain mutation.
let tx = $state.raw(createTranscript());
let rev = $state(0);
let es = null; // the live EventSource
// Mobile: VM controls live in a slide-up sheet. Desktop (≥900px): a column.
let showControls = $state(false);
async function newSession() {
sessionState = 'connecting';
sessionError = '';
try {
sessionId = await openSession();
sessionState = 'ready';
} catch (err) {
sessionState = 'error';
sessionError = err instanceof Error ? err.message : String(err);
function resetTranscript() {
tx = createTranscript();
rev++;
}
function onEvent(ev) {
if (reduceEvent(tx, ev)) {
// turn liveness tracks the folder's view of the stream, so a turn started
// in ANOTHER tab (or before a reload) still flips us into "active".
turnActive = tx.activeUserSeen;
rev++;
}
}
onMount(newSession);
function onLiveSession(id) {
if (id) sessionId = id;
function closeStream() {
if (es) {
es.close();
es = null;
}
}
const shortId = $derived(sessionId ? sessionId.slice(0, 8) : '────────');
const dotState = $derived(
sessionState === 'error' ? 'error' : streaming ? 'busy' : sessionState === 'ready' ? 'ready' : 'idle'
function attach(id) {
closeStream();
sessionId = id;
caughtUp = false;
link = 'connecting';
linkError = '';
es = attachStream(id, {
onOpen: () => {
// a successful (re)connection clears any prior transient error
if (link !== 'attached') link = 'attached';
linkError = '';
},
onCaughtUp: () => {
caughtUp = true;
link = 'attached';
},
onEvent,
onError: () => {
// EventSource auto-reconnects on a transient drop (readyState
// CONNECTING). Only a terminal CLOSED state is a hard failure. The
// server keeps the turn running regardless, so we surface a soft note
// and let the browser retry.
if (es && es.readyState === EventSource.CLOSED) {
link = 'error';
linkError = 'lost the connection to the session — retrying…';
// a closed source won't retry itself; re-attach to the same id.
setTimeout(() => {
if (sessionId === id) attach(id);
}, 1500);
} else {
link = 'connecting';
}
},
});
}
async function bootstrap() {
link = 'connecting';
linkError = '';
resetTranscript();
const existing = loadSessionId();
if (existing) {
// Reuse the persisted id and attach. If it's gone (pod restart → 404 on
// the stream), the EventSource errors; we detect the 404-shaped close and
// mint a fresh session below.
attach(existing);
// Probe liveness: if the attach can't open within a grace window AND the
// id is stale, create a new one. We rely on onError(CLOSED) for the 404.
return;
}
await createFresh();
}
async function createFresh() {
try {
link = 'connecting';
const id = await openSession();
saveSessionId(id);
attach(id);
} catch (err) {
link = 'error';
linkError = err instanceof Error ? err.message : String(err);
}
}
// "New session": archive the local id, mint a new one, re-attach.
async function newSession() {
if (turnActive || sending) return;
closeStream();
clearSessionId();
resetTranscript();
turnActive = false;
await createFresh();
}
// Send a prompt (typed or a preset). Output arrives via the attach stream.
async function submitPrompt(prompt) {
const text = (prompt || '').trim();
if (!text || turnActive || sending) return;
if (!sessionId) {
await createFresh();
if (!sessionId) return;
}
sending = true;
turnActive = true; // optimistic: the working indicator shows immediately
try {
const res = await sendPrompt({ session_id: sessionId, prompt: text });
if (res.status === 'busy') {
flash = 'A turn is already running.';
// turn really is active; keep the indicator, the stream will end it.
} else if (res.status === 'gone') {
// session evaporated (pod restart). Re-create and resend once.
clearSessionId();
await createFresh();
if (sessionId) await sendPrompt({ session_id: sessionId, prompt: text });
}
} catch (err) {
flash = err instanceof Error ? err.message : String(err);
turnActive = tx.activeUserSeen; // back off the optimistic flag on failure
} finally {
sending = false;
}
}
async function stopTurn() {
if (!sessionId) return;
try {
await cancelTurn(sessionId);
// turn_end / cancelled events arrive via the stream and flip turnActive.
} catch (err) {
flash = err instanceof Error ? err.message : String(err);
}
}
// a transient toast (409 / network blips), auto-cleared
let flash = $state('');
let flashTimer;
$effect(() => {
if (flash) {
clearTimeout(flashTimer);
flashTimer = setTimeout(() => (flash = ''), 4200);
}
});
onMount(bootstrap);
onDestroy(closeStream);
// ── header status lamp ──────────────────────────────────────────────────
// One quietly-living "system pulse": idle/connecting (cyan breathe),
// working (amber pulse), error (steady red — the ONLY non-power red, used
// sparingly for the lamp because connection loss IS the emergency here).
const lamp = $derived(
link === 'error'
? 'error'
: turnActive
? 'working'
: link === 'attached'
? 'live'
: 'connecting'
);
const lampLabel = $derived(
{
error: 'link down',
working: 'agent working',
live: 'attached',
connecting: 'connecting',
}[lamp]
);
const shortId = $derived(sessionId ? sessionId.slice(0, 8) : '········');
</script>
<div class="shell">
<header class="rail">
<header class="rail rise-in" style="--d:0ms">
<div class="rail-title">
<span class="glyph" aria-hidden="true">🔧</span>
<h1>devvm <span class="accent">breakglass</span></h1>
<span class="brand-mark" aria-hidden="true">
<!-- breakglass glyph: a wrench struck through a fracture line -->
<svg viewBox="0 0 24 24" width="22" height="22" fill="none" stroke="currentColor"
stroke-width="1.6" stroke-linecap="round" stroke-linejoin="round">
<path d="M15.5 5.5a3.6 3.6 0 0 0-4.7 4.4L4 16.7 7.3 20l6.8-6.8a3.6 3.6 0 0 0 4.4-4.7l-2.2 2.2-2.2-.6-.6-2.2 2-2.6Z" />
<path class="frac" d="M3 3l3.2 4.1L4.4 8.6 7 12" stroke-dasharray="2 2.4" />
</svg>
</span>
<h1>devvm<span class="accent"> breakglass</span></h1>
</div>
<div class="rail-right">
<span class="rail-status">
<span class="dot dot--{dotState}" aria-hidden="true"></span>
{#if sessionState === 'error'}
<span class="session-bad">offline</span>
{:else if sessionState === 'connecting'}
<span class="session-meta">connecting…</span>
{:else}
<code class="session-id" title={sessionId}>{shortId}</code>
{/if}
<span class="lamp-wrap" title={lampLabel}>
<span class="lamp lamp--{lamp}" aria-hidden="true"></span>
<span class="lamp-text lamp-text--{lamp}">
{#if lamp === 'error'}
link down
{:else if lamp === 'working'}
working
{:else if lamp === 'live'}
<code class="sid">{shortId}</code>
{:else}
connecting
{/if}
</span>
</span>
<!-- Mobile-only: open the VM control sheet. Hidden on desktop (column). -->
<button
class="controls-toggle"
class="rail-btn rail-btn--vm"
onclick={() => (showControls = true)}
aria-label="Open direct VM controls"
>
<span class="controls-toggle-label">VM</span>
<span class="bolt" aria-hidden="true"></span><span class="rail-btn-label">VM</span>
</button>
<button
class="new-session"
class="rail-btn"
onclick={newSession}
disabled={streaming || sessionState === 'connecting'}
title={streaming ? 'wait for the current turn to finish' : 'start a fresh session'}
disabled={turnActive || sending || link === 'connecting'}
title={turnActive ? 'wait for the current turn to finish' : 'archive this session and start fresh'}
>
New
</button>
</div>
</header>
{#if sessionState === 'error'}
<div class="rail-error" role="alert">
Can't reach the breakglass backend — {sessionError}. The cluster or network
may be down. The <strong>⚡ VM</strong> power controls still work without the chat.
{#if link === 'error'}
<div class="rail-note" role="alert">
<span>{linkError || "Can't reach the breakglass backend."}</span>
<span class="rail-note-aside">The <strong>⚡ VM</strong> power controls still work without the chat.</span>
<button class="rail-note-retry" onclick={bootstrap}>Reconnect</button>
</div>
{/if}
{#if flash}
<div class="toast" role="status">{flash}</div>
{/if}
<main class="stage">
<section class="chat-pane" aria-label="Recovery chat">
<section class="chat-pane rise-in" style="--d:80ms" aria-label="Recovery chat">
<Chat
{sessionId}
sessionReady={sessionState === 'ready'}
{onLiveSession}
onStreamingChange={(v) => (streaming = v)}
{tx}
{rev}
{caughtUp}
{turnActive}
sending={sending}
linkState={link}
onSubmit={submitPrompt}
onStop={stopTurn}
/>
</section>
<aside class="controls-pane" class:open={showControls} aria-label="Direct VM control">
<aside
class="controls-pane rise-in"
class:open={showControls}
style="--d:160ms"
aria-label="Direct VM control"
>
<div class="sheet-grip" aria-hidden="true"></div>
<div class="controls-head">
<span class="controls-head-title">Direct VM control</span>
@ -104,7 +298,6 @@
</aside>
</main>
<!-- backdrop behind the mobile sheet -->
<button
class="sheet-backdrop"
class:show={showControls}
@ -119,43 +312,51 @@
height: 100%;
display: flex;
flex-direction: column;
max-width: 1500px;
max-width: 1520px;
margin: 0 auto;
/* honour the notch on landscape / edge-to-edge */
padding-left: var(--safe-left);
padding-right: var(--safe-right);
}
/* ── status rail (compact, single row on mobile) ─────────────────────── */
/* ── status rail ───────────────────────────────────────────────────────── */
.rail {
display: flex;
align-items: center;
justify-content: space-between;
gap: 10px;
padding: 10px 14px;
padding: max(10px, var(--safe-top)) 14px 10px;
border-bottom: 1px solid var(--line);
background:
linear-gradient(180deg, rgba(61, 209, 214, 0.03), transparent 60%),
linear-gradient(180deg, rgba(255, 255, 255, 0.015), transparent);
flex: none;
}
.rail-title {
display: flex;
align-items: baseline;
gap: 9px;
align-items: center;
gap: 10px;
min-width: 0;
}
.glyph {
font-size: 17px;
transform: translateY(2px);
filter: saturate(0.85);
.brand-mark {
color: var(--cyan);
display: inline-flex;
filter: drop-shadow(0 0 10px rgba(61, 209, 214, 0.35));
flex: none;
}
.brand-mark .frac { color: var(--amber); stroke: var(--amber); opacity: 0.85; }
h1 {
margin: 0;
font-family: var(--mono);
font-size: 16px;
font-weight: 600;
letter-spacing: 0.02em;
letter-spacing: 0.04em;
color: var(--ink);
white-space: nowrap;
}
.accent {
color: var(--cyan);
text-shadow: 0 0 18px rgba(61, 209, 214, 0.35);
text-shadow: 0 0 18px rgba(61, 209, 214, 0.4);
}
.rail-right {
@ -164,90 +365,158 @@
gap: 8px;
flex: none;
}
.rail-status {
/* the living system-pulse lamp */
.lamp-wrap {
display: inline-flex;
align-items: center;
gap: 7px;
gap: 8px;
padding: 0 4px;
font-family: var(--mono);
font-size: 12px;
}
.session-id {
color: var(--cyan);
letter-spacing: 0.04em;
}
.session-meta {
color: var(--amber);
}
.session-bad {
color: var(--danger-bright);
}
.dot {
width: 9px;
height: 9px;
.lamp {
position: relative;
width: 10px;
height: 10px;
border-radius: 50%;
flex: none;
background: var(--ink-faint);
}
.dot--ready {
/* a soft halo ring that pulses outward — the "instrument is powered" tell */
.lamp::after {
content: '';
position: absolute;
inset: -4px;
border-radius: 50%;
border: 1px solid currentColor;
opacity: 0;
}
.lamp--live {
background: var(--cyan);
box-shadow: 0 0 10px 1px rgba(61, 209, 214, 0.6);
animation: breathe 3.4s ease-in-out infinite;
color: var(--cyan);
box-shadow: 0 0 10px 1px rgba(61, 209, 214, 0.65);
animation: lamp-breathe 3.6s ease-in-out infinite;
}
.dot--busy {
.lamp--live::after { animation: lamp-ring 3.6s ease-out infinite; }
.lamp--connecting {
background: var(--cyan-dim);
color: var(--cyan);
animation: lamp-blink 1.4s ease-in-out infinite;
}
.lamp--working {
background: var(--amber);
color: var(--amber);
box-shadow: 0 0 10px 1px rgba(245, 182, 87, 0.7);
animation: pulse 1s ease-in-out infinite;
animation: lamp-pulse 1s ease-in-out infinite;
}
.dot--error {
.lamp--working::after { animation: lamp-ring 1s ease-out infinite; }
.lamp--error {
background: var(--danger);
color: var(--danger);
box-shadow: 0 0 10px 1px var(--danger-glow);
animation: lamp-pulse 1.2s ease-in-out infinite;
}
@keyframes breathe { 0%, 100% { opacity: 0.55; } 50% { opacity: 1; } }
@keyframes pulse {
0%, 100% { transform: scale(0.82); opacity: 0.7; }
50% { transform: scale(1.15); opacity: 1; }
@keyframes lamp-breathe { 0%, 100% { opacity: 0.6; } 50% { opacity: 1; } }
@keyframes lamp-blink { 0%, 100% { opacity: 0.35; } 50% { opacity: 0.9; } }
@keyframes lamp-pulse {
0%, 100% { transform: scale(0.82); opacity: 0.75; }
50% { transform: scale(1.12); opacity: 1; }
}
@keyframes lamp-ring {
0% { opacity: 0.5; transform: scale(0.6); }
70% { opacity: 0; transform: scale(1.8); }
100% { opacity: 0; transform: scale(1.8); }
}
.lamp-text {
letter-spacing: 0.04em;
color: var(--ink-dim);
max-width: 88px;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.lamp-text--live .sid { color: var(--cyan); letter-spacing: 0.06em; }
.lamp-text--working { color: var(--amber); }
.lamp-text--error { color: var(--danger-bright); }
.lamp-text--connecting { color: var(--ink-faint); }
.sid { font-family: var(--mono); }
/* On the tightest phones the title + lamp text + two buttons crowd; keep the
living dot (the system pulse) and drop the text label until there's room. */
@media (max-width: 439px) {
.lamp-text { display: none; }
.lamp-wrap { padding: 0; }
}
/* touch-friendly buttons */
.controls-toggle,
.new-session {
min-height: 40px;
padding: 0 13px;
/* rail buttons — touch-first (≥44px tall via padding + line height) */
.rail-btn {
min-height: 44px;
padding: 0 14px;
border-radius: var(--radius-sm);
border: 1px solid var(--line-strong);
background: var(--bg-2);
color: var(--ink-dim);
font-size: 13px;
letter-spacing: 0.02em;
letter-spacing: 0.03em;
display: inline-flex;
align-items: center;
gap: 5px;
gap: 6px;
transition: border-color 0.15s, background 0.15s, color 0.15s;
}
.controls-toggle {
border-color: #5a4a2a;
.rail-btn:hover:not(:disabled) { border-color: var(--line-bright); color: var(--ink); }
.rail-btn:active:not(:disabled) { background: var(--bg-3); }
.rail-btn:disabled { opacity: 0.42; }
.rail-btn--vm {
border-color: var(--amber-dim);
color: var(--amber);
}
.controls-toggle:active,
.new-session:active {
background: var(--bg-3);
}
.new-session:disabled {
opacity: 0.45;
}
.rail-btn--vm:hover:not(:disabled) { border-color: var(--amber); color: var(--amber); }
.bolt { font-size: 13px; line-height: 1; }
.rail-error {
.rail-note {
margin: 10px 12px 0;
padding: 11px 14px;
padding: 10px 13px;
border: 1px solid var(--danger-deep);
border-left-width: 3px;
background: rgba(255, 77, 77, 0.07);
color: #ffd5d5;
color: #ffd9d9;
border-radius: var(--radius-sm);
font-size: 13px;
line-height: 1.5;
display: flex;
flex-wrap: wrap;
align-items: center;
gap: 6px 12px;
flex: none;
}
.rail-note-aside { color: #f0b8b8; }
.rail-note-aside strong { color: #fff; font-family: var(--mono); }
.rail-note-retry {
margin-left: auto;
border: 1px solid var(--danger-deep);
background: transparent;
color: var(--danger-bright);
border-radius: 6px;
padding: 6px 12px;
font-size: 12px;
min-height: 36px;
}
.rail-note-retry:hover { background: rgba(255, 77, 77, 0.12); }
.toast {
margin: 10px 12px 0;
padding: 9px 13px;
border: 1px solid var(--line-strong);
border-left: 3px solid var(--amber);
background: var(--bg-2);
color: var(--amber);
border-radius: var(--radius-sm);
font-family: var(--mono);
font-size: 12.5px;
line-height: 1.45;
flex: none;
animation: rise-in 0.28s ease-out both;
}
/* ── stage ───────────────────────────────────────────────────────────── */
.stage {
@ -271,31 +540,37 @@
right: 0;
bottom: 0;
z-index: 40;
max-height: 86dvh;
overflow-y: auto;
max-height: 88dvh;
display: flex;
flex-direction: column;
background: var(--bg-1);
border-top: 1px solid var(--line-strong);
border-radius: 16px 16px 0 0;
box-shadow: 0 -18px 40px rgba(0, 0, 0, 0.55);
padding: 8px 14px calc(14px + env(safe-area-inset-bottom));
transform: translateY(101%);
transition: transform 0.26s cubic-bezier(0.32, 0.72, 0, 1);
border-radius: var(--radius-lg) var(--radius-lg) 0 0;
box-shadow: var(--shadow-sheet);
padding: 8px 14px calc(14px + var(--safe-bottom));
transform: translateY(102%);
transition: transform 0.3s cubic-bezier(0.32, 0.72, 0, 1);
/* the rise-in entrance is for the desktop column; the sheet is transform-
controlled, so cancel the shared keyframe here. */
animation: none !important;
}
.controls-pane.open {
transform: translateY(0);
}
.sheet-grip {
width: 38px;
width: 40px;
height: 4px;
border-radius: 99px;
background: var(--line-strong);
background: var(--line-bright);
margin: 4px auto 10px;
flex: none;
}
.controls-head {
display: flex;
align-items: center;
justify-content: space-between;
margin-bottom: 10px;
flex: none;
}
.controls-head-title {
font-family: var(--mono);
@ -305,14 +580,15 @@
color: var(--amber);
}
.sheet-close {
width: 34px;
height: 34px;
width: 40px;
height: 40px;
border-radius: var(--radius-sm);
border: 1px solid var(--line-strong);
background: var(--bg-2);
color: var(--ink-dim);
font-size: 14px;
}
.sheet-close:active { background: var(--bg-3); }
.sheet-backdrop {
position: fixed;
@ -320,40 +596,40 @@
z-index: 30;
border: 0;
padding: 0;
background: rgba(0, 0, 0, 0.55);
background: rgba(2, 4, 7, 0.62);
backdrop-filter: blur(1.5px);
opacity: 0;
pointer-events: none;
transition: opacity 0.22s;
transition: opacity 0.24s;
}
.sheet-backdrop.show {
opacity: 1;
pointer-events: auto;
}
/* ── desktop: controls become a static side column, sheet chrome gone ── */
/* ── desktop: controls become a static side column ─────────────────────── */
@media (min-width: 900px) {
.rail {
padding: 14px 18px;
}
.rail { padding: 14px 18px; }
h1 { font-size: 19px; }
.stage {
display: grid;
grid-template-columns: minmax(0, 1fr) 372px;
grid-template-columns: minmax(0, 1fr) 384px;
gap: 16px;
padding: 16px 18px 18px;
}
.chat-pane { display: flex; }
.controls-toggle { display: none; }
.rail-btn--vm { display: none; }
.controls-pane {
position: static;
max-height: none;
overflow: visible;
transform: none;
box-shadow: none;
border: none;
border-radius: 0;
padding: 0;
z-index: auto;
animation: rise-in 0.5s cubic-bezier(0.22, 0.61, 0.36, 1) both !important;
animation-delay: var(--d, 0ms) !important;
}
.sheet-grip,
.controls-head,

View file

@ -1,128 +1,105 @@
<script>
import { tick } from 'svelte';
import { streamChat } from './lib/api.js';
import ToolChip from './ToolChip.svelte';
let {
sessionId = '',
sessionReady = false,
onLiveSession = (/** @type {string} */ _id) => {},
onStreamingChange = (/** @type {boolean} */ _v) => {},
tx, // the folded transcript state (plain object, see lib/transcript.js)
rev = 0, // bumped on every in-place mutation to retrigger reactivity
caughtUp = false, // replay drained → staggered reveal may run
turnActive = false, // a turn is running: show Stop, hide Send
sending = false, // a prompt POST is in flight (brief)
linkState = 'connecting', // connecting | attached | error
onSubmit = (/** @type {string} */ _p) => {},
onStop = () => {},
} = $props();
/**
* Message model. A user message is plain text. An assistant message is an
* ordered list of parts so streamed prose and tool chips interleave in the
* exact order the agent emitted them:
* { role:'assistant', parts:[{type:'text',text}|{type:'tool',name,command}],
* result?: {is_error, text, duration_ms}, error?: string }
* @type {Array<any>}
*/
let messages = $state([]);
// The five quick-action presets — the mobile win: one tap, no typing.
const PRESETS = [
{
label: 'Triage',
icon: '◑',
prompt:
'Triage the devvm: uptime, load, memory, swap, disk usage, failed systemd units, and the last 30 lines of dmesg. Summarize what\'s wrong.',
},
{
label: 'Memory / OOM',
icon: '▦',
prompt:
'Check devvm memory pressure: free -h, top memory consumers, any recent OOM-kills in dmesg/journal, and swap usage. Is it OOMing?',
},
{
label: 'Disk',
icon: '▤',
prompt:
'What\'s filling the devvm disk? df -h, then the biggest directories/files under the fullest mount. Anything safe to clear?',
},
{
label: 'Services',
icon: '⚙',
prompt:
'List failed or stuck systemd units on the devvm (systemctl --failed) and show the status + recent journal lines for any that are down.',
},
{
label: 'QEMU wedged?',
icon: '◫',
prompt:
'Is the devvm\'s QEMU wedged (I/O stall)? Check guest responsiveness over SSH, then ssh pve forensics for VM 102\'s qm status/QMP/guest-agent. Tell me if a cycle is needed.',
},
];
let draft = $state('');
let streaming = $state(false);
let scroller; // the scroll viewport
let scroller;
let inputEl;
let pinnedToBottom = true; // auto-scroll only while the user is at the bottom
let pinnedToBottom = true;
const canSend = $derived(sessionReady && !streaming && draft.trim().length > 0);
// re-derive the message list whenever the folder mutates (rev bump). The
// transcript is folded with in-place mutation on a $state.raw object, so no
// reference changes on its own — we depend on `rev` explicitly and rebuild
// fresh objects (message + its parts array) so Svelte's keyed {#each} re-
// renders streamed prose/chips on every token. Transcripts are small; the
// per-token copy is cheap and keeps the hot streaming path bug-free.
const messages = $derived(
rev >= 0 && tx
? tx.messages.map((m) =>
m.role === 'assistant' ? { ...m, parts: m.parts.slice() } : { ...m }
)
: []
);
const isEmpty = $derived(messages.length === 0);
const canSend = $derived(linkState !== 'error' && !turnActive && draft.trim().length > 0);
const inputReady = $derived(!turnActive);
// ── scrolling ─────────────────────────────────────────────────────────────
// ── auto-scroll (only while pinned to the bottom) ─────────────────────────
function onScroll() {
if (!scroller) return;
const gap = scroller.scrollHeight - scroller.scrollTop - scroller.clientHeight;
pinnedToBottom = gap < 60;
pinnedToBottom = gap < 64;
}
async function scrollToBottom(force = false) {
if (!force && !pinnedToBottom) return;
await tick();
if (scroller) scroller.scrollTop = scroller.scrollHeight;
}
// ── streaming a turn ────────────────────────────────────────────────────────
function lastAssistant() {
return messages[messages.length - 1];
}
function appendText(text) {
const msg = lastAssistant();
const parts = msg.parts;
const tail = parts[parts.length - 1];
if (tail && tail.type === 'text') {
tail.text += text;
} else {
parts.push({ type: 'text', text });
}
messages = messages; // notify Svelte of the in-place mutation
}
function handleEvent(ev) {
switch (ev?.kind) {
case 'session':
onLiveSession(ev.session_id);
break;
case 'text':
if (ev.text) appendText(ev.text);
break;
case 'tool': {
// Bash carries a `command`; other tools just show their name.
const command =
ev.input && typeof ev.input.command === 'string' ? ev.input.command : '';
lastAssistant().parts.push({ type: 'tool', name: ev.name || 'tool', command });
messages = messages;
break;
}
case 'result':
lastAssistant().result = {
is_error: Boolean(ev.is_error),
text: typeof ev.result === 'string' ? ev.result : '',
duration_ms: typeof ev.duration_ms === 'number' ? ev.duration_ms : null,
};
messages = messages;
break;
case 'error':
lastAssistant().error = ev.error || 'unknown error';
messages = messages;
break;
case 'done':
// handled by the stream completing; nothing to render
break;
default:
break;
}
// any transcript change → keep the view pinned if the user is at the bottom
$effect(() => {
rev; // track
scrollToBottom();
});
function fire(prompt) {
if (turnActive) return;
pinnedToBottom = true;
onSubmit(prompt);
scrollToBottom(true);
}
async function send() {
const prompt = draft.trim();
if (!prompt || streaming || !sessionReady) return;
messages.push({ role: 'user', text: prompt });
messages.push({ role: 'assistant', parts: [] });
messages = messages;
function send() {
const text = draft.trim();
if (!text || turnActive) return;
draft = '';
streaming = true;
onStreamingChange(true);
pinnedToBottom = true;
await scrollToBottom(true);
try {
await streamChat({ session_id: sessionId, prompt }, handleEvent);
} catch (err) {
// Network/transport failure (backend down, connection dropped mid-stream).
const msg = lastAssistant();
if (msg && msg.role === 'assistant' && !msg.error) {
msg.error =
(err instanceof Error ? err.message : String(err)) +
' — the connection to the agent failed.';
messages = messages;
}
} finally {
streaming = false;
onStreamingChange(false);
await scrollToBottom();
inputEl?.focus();
}
fire(text);
// restore single-row height after clearing
tick().then(() => inputEl?.focus());
}
function onKeydown(e) {
@ -130,7 +107,7 @@
e.preventDefault();
send();
}
// Shift+Enter falls through to insert a newline.
// Shift+Enter → newline (default behaviour)
}
function fmtDuration(ms) {
@ -139,7 +116,12 @@
return `${(ms / 1000).toFixed(ms < 10000 ? 1 : 0)} s`;
}
const isEmpty = $derived(messages.length === 0);
// a freshly-attached transcript reveals with a brief stagger; cap the delay
// so a long replay doesn't animate forever.
function revealDelay(i) {
if (!caughtUp) return 0;
return Math.min(i, 6) * 45;
}
</script>
<div class="chat">
@ -150,41 +132,58 @@
<div class="stream" bind:this={scroller} onscroll={onScroll}>
{#if isEmpty}
<div class="empty">
<div class="empty-mark"></div>
<p class="empty-title">The agent is standing by.</p>
<div class="empty" class:dim={linkState === 'connecting'}>
<div class="empty-mark" aria-hidden="true"></div>
<p class="empty-title">
{#if linkState === 'error'}
The agent is unreachable.
{:else if linkState === 'connecting'}
Attaching to the session…
{:else}
The agent is standing by.
{/if}
</p>
<p class="empty-sub">
Describe the symptom — "devvm is unreachable", "disk full", "ssh hangs"
— and it will connect over SSH, investigate, and stream its work here.
For a hard power action when the agent can't help, use
<strong>Direct VM control</strong>.
{#if linkState === 'error'}
The cluster or network may be down. You can still power-cycle the VM
with <strong>⚡ Direct VM control</strong> — it needs no agent.
{:else}
Tap a preset below or describe the symptom — "devvm unreachable",
"disk full", "ssh hangs" — and it will connect over SSH, investigate,
and stream its work here. For a hard power action, use
<strong>⚡ Direct VM control</strong>.
{/if}
</p>
</div>
{/if}
{#each messages as msg, i (i)}
{#each messages as msg (msg.key)}
{#if msg.role === 'user'}
<div class="row row--user">
<div class="row row--user rise-in" style="--d:{revealDelay(0)}ms">
<div class="bubble bubble--user">{msg.text}</div>
</div>
{:else}
<div class="row row--assistant">
<div class="row row--assistant rise-in" style="--d:{revealDelay(0)}ms">
<div class="bubble bubble--assistant">
{#if msg.parts.length === 0 && !msg.result && !msg.error}
{#if msg.parts.length === 0 && !msg.result && !msg.error && !msg.cancelled}
<span class="thinking" aria-label="working">
<span></span><span></span><span></span>
</span>
{/if}
{#each msg.parts as part, j (j)}
{#if part.type === 'text'}
<span class="prose">{part.text}</span>
{:else}
<ToolChip name={part.name} command={part.command} />
{/if}
{#if part.type === 'text'}<span class="prose">{part.text}</span>{:else}<ToolChip name={part.name} command={part.command} />{/if}
{/each}
{#if msg.error}
<div class="turn-note turn-note--error">{msg.error}</div>
<div class="turn-note turn-note--error">
<span class="turn-note-tag">error</span>
<span class="turn-note-body">{msg.error}</span>
</div>
{:else if msg.cancelled}
<div class="turn-note turn-note--muted">
<span class="turn-note-tag">stopped</span>
<span class="turn-note-body">turn cancelled</span>
</div>
{:else if msg.result}
<div class="turn-note {msg.result.is_error ? 'turn-note--error' : 'turn-note--ok'}">
<span class="turn-note-tag">{msg.result.is_error ? 'failed' : 'done'}</span>
@ -200,36 +199,61 @@
{/each}
</div>
<form
class="composer"
onsubmit={(e) => {
e.preventDefault();
send();
}}
>
{#if streaming}
<div class="working-bar" aria-live="polite">
<span class="working-dots"><span></span><span></span><span></span></span>
agent working — streaming live
</div>
{/if}
<div class="composer-row">
<textarea
bind:this={inputEl}
bind:value={draft}
onkeydown={onKeydown}
placeholder={sessionReady
? 'Describe the problem… (Enter to send · Shift+Enter for a new line)'
: 'Waiting for a session…'}
rows="1"
disabled={!sessionReady || streaming}
spellcheck="false"
></textarea>
<button type="submit" class="send" disabled={!canSend}>
{streaming ? '…' : 'Send'}
</button>
<div class="dock">
<!-- quick-action preset bar: horizontally scrollable, one-tap prompts -->
<div class="presets" role="group" aria-label="Quick actions">
{#each PRESETS as p (p.label)}
<button
class="preset"
onclick={() => fire(p.prompt)}
disabled={turnActive || linkState === 'error'}
title={p.prompt}
>
<span class="preset-icon" aria-hidden="true">{p.icon}</span>
<span class="preset-label">{p.label}</span>
</button>
{/each}
</div>
</form>
<form
class="composer"
onsubmit={(e) => {
e.preventDefault();
send();
}}
>
{#if turnActive}
<div class="working-bar" aria-live="polite">
<span class="working-dots"><span></span><span></span><span></span></span>
<span>agent working — streaming live</span>
</div>
{/if}
<div class="composer-row">
<textarea
bind:this={inputEl}
bind:value={draft}
onkeydown={onKeydown}
placeholder={inputReady
? 'Describe the problem… (Enter to send · Shift+Enter for a new line)'
: 'A turn is running — Stop it to type, or wait…'}
rows="1"
disabled={!inputReady}
spellcheck="false"
enterkeyhint="send"
></textarea>
{#if turnActive}
<button type="button" class="stop" onclick={onStop} title="Stop the running turn">
<span class="stop-glyph" aria-hidden="true"></span>
Stop
</button>
{:else}
<button type="submit" class="send" disabled={!canSend}>
{sending ? '···' : 'Send'}
</button>
{/if}
</div>
</form>
</div>
</div>
<style>
@ -249,9 +273,10 @@
display: flex;
align-items: baseline;
gap: 12px;
padding: 13px 18px;
padding: 12px 18px;
border-bottom: 1px solid var(--line);
background: linear-gradient(180deg, rgba(255, 255, 255, 0.015), transparent);
background: linear-gradient(180deg, rgba(255, 255, 255, 0.018), transparent);
flex: none;
}
.chat-head-label {
font-family: var(--mono);
@ -263,13 +288,16 @@
.chat-head-hint {
font-size: 12px;
color: var(--ink-faint);
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.stream {
flex: 1;
min-height: 0;
overflow-y: auto;
padding: 20px 18px 8px;
padding: 20px 16px 10px;
display: flex;
flex-direction: column;
gap: 14px;
@ -279,23 +307,27 @@
/* empty state */
.empty {
margin: auto;
max-width: 460px;
max-width: 470px;
text-align: center;
padding: 28px 12px;
padding: 24px 14px;
color: var(--ink-dim);
}
.empty.dim { opacity: 0.8; }
.empty-mark {
font-size: 40px;
font-size: 42px;
color: var(--cyan-dim);
line-height: 1;
margin-bottom: 14px;
text-shadow: 0 0 24px rgba(61, 209, 214, 0.25);
text-shadow: 0 0 26px rgba(61, 209, 214, 0.3);
animation: lamp-breathe 3.6s ease-in-out infinite;
}
@keyframes lamp-breathe { 0%, 100% { opacity: 0.7; } 50% { opacity: 1; } }
.empty-title {
font-family: var(--mono);
color: var(--ink);
font-size: 15px;
margin: 0 0 8px;
letter-spacing: 0.01em;
}
.empty-sub {
font-size: 13px;
@ -303,32 +335,23 @@
color: var(--ink-faint);
margin: 0;
}
.empty-sub strong {
color: var(--ink-dim);
font-weight: 600;
}
.empty-sub strong { color: var(--ink-dim); font-weight: 600; }
.row {
display: flex;
}
.row--user {
justify-content: flex-end;
}
.row--assistant {
justify-content: flex-start;
}
.row { display: flex; }
.row--user { justify-content: flex-end; }
.row--assistant { justify-content: flex-start; }
.bubble {
max-width: 86%;
max-width: 88%;
border-radius: 13px;
padding: 11px 14px;
font-size: 14px;
line-height: 1.6;
line-height: 1.62;
word-wrap: break-word;
overflow-wrap: anywhere;
}
.bubble--user {
background: linear-gradient(180deg, #15333a, #0f262c);
background: linear-gradient(180deg, #123036, #0d2329);
border: 1px solid var(--cyan-dim);
color: #d8f6f7;
border-bottom-right-radius: 4px;
@ -341,12 +364,9 @@
border-bottom-left-radius: 4px;
color: var(--ink);
}
/* prose renders inline so text and tool chips share the same flow */
.prose {
white-space: pre-wrap;
}
.prose { white-space: pre-wrap; }
/* in-flight assistant "thinking" dots */
/* in-flight "thinking" dots */
.thinking,
.working-dots {
display: inline-flex;
@ -363,19 +383,15 @@
animation: blink 1.2s infinite ease-in-out;
}
.thinking span:nth-child(2),
.working-dots span:nth-child(2) {
animation-delay: 0.18s;
}
.working-dots span:nth-child(2) { animation-delay: 0.18s; }
.thinking span:nth-child(3),
.working-dots span:nth-child(3) {
animation-delay: 0.36s;
}
.working-dots span:nth-child(3) { animation-delay: 0.36s; }
@keyframes blink {
0%, 80%, 100% { opacity: 0.25; transform: translateY(0); }
40% { opacity: 1; transform: translateY(-2px); }
}
/* turn result / error footer inside the assistant bubble */
/* turn result / error / stopped footer inside the assistant bubble */
.turn-note {
margin-top: 10px;
padding: 7px 10px;
@ -396,9 +412,16 @@
color: #bff5d3;
}
.turn-note--error {
background: rgba(255, 77, 77, 0.08);
border: 1px solid var(--danger-deep);
color: #ffd5d5;
/* the error tint here is amber-leaning text on a faint warm wash, NOT the
reserved power-action red — a turn error is not a destructive action. */
background: rgba(245, 182, 87, 0.06);
border: 1px solid var(--amber-dim);
color: #f7d49a;
}
.turn-note--muted {
background: rgba(255, 255, 255, 0.02);
border: 1px solid var(--line-strong);
color: var(--ink-faint);
}
.turn-note-tag {
text-transform: uppercase;
@ -409,20 +432,55 @@
border: 1px solid currentColor;
opacity: 0.85;
}
.turn-note-body {
flex: 1;
min-width: 0;
}
.turn-note-time {
margin-left: auto;
color: var(--ink-faint);
.turn-note-body { flex: 1; min-width: 0; }
.turn-note-time { margin-left: auto; color: var(--ink-faint); }
/* ── dock: presets + composer, pinned to the bottom ────────────────────── */
.dock {
flex: none;
border-top: 1px solid var(--line);
background: linear-gradient(0deg, rgba(255, 255, 255, 0.015), transparent);
}
/* ── composer ─────────────────────────────────────────────────────────── */
.presets {
display: flex;
gap: 8px;
overflow-x: auto;
padding: 11px 12px 4px;
scrollbar-width: none;
-webkit-overflow-scrolling: touch;
/* fade the right edge to hint there's more to scroll */
mask-image: linear-gradient(90deg, transparent 0, #000 14px, #000 calc(100% - 18px), transparent 100%);
}
.presets::-webkit-scrollbar { display: none; }
.preset {
flex: none;
min-height: 38px;
display: inline-flex;
align-items: center;
gap: 7px;
padding: 0 13px;
border-radius: 999px;
border: 1px solid var(--line-strong);
background: var(--bg-2);
color: var(--ink-dim);
font-family: var(--mono);
font-size: 12.5px;
letter-spacing: 0.02em;
white-space: nowrap;
transition: border-color 0.15s, color 0.15s, background 0.15s, transform 0.06s;
}
.preset:hover:not(:disabled) {
border-color: var(--cyan-dim);
color: var(--ink);
background: var(--bg-3);
}
.preset:active:not(:disabled) { transform: translateY(1px); }
.preset:disabled { opacity: 0.4; }
.preset-icon { color: var(--cyan); font-size: 12px; }
.composer {
border-top: 1px solid var(--line);
padding: 12px;
background: linear-gradient(0deg, rgba(255, 255, 255, 0.012), transparent);
padding: 8px 12px calc(12px + var(--safe-bottom));
}
.working-bar {
display: flex;
@ -431,7 +489,7 @@
font-family: var(--mono);
font-size: 12px;
color: var(--amber);
padding: 0 4px 9px;
padding: 2px 4px 9px;
letter-spacing: 0.02em;
}
.composer-row {
@ -442,13 +500,13 @@
textarea {
flex: 1;
resize: none;
max-height: 168px;
max-height: 160px;
min-height: 48px;
background: var(--bg-2);
color: var(--ink);
border: 1px solid var(--line-strong);
border-radius: var(--radius-sm);
padding: 12px 13px;
padding: 13px 13px;
font-family: var(--sans);
/* 16px: anything smaller makes iOS Safari auto-zoom on focus (mobile is the
primary client) — the zoom then shifts the composer out of view. */
@ -458,39 +516,60 @@
transition: border-color 0.15s, box-shadow 0.15s;
field-sizing: content; /* progressive: auto-grows where supported */
}
textarea::placeholder {
color: var(--ink-faint);
}
textarea::placeholder { color: var(--ink-faint); }
textarea:focus {
border-color: var(--cyan-dim);
box-shadow: 0 0 0 3px rgba(61, 209, 214, 0.12);
}
textarea:disabled {
opacity: 0.55;
}
textarea:disabled { opacity: 0.55; }
.send {
.send,
.stop {
flex: none;
align-self: stretch;
min-width: 78px;
min-width: 82px;
min-height: 48px;
padding: 0 18px;
border-radius: var(--radius-sm);
border: 1px solid var(--cyan-dim);
background: linear-gradient(180deg, #19474b, #103539);
color: #d8f6f7;
font-size: 13px;
font-weight: 600;
letter-spacing: 0.04em;
transition: filter 0.15s, border-color 0.15s, opacity 0.15s;
letter-spacing: 0.05em;
transition: filter 0.15s, border-color 0.15s, opacity 0.15s, background 0.15s;
}
.send:hover:not(:disabled) {
filter: brightness(1.22);
border-color: var(--cyan);
.send {
border: 1px solid var(--cyan-dim);
background: linear-gradient(180deg, #16464a, #0e3438);
color: #d8f6f7;
}
.send:hover:not(:disabled) { filter: brightness(1.24); border-color: var(--cyan); }
.send:disabled {
opacity: 0.4;
background: var(--bg-2);
border-color: var(--line-strong);
color: var(--ink-faint);
}
/* Stop is NOT red — red is reserved for destructive VM power. Stop is a calm
neutral control with a square "halt" glyph. */
.stop {
display: inline-flex;
align-items: center;
justify-content: center;
gap: 8px;
border: 1px solid var(--line-bright);
background: var(--bg-3);
color: var(--ink);
}
.stop:hover { border-color: var(--ink-faint); filter: brightness(1.1); }
.stop-glyph {
width: 10px;
height: 10px;
border-radius: 2px;
background: var(--amber);
box-shadow: 0 0 8px rgba(245, 182, 87, 0.55);
animation: lamp-pulse 1s ease-in-out infinite;
}
@keyframes lamp-pulse {
0%, 100% { transform: scale(0.85); opacity: 0.8; }
50% { transform: scale(1.08); opacity: 1; }
}
</style>

View file

@ -293,7 +293,8 @@
align-items: center;
justify-content: center;
gap: 8px;
padding: 9px 15px;
min-height: 44px; /* touch target */
padding: 10px 16px;
border-radius: var(--radius-sm);
font-size: 13px;
font-weight: 600;
@ -408,7 +409,8 @@
}
.confirm-yes {
flex: 1;
padding: 9px;
min-height: 44px;
padding: 10px;
border-radius: var(--radius-sm);
border: 1px solid var(--danger-bright);
background: var(--danger);
@ -424,7 +426,8 @@
}
.confirm-no {
flex: 1;
padding: 9px;
min-height: 44px;
padding: 10px;
border-radius: var(--radius-sm);
border: 1px solid var(--line-strong);
background: var(--bg-2);

View file

@ -1,48 +1,70 @@
/*
devvm breakglass global theme
A recovery console: dark, high-contrast, terminal-adjacent. Calm by default;
danger is the only loud thing on the screen. No external fonts/CDNs system
monospace carries the identity, system sans carries readable prose.
Emergency recovery console / instrument panel. Dark, high-contrast, monospace
identity, calm by default. Danger (red) is reserved EXCLUSIVELY for the
destructive VM power actions nothing else on the screen is ever red. No
external fonts/CDNs (air-gapped cluster): a refined system-monospace stack
carries the identity, system-sans carries readable prose. Distinctiveness is
earned through composition, the living "system pulse" lamp, motion, hairlines,
and the reserved danger treatment not through a downloaded typeface.
*/
:root {
/* Surfaces — a near-black slate with cool undertone, layered for depth. */
--bg-0: #07090c; /* page base */
--bg-1: #0c1015; /* panel */
--bg-2: #11171e; /* raised panel / input */
--bg-3: #161d26; /* chips, hover */
--bg-term: #06080a; /* command-output panels */
/* Surfaces — a near-black slate with a cool undertone, layered for depth. */
--bg-0: #06080b; /* page base (darkened from #07090c for crisper AA) */
--bg-1: #0b0f14; /* panel */
--bg-2: #10161d; /* raised panel / input */
--bg-3: #161e27; /* chips, hover */
--bg-term: #05070a; /* command-output panels */
/* Hairlines & text */
--line: #1d2630;
--line: #1c2530;
--line-strong: #2a3744;
--ink: #e6edf3; /* primary text */
--ink-dim: #9bb0c0; /* secondary text */
--ink-faint: #5d7185; /* labels, meta */
--line-bright: #3a4a5a;
--ink: #e9eff5; /* primary text */
--ink-dim: #9bb0c0; /* secondary text — 8.0:1 on bg-2 */
/* labels/meta — was #5d7185 (3.6:1, fails AA). Lifted to 6.1:1 on bg-2. */
--ink-faint: #8499ab;
/* Accents */
--cyan: #3dd1d6; /* "system alive" — links, focus, session dot */
/* Accents — the "alive" cyan is the spine of the calm palette. */
--cyan: #3dd1d6; /* "system alive" — links, focus, session pulse */
--cyan-bright: #62e3e7;
--cyan-dim: #1f6f72;
--cyan-deep: #0e3133;
--amber: #f5b657; /* working / in-flight */
--amber-dim: #6a5226;
--green: #5ddb8e; /* healthy exit */
--green-dim: #1f5f3d;
/* Danger — reserved EXCLUSIVELY for mutating actions. Nothing else is red. */
/* Danger — reserved EXCLUSIVELY for mutating power actions. Nothing else red. */
--danger: #ff4d4d;
--danger-bright: #ff6363;
--danger-deep: #7a1717;
--danger-glow: rgba(255, 77, 77, 0.35);
--radius: 10px;
--radius-sm: 7px;
--radius: 11px;
--radius-sm: 8px;
--radius-lg: 16px;
--mono: ui-monospace, "JetBrains Mono", "SF Mono", "Cascadia Code",
"Fira Code", Menlo, Consolas, "Liberation Mono", monospace;
/* A refined, deliberately-ordered monospace stack. We lead with faces that
have real character (Berkeley Mono / JetBrains / Cascadia / SF Mono) and
fall back gracefully but ship nothing; whatever the device has carries
the cockpit-readout identity. */
--mono: "Berkeley Mono", ui-monospace, "JetBrains Mono", "SF Mono",
"Cascadia Code", "Fira Code", "Source Code Pro", Menlo, Consolas,
"Liberation Mono", monospace;
--sans: ui-sans-serif, system-ui, -apple-system, "Segoe UI", Roboto,
"Helvetica Neue", Arial, sans-serif;
--shadow-panel: 0 1px 0 rgba(255, 255, 255, 0.02) inset,
0 16px 40px -24px rgba(0, 0, 0, 0.9);
--shadow-panel: 0 1px 0 rgba(255, 255, 255, 0.025) inset,
0 18px 44px -26px rgba(0, 0, 0, 0.95);
--shadow-sheet: 0 -22px 48px -12px rgba(0, 0, 0, 0.7);
/* Safe-area shorthands (notch / home-indicator). 0px fallback off-device. */
--safe-top: env(safe-area-inset-top, 0px);
--safe-bottom: env(safe-area-inset-bottom, 0px);
--safe-left: env(safe-area-inset-left, 0px);
--safe-right: env(safe-area-inset-right, 0px);
color-scheme: dark;
}
@ -55,23 +77,24 @@ html,
body {
margin: 0;
height: 100%;
/* The page itself never scrolls the chat stream scrolls internally. This
keeps the composer pinned and stops iOS rubber-banding the whole UI. */
/* The page itself never scrolls only the chat stream scrolls internally.
This keeps the composer pinned and stops iOS rubber-banding the whole UI. */
overflow: hidden;
overscroll-behavior: none;
}
body {
background-color: var(--bg-0);
/* Atmosphere: a soft cyan corner-glow over a faint scanline weave, so the
surface reads like backlit equipment rather than flat #000. */
/* Atmosphere: a soft cyan corner-glow + a faint warm counter-glow over a
hairline scanline weave, so the surface reads as backlit equipment rather
than flat black. Fixed so it doesn't drift when the chat scrolls. */
background-image:
radial-gradient(120% 80% at 85% -10%, rgba(61, 209, 214, 0.07), transparent 55%),
radial-gradient(90% 70% at 10% 110%, rgba(245, 182, 87, 0.04), transparent 50%),
radial-gradient(120% 78% at 86% -12%, rgba(61, 209, 214, 0.08), transparent 55%),
radial-gradient(90% 70% at 8% 112%, rgba(245, 182, 87, 0.045), transparent 52%),
repeating-linear-gradient(
0deg,
rgba(255, 255, 255, 0.012) 0px,
rgba(255, 255, 255, 0.012) 1px,
rgba(255, 255, 255, 0.013) 0px,
rgba(255, 255, 255, 0.013) 1px,
transparent 1px,
transparent 3px
);
@ -84,8 +107,8 @@ body {
#app {
/* 100dvh (dynamic viewport height) NOT 100vh/100% so the composer at the
bottom is never hidden behind a mobile browser's address/tool bar. Mobile is
the primary client for this tool. 100vh is the fallback for old engines. */
bottom is never hidden behind a mobile browser's address/tool bar. 100vh is
the fallback for engines without dvh. Mobile is the primary client. */
height: 100vh;
height: 100dvh;
}
@ -94,7 +117,6 @@ button {
font-family: var(--mono);
cursor: pointer;
}
button:disabled {
cursor: not-allowed;
}
@ -119,10 +141,26 @@ button:disabled {
background-clip: content-box;
}
*::-webkit-scrollbar-thumb:hover {
background: #3a4a5a;
background: var(--line-bright);
background-clip: content-box;
}
/* Shared motion primitives
One well-orchestrated entrance beats scattered micro-interactions: panels
and rows rise a few px with a soft fade, staggered via --d on each element. */
@keyframes rise-in {
from { opacity: 0; transform: translateY(8px); }
to { opacity: 1; transform: translateY(0); }
}
@keyframes fade-in {
from { opacity: 0; }
to { opacity: 1; }
}
.rise-in {
animation: rise-in 0.5s cubic-bezier(0.22, 0.61, 0.36, 1) both;
animation-delay: var(--d, 0ms);
}
@media (prefers-reduced-motion: reduce) {
*,
*::before,

View file

@ -1,8 +1,41 @@
// Same-origin API client. Auth is handled entirely by the edge proxy
// (Authentik / basic-auth / bearer) — this UI never sends or stores a token.
import { readEventStream } from './sse.js';
// Same-origin API client for the breakglass UI.
//
// Auth is handled entirely by the edge proxy (Authentik / basic-auth / bearer):
// this UI never sends or stores a token, and builds no login screen.
//
// The chat uses the tmux/attach model. The conversation lives SERVER-SIDE; we
// only persist the session_id locally and ATTACH to it over an EventSource. The
// browser's native EventSource auto-reconnects and sends Last-Event-ID, and the
// server resumes from there — so there is ZERO reconnect logic here. We just
// render events idempotently by id (see transcript.js).
/** Open a fresh chat session. @returns {Promise<string>} session_id */
const SESSION_KEY = 'breakglass.session_id';
/** Read the persisted session id, or '' if none. */
export function loadSessionId() {
try {
return localStorage.getItem(SESSION_KEY) || '';
} catch {
return '';
}
}
/** Persist the session id (best-effort; private-mode storage may throw). */
export function saveSessionId(id) {
try {
if (id) localStorage.setItem(SESSION_KEY, id);
else localStorage.removeItem(SESSION_KEY);
} catch {
/* ignore — storage is a convenience, not a requirement */
}
}
/** Forget the persisted session id (the "New session" archive step). */
export function clearSessionId() {
saveSessionId('');
}
/** Open a fresh server-side session. @returns {Promise<string>} session_id */
export async function openSession() {
const res = await fetch('/api/session', {
method: 'POST',
@ -19,30 +52,89 @@ export async function openSession() {
}
/**
* Run one chat turn. Streams events to onEvent until the backend sends
* {kind:"done"} and the connection closes. Pass an AbortSignal to cancel.
* Attach to a session's event stream. Returns the live EventSource so the
* caller can close() it. Events arrive as:
* - default `message` events: .data is JSON {kind, id, ...}
* - a named `caught-up` event once the replay is drained (.data is {})
* - native `error` events while reconnecting (EventSource retries itself)
*
* @param {{session_id: string, prompt: string, model?: string, signal?: AbortSignal}} opts
* @param {(event: object) => void} onEvent
* @param {string} sessionId
* @param {{
* onEvent: (e: object) => void,
* onCaughtUp?: () => void,
* onOpen?: () => void,
* onError?: (e: Event) => void,
* }} handlers
* @returns {EventSource}
*/
export async function streamChat({ session_id, prompt, model, signal }, onEvent) {
const payload = { session_id, prompt };
if (model) payload.model = model;
export function attachStream(sessionId, { onEvent, onCaughtUp, onOpen, onError }) {
const es = new EventSource(`/api/session/${encodeURIComponent(sessionId)}/stream`);
const res = await fetch('/api/chat', {
method: 'POST',
headers: {
'content-type': 'application/json',
accept: 'text/event-stream',
},
body: JSON.stringify(payload),
signal,
});
await readEventStream(res, onEvent);
es.onopen = () => onOpen?.();
es.onmessage = (e) => {
if (!e || typeof e.data !== 'string' || e.data === '') return;
let obj;
try {
obj = JSON.parse(e.data);
} catch {
// A malformed frame must not abort an in-progress recovery stream.
return;
}
// EventSource exposes the SSE `id:` line as e.lastEventId. The server also
// embeds id in the JSON; prefer the JSON id, fall back to lastEventId.
if ((obj.id == null || obj.id === '') && e.lastEventId) obj.id = e.lastEventId;
onEvent(obj);
};
es.addEventListener('caught-up', () => onCaughtUp?.());
es.onerror = (e) => {
// EventSource auto-reconnects on a transient drop (readyState CONNECTING);
// we only surface a hard, terminal failure (readyState CLOSED).
onError?.(e);
};
return es;
}
/**
* List the PVE power verbs and which of them mutate VM state.
* Start a turn. Output arrives via the attach stream, NOT this response.
* @param {{session_id: string, prompt: string, model?: string}} opts
* @returns {Promise<{status:'started'|'busy'|'gone'}>}
* started accepted; busy 409 (a turn already runs); gone 404 (re-create).
*/
export async function sendPrompt({ session_id, prompt, model }) {
const payload = { prompt };
if (model) payload.model = model;
const res = await fetch(`/api/session/${encodeURIComponent(session_id)}/prompt`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
body: JSON.stringify(payload),
});
if (res.status === 409) return { status: 'busy' };
if (res.status === 404) return { status: 'gone' };
if (!res.ok) throw new Error(`could not start the turn (HTTP ${res.status})`);
return { status: 'started' };
}
/**
* Cancel the in-flight turn (the Stop button).
* @param {string} sessionId
* @returns {Promise<boolean>} whether a turn was cancelled
*/
export async function cancelTurn(sessionId) {
const res = await fetch(`/api/session/${encodeURIComponent(sessionId)}/cancel`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
});
if (!res.ok) throw new Error(`could not stop the turn (HTTP ${res.status})`);
const body = await res.json().catch(() => ({}));
return Boolean(body.cancelled);
}
/**
* List the PVE power verbs and which mutate VM state.
* @returns {Promise<{verbs: string[], mutating: string[]}>}
*/
export async function fetchVerbs() {
@ -58,27 +150,26 @@ export async function fetchVerbs() {
}
/**
* Run a PVE power verb directly (no AI in the path). The backend returns 200
* on success and 502 when the verb's exit code is non-zero, but the JSON body
* carries {verb, exit_code, stdout, stderr, rejected} in BOTH cases so we
* read the body regardless of HTTP status and let the caller style on
* exit_code / rejected.
* Run a PVE power verb directly (no AI in the path). The backend returns 200 on
* success and 502 when the verb's exit code is non-zero, but the JSON body
* carries {verb, exit_code, stdout, stderr, rejected} in BOTH cases so we read
* the body regardless of HTTP status and let the caller style on exit_code.
*
* @param {string} verb
* @returns {Promise<{verb: string, exit_code: number|null, stdout: string, stderr: string, rejected: boolean}>}
* @returns {Promise<{verb:string, exit_code:number|null, stdout:string, stderr:string, rejected:boolean}>}
*/
export async function runVerb(verb) {
const res = await fetch(`/api/pve/${encodeURIComponent(verb)}`, {
method: 'POST',
headers: { 'content-type': 'application/json' },
});
// 400 = unknown verb (FastAPI HTTPException) — has {detail}, not the verb shape.
let body;
try {
body = await res.json();
} catch {
throw new Error(`VM control '${verb}' failed (HTTP ${res.status}, no body)`);
}
// 400 = unknown verb (FastAPI HTTPException) — has {detail}, not the verb shape.
if (res.status === 400) {
throw new Error(body?.detail || `'${verb}' was rejected by the server`);
}

View file

@ -1,150 +0,0 @@
// SSE frame parsing — the load-bearing core of the breakglass UI.
//
// The /api/chat endpoint returns a text/event-stream that we read with
// fetch() + response.body.getReader() (NOT EventSource, which cannot POST).
// The backend emits one frame per event as:
//
// data: {json}\n\n
//
// getReader() hands us bytes at arbitrary boundaries: a single frame can be
// split across reads, and one read can contain several frames. So we keep a
// rolling text buffer, split it on the blank-line frame delimiter, and only
// hand back the JSON payload of *complete* frames. Per the SSE spec a frame may
// carry multiple `data:` lines (joined with "\n"); the backend emits single
// line JSON today, but we handle the general case so a future multi-line
// payload can't silently corrupt the stream.
/**
* Parse a single SSE event block (the text between blank lines) into its data
* payload string, or null if the block carries no `data:` field (e.g. a bare
* comment or a `:` heartbeat).
* @param {string} block
* @returns {string|null}
*/
export function dataFromEventBlock(block) {
const dataLines = [];
for (const rawLine of block.split('\n')) {
const line = rawLine.replace(/\r$/, '');
if (line.startsWith(':')) continue; // SSE comment / heartbeat
if (line === 'data:' || line === 'data') {
dataLines.push('');
} else if (line.startsWith('data:')) {
// Spec: a single leading space after the colon is stripped.
let v = line.slice('data:'.length);
if (v.startsWith(' ')) v = v.slice(1);
dataLines.push(v);
}
// field lines we don't care about (event:, id:, retry:) are ignored
}
if (dataLines.length === 0) return null;
return dataLines.join('\n');
}
/**
* A stateful splitter that turns an arbitrary sequence of decoded text chunks
* into a sequence of complete SSE event-block strings. Frames are delimited by
* a blank line; we tolerate both "\n\n" and "\r\n\r\n".
*/
export class SSEFrameSplitter {
constructor() {
this.buffer = '';
}
/**
* Feed a decoded text chunk; returns the event blocks that are now complete.
* Any trailing partial frame stays buffered for the next chunk.
* @param {string} chunk
* @returns {string[]} complete event blocks (text between delimiters)
*/
push(chunk) {
this.buffer += chunk;
const blocks = [];
// Normalise CRLF delimiters to LF so a single split rule covers both.
let idx;
// Process every complete frame currently in the buffer.
while ((idx = this._nextDelimiter()) !== -1) {
const block = this.buffer.slice(0, idx.start);
this.buffer = this.buffer.slice(idx.end);
if (block.length > 0) blocks.push(block);
}
return blocks;
}
/**
* On stream end, return whatever complete-looking content remains. A
* well-behaved backend always terminates the last frame with a blank line,
* so this is usually empty but if the connection closed mid-trailing-frame
* with a parseable block, surface it rather than dropping data.
* @returns {string[]}
*/
flush() {
const rest = this.buffer.trim();
this.buffer = '';
return rest ? [rest] : [];
}
_nextDelimiter() {
// Find the earliest of "\n\n", "\r\n\r\n", "\r\r".
const candidates = [
{ token: '\r\n\r\n', i: this.buffer.indexOf('\r\n\r\n') },
{ token: '\n\n', i: this.buffer.indexOf('\n\n') },
{ token: '\r\r', i: this.buffer.indexOf('\r\r') },
].filter((c) => c.i !== -1);
if (candidates.length === 0) return -1;
candidates.sort((a, b) => a.i - b.i);
const { token, i } = candidates[0];
return { start: i, end: i + token.length };
}
}
/**
* Read an SSE Response body to completion, invoking onEvent for every parsed
* JSON event object. Resolves when the stream ends. Throws if the response is
* not ok or has no readable body (caller shows the error inline).
*
* @param {Response} response a fetch() Response with a streaming body
* @param {(event: object) => void} onEvent called per parsed JSON event
*/
export async function readEventStream(response, onEvent) {
if (!response.ok) {
throw new Error(`server returned ${response.status} ${response.statusText}`);
}
if (!response.body) {
throw new Error('response has no readable body (streaming unsupported)');
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
const splitter = new SSEFrameSplitter();
const handleBlock = (block) => {
const payload = dataFromEventBlock(block);
if (payload == null || payload.trim() === '') return;
let obj;
try {
obj = JSON.parse(payload);
} catch {
// A malformed frame must not abort an in-progress recovery stream;
// skip it and keep reading.
return;
}
onEvent(obj);
};
try {
for (;;) {
const { value, done } = await reader.read();
if (done) break;
const text = decoder.decode(value, { stream: true });
for (const block of splitter.push(text)) handleBlock(block);
}
} finally {
reader.releaseLock?.();
}
// Drain any trailing bytes the decoder held, then any final frame.
const tail = decoder.decode();
if (tail) {
for (const block of splitter.push(tail)) handleBlock(block);
}
for (const block of splitter.flush()) handleBlock(block);
}

View file

@ -1,152 +0,0 @@
// Standalone test of the SSE frame parser — no test framework, just node.
// Run: node src/lib/sse.test.mjs (exits non-zero on any failure)
//
// These pin the protocol described in the API contract: frames are
// `data: {json}\n\n`, the event `kind` is one of session/text/tool/result/
// error/done, and bytes arrive at arbitrary boundaries via getReader().
import { SSEFrameSplitter, dataFromEventBlock, readEventStream } from './sse.js';
let failures = 0;
function ok(name, cond) {
if (cond) {
console.log(` ok ${name}`);
} else {
failures++;
console.error(`FAIL ${name}`);
}
}
function eq(name, got, want) {
const g = JSON.stringify(got);
const w = JSON.stringify(want);
ok(`${name} (got ${g})`, g === w);
}
// --- dataFromEventBlock ---------------------------------------------------
eq(
'extracts JSON payload from a data: line',
dataFromEventBlock('data: {"kind":"text","text":"hi"}'),
'{"kind":"text","text":"hi"}'
);
eq(
'strips exactly one space after the colon',
dataFromEventBlock('data: leading-space-kept'),
' leading-space-kept'
);
eq('ignores comment/heartbeat lines', dataFromEventBlock(': keep-alive'), null);
eq(
'joins multi-line data fields with newline',
dataFromEventBlock('data: line1\ndata: line2'),
'line1\nline2'
);
// --- SSEFrameSplitter: whole frames --------------------------------------
{
const s = new SSEFrameSplitter();
const blocks = s.push('data: {"kind":"session","session_id":"abc"}\n\n');
eq('one complete frame yields one block', blocks, [
'data: {"kind":"session","session_id":"abc"}',
]);
}
// --- SSEFrameSplitter: multiple frames in one chunk ----------------------
{
const s = new SSEFrameSplitter();
const blocks = s.push(
'data: {"kind":"text","text":"a"}\n\ndata: {"kind":"text","text":"b"}\n\n'
);
eq('two frames in one chunk yield two blocks', blocks.length, 2);
eq('first block', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"a"}');
eq('second block', dataFromEventBlock(blocks[1]), '{"kind":"text","text":"b"}');
}
// --- SSEFrameSplitter: frame split across chunks -------------------------
{
const s = new SSEFrameSplitter();
let blocks = s.push('data: {"kind":"te');
eq('partial frame yields nothing yet', blocks, []);
blocks = s.push('xt","text":"split"}\n\n');
eq('completing the frame yields it whole', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"split"}');
}
// --- SSEFrameSplitter: delimiter split across chunks ---------------------
{
const s = new SSEFrameSplitter();
let blocks = s.push('data: {"kind":"done"}\n');
eq('frame held while delimiter incomplete', blocks, []);
blocks = s.push('\n');
eq('frame released once blank line completes', dataFromEventBlock(blocks[0]), '{"kind":"done"}');
}
// --- SSEFrameSplitter: CRLF delimiters -----------------------------------
{
const s = new SSEFrameSplitter();
const blocks = s.push('data: {"kind":"text","text":"crlf"}\r\n\r\n');
eq('CRLF-delimited frame parses', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"crlf"}');
}
// --- end-to-end via readEventStream over a mock streaming Response --------
function mockResponse(chunks) {
const enc = new TextEncoder();
let i = 0;
return {
ok: true,
status: 200,
body: {
getReader() {
return {
read() {
if (i < chunks.length) {
return Promise.resolve({ value: enc.encode(chunks[i++]), done: false });
}
return Promise.resolve({ value: undefined, done: true });
},
releaseLock() {},
};
},
},
};
}
await (async () => {
// A realistic turn, deliberately chopped at ugly boundaries:
// - the session frame split mid-JSON
// - two text frames glued together
// - a tool frame
// - a result frame and the terminal done frame in one chunk
const chunks = [
'data: {"kind":"sess',
'ion","session_id":"S1"}\n\n',
'data: {"kind":"text","text":"checking "}\n\ndata: {"kind":"text","text":"disk"}\n\n',
'data: {"kind":"tool","name":"Bash","input":{"command":"df -h"}}\n\n',
'data: {"kind":"result","is_error":false,"result":"ok","duration_ms":12}\n\ndata: {"kind":"done"}\n\n',
];
const events = [];
await readEventStream(mockResponse(chunks), (e) => events.push(e));
eq('event count', events.length, 6);
eq('1: session id', events[0], { kind: 'session', session_id: 'S1' });
eq('2: first text', events[1], { kind: 'text', text: 'checking ' });
eq('3: second text', events[2], { kind: 'text', text: 'disk' });
eq('4: tool kind+name', { kind: events[3].kind, name: events[3].name }, { kind: 'tool', name: 'Bash' });
eq('4: tool command', events[3].input.command, 'df -h');
eq('5: result', events[4], { kind: 'result', is_error: false, result: 'ok', duration_ms: 12 });
eq('6: done terminal', events[5], { kind: 'done' });
})();
// malformed frame in the middle must be skipped, not abort the stream
await (async () => {
const chunks = [
'data: {"kind":"text","text":"before"}\n\n',
'data: {this is not json}\n\n',
'data: {"kind":"done"}\n\n',
];
const events = [];
await readEventStream(mockResponse(chunks), (e) => events.push(e));
eq('malformed frame skipped, stream continues', events.map((e) => e.kind), ['text', 'done']);
})();
if (failures) {
console.error(`\n${failures} assertion(s) FAILED`);
process.exit(1);
}
console.log('\nall SSE parser assertions passed');

View file

@ -0,0 +1,196 @@
// transcript.js — the load-bearing core of the breakglass UI.
//
// The attach stream (EventSource) replays the conversation-so-far and then
// tails live. Replayed events are byte-identical to live ones, and on a
// reconnect the server re-replays from Last-Event-ID — so the SAME event id can
// arrive more than once. This module folds a flat, possibly-duplicated event
// sequence into an ordered list of render-ready messages, idempotently.
//
// Contract (every default `message` event's .data is one of these JSON shapes):
// {kind:"user", text, id} → opens a USER bubble
// {kind:"session", session_id, id} → informational (agent's session id)
// {kind:"text", text, id} → assistant prose; concatenated
// {kind:"tool", name, input, id} → inline tool chip (Bash → command)
// {kind:"result", is_error, result, duration_ms, id} → closes the bubble
// {kind:"error", error, id} → error note on the bubble
// {kind:"cancelled", id} → muted "stopped" note
// {kind:"turn_end", id} → the turn finished
//
// Grouping: a `user` event opens a user message; the session/text/tool events
// that follow build ONE assistant message; result/error/cancelled annotate it;
// turn_end ends it. Assistant events with no preceding user (e.g. a session
// banner on a fresh attach) still get an assistant message so nothing is lost.
//
// Idempotency: every event carries a monotonic integer-ish id. We track the
// max id folded so far and DROP any event whose id we've already passed — a
// reconnect replay therefore never double-renders. Ids are compared
// numerically when both parse as numbers, else as strings (defensive).
/** @typedef {{type:'text',text:string}|{type:'tool',name:string,command:string,raw:any}} Part */
/**
* @typedef {Object} Message
* @property {'user'|'assistant'} role
* @property {string} key stable key for keyed {#each}
* @property {string} [text] user text
* @property {Part[]} [parts] assistant parts, in emit order
* @property {{is_error:boolean,text:string,duration_ms:number|null}} [result]
* @property {string} [error]
* @property {boolean} [cancelled]
* @property {boolean} [ended] turn_end seen for this message
*/
/** Compare two ids; numeric when both look numeric, else lexicographic. */
export function idGreater(a, b) {
const na = Number(a);
const nb = Number(b);
if (Number.isFinite(na) && Number.isFinite(nb) && `${a}`.trim() !== '' && `${b}`.trim() !== '') {
return na > nb;
}
return String(a) > String(b);
}
/**
* Create an empty transcript-folding state.
* @returns {{messages: Message[], maxId: any, sawId: boolean, openAssistant: Message|null, activeUserSeen: boolean}}
*/
export function createTranscript() {
return {
messages: [],
maxId: null,
sawId: false,
openAssistant: null,
// a turn is "active" once a user event (or local prompt) has no following
// turn_end; the UI reads `active` from reduceEvent's return.
activeUserSeen: false,
};
}
function bubbleKey(prefix, id, fallbackIndex) {
if (id != null && `${id}`.trim() !== '') return `${prefix}:${id}`;
return `${prefix}:idx:${fallbackIndex}`;
}
/**
* Should this event be applied, given the max id folded so far? Updates and
* returns the new max. Events WITHOUT an id are always applied (and don't move
* the watermark) the protocol always carries ids, but we never drop data on a
* malformed frame.
* @returns {{apply:boolean, maxId:any}}
*/
export function admit(maxId, id) {
if (id == null || `${id}`.trim() === '') return { apply: true, maxId };
if (maxId == null) return { apply: true, maxId: id };
if (idGreater(id, maxId)) return { apply: true, maxId: id };
return { apply: false, maxId }; // already seen — dedupe
}
/**
* Fold one event into the transcript state, mutating `state` in place.
* Returns true if the state changed (so callers can trigger a re-render).
*
* @param {ReturnType<typeof createTranscript>} state
* @param {any} ev parsed event object ({kind, id, ...})
* @returns {boolean} changed
*/
export function reduceEvent(state, ev) {
if (!ev || typeof ev !== 'object') return false;
const { apply, maxId } = admit(state.maxId, ev.id);
state.maxId = maxId;
if (!apply) return false;
if (ev.id != null && `${ev.id}`.trim() !== '') state.sawId = true;
const ensureAssistant = () => {
if (!state.openAssistant) {
const msg = {
role: 'assistant',
key: bubbleKey('a', ev.id, state.messages.length),
parts: [],
ended: false,
};
state.messages.push(msg);
state.openAssistant = msg;
}
return state.openAssistant;
};
switch (ev.kind) {
case 'user': {
// A new user turn. Close any dangling assistant bubble first.
state.openAssistant = null;
state.messages.push({
role: 'user',
key: bubbleKey('u', ev.id, state.messages.length),
text: typeof ev.text === 'string' ? ev.text : '',
});
state.activeUserSeen = true;
return true;
}
case 'session': {
// Informational — does not itself render a part, but it does open the
// assistant bubble for the turn so subsequent text lands in one place.
ensureAssistant();
return true;
}
case 'text': {
if (typeof ev.text !== 'string' || ev.text === '') return false;
const msg = ensureAssistant();
const tail = msg.parts[msg.parts.length - 1];
if (tail && tail.type === 'text') {
tail.text += ev.text; // concatenate consecutive prose
} else {
msg.parts.push({ type: 'text', text: ev.text });
}
return true;
}
case 'tool': {
const msg = ensureAssistant();
const command =
ev.input && typeof ev.input.command === 'string' ? ev.input.command : '';
msg.parts.push({
type: 'tool',
name: typeof ev.name === 'string' && ev.name ? ev.name : 'tool',
command,
raw: ev.input ?? null,
});
return true;
}
case 'result': {
const msg = ensureAssistant();
msg.result = {
is_error: Boolean(ev.is_error),
text: typeof ev.result === 'string' ? ev.result : '',
duration_ms: typeof ev.duration_ms === 'number' ? ev.duration_ms : null,
};
return true;
}
case 'error': {
const msg = ensureAssistant();
msg.error = typeof ev.error === 'string' && ev.error ? ev.error : 'unknown error';
return true;
}
case 'cancelled': {
const msg = ensureAssistant();
msg.cancelled = true;
return true;
}
case 'turn_end': {
if (state.openAssistant) state.openAssistant.ended = true;
state.openAssistant = null;
state.activeUserSeen = false;
return true;
}
default:
return false;
}
}
/**
* Convenience: fold an array of events into a fresh transcript (used by tests
* and by a from-scratch render). Returns the final state.
* @param {any[]} events
*/
export function foldAll(events) {
const state = createTranscript();
for (const ev of events) reduceEvent(state, ev);
return state;
}

View file

@ -0,0 +1,162 @@
// Standalone test of the transcript folder — no test framework, just node.
// Run: node src/lib/transcript.test.mjs (exits non-zero on any failure)
//
// These pin the attach-model contract: events carry monotonic ids, a reconnect
// re-replays already-seen ids (which MUST be deduped), and events group into
// user/assistant messages with consecutive prose concatenated.
import {
admit,
idGreater,
reduceEvent,
createTranscript,
foldAll,
} from './transcript.js';
let failures = 0;
function ok(name, cond) {
if (cond) {
console.log(` ok ${name}`);
} else {
failures++;
console.error(`FAIL ${name}`);
}
}
function eq(name, got, want) {
const g = JSON.stringify(got);
const w = JSON.stringify(want);
ok(`${name} (got ${g})`, g === w);
}
// --- id comparison --------------------------------------------------------
ok('idGreater numeric', idGreater(10, 9) === true);
ok('idGreater numeric not', idGreater(2, 10) === false); // not string "2" > "10"
ok('idGreater string fallback', idGreater('b', 'a') === true);
// --- admit / dedupe watermark --------------------------------------------
{
let { apply, maxId } = admit(null, 1);
eq('first id admitted', { apply, maxId }, { apply: true, maxId: 1 });
({ apply, maxId } = admit(5, 5));
ok('equal id rejected (already seen)', apply === false && maxId === 5);
({ apply, maxId } = admit(5, 3));
ok('lower id rejected', apply === false && maxId === 5);
({ apply, maxId } = admit(5, 6));
ok('higher id admitted, watermark moves', apply === true && maxId === 6);
({ apply, maxId } = admit(5, undefined));
ok('id-less event always admitted, watermark held', apply === true && maxId === 5);
}
// --- a full turn groups into user + one assistant bubble ------------------
{
const events = [
{ kind: 'user', text: 'triage it', id: 1 },
{ kind: 'session', session_id: 'S1', id: 2 },
{ kind: 'text', text: 'Checking ', id: 3 },
{ kind: 'text', text: 'disk usage.', id: 4 },
{ kind: 'tool', name: 'Bash', input: { command: 'df -h' }, id: 5 },
{ kind: 'result', is_error: false, result: 'ok', duration_ms: 1200, id: 6 },
{ kind: 'turn_end', id: 7 },
];
const s = foldAll(events);
eq('two messages: user + assistant', s.messages.length, 2);
eq('first is user with text', { r: s.messages[0].role, t: s.messages[0].text }, { r: 'user', t: 'triage it' });
const a = s.messages[1];
eq('assistant role', a.role, 'assistant');
// consecutive text concatenated into ONE part; tool is a separate part
eq('parts: one concatenated text + one tool', a.parts.map((p) => p.type), ['text', 'tool']);
eq('prose concatenated in order', a.parts[0].text, 'Checking disk usage.');
eq('tool command captured', a.parts[1].command, 'df -h');
eq('result attached', { e: a.result.is_error, ms: a.result.duration_ms }, { e: false, ms: 1200 });
ok('turn ended', a.ended === true);
ok('no longer active after turn_end', s.activeUserSeen === false);
}
// --- reconnect replay: re-feeding the SAME events must NOT double-render --
{
const events = [
{ kind: 'user', text: 'hi', id: 1 },
{ kind: 'text', text: 'hello', id: 2 },
{ kind: 'turn_end', id: 3 },
];
const s = createTranscript();
for (const e of events) reduceEvent(s, e);
// simulate an EventSource reconnect that re-replays everything from the top
for (const e of events) reduceEvent(s, e);
eq('still exactly two messages after replay', s.messages.length, 2);
eq('assistant prose not doubled', s.messages[1].parts[0].text, 'hello');
}
// --- a partial replay (Last-Event-ID resume) continues the same bubble ----
{
const s = createTranscript();
reduceEvent(s, { kind: 'user', text: 'go', id: 1 });
reduceEvent(s, { kind: 'text', text: 'part-A ', id: 2 });
// reconnect: server resumes after id 2; we must drop id<=2 if re-sent and
// keep appending to the open assistant bubble.
reduceEvent(s, { kind: 'text', text: 'part-A ', id: 2 }); // dup, dropped
reduceEvent(s, { kind: 'text', text: 'part-B', id: 3 }); // new, appended
reduceEvent(s, { kind: 'turn_end', id: 4 });
eq('resume appended to same bubble', s.messages[1].parts[0].text, 'part-A part-B');
eq('still two messages', s.messages.length, 2);
}
// --- error / cancelled annotate the open bubble ---------------------------
{
const s = foldAll([
{ kind: 'user', text: 'x', id: 1 },
{ kind: 'text', text: 'working', id: 2 },
{ kind: 'error', error: 'ssh timeout', id: 3 },
{ kind: 'turn_end', id: 4 },
]);
eq('error note on assistant bubble', s.messages[1].error, 'ssh timeout');
}
{
const s = foldAll([
{ kind: 'user', text: 'x', id: 1 },
{ kind: 'cancelled', id: 2 },
{ kind: 'turn_end', id: 3 },
]);
ok('cancelled flag on assistant bubble', s.messages[1].cancelled === true);
}
// --- active state: a user event with no turn_end means a turn is running ---
{
const s = createTranscript();
reduceEvent(s, { kind: 'user', text: 'go', id: 1 });
reduceEvent(s, { kind: 'text', text: '...', id: 2 });
ok('active while no turn_end', s.activeUserSeen === true);
reduceEvent(s, { kind: 'turn_end', id: 3 });
ok('inactive after turn_end', s.activeUserSeen === false);
}
// --- assistant-only stream (session banner on a fresh attach) still renders -
{
const s = foldAll([
{ kind: 'session', session_id: 'S1', id: 1 },
{ kind: 'text', text: 'standing by', id: 2 },
{ kind: 'turn_end', id: 3 },
]);
eq('lone assistant message created', s.messages.length, 1);
eq('assistant prose present', s.messages[0].parts[0].text, 'standing by');
}
// --- two sequential turns produce two assistant bubbles -------------------
{
const s = foldAll([
{ kind: 'user', text: 'q1', id: 1 },
{ kind: 'text', text: 'a1', id: 2 },
{ kind: 'turn_end', id: 3 },
{ kind: 'user', text: 'q2', id: 4 },
{ kind: 'text', text: 'a2', id: 5 },
{ kind: 'turn_end', id: 6 },
]);
eq('four messages (u,a,u,a)', s.messages.map((m) => m.role), ['user', 'assistant', 'user', 'assistant']);
eq('second answer in its own bubble', s.messages[3].parts[0].text, 'a2');
ok('message keys are unique', new Set(s.messages.map((m) => m.key)).size === 4);
}
if (failures) {
console.error(`\n${failures} assertion(s) FAILED`);
process.exit(1);
}
console.log('\nall transcript assertions passed');

View file

@ -43,3 +43,186 @@ def drain():
break
await asyncio.sleep(0.01)
return _drain
# --------------------------------------------------------------------------- #
# AFK loop fixtures.
#
# Shared factories + in-memory fakes for the app.afk modules. EVERYTHING the AFK
# tests touch is faked here — no test ever reaches a real T3 server, GitHub /
# Forgejo, or the cluster. The fakes implement the module interfaces from the
# contract and record their calls so tests can assert on them.
# --------------------------------------------------------------------------- #
from app.afk.types import ( # noqa: E402 (after the env setup above, like app_main)
CIStatus,
Config,
Issue,
RunState,
ThreadStatus,
)
@pytest.fixture
def make_issue():
"""Factory for ``Issue``. Defaults to a clean, dispatchable issue (trusted
label, nothing blocking); override any field per test."""
def _make(
number: int = 1,
repo: str = "infra",
labels: list[str] | None = None,
blocked_by: list[int] | None = None,
labeled_by_trusted: bool = True,
priority: int = 0,
) -> Issue:
return Issue(
number=number,
repo=repo,
labels=["ready-for-agent"] if labels is None else labels,
blocked_by=[] if blocked_by is None else blocked_by,
labeled_by_trusted=labeled_by_trusted,
priority=priority,
)
return _make
@pytest.fixture
def make_config():
"""Factory for ``Config``. Defaults to an ENABLED config (kill switch off,
a one-repo allowlist) so policy/state-machine tests exercise real behaviour;
the disabled production default is covered separately in the config tests."""
def _make(
allowlist: list[str] | None = None,
kill_switch: bool = False,
**overrides,
) -> Config:
return Config(
allowlist=["infra"] if allowlist is None else allowlist,
kill_switch=kill_switch,
**overrides,
)
return _make
@pytest.fixture
def make_run_state():
"""Factory for ``RunState``. Defaults to a freshly-dispatched run (thread
running, nothing pushed, no CI, no fix-forward attempts yet)."""
def _make(
thread_status: ThreadStatus | None = ThreadStatus.RUNNING,
ci_status: CIStatus | None = None,
pushed: bool = False,
fix_forward_attempts: int = 0,
elapsed_seconds: float = 0.0,
) -> RunState:
return RunState(
thread_status=thread_status,
ci_status=ci_status,
pushed=pushed,
fix_forward_attempts=fix_forward_attempts,
elapsed_seconds=elapsed_seconds,
)
return _make
class FakeT3Client:
"""In-memory stand-in for ``t3_client.T3Client``. Records each dispatch and
hands back a deterministic thread id; ``snapshot`` returns whatever was
staged via ``set_snapshot``."""
def __init__(self) -> None:
self.dispatched: list[dict] = []
self._snapshot: dict = {"threads": []}
self._next_id = 0
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
thread_id = f"thread-{self._next_id}"
self._next_id += 1
self.dispatched.append(
{"repo": repo, "issue": issue, "prompt": prompt, "thread_id": thread_id}
)
return thread_id
def snapshot(self) -> dict:
return self._snapshot
def set_snapshot(self, snapshot: dict) -> None:
self._snapshot = snapshot
class FakeTracker:
"""In-memory stand-in for ``tracker.Tracker``. ``list_ready`` returns issues
staged via ``seed``; label/comment/close just record their calls."""
def __init__(self) -> None:
self._ready: dict[str, list[Issue]] = {}
self.label_ops: list[tuple[str, str, int, str]] = [] # (op, repo, issue, label)
self.comments: list[tuple[str, int, str]] = []
self.closed: list[tuple[str, int]] = []
def seed(self, repo: str, issues: list[Issue]) -> None:
self._ready[repo] = issues
def list_ready(self, repos: list[str]) -> list[Issue]:
out: list[Issue] = []
for repo in repos:
out.extend(self._ready.get(repo, []))
return out
def add_label(self, repo: str, issue: int, label: str) -> None:
self.label_ops.append(("add", repo, issue, label))
def remove_label(self, repo: str, issue: int, label: str) -> None:
self.label_ops.append(("remove", repo, issue, label))
def comment(self, repo: str, issue: int, body: str) -> None:
self.comments.append((repo, issue, body))
def close(self, repo: str, issue: int) -> None:
self.closed.append((repo, issue))
class FakeCIWatcher:
"""In-memory stand-in for ``ci_watcher.CIWatcher``. Returns the status staged
per ``(repo, commit)`` via ``set_status``; unknown commits read PENDING."""
def __init__(self) -> None:
self._statuses: dict[tuple[str, str], CIStatus] = {}
def set_status(self, repo: str, commit: str, status: CIStatus) -> None:
self._statuses[(repo, commit)] = status
def status(self, repo: str, commit: str) -> CIStatus:
return self._statuses.get((repo, commit), CIStatus.PENDING)
class FakeNotifier:
"""In-memory stand-in for ``notifier.Notifier``. Records every notification
so tests can assert escalations fired with the right kind/detail."""
def __init__(self) -> None:
self.sent: list[dict] = []
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
self.sent.append(
{"kind": kind, "issue": issue, "thread_id": thread_id, "detail": detail}
)
@pytest.fixture
def fake_t3() -> FakeT3Client:
return FakeT3Client()
@pytest.fixture
def fake_tracker() -> FakeTracker:
return FakeTracker()
@pytest.fixture
def fake_ci() -> FakeCIWatcher:
return FakeCIWatcher()
@pytest.fixture
def fake_notifier() -> FakeNotifier:
return FakeNotifier()

View file

@ -0,0 +1,285 @@
"""Tests for ``app.afk.ci_watcher`` — the commit → ``CIStatus`` adapter.
The watcher folds two independent signals into one verdict the state machine
reads: the **GHA run** for a pushed commit (build/test/lint) and the
**deploy/rollout** that reaches the cluster (Woodpecker pipeline Keel/k8s
rollout). The CI/CD chain is GHA ghcr Woodpecker Keel
(``docs/2026-06-14-afk-implementation-pipeline-design.md``), so a commit is only
truly GREEN once *both* the build passed AND its image actually rolled out.
Every test injects FAKE clients no test ever shells out to ``gh``,
``woodpecker``, or ``kubectl``, or reaches the network. The fakes implement the
``ci_watcher`` client Protocols and return staged ``StageResult`` values per
``(repo, commit)``; the watcher's only job is to query them and fold the result,
so the folding table is what these tests pin.
"""
import pytest
from app.afk.ci_watcher import (
CIWatcher,
StageResult,
)
from app.afk.types import CIStatus
# --------------------------------------------------------------------------- #
# Fakes for the three injected clients.
#
# Each maps (repo, commit) → StageResult and records every query, so tests can
# assert both the folded verdict AND that short-circuiting skips later stages
# (a RED build must not even ask the rollout client).
# --------------------------------------------------------------------------- #
class _FakeStageClient:
"""A recording stand-in for any of the three stage clients. ``default`` is
returned for an unstaged ``(repo, commit)`` defaults to ``PENDING`` so an
un-seeded stage reads "not done yet", never a false GREEN."""
def __init__(self, default: StageResult = StageResult.PENDING) -> None:
self._results: dict[tuple[str, str], StageResult] = {}
self._default = default
self.queries: list[tuple[str, str]] = []
def set(self, repo: str, commit: str, result: StageResult) -> None:
self._results[(repo, commit)] = result
def _lookup(self, repo: str, commit: str) -> StageResult:
self.queries.append((repo, commit))
return self._results.get((repo, commit), self._default)
class FakeGitHubChecks(_FakeStageClient):
def run_conclusion(self, repo: str, commit: str) -> StageResult:
return self._lookup(repo, commit)
class FakeWoodpecker(_FakeStageClient):
def deploy_conclusion(self, repo: str, commit: str) -> StageResult:
return self._lookup(repo, commit)
class FakeRollout(_FakeStageClient):
def rollout_status(self, repo: str, commit: str) -> StageResult:
return self._lookup(repo, commit)
# --------------------------------------------------------------------------- #
# Fixtures.
# --------------------------------------------------------------------------- #
REPO = "infra"
COMMIT = "deadbeefcafe"
@pytest.fixture
def gha() -> FakeGitHubChecks:
return FakeGitHubChecks()
@pytest.fixture
def woodpecker() -> FakeWoodpecker:
return FakeWoodpecker()
@pytest.fixture
def rollout() -> FakeRollout:
return FakeRollout()
@pytest.fixture
def watcher(gha, woodpecker, rollout) -> CIWatcher:
return CIWatcher(github=gha, woodpecker=woodpecker, rollout=rollout)
def _stage_all(gha, woodpecker, rollout, *, build, deploy, roll) -> None:
"""Stage all three clients for the canonical ``(REPO, COMMIT)`` at once."""
gha.set(REPO, COMMIT, build)
woodpecker.set(REPO, COMMIT, deploy)
rollout.set(REPO, COMMIT, roll)
# --------------------------------------------------------------------------- #
# StageResult vocabulary.
# --------------------------------------------------------------------------- #
def test_stageresult_has_the_four_outcomes():
assert {s.name for s in StageResult} == {"NONE", "PENDING", "SUCCESS", "FAILURE"}
# --------------------------------------------------------------------------- #
# The happy path: every stage green ⇒ GREEN.
# --------------------------------------------------------------------------- #
def test_all_stages_success_is_green(watcher, gha, woodpecker, rollout):
_stage_all(gha, woodpecker, rollout,
build=StageResult.SUCCESS,
deploy=StageResult.SUCCESS,
roll=StageResult.SUCCESS)
assert watcher.status(REPO, COMMIT) is CIStatus.GREEN
# --------------------------------------------------------------------------- #
# GHA build stage gates everything below it.
# --------------------------------------------------------------------------- #
def test_build_failure_is_red(watcher, gha):
gha.set(REPO, COMMIT, StageResult.FAILURE)
assert watcher.status(REPO, COMMIT) is CIStatus.RED
@pytest.mark.parametrize("build", [StageResult.NONE, StageResult.PENDING])
def test_build_not_yet_concluded_is_pending(watcher, gha, build):
# No run yet (NONE) and in-progress (PENDING) both read PENDING — the state
# machine waits on either.
gha.set(REPO, COMMIT, build)
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
def test_build_failure_short_circuits_before_deploy_and_rollout(
watcher, gha, woodpecker, rollout
):
gha.set(REPO, COMMIT, StageResult.FAILURE)
# Even if later stages would (nonsensically) be green, a red build wins...
woodpecker.set(REPO, COMMIT, StageResult.SUCCESS)
rollout.set(REPO, COMMIT, StageResult.SUCCESS)
assert watcher.status(REPO, COMMIT) is CIStatus.RED
# ...and the later clients are never even queried.
assert woodpecker.queries == []
assert rollout.queries == []
def test_build_pending_short_circuits_before_deploy_and_rollout(
watcher, gha, woodpecker, rollout
):
gha.set(REPO, COMMIT, StageResult.PENDING)
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
assert woodpecker.queries == []
assert rollout.queries == []
# --------------------------------------------------------------------------- #
# Deploy (Woodpecker) stage — only consulted once the build is green.
# --------------------------------------------------------------------------- #
def test_deploy_failure_is_red_even_with_green_build(watcher, gha, woodpecker):
gha.set(REPO, COMMIT, StageResult.SUCCESS)
woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
assert watcher.status(REPO, COMMIT) is CIStatus.RED
@pytest.mark.parametrize("deploy", [StageResult.NONE, StageResult.PENDING])
def test_deploy_not_yet_concluded_is_pending(watcher, gha, woodpecker, deploy):
gha.set(REPO, COMMIT, StageResult.SUCCESS)
woodpecker.set(REPO, COMMIT, deploy)
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
def test_deploy_failure_short_circuits_before_rollout(
watcher, gha, woodpecker, rollout
):
gha.set(REPO, COMMIT, StageResult.SUCCESS)
woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
rollout.set(REPO, COMMIT, StageResult.SUCCESS)
assert watcher.status(REPO, COMMIT) is CIStatus.RED
assert rollout.queries == []
# The build WAS consulted (it had to pass to reach deploy).
assert gha.queries == [(REPO, COMMIT)]
# --------------------------------------------------------------------------- #
# Rollout stage — the final gate. Green build + green deploy is still only
# PENDING until the image actually reaches the cluster.
# --------------------------------------------------------------------------- #
def test_rollout_failure_is_red(watcher, gha, woodpecker, rollout):
_stage_all(gha, woodpecker, rollout,
build=StageResult.SUCCESS,
deploy=StageResult.SUCCESS,
roll=StageResult.FAILURE)
assert watcher.status(REPO, COMMIT) is CIStatus.RED
@pytest.mark.parametrize("roll", [StageResult.NONE, StageResult.PENDING])
def test_green_build_and_deploy_but_unfinished_rollout_is_pending(
watcher, gha, woodpecker, rollout, roll
):
_stage_all(gha, woodpecker, rollout,
build=StageResult.SUCCESS,
deploy=StageResult.SUCCESS,
roll=roll)
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
def test_green_requires_all_three_stages_consulted(
watcher, gha, woodpecker, rollout
):
_stage_all(gha, woodpecker, rollout,
build=StageResult.SUCCESS,
deploy=StageResult.SUCCESS,
roll=StageResult.SUCCESS)
assert watcher.status(REPO, COMMIT) is CIStatus.GREEN
assert gha.queries == [(REPO, COMMIT)]
assert woodpecker.queries == [(REPO, COMMIT)]
assert rollout.queries == [(REPO, COMMIT)]
# --------------------------------------------------------------------------- #
# Plumbing: the commit and repo are passed through verbatim to every client,
# and an entirely un-seeded commit reads PENDING (not GREEN, not RED).
# --------------------------------------------------------------------------- #
def test_repo_and_commit_passed_through_to_clients(watcher, gha):
gha.set("realestate-crawler", "abc123", StageResult.FAILURE)
assert watcher.status("realestate-crawler", "abc123") is CIStatus.RED
assert gha.queries == [("realestate-crawler", "abc123")]
def test_unknown_commit_defaults_to_pending(watcher):
# Nothing staged anywhere ⇒ the build stage reads PENDING by default ⇒ the
# whole verdict is PENDING. A never-pushed/just-pushed commit is never a
# false GREEN.
assert watcher.status(REPO, "never-seen") is CIStatus.PENDING
# --------------------------------------------------------------------------- #
# The default rollout client is OPTIONAL — per the pilot facts, state.sqlite /
# kubectl reads are optional, so a CIWatcher built without a rollout client must
# still work, treating "build green + deploy green" as the terminal GREEN.
# --------------------------------------------------------------------------- #
def test_rollout_client_is_optional_deploy_green_is_green(gha, woodpecker):
w = CIWatcher(github=gha, woodpecker=woodpecker) # no rollout client
gha.set(REPO, COMMIT, StageResult.SUCCESS)
woodpecker.set(REPO, COMMIT, StageResult.SUCCESS)
assert w.status(REPO, COMMIT) is CIStatus.GREEN
def test_rollout_client_optional_still_honours_build_and_deploy_failures(
gha, woodpecker
):
w = CIWatcher(github=gha, woodpecker=woodpecker)
gha.set(REPO, COMMIT, StageResult.SUCCESS)
woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
assert w.status(REPO, COMMIT) is CIStatus.RED
# --------------------------------------------------------------------------- #
# Full folding table — exhaustive over (build, deploy, rollout) so the
# precedence rules (FAILURE short-circuits red; otherwise any PENDING/NONE keeps
# it pending; all-success ⇒ green) can never silently drift.
# --------------------------------------------------------------------------- #
_N, _P, _S, _F = (
StageResult.NONE,
StageResult.PENDING,
StageResult.SUCCESS,
StageResult.FAILURE,
)
def _expected(build: StageResult, deploy: StageResult, roll: StageResult) -> CIStatus:
# Reference fold, independent of the implementation, evaluated stage by stage.
for stage in (build, deploy, roll):
if stage is _F:
return CIStatus.RED
if stage in (_N, _P):
return CIStatus.PENDING
return CIStatus.GREEN
@pytest.mark.parametrize("build", [_N, _P, _S, _F])
@pytest.mark.parametrize("deploy", [_N, _P, _S, _F])
@pytest.mark.parametrize("roll", [_N, _P, _S, _F])
def test_full_folding_table(watcher, gha, woodpecker, rollout, build, deploy, roll):
_stage_all(gha, woodpecker, rollout, build=build, deploy=deploy, roll=roll)
assert watcher.status(REPO, COMMIT) is _expected(build, deploy, roll)

View file

@ -0,0 +1,374 @@
"""Tests for ``app.afk.dispatch_policy.select_dispatchable`` — the pure gate that
turns a pile of ready issues into the ordered set the loop may dispatch *now*.
The function is PURE (no IO), so every test here is a plain in-memory call over
the fakes/factories in ``conftest`` (``make_issue`` / ``make_config``); nothing
touches a real T3 server, tracker, or cluster. The suite walks the full
dispatchability matrix trust gate, allowlist, per-repo lock, blocked_by,
kill switch plus the priority ordering and the one-agent-per-repo invariant.
Ordering contract under test: **lower ``priority`` value first** (P0 before P1
before P2 most urgent wins), matching tracker conventions and
``Issue.priority``'s own docstring, with a deterministic tiebreaker (ascending
issue number) so the output is stable regardless of input order.
"""
import itertools
import pytest
from app.afk import dispatch_policy
from app.afk.types import DispatchDecision, Issue
# --------------------------------------------------------------------------- #
# Helpers — keep assertions terse and intent-revealing.
# --------------------------------------------------------------------------- #
def _selected_numbers(decisions: list[DispatchDecision]) -> list[int]:
"""The issue numbers, in the order the policy returned them."""
return [d.issue.number for d in decisions]
def _selected_set(decisions: list[DispatchDecision]) -> set[int]:
return {d.issue.number for d in decisions}
# --------------------------------------------------------------------------- #
# Return shape & purity.
# --------------------------------------------------------------------------- #
def test_returns_list_of_dispatch_decisions(make_issue, make_config):
issue = make_issue(number=7, repo="infra")
decisions = dispatch_policy.select_dispatchable([issue], make_config(), set())
assert isinstance(decisions, list)
assert len(decisions) == 1
assert isinstance(decisions[0], DispatchDecision)
assert decisions[0].issue is issue
assert isinstance(decisions[0].reason, str) and decisions[0].reason # non-empty
def test_empty_input_yields_empty_output(make_config):
assert dispatch_policy.select_dispatchable([], make_config(), set()) == []
def test_does_not_mutate_inputs(make_issue, make_config):
issues = [make_issue(number=1, priority=0), make_issue(number=2, priority=9)]
issues_snapshot = list(issues)
config = make_config(allowlist=["infra"])
in_flight: set[str] = set()
dispatch_policy.select_dispatchable(issues, config, in_flight)
# Caller's list (and its order) and the lock set are left untouched.
assert issues == issues_snapshot
assert [i.number for i in issues] == [1, 2]
assert in_flight == set()
assert config.allowlist == ["infra"]
def test_decision_wraps_the_same_issue_object(make_issue, make_config):
issue = make_issue(number=42)
[decision] = dispatch_policy.select_dispatchable([issue], make_config(), set())
assert decision.issue is issue # identity, not a copy
# --------------------------------------------------------------------------- #
# Kill switch — highest-precedence short-circuit.
# --------------------------------------------------------------------------- #
def test_kill_switch_returns_empty_even_with_perfect_issues(make_issue, make_config):
issues = [make_issue(number=n, repo="infra") for n in range(1, 6)]
config = make_config(allowlist=["infra"], kill_switch=True)
assert dispatch_policy.select_dispatchable(issues, config, set()) == []
def test_kill_switch_off_dispatches(make_issue, make_config):
issue = make_issue(repo="infra")
config = make_config(allowlist=["infra"], kill_switch=False)
assert len(dispatch_policy.select_dispatchable([issue], config, set())) == 1
def test_production_default_config_dispatches_nothing(make_issue):
"""The shipped default (kill switch ON, empty allowlist) is inert: even a
pristine, trusted issue is never selected."""
from app.afk import config as afk_config
issue = make_issue(repo="infra")
assert dispatch_policy.select_dispatchable([issue], afk_config.default(), set()) == []
# --------------------------------------------------------------------------- #
# Trust gate.
# --------------------------------------------------------------------------- #
def test_untrusted_issue_is_skipped(make_issue, make_config):
issue = make_issue(repo="infra", labeled_by_trusted=False)
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
def test_trusted_issue_is_eligible(make_issue, make_config):
issue = make_issue(repo="infra", labeled_by_trusted=True)
assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
def test_trust_gate_filters_only_untrusted(make_issue, make_config):
trusted = make_issue(number=1, repo="infra", labeled_by_trusted=True)
untrusted = make_issue(number=2, repo="infra", labeled_by_trusted=False)
decisions = dispatch_policy.select_dispatchable(
[trusted, untrusted], make_config(allowlist=["infra"]), set()
)
assert _selected_set(decisions) == {1}
# --------------------------------------------------------------------------- #
# Allowlist membership.
# --------------------------------------------------------------------------- #
def test_repo_not_in_allowlist_is_skipped(make_issue, make_config):
issue = make_issue(repo="some-other-repo")
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
def test_empty_allowlist_dispatches_nothing(make_issue, make_config):
issue = make_issue(repo="infra")
# kill switch off but allowlist empty -> still inert (the two-gate posture).
config = make_config(allowlist=[], kill_switch=False)
assert dispatch_policy.select_dispatchable([issue], config, set()) == []
def test_allowlist_selects_only_listed_repos(make_issue, make_config):
a = make_issue(number=1, repo="infra")
b = make_issue(number=2, repo="realestate-crawler")
c = make_issue(number=3, repo="not-allowed")
decisions = dispatch_policy.select_dispatchable(
[a, b, c], make_config(allowlist=["infra", "realestate-crawler"]), set()
)
assert _selected_set(decisions) == {1, 2}
# --------------------------------------------------------------------------- #
# Per-repo lock (in_flight_repos).
# --------------------------------------------------------------------------- #
def test_repo_already_in_flight_is_skipped(make_issue, make_config):
issue = make_issue(repo="infra")
decisions = dispatch_policy.select_dispatchable(
[issue], make_config(allowlist=["infra"]), in_flight_repos={"infra"}
)
assert decisions == []
def test_in_flight_lock_is_per_repo(make_issue, make_config):
locked = make_issue(number=1, repo="infra")
free = make_issue(number=2, repo="realestate-crawler")
decisions = dispatch_policy.select_dispatchable(
[locked, free],
make_config(allowlist=["infra", "realestate-crawler"]),
in_flight_repos={"infra"},
)
assert _selected_set(decisions) == {2} # only the unlocked repo's issue runs
def test_all_repos_in_flight_dispatches_nothing(make_issue, make_config):
a = make_issue(number=1, repo="infra")
b = make_issue(number=2, repo="realestate-crawler")
decisions = dispatch_policy.select_dispatchable(
[a, b],
make_config(allowlist=["infra", "realestate-crawler"]),
in_flight_repos={"infra", "realestate-crawler"},
)
assert decisions == []
# --------------------------------------------------------------------------- #
# One-agent-per-repo invariant — at most ONE decision per repo per call.
#
# The whole design serialises agents within a repo (two would collide on the
# working tree). A single call must therefore never hand back two issues for the
# same repo, even when both are eligible and the repo is not yet in-flight.
# --------------------------------------------------------------------------- #
def test_at_most_one_decision_per_repo(make_issue, make_config):
urgent = make_issue(number=1, repo="infra", priority=1)
minor = make_issue(number=2, repo="infra", priority=9)
decisions = dispatch_policy.select_dispatchable(
[urgent, minor], make_config(allowlist=["infra"]), set()
)
assert len(decisions) == 1
assert decisions[0].issue.number == 1 # most urgent (lowest value) wins the slot
def test_one_decision_per_repo_across_many_repos(make_issue, make_config):
issues = [
make_issue(number=10, repo="infra", priority=1),
make_issue(number=11, repo="infra", priority=5),
make_issue(number=20, repo="realestate-crawler", priority=3),
make_issue(number=21, repo="realestate-crawler", priority=2),
]
decisions = dispatch_policy.select_dispatchable(
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
)
# One per repo, each the repo's most urgent (lowest-value) eligible issue:
# infra -> #10 (p1 < p5); realestate-crawler -> #21 (p2 < p3).
assert _selected_set(decisions) == {10, 21}
repos = [d.issue.repo for d in decisions]
assert len(repos) == len(set(repos)) # no repo appears twice
def test_ineligible_higher_priority_does_not_consume_repo_slot(make_issue, make_config):
"""A more-urgent issue that is itself ineligible (e.g. blocked) must not
suppress a less-urgent *eligible* issue in the same repo the slot goes to
the best ELIGIBLE candidate, not merely the most urgent one."""
blocked_urgent = make_issue(number=1, repo="infra", priority=1, blocked_by=[99])
ready_minor = make_issue(number=2, repo="infra", priority=9)
decisions = dispatch_policy.select_dispatchable(
[blocked_urgent, ready_minor], make_config(allowlist=["infra"]), set()
)
assert _selected_numbers(decisions) == [2]
# --------------------------------------------------------------------------- #
# blocked_by gating — blocked_by holds OPEN blocker numbers.
# --------------------------------------------------------------------------- #
def test_blocked_issue_is_skipped(make_issue, make_config):
issue = make_issue(repo="infra", blocked_by=[101])
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
def test_unblocked_issue_with_empty_blocked_by_is_eligible(make_issue, make_config):
issue = make_issue(repo="infra", blocked_by=[])
assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
@pytest.mark.parametrize("blockers", [[1], [1, 2], [5, 6, 7]])
def test_any_open_blocker_blocks(make_issue, make_config, blockers):
issue = make_issue(repo="infra", blocked_by=blockers)
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
def test_blocked_filters_only_blocked(make_issue, make_config):
ready = make_issue(number=1, repo="infra", blocked_by=[])
blocked = make_issue(number=2, repo="realestate-crawler", blocked_by=[7])
decisions = dispatch_policy.select_dispatchable(
[ready, blocked], make_config(allowlist=["infra", "realestate-crawler"]), set()
)
assert _selected_set(decisions) == {1}
# --------------------------------------------------------------------------- #
# Priority ordering — lower priority value first, deterministic tiebreaker.
# --------------------------------------------------------------------------- #
def test_lower_priority_value_first(make_issue, make_config):
p1 = make_issue(number=1, repo="infra", priority=1)
p5 = make_issue(number=2, repo="realestate-crawler", priority=5)
p9 = make_issue(number=3, repo="SparkyFitness", priority=9)
decisions = dispatch_policy.select_dispatchable(
[p1, p9, p5],
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
set(),
)
assert _selected_numbers(decisions) == [1, 2, 3] # priorities 1, 5, 9
def test_ordering_independent_of_input_order(make_issue, make_config):
"""Whatever order the caller supplies issues in, the dispatch order is the
same sorted purely by the policy, not by arrival."""
base = [
("infra", 10, 2),
("realestate-crawler", 20, 8),
("SparkyFitness", 30, 5),
("health", 40, 1),
]
allow = ["infra", "realestate-crawler", "SparkyFitness", "health"]
config = make_config(allowlist=allow)
expected = [40, 10, 30, 20] # priorities 1,2,5,8 (most urgent first)
for perm in itertools.permutations(base):
issues = [make_issue(number=n, repo=r, priority=p) for (r, n, p) in perm]
decisions = dispatch_policy.select_dispatchable(issues, config, set())
assert _selected_numbers(decisions) == expected
def test_priority_ties_break_deterministically_by_issue_number(make_issue, make_config):
"""Equal priority across different repos -> a stable, total order. We tie-break
on ascending issue number so the result never depends on dict/set iteration
or input order."""
a = make_issue(number=30, repo="infra", priority=5)
b = make_issue(number=10, repo="realestate-crawler", priority=5)
c = make_issue(number=20, repo="SparkyFitness", priority=5)
config = make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"])
for perm in itertools.permutations([a, b, c]):
decisions = dispatch_policy.select_dispatchable(list(perm), config, set())
assert _selected_numbers(decisions) == [10, 20, 30]
def test_negative_and_zero_priorities_order_correctly(make_issue, make_config):
neg = make_issue(number=1, repo="infra", priority=-5)
zero = make_issue(number=2, repo="realestate-crawler", priority=0)
pos = make_issue(number=3, repo="SparkyFitness", priority=3)
decisions = dispatch_policy.select_dispatchable(
[neg, zero, pos],
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
set(),
)
assert _selected_numbers(decisions) == [1, 2, 3] # -5 < 0 < 3 (most urgent first)
# --------------------------------------------------------------------------- #
# Reasons — human-readable, never parsed, but must be present and sensible.
# --------------------------------------------------------------------------- #
def test_every_decision_has_a_nonempty_reason(make_issue, make_config):
issues = [
make_issue(number=1, repo="infra", priority=3),
make_issue(number=2, repo="realestate-crawler", priority=1),
]
decisions = dispatch_policy.select_dispatchable(
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
)
assert decisions # sanity
assert all(d.reason.strip() for d in decisions)
# --------------------------------------------------------------------------- #
# Combined matrix — every gate together. A single eligible needle in a haystack
# of issues that each trip exactly one gate.
# --------------------------------------------------------------------------- #
def test_only_the_fully_eligible_issue_survives_all_gates(make_issue, make_config):
config = make_config(allowlist=["infra", "realestate-crawler"], kill_switch=False)
in_flight = {"realestate-crawler"} # this repo is locked
issues = [
make_issue(number=1, repo="infra", priority=5), # ELIGIBLE
make_issue(number=2, repo="not-allowed", priority=9), # allowlist
make_issue(number=3, repo="infra", priority=9, labeled_by_trusted=False), # trust
make_issue(number=4, repo="infra", priority=9, blocked_by=[1]), # blocked
make_issue(number=5, repo="realestate-crawler", priority=9), # repo locked
]
decisions = dispatch_policy.select_dispatchable(issues, config, in_flight)
assert _selected_numbers(decisions) == [1]
assert decisions[0].issue.repo == "infra"
@pytest.mark.parametrize("trusted", [True, False])
@pytest.mark.parametrize("allowed", [True, False])
@pytest.mark.parametrize("blocked", [True, False])
@pytest.mark.parametrize("locked", [True, False])
@pytest.mark.parametrize("killed", [True, False])
def test_full_eligibility_matrix(
make_issue, make_config, trusted, allowed, blocked, locked, killed
):
"""Exhaustive truth table: an issue is dispatched iff ALL gates pass and the
kill switch is off. 2**5 = 32 cases, single issue so ordering is moot."""
issue = make_issue(
number=1,
repo="infra",
priority=0,
labeled_by_trusted=trusted,
blocked_by=[99] if blocked else [],
)
config = make_config(
allowlist=["infra"] if allowed else ["other-repo"],
kill_switch=killed,
)
in_flight = {"infra"} if locked else set()
decisions = dispatch_policy.select_dispatchable([issue], config, in_flight)
should_dispatch = trusted and allowed and not blocked and not locked and not killed
assert (len(decisions) == 1) is should_dispatch
if should_dispatch:
assert decisions[0].issue is issue

198
tests/test_afk_notifier.py Normal file
View file

@ -0,0 +1,198 @@
"""Tests for ``app.afk.notifier`` — the terminal-state doorbell.
The notifier's whole job is to format a human-facing alert (Slack / ntfy) with a
deep-link back to the T3 thread when a run reaches a terminal state done,
needs-human, or frozen and hand it to an injected sender. Every test here
injects a recording fake sender, so nothing is ever POSTed: we assert the
*formatted payload* per kind, plus the deep-link, the kind vocabulary, and the
guardrails (no thread no link, unknown kind rejected, sender called exactly
once with the return value being None).
No real Slack/ntfy/T3 is touched consistent with the rest of the AFK suite.
"""
import pytest
from app.afk import notifier as notifier_mod
from app.afk.notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN, Notification, Notifier
from app.afk.types import Issue
# --------------------------------------------------------------------------- #
# A recording sender — captures the Notification instead of posting it.
# --------------------------------------------------------------------------- #
class RecordingSender:
"""Injectable stand-in for the real Slack/ntfy POST. Records each payload so
a test can assert the formatting without any network."""
def __init__(self) -> None:
self.sent: list[Notification] = []
def __call__(self, notification: Notification) -> None:
self.sent.append(notification)
@pytest.fixture
def sender() -> RecordingSender:
return RecordingSender()
def _issue(number: int = 42, repo: str = "infra") -> Issue:
return Issue(
number=number,
repo=repo,
labels=["ready-for-agent"],
blocked_by=[],
labeled_by_trusted=True,
priority=0,
)
# --------------------------------------------------------------------------- #
# Kind vocabulary — the three terminal states, and nothing else.
# --------------------------------------------------------------------------- #
def test_terminal_kinds_are_exactly_the_three_terminal_states():
assert KIND_DONE == "done"
assert KIND_NEEDS_HUMAN == "needs-human"
assert KIND_FROZEN == "frozen"
assert notifier_mod.TERMINAL_KINDS == {KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN}
# --------------------------------------------------------------------------- #
# Dispatch mechanics — sender injected, called exactly once, returns None.
# --------------------------------------------------------------------------- #
def test_notify_calls_sender_exactly_once_and_returns_none(sender):
n = Notifier(sender)
result = n.notify(KIND_DONE, _issue(), "thread-7", "all green")
assert result is None
assert len(sender.sent) == 1
def test_notify_does_not_post_anything_itself(sender):
"""The Notifier must never reach the network on its own — all egress goes
through the injected sender. A test-only sentinel proves that."""
n = Notifier(sender)
n.notify(KIND_FROZEN, _issue(), "thread-1", "budget exhausted")
# Nothing other than the injected sender ran: exactly one recorded payload,
# and it is the Notification dataclass (not a raw dict / HTTP response).
assert isinstance(sender.sent[0], Notification)
# --------------------------------------------------------------------------- #
# Deep-link — every payload links back to the T3 thread (when there is one).
# --------------------------------------------------------------------------- #
def test_payload_deep_links_to_the_t3_thread(sender):
n = Notifier(sender, base_url="https://t3.viktorbarzin.me")
n.notify(KIND_DONE, _issue(), "thread-abc", "done")
payload = sender.sent[0]
assert payload.link == "https://t3.viktorbarzin.me/?thread=thread-abc"
# The link is also surfaced in the human-readable body so it survives
# senders that drop structured fields (e.g. a plain ntfy message).
assert "https://t3.viktorbarzin.me/?thread=thread-abc" in payload.body
def test_base_url_trailing_slash_is_normalised(sender):
n = Notifier(sender, base_url="https://t3.viktorbarzin.me/")
n.notify(KIND_DONE, _issue(), "thread-x", "done")
assert sender.sent[0].link == "https://t3.viktorbarzin.me/?thread=thread-x"
def test_no_thread_id_means_no_link(sender):
"""A run can reach 'needs-human' before any thread exists (e.g. dispatch
itself failed). Without a thread there is nothing to deep-link to, so the
link is None but the doorbell still fires."""
n = Notifier(sender)
n.notify(KIND_NEEDS_HUMAN, _issue(), None, "dispatch failed")
payload = sender.sent[0]
assert payload.link is None
assert len(sender.sent) == 1
# No dangling "/?thread=" fragment leaks into the body either.
assert "?thread=" not in payload.body
# --------------------------------------------------------------------------- #
# Per-kind formatting — title / body / priority / tags differ per terminal kind.
# --------------------------------------------------------------------------- #
def test_done_payload_is_informational(sender):
n = Notifier(sender)
n.notify(KIND_DONE, _issue(number=7, repo="infra"), "thread-7", "merged + CI green")
p = sender.sent[0]
assert p.kind == KIND_DONE
assert p.issue_ref == "infra#7"
assert "infra#7" in p.title
assert "merged + CI green" in p.body
# A successful close is informational, not an escalation.
assert p.priority == "low"
assert "escalation" not in p.tags
def test_needs_human_payload_is_an_escalation(sender):
n = Notifier(sender)
n.notify(KIND_NEEDS_HUMAN, _issue(number=9, repo="claude-agent-service"), "thread-9", "errored before push")
p = sender.sent[0]
assert p.kind == KIND_NEEDS_HUMAN
assert p.issue_ref == "claude-agent-service#9"
assert "claude-agent-service#9" in p.title
assert "errored before push" in p.body
assert p.priority == "high"
assert "escalation" in p.tags
def test_frozen_payload_is_an_escalation(sender):
n = Notifier(sender)
n.notify(KIND_FROZEN, _issue(number=3, repo="infra"), "thread-3", "fix-forward budget exhausted")
p = sender.sent[0]
assert p.kind == KIND_FROZEN
assert "infra#3" in p.title
assert "fix-forward budget exhausted" in p.body
assert p.priority == "high"
assert "escalation" in p.tags
def test_titles_distinguish_the_three_kinds(sender):
"""An operator skimming a Slack channel must tell the three apart from the
title alone, without reading the body."""
n = Notifier(sender)
n.notify(KIND_DONE, _issue(), "t", "x")
n.notify(KIND_NEEDS_HUMAN, _issue(), "t", "x")
n.notify(KIND_FROZEN, _issue(), "t", "x")
titles = [p.title for p in sender.sent]
assert len({t.split(" ")[0] for t in titles}) == 3 # distinct leading marker per kind
# --------------------------------------------------------------------------- #
# Guardrail — only terminal kinds are sendable. An unknown kind is a bug.
# --------------------------------------------------------------------------- #
def test_unknown_kind_raises_and_sends_nothing(sender):
n = Notifier(sender)
with pytest.raises(ValueError):
n.notify("running", _issue(), "thread-1", "still working")
assert sender.sent == []
# --------------------------------------------------------------------------- #
# Pure formatter — render_notification builds the payload independently of any
# sender, so the formatting is unit-testable on its own.
# --------------------------------------------------------------------------- #
def test_render_notification_is_pure_and_matches_notify(sender):
issue = _issue(number=11, repo="infra")
built = notifier_mod.render_notification(
KIND_FROZEN, issue, "thread-11", "stuck", base_url="https://t3.viktorbarzin.me"
)
assert isinstance(built, Notification)
assert built.link == "https://t3.viktorbarzin.me/?thread=thread-11"
# notify() must produce the identical payload it hands the sender.
Notifier(sender, base_url="https://t3.viktorbarzin.me").notify(
KIND_FROZEN, issue, "thread-11", "stuck"
)
assert sender.sent[0] == built
def test_sender_exception_propagates(sender):
"""If the sender fails (Slack down), the notifier does not swallow it — the
loop decides what to do with a failed doorbell, not this adapter."""
def boom(_notification: Notification) -> None:
raise RuntimeError("slack 503")
n = Notifier(boom)
with pytest.raises(RuntimeError, match="slack 503"):
n.notify(KIND_DONE, _issue(), "thread-1", "done")

View file

@ -0,0 +1,247 @@
"""Tests for ``app.afk.phase_checklist`` — the live progress checklist.
``render(current, meta)`` is PURE: same inputs byte-identical markdown, no I/O.
It draws the seven-phase lifecycle (worktree tests-red green pushed CI
deployed done) as a markdown task list, with phases *before* ``current`` checked
off, ``current`` marked in-progress, and later phases left empty.
Style matches the existing suite: plain ``assert`` functions, parametrized cases,
and a couple of full-output snapshots so the rendered shape is pinned, not just
its line count.
"""
import pytest
from app.afk.phase_checklist import render
from app.afk.types import Phase
# Lifecycle order, mirrored from the contract so a reordering of the enum that
# the renderer didn't track shows up as a test failure rather than silent drift.
PHASES_IN_ORDER = [
Phase.WORKTREE,
Phase.TESTS_RED,
Phase.GREEN,
Phase.PUSHED,
Phase.CI,
Phase.DEPLOYED,
Phase.DONE,
]
# --------------------------------------------------------------------------- #
# Structure: one line per phase, in order, always all seven.
# --------------------------------------------------------------------------- #
def _checklist_lines(out: str) -> list[str]:
"""The markdown task-list lines (``- [ ]`` / ``- [x]`` ...), in order."""
return [ln for ln in out.splitlines() if ln.lstrip().startswith("- [")]
def test_renders_a_string():
assert isinstance(render(Phase.WORKTREE, {}), str)
@pytest.mark.parametrize("current", PHASES_IN_ORDER)
def test_every_phase_has_exactly_one_checklist_line(current):
lines = _checklist_lines(render(current, {}))
assert len(lines) == len(PHASES_IN_ORDER)
@pytest.mark.parametrize("current", PHASES_IN_ORDER)
def test_checklist_lines_are_in_lifecycle_order(current):
lines = _checklist_lines(render(current, {}))
# Each phase's human label appears, and in the lifecycle order.
positions = [
next(i for i, ln in enumerate(lines) if _has_label(ln, phase))
for phase in PHASES_IN_ORDER
]
assert positions == sorted(positions)
def _has_label(line: str, phase: Phase) -> bool:
"""Whether a checklist line carries ``phase``'s headline word (case-insensitive
substring the test asserts the label is *present*, not its exact decoration)."""
return _phase_label(phase).lower() in line.lower()
def _phase_label(phase: Phase) -> str:
"""The headline word(s) the renderer must use for a phase. Loose on purpose:
the test asserts the label is *present*, not the exact decoration."""
return {
Phase.WORKTREE: "worktree",
Phase.TESTS_RED: "test",
Phase.GREEN: "green",
Phase.PUSHED: "push",
Phase.CI: "CI",
Phase.DEPLOYED: "deploy",
Phase.DONE: "done",
}[phase]
# --------------------------------------------------------------------------- #
# Check/in-progress/empty partitioning around ``current``.
# --------------------------------------------------------------------------- #
def _classify(line: str) -> str:
"""Bucket a checklist line by its marker: 'done' ``[x]``, 'todo' ``[ ]``, or
'active' (anything else, e.g. an in-progress glyph)."""
body = line.lstrip()
if body.startswith("- [x]"):
return "done"
if body.startswith("- [ ]"):
return "todo"
return "active"
@pytest.mark.parametrize("idx,current", list(enumerate(PHASES_IN_ORDER)))
def test_earlier_checked_current_active_later_empty(idx, current):
lines = _checklist_lines(render(current, {}))
buckets = [_classify(ln) for ln in lines]
# Everything strictly before the current phase is checked off.
assert all(b == "done" for b in buckets[:idx]), buckets
if current is Phase.DONE:
# Terminal phase: the whole list is checked, nothing left active/empty.
assert all(b == "done" for b in buckets), buckets
else:
# The current phase is the single in-progress marker...
assert buckets[idx] == "active", buckets
assert buckets.count("active") == 1, buckets
# ...and every phase after it is still an empty checkbox.
assert all(b == "todo" for b in buckets[idx + 1 :]), buckets
def test_first_phase_has_nothing_checked_before_it():
lines = _checklist_lines(render(Phase.WORKTREE, {}))
assert _classify(lines[0]) == "active"
assert "done" not in [_classify(ln) for ln in lines]
def test_done_checks_every_phase_including_done():
lines = _checklist_lines(render(Phase.DONE, {}))
assert all(_classify(ln) == "done" for ln in lines)
# The DONE line itself is checked, not merely the ones before it.
done_line = next(ln for ln in lines if _has_label(ln, Phase.DONE))
assert _classify(done_line) == "done"
# --------------------------------------------------------------------------- #
# Active-phase emphasis: the current phase is visually distinguishable.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("current", [p for p in PHASES_IN_ORDER if p is not Phase.DONE])
def test_active_phase_line_differs_from_todo_and_done_markers(current):
lines = _checklist_lines(render(current, {}))
active = [ln for ln in lines if _classify(ln) == "active"]
assert len(active) == 1
# Not a plain checkbox in either state.
assert not active[0].lstrip().startswith("- [x]")
assert not active[0].lstrip().startswith("- [ ]")
# --------------------------------------------------------------------------- #
# meta rendering: optional context is surfaced, omission never explodes.
# --------------------------------------------------------------------------- #
def test_meta_empty_does_not_raise_and_still_lists_phases():
out = render(Phase.GREEN, {})
assert _checklist_lines(out) # non-empty
def test_meta_issue_and_repo_appear_in_output():
out = render(Phase.GREEN, {"repo": "infra", "issue": 42})
assert "infra" in out
assert "42" in out
def test_meta_thread_id_appears_when_present():
out = render(Phase.PUSHED, {"thread_id": "thread-7"})
assert "thread-7" in out
def test_meta_thread_id_absent_is_silent():
out = render(Phase.PUSHED, {})
assert "thread-" not in out
def test_meta_fix_forward_attempt_surfaced():
out = render(Phase.CI, {"fix_forward_attempts": 3})
assert "3" in out
def test_meta_unknown_keys_are_ignored():
# An unexpected key must not crash or leak its raw value as a stray line.
out = render(Phase.WORKTREE, {"totally_unknown_field": "should-not-appear"})
assert "should-not-appear" not in out
# --------------------------------------------------------------------------- #
# Determinism + idempotence (it's pure).
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("current", PHASES_IN_ORDER)
def test_render_is_deterministic(current):
meta = {"repo": "infra", "issue": 9, "thread_id": "thread-1"}
assert render(current, meta) == render(current, meta)
def test_render_does_not_mutate_meta():
meta = {"repo": "infra", "issue": 1}
before = dict(meta)
render(Phase.GREEN, meta)
assert meta == before
# --------------------------------------------------------------------------- #
# Snapshots: pin the exact rendered shape for two representative phases. If the
# format changes intentionally, update these strings; an accidental change to
# wording/markers/order fails here loudly.
# --------------------------------------------------------------------------- #
WORKTREE_SNAPSHOT = """\
### infra#7 — AFK run progress
- [~] Worktree created
- [ ] Failing test written (TDD red)
- [ ] Implementation passing (TDD green)
- [ ] Pushed to master
- [ ] CI green on pushed commit
- [ ] Deployed / rolled out
- [ ] Done issue closed
"""
def test_snapshot_worktree_phase():
out = render(Phase.WORKTREE, {"repo": "infra", "issue": 7})
assert out == WORKTREE_SNAPSHOT
CI_SNAPSHOT = """\
### infra#7 — AFK run progress (thread thread-3)
- [x] Worktree created
- [x] Failing test written (TDD red)
- [x] Implementation passing (TDD green)
- [x] Pushed to master
- [~] CI green on pushed commit
- [ ] Deployed / rolled out
- [ ] Done issue closed
"""
def test_snapshot_ci_phase_with_thread():
out = render(Phase.CI, {"repo": "infra", "issue": 7, "thread_id": "thread-3"})
assert out == CI_SNAPSHOT
DONE_SNAPSHOT = """\
### infra#7 — AFK run progress
- [x] Worktree created
- [x] Failing test written (TDD red)
- [x] Implementation passing (TDD green)
- [x] Pushed to master
- [x] CI green on pushed commit
- [x] Deployed / rolled out
- [x] Done issue closed
"""
def test_snapshot_done_phase():
out = render(Phase.DONE, {"repo": "infra", "issue": 7})
assert out == DONE_SNAPSHOT

270
tests/test_afk_poller.py Normal file
View file

@ -0,0 +1,270 @@
"""Integration tests for ``app.afk.poller`` — the CronJob dispatch tick.
Unlike the unit suites, these wire the REAL pure cores (the actual
``dispatch_policy.select_dispatchable``) to the in-memory adapter FAKES from
``conftest`` (``FakeTracker`` / ``FakeT3Client``). No test touches a real T3
server, GitHub/Forgejo, or the cluster the poller is exercised end to end with
fakes standing in only for the I/O edges.
What the tick must do (the poller contract):
* **kill switch** a disabled config dispatches nothing AND never calls the
tracker or T3 (the CronJob does no I/O when the loop is off);
* read the ready set via ``tracker.list_ready(config.allowlist)``;
* derive the **per-repo lock** from the ready set itself a repo with an issue
already carrying the ``in_progress_label`` is in flight and is skipped (the
CronJob is stateless between ticks, so the tracker is the source of truth);
* run the real ``select_dispatchable`` over (ready issues, config, in-flight
repos) and, for each decision, ``t3_client.dispatch(...)`` then
``tracker.add_label(repo, issue, in_progress_label)`` label AFTER a
successful dispatch so a dispatch failure never leaves a phantom lock.
"""
import pytest
from app.afk import poller
from app.afk.types import Config
# --------------------------------------------------------------------------- #
# Helpers.
# --------------------------------------------------------------------------- #
def _poller(fake_tracker, fake_t3) -> poller.Poller:
"""A Poller wired to the conftest fakes and the real dispatch policy."""
return poller.Poller(tracker=fake_tracker, t3_client=fake_t3)
def _dispatched_pairs(fake_t3) -> set[tuple[str, int]]:
return {(d["repo"], d["issue"]) for d in fake_t3.dispatched}
def _added_in_progress(fake_tracker, label: str = "agent-in-progress") -> set[tuple[str, int]]:
return {
(repo, issue)
for (op, repo, issue, lbl) in fake_tracker.label_ops
if op == "add" and lbl == label
}
# --------------------------------------------------------------------------- #
# Kill switch — no dispatch, no I/O at all.
# --------------------------------------------------------------------------- #
def test_kill_switch_dispatches_nothing(fake_tracker, fake_t3, make_issue):
fake_tracker.seed("infra", [make_issue(number=1, repo="infra")])
config = Config(allowlist=["infra"], kill_switch=True)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert result.dispatched == []
assert fake_t3.dispatched == []
def test_kill_switch_does_not_even_read_the_tracker(fake_t3):
"""When the loop is off the CronJob must do zero I/O — not a single tracker
or T3 call. A tracker that explodes if touched proves it."""
class ExplodingTracker:
def list_ready(self, repos):
raise AssertionError("tracker must not be read when kill switch is on")
config = Config(allowlist=["infra"], kill_switch=True)
result = poller.Poller(tracker=ExplodingTracker(), t3_client=fake_t3).run_once(config)
assert result.dispatched == []
# --------------------------------------------------------------------------- #
# Empty allowlist — armed kill switch but nothing to run.
# --------------------------------------------------------------------------- #
def test_empty_allowlist_dispatches_nothing(fake_tracker, fake_t3, make_issue):
# list_ready([]) returns nothing, and even if it didn't the policy gates on
# the (empty) allowlist. The shipped default posture.
config = Config(allowlist=[], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert result.dispatched == []
assert fake_t3.dispatched == []
# --------------------------------------------------------------------------- #
# Happy path — one ready issue gets dispatched and labelled.
# --------------------------------------------------------------------------- #
def test_dispatches_a_ready_issue(fake_tracker, fake_t3, make_issue):
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
config = Config(allowlist=["infra"], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert _dispatched_pairs(fake_t3) == {("infra", 7)}
assert len(result.dispatched) == 1
assert result.dispatched[0].thread_id == "thread-0"
assert result.dispatched[0].issue.number == 7
def test_labels_in_progress_after_dispatch(fake_tracker, fake_t3, make_issue):
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
config = Config(allowlist=["infra"], kill_switch=False)
_poller(fake_tracker, fake_t3).run_once(config)
assert _added_in_progress(fake_tracker) == {("infra", 7)}
def test_in_progress_label_honours_config_override(fake_tracker, fake_t3, make_issue):
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
config = Config(allowlist=["infra"], kill_switch=False, in_progress_label="busy")
_poller(fake_tracker, fake_t3).run_once(config)
assert _added_in_progress(fake_tracker, "busy") == {("infra", 7)}
def test_dispatch_prompt_references_the_issue(fake_tracker, fake_t3, make_issue):
"""The agent runs full-access and fetches the body itself, so the prompt the
poller sends must at minimum point at the concrete repo#issue."""
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
config = Config(allowlist=["infra"], kill_switch=False)
_poller(fake_tracker, fake_t3).run_once(config)
prompt = fake_t3.dispatched[0]["prompt"]
assert "7" in prompt and "infra" in prompt
assert prompt.strip() # non-empty
# --------------------------------------------------------------------------- #
# Per-repo lock — an issue already carrying the in-progress label means an agent
# is in flight on that repo, so the repo is skipped this tick.
# --------------------------------------------------------------------------- #
def test_repo_with_in_progress_issue_is_locked(fake_tracker, fake_t3, make_issue):
in_flight = make_issue(
number=1, repo="infra", labels=["ready-for-agent", "agent-in-progress"]
)
waiting = make_issue(number=2, repo="infra", labels=["ready-for-agent"])
fake_tracker.seed("infra", [in_flight, waiting])
config = Config(allowlist=["infra"], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
# Repo already busy → nothing new dispatched, no new in-progress label.
assert result.dispatched == []
assert fake_t3.dispatched == []
assert _added_in_progress(fake_tracker) == set()
def test_lock_is_per_repo_not_global(fake_tracker, fake_t3, make_issue):
# infra is busy; a different repo is free and should still dispatch.
fake_tracker.seed(
"infra",
[make_issue(number=1, repo="infra", labels=["ready-for-agent", "agent-in-progress"])],
)
fake_tracker.seed("dotfiles", [make_issue(number=2, repo="dotfiles")])
config = Config(allowlist=["infra", "dotfiles"], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert _dispatched_pairs(fake_t3) == {("dotfiles", 2)}
assert {d.issue.repo for d in result.dispatched} == {"dotfiles"}
def test_custom_in_progress_label_drives_the_lock(fake_tracker, fake_t3, make_issue):
# The lock keys off config.in_progress_label, not the hardcoded default.
fake_tracker.seed(
"infra",
[make_issue(number=1, repo="infra", labels=["ready-for-agent", "busy"])],
)
config = Config(allowlist=["infra"], kill_switch=False, in_progress_label="busy")
result = _poller(fake_tracker, fake_t3).run_once(config)
assert result.dispatched == []
# --------------------------------------------------------------------------- #
# One dispatch per repo per tick (the policy's one-agent-per-repo invariant,
# observed through the poller): the most urgent (lowest-value) eligible issue
# wins the slot.
# --------------------------------------------------------------------------- #
def test_one_dispatch_per_repo_per_tick(fake_tracker, fake_t3, make_issue):
fake_tracker.seed(
"infra",
[
make_issue(number=1, repo="infra", priority=1), # most urgent (lowest value)
make_issue(number=2, repo="infra", priority=9),
make_issue(number=3, repo="infra", priority=5),
],
)
config = Config(allowlist=["infra"], kill_switch=False)
_poller(fake_tracker, fake_t3).run_once(config)
assert _dispatched_pairs(fake_t3) == {("infra", 1)}
assert _added_in_progress(fake_tracker) == {("infra", 1)}
# --------------------------------------------------------------------------- #
# Gating still applies through the poller (the pure policy enforces it; the
# poller must not bypass it).
# --------------------------------------------------------------------------- #
def test_untrusted_issue_is_not_dispatched(fake_tracker, fake_t3, make_issue):
fake_tracker.seed(
"infra", [make_issue(number=1, repo="infra", labeled_by_trusted=False)]
)
config = Config(allowlist=["infra"], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert result.dispatched == []
assert fake_t3.dispatched == []
def test_blocked_issue_is_not_dispatched(fake_tracker, fake_t3, make_issue):
fake_tracker.seed(
"infra", [make_issue(number=2, repo="infra", blocked_by=[1])]
)
config = Config(allowlist=["infra"], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert result.dispatched == []
def test_repo_outside_allowlist_is_not_dispatched(fake_tracker, fake_t3, make_issue):
# list_ready only queries the allowlist, but even if a stray repo's issues
# arrive the policy's allowlist gate drops them.
fake_tracker.seed("secret", [make_issue(number=1, repo="secret")])
config = Config(allowlist=["infra"], kill_switch=False)
result = _poller(fake_tracker, fake_t3).run_once(config)
assert result.dispatched == []
# --------------------------------------------------------------------------- #
# Dispatch failure must not leave a phantom lock (label only AFTER success).
# --------------------------------------------------------------------------- #
def test_dispatch_failure_does_not_label_in_progress(fake_tracker, make_issue):
class FailingT3:
def __init__(self):
self.dispatched = []
def dispatch(self, repo, issue, prompt):
raise RuntimeError("T3 down")
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
config = Config(allowlist=["infra"], kill_switch=False)
with pytest.raises(RuntimeError):
poller.Poller(tracker=fake_tracker, t3_client=FailingT3()).run_once(config)
# No in-progress label was applied — the issue stays purely ready, so the
# next tick retries it rather than treating it as locked.
assert _added_in_progress(fake_tracker) == set()
# --------------------------------------------------------------------------- #
# list_ready is called with exactly the allowlist (not all repos).
# --------------------------------------------------------------------------- #
def test_queries_only_the_allowlisted_repos(fake_t3, make_issue):
seen_repos: list[list[str]] = []
class RecordingTracker:
def list_ready(self, repos):
seen_repos.append(list(repos))
return []
def add_label(self, *a): # pragma: no cover - not reached here
raise AssertionError("nothing to label")
config = Config(allowlist=["infra", "dotfiles"], kill_switch=False)
poller.Poller(tracker=RecordingTracker(), t3_client=fake_t3).run_once(config)
assert seen_repos == [["infra", "dotfiles"]]

View file

@ -0,0 +1,190 @@
"""Tests for ``app.afk.run_state_machine.next_action`` — the pure decision
function that turns one assembled ``RunState`` into the next ``Action``.
The function encodes ADR-0002's run lifecycle:
* healthy (pushed AND CI green) -> CLOSE_SUCCESS
* cannot reach green before push (errored /
stalled with nothing pushed) -> ESCALATE_PREPUSH
* pushed but CI red, budget remaining -> FIX_FORWARD
* pushed but CI red, budget exhausted -> FREEZE_ESCALATE
* anything still in flight -> WAIT
It is PURE: no I/O, no clock, no globals it reads only its two arguments, so
every case is a plain table assertion. ``make_config`` / ``make_run_state`` come
from ``conftest.py`` (config defaults to ENABLED, run state to a fresh dispatch).
"""
import pytest
from app.afk.run_state_machine import next_action
from app.afk.types import Action, CIStatus, ThreadStatus
# --------------------------------------------------------------------------- #
# Healthy terminal: pushed + CI green -> close, regardless of thread status.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize(
"thread_status",
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
)
def test_pushed_and_green_closes_success(make_config, make_run_state, thread_status):
state = make_run_state(
thread_status=thread_status, ci_status=CIStatus.GREEN, pushed=True
)
assert next_action(state, make_config()) is Action.CLOSE_SUCCESS
# --------------------------------------------------------------------------- #
# Pre-push escalation: nothing pushed and the turn is no longer going to push
# (errored, or finished/stalled clean) -> hand back to a human.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("thread_status", [ThreadStatus.ERROR, ThreadStatus.IDLE])
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
def test_not_pushed_terminal_thread_escalates_prepush(
make_config, make_run_state, thread_status, ci_status
):
state = make_run_state(
thread_status=thread_status, ci_status=ci_status, pushed=False
)
assert next_action(state, make_config()) is Action.ESCALATE_PREPUSH
# --------------------------------------------------------------------------- #
# Still working toward a first push -> WAIT (not yet an escalation).
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("thread_status", [ThreadStatus.RUNNING, None])
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
def test_not_pushed_in_flight_waits(
make_config, make_run_state, thread_status, ci_status
):
state = make_run_state(
thread_status=thread_status, ci_status=ci_status, pushed=False
)
assert next_action(state, make_config()) is Action.WAIT
# --------------------------------------------------------------------------- #
# Pushed, CI not yet decided -> WAIT for the verdict, whatever the thread does.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize(
"thread_status",
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
)
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
def test_pushed_ci_pending_waits(
make_config, make_run_state, thread_status, ci_status
):
state = make_run_state(
thread_status=thread_status, ci_status=ci_status, pushed=True
)
assert next_action(state, make_config()) is Action.WAIT
# --------------------------------------------------------------------------- #
# Pushed + CI red: fix-forward while BOTH budgets remain, else freeze.
# Boundaries are strict-less-than on attempts AND elapsed; at/over either freezes.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize(
("attempts", "elapsed", "expected"),
[
# fresh red, plenty of budget -> fix forward
(0, 0.0, Action.FIX_FORWARD),
(1, 10.0, Action.FIX_FORWARD),
# one attempt below the cap, well inside the clock -> still fix forward
(4, 3599.0, Action.FIX_FORWARD),
# attempts hit the cap (5) -> freeze
(5, 0.0, Action.FREEZE_ESCALATE),
(6, 0.0, Action.FREEZE_ESCALATE),
# clock hits the cap (3600s) -> freeze even with attempts to spare
(0, 3600.0, Action.FREEZE_ESCALATE),
(0, 7200.0, Action.FREEZE_ESCALATE),
# both exhausted -> freeze
(5, 3600.0, Action.FREEZE_ESCALATE),
],
)
def test_pushed_red_fix_forward_until_budget_exhausted(
make_config, make_run_state, attempts, elapsed, expected
):
state = make_run_state(
thread_status=ThreadStatus.IDLE,
ci_status=CIStatus.RED,
pushed=True,
fix_forward_attempts=attempts,
elapsed_seconds=elapsed,
)
assert next_action(state, make_config()) is expected
# --------------------------------------------------------------------------- #
# Fix-forward budget is honoured from config, not hardcoded.
# --------------------------------------------------------------------------- #
def test_fix_forward_attempts_cap_comes_from_config(make_config, make_run_state):
config = make_config(fix_forward_max_attempts=2)
red = dict(thread_status=ThreadStatus.IDLE, ci_status=CIStatus.RED, pushed=True)
assert next_action(make_run_state(fix_forward_attempts=1, **red), config) is Action.FIX_FORWARD
assert next_action(make_run_state(fix_forward_attempts=2, **red), config) is Action.FREEZE_ESCALATE
def test_fix_forward_seconds_cap_comes_from_config(make_config, make_run_state):
config = make_config(fix_forward_max_seconds=120)
red = dict(thread_status=ThreadStatus.IDLE, ci_status=CIStatus.RED, pushed=True)
assert next_action(make_run_state(elapsed_seconds=119.0, **red), config) is Action.FIX_FORWARD
assert next_action(make_run_state(elapsed_seconds=120.0, **red), config) is Action.FREEZE_ESCALATE
# --------------------------------------------------------------------------- #
# A red CI on a pushed commit while the thread is still RUNNING a fix is, per
# spec, keyed only on (pushed AND red) + budget — thread status doesn't gate it.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize(
"thread_status",
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
)
def test_pushed_red_with_budget_fixes_forward_for_any_thread_status(
make_config, make_run_state, thread_status
):
state = make_run_state(
thread_status=thread_status,
ci_status=CIStatus.RED,
pushed=True,
fix_forward_attempts=0,
elapsed_seconds=0.0,
)
assert next_action(state, make_config()) is Action.FIX_FORWARD
# --------------------------------------------------------------------------- #
# Full cross-product sanity sweep: next_action is TOTAL — it returns a real
# Action for every reachable combination, and matches the reference table.
# --------------------------------------------------------------------------- #
def _expected(thread_status, ci_status, pushed):
"""Reference implementation of the decision table, written independently of
the module under test, to cross-check every combination."""
if pushed and ci_status is CIStatus.GREEN:
return Action.CLOSE_SUCCESS
if pushed and ci_status is CIStatus.RED:
return Action.FIX_FORWARD # budget always available in this sweep
if not pushed and thread_status in (ThreadStatus.ERROR, ThreadStatus.IDLE):
return Action.ESCALATE_PREPUSH
return Action.WAIT
@pytest.mark.parametrize(
"thread_status",
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
)
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING, CIStatus.GREEN, CIStatus.RED])
@pytest.mark.parametrize("pushed", [True, False])
def test_decision_table_is_total(
make_config, make_run_state, thread_status, ci_status, pushed
):
state = make_run_state(
thread_status=thread_status,
ci_status=ci_status,
pushed=pushed,
fix_forward_attempts=0,
elapsed_seconds=0.0,
)
result = next_action(state, make_config())
assert isinstance(result, Action)
assert result is _expected(thread_status, ci_status, pushed)

265
tests/test_afk_t3_client.py Normal file
View file

@ -0,0 +1,265 @@
"""Tests for ``app.afk.t3_client`` — the in-cluster T3 dispatch/snapshot adapter.
Everything runs against an in-memory FAKE HTTP transport; no test touches a real
T3 server. These assertions pin the **real** orchestration wire contract
(reverse-engineered from T3 v0.0.27 and verified live against t3-afk on
2026-06-15) deliberately strict, because the previous version of this adapter
passed a laxer fake while 400-ing the real server. The fake therefore *rejects*
a command without a ``type`` discriminator, so a regression to the old
``{"command": "..."}` shape fails loudly here.
Pinned facts:
* the dispatch body is a BARE command keyed by ``type`` (not ``command``);
* the CLIENT mints ``threadId``/``commandId``/``messageId`` + ``createdAt``;
``dispatch`` returns the id it generated (the server replies ``{sequence}``);
* a thread lives in a project, so ``dispatch`` ensures the repo's project
(snapshot GET ``project.create`` iff absent) before ``thread.create``;
* ``ISSUE_IMPLEMENTER_PREAMBLE`` is prepended to the opening turn's text;
* ``send_turn`` posts a follow-up turn (no preamble) on an existing thread;
* every request carries ``Authorization: Bearer <token>``, re-read per call.
"""
import pytest
from app.afk import t3_client
from app.afk.issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
_MODEL = "claude-sonnet-4-6"
# --------------------------------------------------------------------------- #
# Fake HTTP transport — httpx-shaped, but it ENFORCES the command envelope so a
# malformed command (the old bug) raises instead of silently passing.
# --------------------------------------------------------------------------- #
class FakeResponse:
def __init__(self, payload: dict, status_code: int = 200) -> None:
self._payload = payload
self.status_code = status_code
def json(self) -> dict:
return self._payload
def raise_for_status(self) -> None:
if self.status_code >= 400:
raise RuntimeError(f"HTTP {self.status_code}")
class FakeHttp:
"""Records each POST/GET; GETs replay staged snapshots (default: no projects,
so ``dispatch`` creates one). POST bodies are validated as real commands."""
def __init__(self, get_responses: list[dict] | None = None) -> None:
self.get_responses = list(get_responses or [])
self.posts: list[dict] = []
self.gets: list[dict] = []
def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
assert isinstance(json.get("type"), str) and json["type"], (
f"command must carry a non-empty `type` discriminator, got {json!r}"
)
self.posts.append({"url": url, "json": json, "headers": headers})
return FakeResponse({"sequence": len(self.posts)}) # the real server reply
def get(self, url: str, headers: dict) -> FakeResponse:
self.gets.append({"url": url, "headers": headers})
body = self.get_responses.pop(0) if self.get_responses else {"projects": []}
return FakeResponse(body)
# Convenience views over recorded POSTs, keyed by command type.
def commands(self, type_: str) -> list[dict]:
return [c["json"] for c in self.posts if c["json"]["type"] == type_]
def _ids():
"""Deterministic id factory: id-1, id-2, … so tests can reason about minting."""
n = {"i": 0}
def f() -> str:
n["i"] += 1
return f"id-{n['i']}"
return f
def _resolver(repo: str) -> t3_client.ProjectRef:
"""Predictable repo -> project mapping for assertions."""
return t3_client.ProjectRef(f"proj-{repo}", f"/data/{repo}", repo)
def _client(http: FakeHttp, *, base_url="http://t3-afk:8080", token="tok-1", **kw):
return t3_client.T3Client(
base_url=base_url,
http=http,
bearer_provider=lambda: token,
project_resolver=_resolver,
id_factory=kw.pop("id_factory", _ids()),
clock=kw.pop("clock", lambda: "2026-06-15T00:00:00+00:00"),
model=_MODEL,
)
def _dispatch(http: FakeHttp, *, repo="infra", issue=42, prompt="Do the thing.", **kw):
return _client(http, **kw).dispatch(repo=repo, issue=issue, prompt=prompt)
# --------------------------------------------------------------------------- #
# dispatch — ensure-project, then create, then turn.
# --------------------------------------------------------------------------- #
def test_dispatch_ensures_project_then_creates_thread_then_turn_when_project_absent():
http = FakeHttp(get_responses=[{"projects": []}])
_dispatch(http)
# one snapshot GET (the existence check) + three POSTs in order.
assert len(http.gets) == 1
types = [c["json"]["type"] for c in http.posts]
assert types == ["project.create", "thread.create", "thread.turn.start"]
for call in http.posts:
assert call["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
def test_dispatch_skips_project_create_when_project_already_exists():
http = FakeHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
_dispatch(http, repo="infra")
types = [c["json"]["type"] for c in http.posts]
assert types == ["thread.create", "thread.turn.start"] # idempotent: no re-create
def test_dispatch_uses_type_discriminator_not_command_string():
# Regression guard for the original bug: discriminator is `type`, and there is
# no legacy top-level `command` string key on any command.
http = FakeHttp()
_dispatch(http)
for c in http.posts:
assert "type" in c["json"]
assert not isinstance(c["json"].get("command"), str)
# --------------------------------------------------------------------------- #
# dispatch — thread.create real field set.
# --------------------------------------------------------------------------- #
def test_thread_create_carries_real_required_fields():
http = FakeHttp()
_dispatch(http, repo="infra")
create = http.commands("thread.create")[0]
assert create["projectId"] == "proj-infra"
assert create["modelSelection"] == {"instanceId": "claudeAgent", "model": _MODEL}
assert create["runtimeMode"] == "full-access"
assert create["interactionMode"] == "default"
# NullOr fields are present (not omitted) — the schema requires the keys.
assert create["branch"] is None
assert create["worktreePath"] is None
# client-minted identity + timestamp.
assert isinstance(create["commandId"], str) and create["commandId"]
assert isinstance(create["threadId"], str) and create["threadId"]
assert create["createdAt"] == "2026-06-15T00:00:00+00:00"
def test_dispatch_returns_client_minted_thread_id_not_a_server_value():
http = FakeHttp()
returned = _dispatch(http)
create = http.commands("thread.create")[0]
turn = http.commands("thread.turn.start")[0]
# The returned id is the one WE put on thread.create (server only sends {sequence}).
assert returned == create["threadId"] == turn["threadId"]
# --------------------------------------------------------------------------- #
# dispatch — thread.turn.start real message shape + preamble.
# --------------------------------------------------------------------------- #
def test_turn_message_has_real_shape_and_prepends_preamble():
http = FakeHttp()
_dispatch(http, prompt="Implement issue 42 body here.")
turn = http.commands("thread.turn.start")[0]
msg = turn["message"]
assert msg["role"] == "user"
assert isinstance(msg["messageId"], str) and msg["messageId"]
assert msg["attachments"] == []
assert msg["text"] == ISSUE_IMPLEMENTER_PREAMBLE + "Implement issue 42 body here."
assert turn["runtimeMode"] == "full-access"
assert turn["interactionMode"] == "default"
def test_preamble_only_on_turn_not_on_create():
http = FakeHttp()
_dispatch(http)
assert "message" not in http.commands("thread.create")[0]
# --------------------------------------------------------------------------- #
# send_turn — follow-up turn on an existing thread (multi-turn), no preamble.
# --------------------------------------------------------------------------- #
def test_send_turn_posts_single_turn_to_existing_thread_without_preamble():
http = FakeHttp()
_client(http).send_turn("thread-xyz", "Just this follow-up.")
assert [c["json"]["type"] for c in http.posts] == ["thread.turn.start"]
turn = http.commands("thread.turn.start")[0]
assert turn["threadId"] == "thread-xyz"
assert turn["message"]["text"] == "Just this follow-up." # verbatim, no preamble
assert http.gets == [] # no project work for a follow-up
# --------------------------------------------------------------------------- #
# Auth — bearer on every request, re-read per call.
# --------------------------------------------------------------------------- #
def test_every_request_sends_bearer():
http = FakeHttp()
_dispatch(http, token="secret-token")
for call in http.posts:
assert call["headers"]["Authorization"] == "Bearer secret-token"
for call in http.gets:
assert call["headers"]["Authorization"] == "Bearer secret-token"
def test_bearer_is_reread_per_request_so_rotation_is_honoured():
tokens = iter(["tok-A", "tok-B", "tok-C", "tok-D", "tok-E"])
http = FakeHttp()
client = t3_client.T3Client(
base_url="http://t3-afk:8080",
http=http,
bearer_provider=lambda: next(tokens),
project_resolver=_resolver,
id_factory=_ids(),
clock=lambda: "t",
)
client.dispatch(repo="infra", issue=1, prompt="x")
# GET(ensure) then POST(project.create) then POST(create) then POST(turn) —
# each pulled a fresh token in call order.
assert http.gets[0]["headers"]["Authorization"] == "Bearer tok-A"
assert http.posts[0]["headers"]["Authorization"] == "Bearer tok-B"
assert http.posts[1]["headers"]["Authorization"] == "Bearer tok-C"
assert http.posts[2]["headers"]["Authorization"] == "Bearer tok-D"
# --------------------------------------------------------------------------- #
# snapshot — GET + parse.
# --------------------------------------------------------------------------- #
def test_snapshot_gets_endpoint_and_returns_parsed_body():
fleet = {"threads": [{"id": "t1", "latestTurn": {"state": "running"}}], "projects": []}
http = FakeHttp(get_responses=[fleet])
result = _client(http).snapshot()
assert result == fleet
assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
assert http.posts == []
# --------------------------------------------------------------------------- #
# base_url normalisation + error surfacing.
# --------------------------------------------------------------------------- #
def test_trailing_slash_in_base_url_is_normalised():
http = FakeHttp()
client = _client(http, base_url="http://t3-afk:8080/")
client.dispatch(repo="infra", issue=1, prompt="x")
assert http.posts[0]["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
def test_dispatch_raises_and_short_circuits_when_a_post_errors():
class ErroringHttp(FakeHttp):
def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
super().post(url, json, headers) # validates + records
return FakeResponse({}, status_code=500)
http = ErroringHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
with pytest.raises(RuntimeError):
_dispatch(http, repo="infra")
# Project already existed, so the FIRST post is thread.create — and it failed,
# so thread.turn.start never fired.
assert [c["json"]["type"] for c in http.posts] == ["thread.create"]

92
tests/test_afk_t3_live.py Normal file
View file

@ -0,0 +1,92 @@
"""LIVE smoke test for ``app.afk.t3_client`` against a real T3 instance.
Skipped by default. The unit tests (``test_afk_t3_client``) pin the wire shape
against a contract-accurate fake; this file proves the *same code* actually talks
to a live T3 the guard that "green tests" mean "wired to T3", which the earlier
fake-only suite did NOT provide (it was green while the real server 400'd).
It is opt-in because the orchestration API is in-cluster (ClusterIP + an
Authentik-gated ingress), so it can't run in CI without cluster access. Run it
from inside the cluster, or via a port-forward, with a bearer minted on the pod::
# bearer (on the t3-afk pod, as the node user):
# t3 auth session issue --token-only --base-dir /data/t3 --ttl 30m
kubectl -n t3-afk port-forward deploy/t3-afk 3773:3773 &
T3_AFK_BASE_URL=http://127.0.0.1:3773 T3_AFK_TOKEN=<bearer> \
python3 -m pytest tests/test_afk_t3_live.py -v
The read-only snapshot check is always safe. The full dispatch round-trip
(create thread + turn + verify it appears, then delete it) only runs with
``T3_AFK_SMOKE_DISPATCH=1`` since it spends a (tiny) agent turn.
"""
import os
import time
import pytest
from app.afk import t3_client
_BASE_URL = os.environ.get("T3_AFK_BASE_URL")
_TOKEN = os.environ.get("T3_AFK_TOKEN")
pytestmark = pytest.mark.skipif(
not (_BASE_URL and _TOKEN),
reason="set T3_AFK_BASE_URL + T3_AFK_TOKEN to run the live T3 smoke test",
)
def _real_client():
import httpx # local import so the module imports fine without httpx installed
return t3_client.T3Client(
base_url=_BASE_URL,
http=httpx.Client(timeout=30.0),
bearer_provider=lambda: _TOKEN,
)
def test_live_snapshot_has_the_real_shape():
"""A real snapshot parses and carries the keys the watcher/adapter depend on:
``threads`` + ``projects``, and any thread exposes ``latestTurn`` (the
liveness source) not a top-level ``status``."""
snap = _real_client().snapshot()
assert isinstance(snap, dict)
assert "threads" in snap and "projects" in snap
for thread in snap["threads"]:
assert "id" in thread
# liveness lives under latestTurn.state (the contract this suite guards)
assert "status" not in thread, "real threads have no top-level status field"
@pytest.mark.skipif(
os.environ.get("T3_AFK_SMOKE_DISPATCH") != "1",
reason="set T3_AFK_SMOKE_DISPATCH=1 to run the dispatch round-trip (spends a turn)",
)
def test_live_dispatch_round_trip_then_cleanup():
"""End-to-end against the real server: ``dispatch`` (ensure-project + create +
turn) succeeds and the new thread shows up in the snapshot. Cleans up the
thread it created so the cockpit isn't littered."""
import httpx
repo = "afk-smoke/roundtrip"
client = _real_client()
thread_id = client.dispatch(repo, 1, "Reply with just: ok. Do not use any tools.")
assert isinstance(thread_id, str) and thread_id
# The thread must appear in the fleet read-model (poll briefly — dispatch is
# accepted asynchronously).
found = False
for _ in range(10):
if any(t.get("id") == thread_id for t in client.snapshot().get("threads", [])):
found = True
break
time.sleep(1.0)
assert found, f"dispatched thread {thread_id} never appeared in the snapshot"
# Cleanup: delete the throwaway thread (raw command — not part of the adapter).
httpx.post(
f"{_BASE_URL.rstrip('/')}/api/orchestration/dispatch",
headers={"Authorization": f"Bearer {_TOKEN}"},
json={"type": "thread.delete", "commandId": t3_client._uuid(), "threadId": thread_id},
timeout=30.0,
).raise_for_status()

493
tests/test_afk_tracker.py Normal file
View file

@ -0,0 +1,493 @@
"""Tests for ``app.afk.tracker`` — the GitHub issues adapter.
The ``Tracker`` is the loop's read/write port onto the issue tracker. It wraps
an injected GitHub client (the real one shells out to ``gh``; here we inject a
FAKE that records calls and replays staged data) and holds all the *business*
logic the loop depends on: turning raw issues into ``Issue`` records with
``blocked_by`` parsed, ``labeled_by_trusted`` decided fail-closed from the label
event actor, and ``priority`` read off a priority label. No test here reaches a
real ``gh``, GitHub/Forgejo, or the network.
"""
import pytest
from app.afk.tracker import (
DEFAULT_TRUSTED_ASSOCIATIONS,
GitHubClient,
Tracker,
)
from app.afk.types import Issue
# --------------------------------------------------------------------------- #
# Fake GitHub client — the injected port. Records every mutating call and
# replays issues / label-events staged per repo. Implements the GitHubClient
# Protocol the Tracker depends on.
# --------------------------------------------------------------------------- #
class FakeGitHub:
def __init__(self) -> None:
# repo -> list of raw issue dicts (gh issue list --json shape)
self._issues: dict[str, list[dict]] = {}
# (repo, number) -> list of label-event dicts (who added which label)
self._events: dict[tuple[str, int], list[dict]] = {}
# recorded mutations
self.labels_added: list[tuple[str, int, str]] = []
self.labels_removed: list[tuple[str, int, str]] = []
self.comments: list[tuple[str, int, str]] = []
self.closed: list[tuple[str, int]] = []
# --- staging helpers (test-only) --- #
def seed_issues(self, repo: str, issues: list[dict]) -> None:
self._issues[repo] = issues
def seed_label_events(self, repo: str, number: int, events: list[dict]) -> None:
self._events[(repo, number)] = events
# --- GitHubClient surface --- #
def list_issues(self, repo: str, label: str) -> list[dict]:
return [
issue
for issue in self._issues.get(repo, [])
if label in [lbl["name"] for lbl in issue.get("labels", [])]
]
def label_events(self, repo: str, number: int) -> list[dict]:
return list(self._events.get((repo, number), []))
def add_label(self, repo: str, number: int, label: str) -> None:
self.labels_added.append((repo, number, label))
def remove_label(self, repo: str, number: int, label: str) -> None:
self.labels_removed.append((repo, number, label))
def comment(self, repo: str, number: int, body: str) -> None:
self.comments.append((repo, number, body))
def close(self, repo: str, number: int) -> None:
self.closed.append((repo, number))
# --------------------------------------------------------------------------- #
# Raw-issue / event builders matching the gh JSON shapes the real client emits.
# --------------------------------------------------------------------------- #
def _raw_issue(
number: int = 1,
labels: list[str] | None = None,
body: str = "",
) -> dict:
return {
"number": number,
"labels": [{"name": name} for name in (labels or ["ready-for-agent"])],
"body": body,
}
def _label_event(label: str, association: str = "OWNER", actor: str = "viktorbarzin") -> dict:
# Mirrors the `gh api .../timeline` "labeled" event shape we care about.
return {
"event": "labeled",
"label": {"name": label},
"actor": {"login": actor},
"author_association": association,
}
@pytest.fixture
def gh() -> FakeGitHub:
return FakeGitHub()
@pytest.fixture
def tracker(gh: FakeGitHub) -> Tracker:
return Tracker(gh)
# --------------------------------------------------------------------------- #
# Construction / contract.
# --------------------------------------------------------------------------- #
def test_tracker_wraps_injected_client(gh: FakeGitHub):
t = Tracker(gh)
assert t.client is gh
def test_fake_satisfies_protocol(gh: FakeGitHub):
# The fake must be usable where a GitHubClient is expected (structural typing).
assert isinstance(gh, GitHubClient)
def test_default_trusted_associations_are_collaborator_or_above():
assert DEFAULT_TRUSTED_ASSOCIATIONS == frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
# --------------------------------------------------------------------------- #
# list_ready — the read path that builds Issue records.
# --------------------------------------------------------------------------- #
def test_list_ready_returns_issue_objects(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=7)])
gh.seed_label_events("infra", 7, [_label_event("ready-for-agent")])
issues = tracker.list_ready(["infra"])
assert len(issues) == 1
issue = issues[0]
assert isinstance(issue, Issue)
assert issue.number == 7
assert issue.repo == "infra"
assert issue.labels == ["ready-for-agent"]
def test_list_ready_spans_multiple_repos(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_issues("crawler", [_raw_issue(number=2)])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
gh.seed_label_events("crawler", 2, [_label_event("ready-for-agent")])
issues = tracker.list_ready(["infra", "crawler"])
assert {(i.repo, i.number) for i in issues} == {("infra", 1), ("crawler", 2)}
def test_list_ready_empty_when_no_ready_issues(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=1, labels=["bug"])])
assert tracker.list_ready(["infra"]) == []
def test_list_ready_queries_with_configured_ready_label(gh: FakeGitHub):
# A Tracker built with a custom ready label must query the client for *that*
# label, not the default.
seen: dict[str, str] = {}
class _RecordingGitHub(FakeGitHub):
def list_issues(self, repo: str, label: str) -> list[dict]:
seen["label"] = label
return super().list_issues(repo, label)
rec = _RecordingGitHub()
rec.seed_issues("infra", [_raw_issue(number=1, labels=["queue-me"])])
rec.seed_label_events("infra", 1, [_label_event("queue-me")])
t = Tracker(rec, ready_label="queue-me")
issues = t.list_ready(["infra"])
assert seen["label"] == "queue-me"
assert len(issues) == 1
# --------------------------------------------------------------------------- #
# Trust gate — labeled_by_trusted is decided from the label-event actor,
# fail-closed.
# --------------------------------------------------------------------------- #
def test_owner_labeled_issue_is_trusted(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association="OWNER")])
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
@pytest.mark.parametrize("association", ["MEMBER", "COLLABORATOR"])
def test_collaborator_and_member_are_trusted(gh: FakeGitHub, tracker: Tracker, association: str):
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association=association)])
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
@pytest.mark.parametrize("association", ["NONE", "CONTRIBUTOR", "FIRST_TIME_CONTRIBUTOR", ""])
def test_untrusted_association_is_not_trusted(gh: FakeGitHub, tracker: Tracker, association: str):
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association=association)])
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
def test_missing_label_event_is_not_trusted(gh: FakeGitHub, tracker: Tracker):
# The issue carries the ready label, but no event records WHO applied it —
# fail closed: an unattributable label is never trusted.
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events("infra", 1, [])
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
def test_trust_uses_latest_application_of_ready_label(gh: FakeGitHub, tracker: Tracker):
# If the ready label was removed and re-added, the MOST RECENT application
# decides trust — a trusted re-label after an untrusted one is trusted.
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events(
"infra",
1,
[
_label_event("ready-for-agent", association="NONE", actor="drive-by"),
_label_event("ready-for-agent", association="OWNER", actor="viktorbarzin"),
],
)
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
def test_trust_ignores_events_for_other_labels(gh: FakeGitHub, tracker: Tracker):
# A trusted actor labeling something else must not make the ready label trusted.
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events(
"infra",
1,
[
_label_event("priority:high", association="OWNER"),
_label_event("ready-for-agent", association="NONE", actor="drive-by"),
],
)
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
def test_custom_trusted_associations_override_default(gh: FakeGitHub):
# Tighten the trust set to OWNER only: a COLLABORATOR label is no longer trusted.
t = Tracker(gh, trusted_associations=frozenset({"OWNER"}))
gh.seed_issues("infra", [_raw_issue(number=1)])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association="COLLABORATOR")])
assert t.list_ready(["infra"])[0].labeled_by_trusted is False
# --------------------------------------------------------------------------- #
# blocked_by — parsed from the issue body's "Blocked by" references.
# --------------------------------------------------------------------------- #
def test_blocked_by_empty_when_body_has_no_references(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=1, body="just implement the thing")])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].blocked_by == []
def test_blocked_by_parses_single_reference(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=5, body="Blocked by #3")])
gh.seed_label_events("infra", 5, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].blocked_by == [3]
def test_blocked_by_parses_multiple_references(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=9, body="Blocked by #3, #4 and #10")])
gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].blocked_by == [3, 4, 10]
def test_blocked_by_is_case_insensitive_and_dedupes(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=9, body="blocked BY #3 and Blocked by #3, #4")])
gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].blocked_by == [3, 4]
def test_blocked_by_ignores_plain_issue_mentions(gh: FakeGitHub, tracker: Tracker):
# A bare "#7" that is not part of a "Blocked by" clause is NOT a blocker.
gh.seed_issues("infra", [_raw_issue(number=9, body="See #7 for context. Blocked by #3")])
gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].blocked_by == [3]
def test_blocked_by_tolerates_missing_body(gh: FakeGitHub, tracker: Tracker):
issue = _raw_issue(number=1)
issue["body"] = None # gh returns null for an empty body
gh.seed_issues("infra", [issue])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].blocked_by == []
# --------------------------------------------------------------------------- #
# priority — read off a priority label (lower number runs first).
# --------------------------------------------------------------------------- #
def test_priority_defaults_to_zero_without_priority_label(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=1, labels=["ready-for-agent"])])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].priority == 0
def test_priority_read_from_priority_label(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues("infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:2"])])
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].priority == 2
def test_priority_lowest_label_wins_when_several(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues(
"infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:5", "priority:1"])]
)
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].priority == 1
def test_priority_ignores_non_numeric_priority_label(gh: FakeGitHub, tracker: Tracker):
gh.seed_issues(
"infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:high"])]
)
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
assert tracker.list_ready(["infra"])[0].priority == 0
# --------------------------------------------------------------------------- #
# Mutations delegate to the injected client.
# --------------------------------------------------------------------------- #
def test_add_label_delegates(gh: FakeGitHub, tracker: Tracker):
tracker.add_label("infra", 7, "agent-in-progress")
assert gh.labels_added == [("infra", 7, "agent-in-progress")]
def test_remove_label_delegates(gh: FakeGitHub, tracker: Tracker):
tracker.remove_label("infra", 7, "agent-in-progress")
assert gh.labels_removed == [("infra", 7, "agent-in-progress")]
def test_comment_delegates(gh: FakeGitHub, tracker: Tracker):
tracker.comment("infra", 7, "phase: tests-red done")
assert gh.comments == [("infra", 7, "phase: tests-red done")]
def test_close_delegates(gh: FakeGitHub, tracker: Tracker):
tracker.close("infra", 7)
assert gh.closed == [("infra", 7)]
# --------------------------------------------------------------------------- #
# The concrete gh-CLI-backed client builds no-shell argv and parses JSON; we
# inject a fake runner so no real `gh` is ever spawned.
# --------------------------------------------------------------------------- #
from app.afk.tracker import GhCliClient # noqa: E402
class _FakeRunner:
"""Stand-in for the subprocess runner GhCliClient shells out through.
Records every argv and returns staged stdout per command, so we can pin the
exact `gh` invocations without spawning a process.
"""
def __init__(self, responses: dict[tuple[str, ...], str] | None = None) -> None:
self.calls: list[tuple[str, ...]] = []
self._responses = responses or {}
def __call__(self, argv: list[str]) -> str:
key = tuple(argv)
self.calls.append(key)
return self._responses.get(key, "")
def test_gh_cli_list_issues_builds_no_shell_argv_and_parses_json():
argv = (
"gh", "issue", "list", "--repo", "owner/infra",
"--label", "ready-for-agent", "--state", "open",
"--json", "number,labels,body", "--limit", "100",
)
runner = _FakeRunner({argv: '[{"number": 4, "labels": [{"name": "ready-for-agent"}], "body": "x"}]'})
client = GhCliClient(repo_owner="owner", run=runner)
issues = client.list_issues("infra", "ready-for-agent")
assert runner.calls == [argv]
assert issues == [{"number": 4, "labels": [{"name": "ready-for-agent"}], "body": "x"}]
def test_gh_cli_list_issues_empty_output_is_empty_list():
runner = _FakeRunner() # returns "" for everything
client = GhCliClient(repo_owner="owner", run=runner)
assert client.list_issues("infra", "ready-for-agent") == []
def test_gh_cli_label_events_filters_labeled_events():
timeline = (
'[{"event": "commented"},'
' {"event": "labeled", "label": {"name": "ready-for-agent"},'
' "actor": {"login": "viktorbarzin"}, "author_association": "OWNER"}]'
)
argv = (
"gh", "api",
"repos/owner/infra/issues/4/timeline",
"--paginate",
"-H", "Accept: application/vnd.github+json",
)
runner = _FakeRunner({argv: timeline})
client = GhCliClient(repo_owner="owner", run=runner)
events = client.label_events("infra", 4)
assert runner.calls == [argv]
assert [e["event"] for e in events] == ["labeled"]
assert events[0]["label"]["name"] == "ready-for-agent"
def test_gh_cli_add_label_builds_argv():
runner = _FakeRunner()
client = GhCliClient(repo_owner="owner", run=runner)
client.add_label("infra", 4, "agent-in-progress")
assert runner.calls == [
("gh", "issue", "edit", "4", "--repo", "owner/infra", "--add-label", "agent-in-progress")
]
def test_gh_cli_remove_label_builds_argv():
runner = _FakeRunner()
client = GhCliClient(repo_owner="owner", run=runner)
client.remove_label("infra", 4, "agent-in-progress")
assert runner.calls == [
("gh", "issue", "edit", "4", "--repo", "owner/infra", "--remove-label", "agent-in-progress")
]
def test_gh_cli_comment_builds_argv():
runner = _FakeRunner()
client = GhCliClient(repo_owner="owner", run=runner)
client.comment("infra", 4, "phase update")
assert runner.calls == [
("gh", "issue", "comment", "4", "--repo", "owner/infra", "--body", "phase update")
]
def test_gh_cli_close_builds_argv():
runner = _FakeRunner()
client = GhCliClient(repo_owner="owner", run=runner)
client.close("infra", 4)
assert runner.calls == [
("gh", "issue", "close", "4", "--repo", "owner/infra")
]
def test_gh_cli_end_to_end_through_tracker():
# Wire the gh-CLI client (fake runner) behind a real Tracker and confirm a
# full read produces a correctly-decoded, trusted, blocked Issue.
list_argv = (
"gh", "issue", "list", "--repo", "owner/infra",
"--label", "ready-for-agent", "--state", "open",
"--json", "number,labels,body", "--limit", "100",
)
timeline_argv = (
"gh", "api",
"repos/owner/infra/issues/12/timeline",
"--paginate",
"-H", "Accept: application/vnd.github+json",
)
runner = _FakeRunner({
list_argv: (
'[{"number": 12,'
' "labels": [{"name": "ready-for-agent"}, {"name": "priority:3"}],'
' "body": "Blocked by #11"}]'
),
timeline_argv: (
'[{"event": "labeled", "label": {"name": "ready-for-agent"},'
' "actor": {"login": "viktorbarzin"}, "author_association": "OWNER"}]'
),
})
tracker = Tracker(GhCliClient(repo_owner="owner", run=runner))
issue = tracker.list_ready(["infra"])[0]
assert issue.number == 12
assert issue.repo == "infra"
assert issue.blocked_by == [11]
assert issue.priority == 3
assert issue.labeled_by_trusted is True

403
tests/test_afk_watcher.py Normal file
View file

@ -0,0 +1,403 @@
"""Integration tests for ``app.afk.watcher`` — the in-flight run driver.
These wire the REAL pure cores (the actual ``run_state_machine.next_action`` and
``phase_checklist.render``) to the in-memory adapter FAKES from ``conftest``
(``FakeT3Client`` / ``FakeTracker`` / ``FakeCIWatcher`` / ``FakeNotifier``). No
test touches a real T3 server, GitHub/Forgejo, the cluster, or Slack the
watcher is exercised end to end with fakes only at the I/O edges.
What one watch tick must do (the watcher contract), given an in-flight run
``(issue, thread_id, commit, bookkeeping)``:
* assemble a ``RunState`` from ``t3_client.snapshot()`` (the thread's liveness)
+ ``ci_watcher.status(repo, commit)`` (the CI verdict, only when something is
pushed) + the run's own ``pushed`` / ``fix_forward_attempts`` /
``elapsed_seconds`` bookkeeping, and feed it to the pure state machine;
* **CLOSE_SUCCESS** ``tracker.close``, drop the in-progress label, post the
DONE checklist, and ring the ``done`` doorbell;
* **ESCALATE_PREPUSH / FREEZE_ESCALATE** drop the in-progress label, relabel
``ready-for-human``, ring the ``needs-human`` / ``frozen`` doorbell, post the
checklist the run is handed back to a human;
* **FIX_FORWARD** dispatch a corrective turn (``t3_client.dispatch``), bump
the fix-forward attempt count, keep the run in flight, refresh the checklist;
NOT terminal, so no doorbell and no label churn;
* **WAIT** just refresh the progress checklist and keep waiting; no labels,
no close, no doorbell, no dispatch.
"""
import pytest
from app.afk import watcher
from app.afk.notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
from app.afk.types import CIStatus, Issue
# --------------------------------------------------------------------------- #
# Helpers.
# --------------------------------------------------------------------------- #
READY_FOR_HUMAN = "ready-for-human"
def _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier) -> watcher.Watcher:
return watcher.Watcher(
t3_client=fake_t3,
tracker=fake_tracker,
ci_watcher=fake_ci,
notifier=fake_notifier,
)
def _run(
issue: Issue,
thread_id: str = "thread-0",
commit: str | None = None,
fix_forward_attempts: int = 0,
elapsed_seconds: float = 0.0,
) -> watcher.InFlightRun:
return watcher.InFlightRun(
issue=issue,
thread_id=thread_id,
commit=commit,
fix_forward_attempts=fix_forward_attempts,
elapsed_seconds=elapsed_seconds,
)
# Map the tests' abstract liveness vocab to T3's REAL ``latestTurn.state`` strings
# so call sites stay readable while the snapshot carries the true shape the
# watcher parses (a finished turn is "completed", a failed one "errored",
# "running" is itself real). Unknown values pass through verbatim.
_REAL_STATE = {"idle": "completed", "error": "errored"}
def _snapshot(thread_id: str, status: str) -> dict:
"""A fleet snapshot with one thread whose latest turn is in ``status`` — real
shape ``threads[].latestTurn.state`` (not a top-level ``status`` field)."""
return {
"threads": [
{"id": thread_id, "latestTurn": {"state": _REAL_STATE.get(status, status)}}
]
}
def _labels(fake_tracker):
return [(op, repo, num, lbl) for (op, repo, num, lbl) in fake_tracker.label_ops]
def _kinds(fake_notifier):
return [n["kind"] for n in fake_notifier.sent]
# --------------------------------------------------------------------------- #
# WAIT — agent still working, nothing pushed: refresh the checklist, no action.
# --------------------------------------------------------------------------- #
def test_wait_refreshes_checklist_and_does_nothing_else(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "running"))
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue), make_config()
)
assert result.action.value == "wait"
assert result.terminal is False
assert fake_tracker.closed == []
assert _labels(fake_tracker) == [] # no label churn while waiting
assert fake_notifier.sent == [] # no doorbell
assert fake_t3.dispatched == [] # no corrective turn
# The progress checklist was posted as a comment.
assert len(fake_tracker.comments) == 1
repo, num, body = fake_tracker.comments[0]
assert (repo, num) == ("infra", 7)
assert "AFK run progress" in body
def test_wait_when_thread_missing_from_snapshot(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# No snapshot entry for this thread yet -> thread_status None -> WAIT.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": []})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue), make_config()
)
assert result.action.value == "wait"
assert result.terminal is False
def test_pushed_ci_pending_waits(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "running"))
# commit present (pushed) but CI not yet decided -> PENDING -> WAIT.
fake_ci.set_status("infra", "deadbeef", CIStatus.PENDING)
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="deadbeef"), make_config()
)
assert result.action.value == "wait"
assert fake_tracker.closed == []
# --------------------------------------------------------------------------- #
# CLOSE_SUCCESS — pushed + CI green: close, unlabel, DONE checklist, doorbell.
# --------------------------------------------------------------------------- #
def test_close_success_closes_and_unlabels_and_notifies(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="cafef00d"), make_config()
)
assert result.action.value == "close_success"
assert result.terminal is True
assert fake_tracker.closed == [("infra", 7)]
# in-progress label removed (no ready-for-human on the happy path).
assert ("remove", "infra", 7, "agent-in-progress") in _labels(fake_tracker)
assert ("add", "infra", 7, READY_FOR_HUMAN) not in _labels(fake_tracker)
# done doorbell fired with the thread deep-link target.
assert _kinds(fake_notifier) == [KIND_DONE]
assert fake_notifier.sent[0]["thread_id"] == "thread-0"
assert fake_notifier.sent[0]["issue"] is issue
def test_close_success_posts_done_checklist(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
_watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="cafef00d"), make_config()
)
# The final checklist shows the run DONE — every phase checked.
body = fake_tracker.comments[-1][2]
assert "Done — issue closed" in body
assert "- [ ]" not in body # nothing left unchecked at DONE
# --------------------------------------------------------------------------- #
# ESCALATE_PREPUSH — agent stalled/errored before any push: hand to a human.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("thread_state", ["errored", "completed"])
def test_escalate_prepush_relabels_and_notifies(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, thread_state
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", thread_state))
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "escalate_prepush"
assert result.terminal is True
assert fake_tracker.closed == [] # NOT closed — needs a human
labels = _labels(fake_tracker)
assert ("remove", "infra", 7, "agent-in-progress") in labels
assert ("add", "infra", 7, READY_FOR_HUMAN) in labels
assert _kinds(fake_notifier) == [KIND_NEEDS_HUMAN]
# --------------------------------------------------------------------------- #
# FREEZE_ESCALATE — pushed, CI red, fix-forward budget exhausted: freeze + page.
# --------------------------------------------------------------------------- #
def test_freeze_escalate_relabels_and_notifies(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
config = make_config(fix_forward_max_attempts=3)
# attempts already at the cap -> budget exhausted -> FREEZE_ESCALATE.
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="badc0de", fix_forward_attempts=3), config
)
assert result.action.value == "freeze_escalate"
assert result.terminal is True
assert fake_tracker.closed == []
labels = _labels(fake_tracker)
assert ("remove", "infra", 7, "agent-in-progress") in labels
assert ("add", "infra", 7, READY_FOR_HUMAN) in labels
assert _kinds(fake_notifier) == [KIND_FROZEN]
# --------------------------------------------------------------------------- #
# FIX_FORWARD — pushed, CI red, budget remaining: corrective turn, stay in flight.
# --------------------------------------------------------------------------- #
def test_fix_forward_dispatches_corrective_turn(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
config = make_config(fix_forward_max_attempts=5)
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="badc0de", fix_forward_attempts=1), config
)
assert result.action.value == "fix_forward"
assert result.terminal is False
# A corrective turn was dispatched against the same repo/issue.
assert len(fake_t3.dispatched) == 1
assert (fake_t3.dispatched[0]["repo"], fake_t3.dispatched[0]["issue"]) == ("infra", 7)
# Attempt count advanced and is surfaced on the result for the caller's
# bookkeeping on the next tick.
assert result.fix_forward_attempts == 2
# Not terminal: no close, no ready-for-human, no doorbell.
assert fake_tracker.closed == []
assert ("add", "infra", 7, READY_FOR_HUMAN) not in _labels(fake_tracker)
assert fake_notifier.sent == []
def test_fix_forward_updates_thread_id_to_corrective_turn(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# The corrective dispatch spawns a new thread; the result carries the new id
# so the next tick polls the right thread.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, thread_id="thread-old", commit="badc0de"), make_config()
)
assert result.thread_id == "thread-0" # FakeT3Client hands back thread-0
assert result.thread_id != "thread-old"
def test_fix_forward_note_appears_in_checklist(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
_watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="badc0de", fix_forward_attempts=1), make_config()
)
body = fake_tracker.comments[-1][2]
assert "Fix-forward" in body
# --------------------------------------------------------------------------- #
# Unknown / unrecognised thread status folds to "keep waiting" (fail-safe).
# --------------------------------------------------------------------------- #
def test_unknown_thread_status_waits(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "provisioning")) # not a known status
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
# Unknown status must not escalate or close — treat as "no status yet".
assert result.action.value == "wait"
assert fake_tracker.closed == []
assert fake_notifier.sent == []
# --------------------------------------------------------------------------- #
# Real T3 ``latestTurn.state`` strings map to the right liveness (contract guard
# against the snapshot-shape drift that the previous adapter/fake masked).
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("state", ["running", "in_progress", "pending", "queued", "pendingInit"])
def test_real_in_progress_states_keep_waiting(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, state
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": state}}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "wait" # still working -> keep polling
def test_real_errored_state_escalates_when_nothing_pushed(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# The real failure state is "errored" (not "error"); with nothing pushed it
# is a pre-push escalation, not a freeze.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": "errored"}}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "escalate_prepush"
def test_thread_present_but_no_turn_yet_waits(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# A freshly-created thread has no latestTurn -> no usable status yet -> WAIT.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0"}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "wait"
# --------------------------------------------------------------------------- #
# Terminal cleanup only happens once / cleanly: a terminal tick posts exactly
# one checklist comment (no double-commenting on the way out).
# --------------------------------------------------------------------------- #
def test_terminal_tick_posts_exactly_one_checklist(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
_watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit="cafef00d"), make_config()
)
assert len(fake_tracker.comments) == 1
# --------------------------------------------------------------------------- #
# CI status is only queried when something is pushed (don't hit CI for an
# unpushed run — there's no commit to check).
# --------------------------------------------------------------------------- #
def test_ci_not_queried_when_nothing_pushed(
fake_t3, fake_tracker, fake_notifier, make_issue, make_config
):
class ExplodingCI:
def status(self, repo, commit):
raise AssertionError("CI must not be queried with no pushed commit")
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "running"))
result = watcher.Watcher(
t3_client=fake_t3,
tracker=fake_tracker,
ci_watcher=ExplodingCI(),
notifier=fake_notifier,
).tick(_run(issue, commit=None), make_config())
assert result.action.value == "wait"
# --------------------------------------------------------------------------- #
# ready-for-human label is configurable.
# --------------------------------------------------------------------------- #
def test_ready_for_human_label_is_configurable(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot(_snapshot("thread-0", "error"))
w = watcher.Watcher(
t3_client=fake_t3,
tracker=fake_tracker,
ci_watcher=fake_ci,
notifier=fake_notifier,
ready_for_human_label="needs-eyes",
)
w.tick(_run(issue, commit=None), make_config())
assert ("add", "infra", 7, "needs-eyes") in _labels(fake_tracker)

View file

@ -1,174 +1,251 @@
"""Tests for the breakglass app: verb whitelist, SSE translation, auth, routes."""
"""Tests for the breakglass app: session manager (attach model), verb whitelist,
SSE translation, auth, routes."""
import os
os.environ.setdefault("API_BEARER_TOKEN", "test-token")
# Turns chdir into a per-session workspace; point it somewhere writable for tests
# (prod uses the /workspace emptyDir). Must be set before the app imports config.
os.environ.setdefault("BREAKGLASS_SESSIONS_DIR", "/tmp/bg-test-sessions")
import pytest
from fastapi.testclient import TestClient
from app.breakglass import agent_session, pve
from app.breakglass import agent_session, pve, session as sessionmod
from app.breakglass.server import app
# --------------------------------------------------------------------------- #
# PVE verb whitelist — the security boundary mirrored client-side.
# Fakes for the claude subprocess a turn spawns.
# --------------------------------------------------------------------------- #
class _FakeStdout:
def __init__(self, lines):
self._lines = [(l + "\n").encode() for l in lines]
self._i = 0
def __aiter__(self):
return self
async def __anext__(self):
if self._i >= len(self._lines):
raise StopAsyncIteration
line = self._lines[self._i]
self._i += 1
return line
class _FakeStderr:
async def read(self):
return b""
class _FakeProc:
def __init__(self, lines, rc=0):
self.stdout = _FakeStdout(lines)
self.stderr = _FakeStderr()
self.returncode = None
self._rc = rc
async def wait(self):
self.returncode = self._rc
return self._rc
def kill(self):
self.returncode = -9
def _patch_proc(monkeypatch, lines, rc=0):
async def _fake_spawn(*argv, **kwargs):
return _FakeProc(lines, rc)
monkeypatch.setattr(sessionmod.asyncio, "create_subprocess_exec", _fake_spawn)
_TURN_LINES = [
'{"type":"system","subtype":"init","session_id":"s"}',
'{"type":"system","subtype":"thinking_tokens","estimated_tokens":5}',
'{"type":"assistant","message":{"content":[{"type":"text","text":"checking disk"}]}}',
'{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Bash","input":{"command":"df -h"}}]}}',
'{"type":"result","is_error":false,"result":"done","duration_ms":12}',
]
# --------------------------------------------------------------------------- #
# Session: event log + broadcast + replay/Last-Event-ID.
# --------------------------------------------------------------------------- #
def test_add_event_assigns_sequential_ids():
s = sessionmod.Session("s1")
a = s.add_event({"kind": "user", "text": "hi"})
b = s.add_event({"kind": "text", "text": "yo"})
assert a["id"] == 0 and b["id"] == 1
assert [e["kind"] for e in s.events] == ["user", "text"]
def test_subscribe_receives_broadcast():
s = sessionmod.Session("s1")
q = s.subscribe()
s.add_event({"kind": "text", "text": "live"})
assert q.get_nowait()["text"] == "live"
s.unsubscribe(q)
s.add_event({"kind": "text", "text": "after"})
assert q.empty()
@pytest.mark.asyncio
async def test_attach_replays_then_signals_caught_up():
s = sessionmod.Session("s1")
s.add_event({"kind": "user", "text": "diagnose"})
s.add_event({"kind": "text", "text": "looking"})
frames = []
async for frame in sessionmod.attach_stream(s, last_event_id=None):
frames.append(frame)
if "caught-up" in frame:
break
body = "".join(frames)
assert "diagnose" in body and "looking" in body
assert "id: 0" in body and "id: 1" in body
assert "event: caught-up" in frames[-1]
@pytest.mark.asyncio
async def test_attach_reconnect_replays_only_missed():
s = sessionmod.Session("s1")
for i in range(3):
s.add_event({"kind": "text", "text": f"e{i}"}) # ids 0,1,2
frames = []
async for frame in sessionmod.attach_stream(s, last_event_id=0): # already saw id 0
frames.append(frame)
if "caught-up" in frame:
break
body = "".join(frames)
assert "e0" not in body # not re-sent
assert "e1" in body and "e2" in body
# --------------------------------------------------------------------------- #
# Session: running a detached turn (mocked subprocess).
# --------------------------------------------------------------------------- #
@pytest.mark.asyncio
async def test_turn_streams_events_into_log(monkeypatch):
_patch_proc(monkeypatch, _TURN_LINES)
s = sessionmod.Session("s1")
assert s.start_turn("diagnose the devvm") is True
await s._turn # wait for the detached turn to finish
kinds = [e["kind"] for e in s.events]
assert kinds[0] == "user"
assert "session" in kinds and "text" in kinds and "tool" in kinds
assert "result" in kinds and kinds[-1] == "turn_end"
assert "thinking_tokens" not in kinds
@pytest.mark.asyncio
async def test_one_turn_at_a_time(monkeypatch):
_patch_proc(monkeypatch, _TURN_LINES)
s = sessionmod.Session("s1")
assert s.start_turn("first") is True
assert s.start_turn("second") is False # task not done yet
await s._turn
@pytest.mark.asyncio
async def test_resume_after_first_turn(monkeypatch):
captured = {"argvs": []}
async def _fake_spawn(*argv, **kwargs):
captured["argvs"].append(argv)
return _FakeProc(_TURN_LINES)
monkeypatch.setattr(sessionmod.asyncio, "create_subprocess_exec", _fake_spawn)
s = sessionmod.Session("s1")
s.start_turn("first"); await s._turn
s.start_turn("second"); await s._turn
assert "--session-id" in captured["argvs"][0]
assert "--resume" in captured["argvs"][1]
# --------------------------------------------------------------------------- #
# SessionManager.
# --------------------------------------------------------------------------- #
def test_manager_create_get():
m = sessionmod.SessionManager()
s = m.create()
assert m.get(s.id) is s
assert m.get("nope") is None
assert m.get_or_create(s.id) is s
assert m.get_or_create(None).id != s.id
# --------------------------------------------------------------------------- #
# PVE verb whitelist (unchanged security boundary).
# --------------------------------------------------------------------------- #
def test_allowed_verbs_match_host_script():
assert pve.ALLOWED_VERBS == {
"status", "forensics", "reset", "stop", "start", "cycle"
}
assert pve.ALLOWED_VERBS == {"status", "forensics", "reset", "stop", "start", "cycle"}
assert pve.MUTATING_VERBS == {"reset", "stop", "start", "cycle"}
assert pve.MUTATING_VERBS < pve.ALLOWED_VERBS
@pytest.mark.parametrize("bad", [
"rm -rf /", "status; rm -rf /", "status 103", "shutdown", "", "STATUS",
"cycle 999", "$(reboot)", "../start",
])
@pytest.mark.parametrize("bad", ["rm -rf /", "status; reboot", "status 103", "", "STATUS"])
@pytest.mark.asyncio
async def test_run_verb_rejects_non_whitelisted_without_ssh(bad, monkeypatch):
"""A bad verb must be rejected locally — never spawning a subprocess."""
called = False
async def _boom(*a, **k):
nonlocal called
called = True
raise AssertionError("ssh must not run for a rejected verb")
monkeypatch.setattr(pve.asyncio, "create_subprocess_exec", _boom)
result = await pve.run_verb(bad)
assert result["rejected"] is True
assert result["exit_code"] is None
assert called is False
@pytest.mark.asyncio
async def test_run_verb_allowed_invokes_ssh_with_bare_verb(monkeypatch):
captured = {}
class _FakeProc:
returncode = 0
async def communicate(self):
return (b"status: running\n", b"")
async def _fake_exec(*argv, **kwargs):
captured["argv"] = argv
return _FakeProc()
monkeypatch.setattr(pve.asyncio, "create_subprocess_exec", _fake_exec)
result = await pve.run_verb("status")
assert result["rejected"] is False
assert result["exit_code"] == 0
assert "running" in result["stdout"]
# The verb is the LAST argv element, passed as a single token (no shell).
assert captured["argv"][-1] == "status"
assert captured["argv"][0] == "ssh"
# --------------------------------------------------------------------------- #
# stream-json -> UI event translation (pure function).
# translate_event (pure).
# --------------------------------------------------------------------------- #
def test_translate_init_to_session():
ev = agent_session.translate_event(
def test_translate_init_and_noise_and_blocks():
assert agent_session.translate_event(
{"type": "system", "subtype": "init", "session_id": "abc"}
) == {"kind": "session", "session_id": "abc"}
assert agent_session.translate_event({"type": "system", "subtype": "hook_started"}) is None
assert agent_session.translate_event(
{"type": "assistant", "message": {"content": [{"type": "text", "text": "hi"}]}}
) == {"kind": "text", "text": "hi"}
tool = agent_session.translate_event(
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "df -h"}}]}}
)
assert ev == {"kind": "session", "session_id": "abc"}
@pytest.mark.parametrize("noise", [
{"type": "system", "subtype": "hook_started"},
{"type": "system", "subtype": "thinking_tokens", "estimated_tokens": 5},
{"type": "user", "message": {"content": []}},
{"type": "unknown"},
])
def test_translate_drops_noise(noise):
assert agent_session.translate_event(noise) is None
def test_translate_assistant_text():
ev = agent_session.translate_event({
"type": "assistant",
"message": {"content": [{"type": "text", "text": "checking disk"}]},
})
assert ev == {"kind": "text", "text": "checking disk"}
def test_translate_assistant_tool_use():
ev = agent_session.translate_event({
"type": "assistant",
"message": {"content": [
{"type": "tool_use", "name": "Bash", "input": {"command": "df -h"}}
]},
})
assert ev["kind"] == "tool"
assert ev["name"] == "Bash"
assert ev["input"]["command"] == "df -h"
def test_translate_result():
ev = agent_session.translate_event({
"type": "result", "is_error": False, "result": "done", "duration_ms": 1234,
})
assert ev == {"kind": "result", "is_error": False, "result": "done", "duration_ms": 1234}
assert tool["kind"] == "tool" and tool["input"]["command"] == "df -h"
# --------------------------------------------------------------------------- #
# Routes + auth.
# --------------------------------------------------------------------------- #
client = TestClient(app)
AUTH = {"Authorization": "Bearer test-token"}
def test_health_no_auth():
r = client.get("/health")
assert r.status_code == 200
assert r.json()["service"] == "claude-breakglass"
assert client.get("/health").json()["service"] == "claude-breakglass"
def test_api_requires_auth():
assert client.post("/api/session").status_code == 401
assert client.get("/api/pve/verbs").status_code == 401
assert client.post("/api/session/x/prompt", json={"prompt": "hi"}).status_code == 401
def test_api_accepts_bearer():
def test_session_create_and_unknown_session_404():
r = client.post("/api/session", headers=AUTH)
assert r.status_code == 200
assert "session_id" in r.json()
assert r.status_code == 200 and "session_id" in r.json()
assert client.post("/api/session/nope/prompt", headers=AUTH, json={"prompt": "x"}).status_code == 404
assert client.post("/api/session/nope/cancel", headers=AUTH).status_code == 404
def test_api_accepts_authentik_header():
r = client.post("/api/session", headers={"X-authentik-username": "me@viktorbarzin.me"})
assert r.status_code == 200
def test_prompt_starts_turn(monkeypatch):
monkeypatch.setattr(sessionmod.Session, "start_turn", lambda self, *a, **k: True)
sid = client.post("/api/session", headers=AUTH).json()["session_id"]
r = client.post(f"/api/session/{sid}/prompt", headers=AUTH, json={"prompt": "diagnose"})
assert r.status_code == 200 and r.json()["status"] == "started"
def test_pve_verb_route_rejects_unknown():
r = client.post("/api/pve/destroy", headers=AUTH)
assert r.status_code == 400
def test_prompt_409_when_turn_active(monkeypatch):
monkeypatch.setattr(sessionmod.Session, "start_turn", lambda self, *a, **k: False)
sid = client.post("/api/session", headers=AUTH).json()["session_id"]
r = client.post(f"/api/session/{sid}/prompt", headers=AUTH, json={"prompt": "x"})
assert r.status_code == 409
def test_pve_verbs_listing():
r = client.get("/api/pve/verbs", headers=AUTH)
assert r.status_code == 200
body = r.json()
assert set(body["verbs"]) == pve.ALLOWED_VERBS
assert set(body["mutating"]) == pve.MUTATING_VERBS
def test_chat_streams_sse(monkeypatch):
async def _fake_turn(session_id, prompt, model=None):
yield {"kind": "session", "session_id": session_id}
yield {"kind": "text", "text": "hello"}
yield {"kind": "result", "is_error": False, "result": "ok"}
monkeypatch.setattr(agent_session, "run_turn", _fake_turn)
r = client.post("/api/chat", headers=AUTH,
json={"session_id": "s1", "prompt": "diagnose"})
assert r.status_code == 200
assert "text/event-stream" in r.headers["content-type"]
body = r.text
assert "hello" in body
assert '"kind": "done"' in body # terminal frame always emitted
def test_pve_verbs_listing_and_unknown_rejected():
assert set(client.get("/api/pve/verbs", headers=AUTH).json()["verbs"]) == pve.ALLOWED_VERBS
assert client.post("/api/pve/destroy", headers=AUTH).status_code == 400

View file

@ -0,0 +1,256 @@
"""Tests for the conversational (no-tools, multi-turn) brain endpoint.
This is the portal-assistant "Brain": a lean path that drives the Claude CLI with
a no-tools conversational agent and per-conversation `--resume`, used by the voice
gateway. Unlike /v1/chat/completions it does NOT clone a workspace or run a
tool-enabled agent (see portal-assistant ADR-0002).
"""
import json
from unittest.mock import AsyncMock, patch
import pytest
from httpx import ASGITransport, AsyncClient
from app import conversational
from app.main import app
# --------------------------------------------------------------------------- #
# argv builder
# --------------------------------------------------------------------------- #
def test_conversational_argv_new_session():
argv = conversational_argv_call(resume=False)
assert argv[0] == "claude"
assert "-p" in argv
assert argv[argv.index("--agent") + 1] == "conversational"
# a new conversation opens with --session-id, never --resume
assert argv[argv.index("--session-id") + 1] == "sess-1"
assert "--resume" not in argv
# SECURITY: a public-facing endpoint must NOT skip tool permissions
assert "--dangerously-skip-permissions" not in argv
assert argv[argv.index("--model") + 1] == "sonnet"
assert argv[argv.index("--output-format") + 1] == "json"
# latency: trims project CLAUDE.md/MCP + dynamic system-prompt sections off
# the no-tools voice turn (~45k -> ~23k input tokens, ~1.3s faster TTFT)
assert argv[argv.index("--setting-sources") + 1] == "user"
assert "--exclude-dynamic-system-prompt-sections" in argv
assert argv[-1] == "Hi there"
def test_conversational_argv_resume_continues_session():
argv = conversational_argv_call(resume=True)
# a follow-up turn resumes the existing claude session
assert argv[argv.index("--resume") + 1] == "sess-1"
assert "--session-id" not in argv
def conversational_argv_call(resume: bool):
from app.conversational import conversational_argv
return conversational_argv(
session_id="sess-1", message="Hi there", model="sonnet", resume=resume
)
# --------------------------------------------------------------------------- #
# endpoint
# --------------------------------------------------------------------------- #
class _AsyncLineIter:
"""Async iterator over a list of byte lines — mimics `proc.stdout`."""
def __init__(self, lines: list[bytes]):
self._lines = list(lines)
self._i = 0
def __aiter__(self):
return self
async def __anext__(self):
if self._i >= len(self._lines):
raise StopAsyncIteration
line = self._lines[self._i]
self._i += 1
return line
def _mock_subprocess_returning(output: bytes, returncode: int = 0):
proc = AsyncMock()
lines = [chunk + b"\n" for chunk in output.split(b"\n") if chunk]
proc.stdout = _AsyncLineIter(lines)
proc.stderr = AsyncMock()
proc.stderr.read = AsyncMock(return_value=b"")
proc.wait = AsyncMock(return_value=returncode)
proc.returncode = returncode
return proc
@pytest.fixture(autouse=True)
def _reset_sessions():
conversational.reset_started()
yield
conversational.reset_started()
@pytest.fixture
def auth_header():
return {"Authorization": "Bearer test-token"}
@pytest.mark.asyncio
async def test_conversational_happy_path(auth_header):
"""A message in → the assistant's reply out, keyed to the session."""
cli_output = json.dumps({
"type": "result",
"is_error": False,
"result": "Здравейте! Как мога да помогна?",
"session_id": "sess-1",
}).encode()
mock_proc = _mock_subprocess_returning(cli_output, returncode=0)
with patch("app.conversational.asyncio.create_subprocess_exec", return_value=mock_proc):
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
response = await client.post(
"/v1/conversational",
json={"session_id": "sess-1", "message": "Здравей"},
headers=auth_header,
)
assert response.status_code == 200, response.text
body = response.json()
assert body["session_id"] == "sess-1"
assert body["reply"] == "Здравейте! Как мога да помогна?"
@pytest.mark.asyncio
async def test_conversational_resumes_on_second_turn(auth_header):
"""First turn opens the session (--session-id); a second turn on the same
session id resumes it (--resume) this is what makes it a conversation."""
calls: list[tuple] = []
def fake_spawn(*args, **kwargs):
calls.append(args)
out = json.dumps({"type": "result", "is_error": False, "result": "ok"}).encode()
return _mock_subprocess_returning(out, returncode=0)
with patch("app.conversational.asyncio.create_subprocess_exec", side_effect=fake_spawn):
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
for _ in range(2):
r = await client.post(
"/v1/conversational",
json={"session_id": "sess-X", "message": "hi"},
headers=auth_header,
)
assert r.status_code == 200, r.text
assert "--session-id" in calls[0] and "--resume" not in calls[0]
assert "--resume" in calls[1] and "--session-id" not in calls[1]
@pytest.mark.asyncio
async def test_conversational_requires_auth():
"""No bearer token → 401, same as the other endpoints."""
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
r = await client.post(
"/v1/conversational",
json={"session_id": "s", "message": "hi"},
)
assert r.status_code == 401
@pytest.mark.asyncio
async def test_conversational_returns_503_on_failure(auth_header):
"""A non-zero claude exit surfaces as 503 execution-failed."""
mock_proc = _mock_subprocess_returning(b"", returncode=7)
mock_proc.stderr.read = AsyncMock(return_value=b"boom")
with patch("app.conversational.asyncio.create_subprocess_exec", return_value=mock_proc):
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
r = await client.post(
"/v1/conversational",
json={"session_id": "s", "message": "x"},
headers=auth_header,
)
assert r.status_code == 503
assert r.json()["error"] == "execution failed"
# --------------------------------------------------------------------------- #
# streaming helpers (OpenAI-compatible token relay for the realtime voice agent)
# --------------------------------------------------------------------------- #
from collections import namedtuple # noqa: E402
_Msg = namedtuple("_Msg", "role content")
def test_stream_argv_uses_stream_json_and_is_stateless():
argv = conversational.stream_argv("hello", "sonnet")
assert argv[:2] == ["claude", "-p"]
assert "--agent" in argv and "conversational" in argv
assert "stream-json" in argv
assert "--include-partial-messages" in argv
assert "--verbose" in argv
assert "--model" in argv and "sonnet" in argv
# latency: same lean-context trim as the gateway path
assert argv[argv.index("--setting-sources") + 1] == "user"
assert "--exclude-dynamic-system-prompt-sections" in argv
assert argv[-1] == "hello"
# stateless + no tools
assert "--resume" not in argv and "--session-id" not in argv
assert "--dangerously-skip-permissions" not in argv
def test_delta_text_extracts_content_block_delta():
line = json.dumps({
"type": "stream_event",
"event": {"type": "content_block_delta",
"delta": {"type": "text_delta", "text": "Слон"}},
})
assert conversational.delta_text(line) == "Слон"
def test_delta_text_ignores_non_text_events():
for ev in [
{"type": "system"},
{"type": "stream_event", "event": {"type": "message_start"}},
{"type": "stream_event", "event": {"type": "content_block_delta",
"delta": {"type": "input_json_delta", "partial_json": "{"}}},
{"type": "result"},
]:
assert conversational.delta_text(json.dumps(ev)) is None
assert conversational.delta_text("") is None
assert conversational.delta_text("not json") is None
def test_openai_chunk_valid_sse_and_keeps_cyrillic():
s = conversational.openai_chunk("chatcmpl-x", "sonnet", 123, content="две")
assert s.startswith("data: ") and s.endswith("\n\n")
payload = json.loads(s[len("data: "):].strip())
assert payload["object"] == "chat.completion.chunk"
assert payload["choices"][0]["delta"]["content"] == "две"
assert payload["choices"][0]["finish_reason"] is None
assert "две" in s # not unicode-escaped
def test_openai_chunk_role_and_finish():
role = conversational.openai_chunk("id", "m", 1, role="assistant")
assert json.loads(role[6:].strip())["choices"][0]["delta"] == {"role": "assistant"}
stop = conversational.openai_chunk("id", "m", 1, finish_reason="stop")
c = json.loads(stop[6:].strip())["choices"][0]
assert c["finish_reason"] == "stop" and c["delta"] == {}
def test_synthesise_chat_prompt_keeps_assistant_turns():
msgs = [
_Msg("system", "Be brief."),
_Msg("user", "Здравей"),
_Msg("assistant", "Здравей! Как си?"),
_Msg("user", "Добре, ти?"),
]
p = conversational.synthesise_chat_prompt(msgs)
assert "Be brief." in p
assert "User: Здравей" in p
assert "Assistant: Здравей! Как си?" in p
assert p.strip().endswith("User: Добре, ти?")

View file

@ -98,14 +98,15 @@ async def test_chat_completions_happy_path(auth_header):
@pytest.mark.asyncio
async def test_chat_completions_rejects_streaming(auth_header):
"""stream=true is not supported and must 400 with a clear message."""
async def test_chat_completions_streaming_rejects_unsupported_model(auth_header):
"""Streaming is supported now; model validation still runs first, so an
unsupported model 400s before any CLI is spawned."""
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
response = await client.post(
"/v1/chat/completions",
json={
"model": "haiku",
"model": "gpt-4",
"messages": [{"role": "user", "content": "hi"}],
"stream": True,
},
@ -113,7 +114,7 @@ async def test_chat_completions_rejects_streaming(auth_header):
)
assert response.status_code == 400
body = response.json()
assert "streaming not supported" in json.dumps(body).lower()
assert "unsupported model" in json.dumps(body).lower()
@pytest.mark.asyncio
@ -370,3 +371,58 @@ async def test_chat_completions_response_model_echoes_default_when_missing(auth_
)
assert status == 200
assert body["model"] == "sonnet"
def _delta_line(text: str) -> str:
return json.dumps({
"type": "stream_event",
"event": {"type": "content_block_delta",
"delta": {"type": "text_delta", "text": text}},
})
@pytest.mark.asyncio
async def test_chat_completions_streaming_relays_token_sse(auth_header):
"""stream=true relays CLI stream-json token deltas as OpenAI SSE chunks."""
cli_output = "\n".join([
json.dumps({"type": "system"}),
json.dumps({"type": "stream_event", "event": {"type": "message_start"}}),
_delta_line("Две"),
_delta_line(" точки."),
json.dumps({"type": "result", "subtype": "success"}),
]).encode()
mock_proc = _mock_subprocess_returning(cli_output, returncode=0)
with patch("app.main.asyncio.create_subprocess_exec", return_value=mock_proc):
transport = ASGITransport(app=app)
async with AsyncClient(transport=transport, base_url="http://test") as client:
response = await client.post(
"/v1/chat/completions",
json={
"model": "sonnet",
"stream": True,
"messages": [{"role": "user", "content": "Колко е?"}],
},
headers=auth_header,
)
assert response.status_code == 200, response.text
assert response.headers["content-type"].startswith("text/event-stream")
body = response.text
assert "chat.completion.chunk" in body
assert body.rstrip().endswith("data: [DONE]")
# Reassemble the streamed assistant content from the delta chunks.
content = ""
saw_role = False
for line in body.splitlines():
if not line.startswith("data: ") or line.strip() == "data: [DONE]":
continue
payload = json.loads(line[len("data: "):])
assert payload["object"] == "chat.completion.chunk"
delta = payload["choices"][0]["delta"]
if delta.get("role") == "assistant":
saw_role = True
content += delta.get("content", "")
assert saw_role
assert content == "Две точки."