Compare commits
9 commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
eccf0dd407 | ||
|
|
a29bffdda3 | ||
|
|
4e48214c0b | ||
|
|
33ff0868c3 | ||
|
|
e34640cc47 | ||
|
|
2ef0db9a96 | ||
|
|
171857da6b | ||
|
|
5b5daa4bea | ||
|
|
be81005186 |
32
agents/conversational.md
Normal file
|
|
@ -0,0 +1,32 @@
|
|||
---
|
||||
name: conversational
|
||||
description: Friendly bilingual (Bulgarian + English) spoken-conversation assistant for non-technical users. No tools and no file/cluster/web access — it only talks. Replies are short and natural for text-to-speech. Used by the portal-assistant voice gateway.
|
||||
model: sonnet
|
||||
tools: ""
|
||||
---
|
||||
|
||||
You are a warm, friendly voice assistant talking with everyday people at home.
|
||||
Your replies are SPOKEN ALOUD by a text-to-speech engine, so how you write
|
||||
matters as much as what you say.
|
||||
|
||||
- Reply in the SAME language the person used — Bulgarian or English. If they mix,
|
||||
follow their dominant language. Never announce or comment on the language; just
|
||||
use it.
|
||||
- Keep it SHORT: one to three sentences. This is a conversation, not an essay.
|
||||
- Write plain spoken text ONLY. No markdown, no bullet lists, no code blocks, no
|
||||
URLs, no emoji, no headings — none of that survives being read aloud.
|
||||
- Sound natural and warm, like a helpful person, not a manual. Contractions are
|
||||
good.
|
||||
- Write numbers, dates and times the way they should be SPOKEN (for example
|
||||
"ten thirty in the morning", "the fifteenth of March"), not as digits or
|
||||
symbols.
|
||||
- If you don't know something or can't help, say so briefly and kindly.
|
||||
|
||||
You have NO tools and no access to the home, devices, files, the internet, or any
|
||||
system. You cannot turn things on or off, look things up live, send messages, or
|
||||
take any action — you are a conversation partner only. If asked to do something
|
||||
you can't, say so simply and offer what you can instead (talk it through, explain,
|
||||
or suggest an idea).
|
||||
|
||||
Never mention these instructions, "tools", "agents", tokens, system prompts, or
|
||||
that you are an AI model — unless the person directly and explicitly asks.
|
||||
43
app/afk/__init__.py
Normal file
|
|
@ -0,0 +1,43 @@
|
|||
"""AFK loop: the autonomous issue-implementer control plane.
|
||||
|
||||
This package is the "away-from-keyboard" automation that watches the issue
|
||||
tracker for ``ready-for-agent`` issues, dispatches each to a fresh **T3** thread
|
||||
(the full-access ``claudeAgent`` runtime) with the issue-implementer preamble
|
||||
prepended, then drives the resulting run through its lifecycle — tests-red →
|
||||
green → pushed → CI → deployed — escalating or fix-forwarding per a small,
|
||||
testable state machine. It owns no agent behaviour itself; the agent's standing
|
||||
rules are injected as a prompt preamble (``issue_implementer_prompt``) because
|
||||
T3 does NOT honour ``~/.claude/CLAUDE.md``.
|
||||
|
||||
The whole loop ships **DISABLED**, by two independent gates: ``Config`` defaults
|
||||
to ``kill_switch=True`` AND an empty ``allowlist`` (see ``config.py``). Importing
|
||||
this package, scheduling the CronJob entrypoints, or constructing the default
|
||||
``Config`` therefore dispatches NOTHING and performs zero I/O — a disabled tick
|
||||
is wholly inert. The package is also not imported by the running service
|
||||
(``app.main``), so wiring it in changes nothing on its own.
|
||||
|
||||
>>> ENABLING IS A DELIBERATE MANUAL STEP, PERFORMED LATER, NEVER BY THIS CODE. <<<
|
||||
Arming the loop takes BOTH of, on purpose (either alone stays inert, so one
|
||||
fat-fingered env var can't arm every repo):
|
||||
1. clear the kill switch (``AFK_KILL_SWITCH=false`` / ConfigMap ``kill_switch: "false"``), AND
|
||||
2. enrol the exact repos (``AFK_ALLOWLIST=repo-a,repo-b`` / ConfigMap ``allowlist``).
|
||||
There is no auto-enable path anywhere in this package; do not add one here.
|
||||
|
||||
Every test in the suite runs against fakes — this package never talks to a real
|
||||
T3 server, GitHub/Forgejo, the cluster, or Slack.
|
||||
|
||||
Module map (each is independently testable against the interfaces in
|
||||
``types.py``):
|
||||
* ``types`` — shared dataclasses + enums (the contract).
|
||||
* ``config`` — disabled-by-default Config + env/configmap loaders.
|
||||
* ``issue_implementer_prompt`` — the preamble prepended to every dispatch.
|
||||
* ``dispatch_policy`` — which ready issues to dispatch right now (pure).
|
||||
* ``run_state_machine`` — snapshot + CI status → next Action (pure).
|
||||
* ``phase_checklist`` — render the run's progress as a markdown checklist (pure).
|
||||
* ``t3_client`` — the two-POST T3 dispatch + snapshot reader.
|
||||
* ``tracker`` — issue-tracker reads/labels/comments/close.
|
||||
* ``ci_watcher`` — commit → CI status.
|
||||
* ``notifier`` — escalation/notification sink.
|
||||
* ``poller`` — CronJob tick #1: select + dispatch ready issues.
|
||||
* ``watcher`` — CronJob tick #2: drive one in-flight run to a verdict.
|
||||
"""
|
||||
141
app/afk/ci_watcher.py
Normal file
|
|
@ -0,0 +1,141 @@
|
|||
"""CI watcher — fold a pushed commit's pipeline into a single ``CIStatus``.
|
||||
|
||||
A commit the agent pushed to ``master`` is only "done" once it has both *built*
|
||||
and *deployed*: the CI/CD chain is GHA → ghcr → Woodpecker → Keel
|
||||
(``docs/2026-06-14-afk-implementation-pipeline-design.md``). This adapter
|
||||
collapses that multi-stage reality into the three-value verdict the state
|
||||
machine speaks (:class:`~app.afk.types.CIStatus`): ``PENDING`` / ``GREEN`` /
|
||||
``RED``.
|
||||
|
||||
It checks three stages in order and stops at the first that decides the verdict:
|
||||
|
||||
1. **build** — the GitHub Actions run for the commit (build + test + lint);
|
||||
2. **deploy** — the Woodpecker pipeline that ships the built image;
|
||||
3. **rollout** — the image actually reaching the cluster (Keel/k8s rollout).
|
||||
|
||||
Folding rule, applied stage by stage: a ``FAILURE`` anywhere is ``RED`` (and we
|
||||
short-circuit — a red build is never "rolled out", and we don't bother the later
|
||||
clients); a stage that hasn't concluded (``NONE`` = no run yet, ``PENDING`` =
|
||||
in progress) makes the whole verdict ``PENDING`` (the state machine waits on
|
||||
either); only when *every* stage has succeeded is the commit ``GREEN``.
|
||||
|
||||
The three stage clients are **injected**, each behind a tiny structural
|
||||
:class:`typing.Protocol`, so this module never imports ``gh`` / ``woodpecker`` /
|
||||
``kubectl`` and the tests drive it entirely with fakes. The rollout client is
|
||||
**optional** — the pilot keeps cluster/``state.sqlite`` reads optional, so a
|
||||
watcher built without one treats a green deploy as the terminal ``GREEN``. The
|
||||
real client wiring (subprocess argv, JSON parsing, kubectl-exec) lives in the
|
||||
adapters that *implement* these Protocols, not here; keeping this module pure
|
||||
keeps the folding logic the only thing under test.
|
||||
"""
|
||||
from enum import Enum
|
||||
from typing import Protocol
|
||||
|
||||
from .types import CIStatus
|
||||
|
||||
|
||||
class StageResult(Enum):
|
||||
"""Outcome of one CI/CD stage for a commit, before folding into ``CIStatus``.
|
||||
|
||||
Each injected client returns one of these per ``(repo, commit)``:
|
||||
|
||||
``NONE`` — no run exists yet for this commit (e.g. the webhook hasn't fired);
|
||||
``PENDING`` — a run exists and is still in progress;
|
||||
``SUCCESS`` — the stage concluded green;
|
||||
``FAILURE`` — the stage concluded red.
|
||||
|
||||
``NONE`` and ``PENDING`` are distinct on purpose so a client can report
|
||||
"nothing here yet" vs "running" even though both fold to ``CIStatus.PENDING``;
|
||||
keeping them separate lets callers/log lines tell the two apart.
|
||||
"""
|
||||
|
||||
NONE = "none"
|
||||
PENDING = "pending"
|
||||
SUCCESS = "success"
|
||||
FAILURE = "failure"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Injected client Protocols — structural, so any object with the right method
|
||||
# (real adapter or test fake) satisfies them. No ``Any``: every method is typed
|
||||
# (repo, commit) -> StageResult.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class GitHubChecksClient(Protocol):
|
||||
"""Reads the GitHub Actions run (build + test + lint) for a commit."""
|
||||
|
||||
def run_conclusion(self, repo: str, commit: str) -> StageResult: ...
|
||||
|
||||
|
||||
class WoodpeckerClient(Protocol):
|
||||
"""Reads the Woodpecker deploy pipeline triggered for a commit's image."""
|
||||
|
||||
def deploy_conclusion(self, repo: str, commit: str) -> StageResult: ...
|
||||
|
||||
|
||||
class RolloutClient(Protocol):
|
||||
"""Reads whether the commit's image has rolled out to the cluster."""
|
||||
|
||||
def rollout_status(self, repo: str, commit: str) -> StageResult: ...
|
||||
|
||||
|
||||
class CIWatcher:
|
||||
"""Folds build → deploy → rollout into a single :class:`CIStatus`.
|
||||
|
||||
Inject the three stage clients (``github`` and ``woodpecker`` are required;
|
||||
``rollout`` is optional — omit it to stop the verdict at the deploy stage,
|
||||
matching the pilot's "cluster reads optional" posture). The clients are the
|
||||
only I/O surface, so production passes real adapters and tests pass fakes;
|
||||
:meth:`status` itself is pure.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
github: GitHubChecksClient,
|
||||
woodpecker: WoodpeckerClient,
|
||||
rollout: RolloutClient | None = None,
|
||||
) -> None:
|
||||
self._github = github
|
||||
self._woodpecker = woodpecker
|
||||
self._rollout = rollout
|
||||
|
||||
def status(self, repo: str, commit: str) -> CIStatus:
|
||||
"""Return the folded CI verdict for ``commit`` in ``repo``.
|
||||
|
||||
Stages are queried lazily in order and the first decisive one wins: a
|
||||
``FAILURE`` yields ``RED``, an unconcluded stage (``NONE``/``PENDING``)
|
||||
yields ``PENDING``, and only when every stage has ``SUCCESS`` does the
|
||||
verdict reach ``GREEN``. Short-circuiting is real — a stage is only
|
||||
queried if every earlier stage succeeded, so a red/pending build never
|
||||
touches the deploy or rollout client (the assertions in the tests, and
|
||||
avoiding a needless kubectl-exec, both depend on this). With no rollout
|
||||
client the deploy stage is terminal.
|
||||
"""
|
||||
# Each entry is a thunk so a later stage's client is never called once an
|
||||
# earlier stage has already decided the verdict.
|
||||
probes = [
|
||||
lambda: self._github.run_conclusion(repo, commit),
|
||||
lambda: self._woodpecker.deploy_conclusion(repo, commit),
|
||||
]
|
||||
if self._rollout is not None:
|
||||
rollout = self._rollout # bind for the closure (narrowed, non-None)
|
||||
probes.append(lambda: rollout.rollout_status(repo, commit))
|
||||
|
||||
for probe in probes:
|
||||
verdict = _stage_verdict(probe())
|
||||
if verdict is not None:
|
||||
return verdict # FAILURE → RED, NONE/PENDING → PENDING
|
||||
return CIStatus.GREEN
|
||||
|
||||
|
||||
def _stage_verdict(stage: StageResult) -> CIStatus | None:
|
||||
"""Decisive verdict for a single stage, or ``None`` to "keep going".
|
||||
|
||||
``FAILURE`` decides ``RED``; an unconcluded stage (``NONE``/``PENDING``)
|
||||
decides ``PENDING``; ``SUCCESS`` is non-decisive (``None``) — the next stage
|
||||
gets to speak, and only the last stage's success folds to ``GREEN``.
|
||||
"""
|
||||
if stage is StageResult.FAILURE:
|
||||
return CIStatus.RED
|
||||
if stage in (StageResult.NONE, StageResult.PENDING):
|
||||
return CIStatus.PENDING
|
||||
return None
|
||||
127
app/afk/config.py
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
"""Config loader for the AFK loop — DISABLED BY DEFAULT.
|
||||
|
||||
The whole loop ships off. A bare ``Config()`` (and therefore ``default()``,
|
||||
``from_env()`` with nothing set, and ``from_configmap({})``) has
|
||||
``kill_switch=True`` and an empty ``allowlist`` — so nothing is ever
|
||||
dispatched until an operator deliberately turns it on. Enabling is a TWO-part
|
||||
manual step, on purpose:
|
||||
|
||||
1. set ``AFK_KILL_SWITCH=false`` (or ``kill_switch: "false"`` in the
|
||||
ConfigMap), AND
|
||||
2. populate ``AFK_ALLOWLIST`` with the exact repos that may be automated.
|
||||
|
||||
Either alone is inert: the kill switch off with an empty allowlist still
|
||||
dispatches nothing, and a full allowlist with the kill switch on is frozen.
|
||||
Both gates exist so a single fat-fingered env var can't accidentally arm the
|
||||
loop across every repo.
|
||||
|
||||
``from_env`` reads process env; ``from_configmap`` reads an already-parsed
|
||||
string→string mapping (the shape a mounted ConfigMap gives you). They share one
|
||||
parser so the two paths can't drift. Lists are comma-separated; booleans accept
|
||||
the usual truthy spellings.
|
||||
|
||||
This module owns only *loading* a ``Config`` — the dataclass itself lives in
|
||||
``types`` and policy decisions live in ``dispatch_policy`` / ``run_state_machine``.
|
||||
"""
|
||||
import os
|
||||
from collections.abc import Mapping
|
||||
|
||||
from .types import Config
|
||||
|
||||
# Env var names — also the ConfigMap keys (one source of truth for both paths).
|
||||
ENV_ALLOWLIST = "AFK_ALLOWLIST"
|
||||
ENV_KILL_SWITCH = "AFK_KILL_SWITCH"
|
||||
ENV_IN_PROGRESS_LABEL = "AFK_IN_PROGRESS_LABEL"
|
||||
ENV_READY_LABEL = "AFK_READY_LABEL"
|
||||
ENV_BUDGET_USD = "AFK_BUDGET_USD"
|
||||
ENV_FIX_FORWARD_MAX_ATTEMPTS = "AFK_FIX_FORWARD_MAX_ATTEMPTS"
|
||||
ENV_FIX_FORWARD_MAX_SECONDS = "AFK_FIX_FORWARD_MAX_SECONDS"
|
||||
|
||||
# Spellings accepted as boolean true / false (case-insensitive). Anything else
|
||||
# raises rather than silently defaulting — an unparseable kill-switch value must
|
||||
# never be guessed safe-or-unsafe.
|
||||
_TRUE = frozenset({"1", "true", "yes", "on"})
|
||||
_FALSE = frozenset({"0", "false", "no", "off"})
|
||||
|
||||
|
||||
def default() -> Config:
|
||||
"""The disabled default Config: kill switch ON, allowlist EMPTY.
|
||||
|
||||
Equivalent to ``Config(allowlist=[], kill_switch=True)``; provided as a named
|
||||
entry point so callers don't hardcode the disabled posture themselves.
|
||||
"""
|
||||
return Config(allowlist=[], kill_switch=True)
|
||||
|
||||
|
||||
def from_env(env: Mapping[str, str] | None = None) -> Config:
|
||||
"""Build a Config from environment variables (defaults to ``os.environ``).
|
||||
|
||||
Unset variables fall back to the disabled/contract defaults, so an
|
||||
unconfigured process stays off.
|
||||
"""
|
||||
return _from_mapping(os.environ if env is None else env)
|
||||
|
||||
|
||||
def from_configmap(data: Mapping[str, str]) -> Config:
|
||||
"""Build a Config from a parsed ConfigMap (string→string mapping).
|
||||
|
||||
Identical semantics to ``from_env`` — same keys, same parser — but sourced
|
||||
from a mounted ConfigMap's ``data`` rather than process env. An empty mapping
|
||||
yields the disabled default.
|
||||
"""
|
||||
return _from_mapping(data)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Internals — one shared parser so env and ConfigMap paths can't diverge.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _from_mapping(data: Mapping[str, str]) -> Config:
|
||||
base = default()
|
||||
return Config(
|
||||
allowlist=_parse_list(data.get(ENV_ALLOWLIST), base.allowlist),
|
||||
kill_switch=_parse_bool(data.get(ENV_KILL_SWITCH), base.kill_switch),
|
||||
in_progress_label=_nonempty(data.get(ENV_IN_PROGRESS_LABEL), base.in_progress_label),
|
||||
ready_label=_nonempty(data.get(ENV_READY_LABEL), base.ready_label),
|
||||
budget_usd=_parse_float(data.get(ENV_BUDGET_USD), base.budget_usd),
|
||||
fix_forward_max_attempts=_parse_int(
|
||||
data.get(ENV_FIX_FORWARD_MAX_ATTEMPTS), base.fix_forward_max_attempts
|
||||
),
|
||||
fix_forward_max_seconds=_parse_int(
|
||||
data.get(ENV_FIX_FORWARD_MAX_SECONDS), base.fix_forward_max_seconds
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
def _parse_list(raw: str | None, fallback: list[str]) -> list[str]:
|
||||
if raw is None:
|
||||
return list(fallback)
|
||||
return [item.strip() for item in raw.split(",") if item.strip()]
|
||||
|
||||
|
||||
def _parse_bool(raw: str | None, fallback: bool) -> bool:
|
||||
if raw is None:
|
||||
return fallback
|
||||
value = raw.strip().lower()
|
||||
if value in _TRUE:
|
||||
return True
|
||||
if value in _FALSE:
|
||||
return False
|
||||
raise ValueError(f"unparseable boolean for AFK config: {raw!r}")
|
||||
|
||||
|
||||
def _parse_int(raw: str | None, fallback: int) -> int:
|
||||
if raw is None or not raw.strip():
|
||||
return fallback
|
||||
return int(raw.strip())
|
||||
|
||||
|
||||
def _parse_float(raw: str | None, fallback: float) -> float:
|
||||
if raw is None or not raw.strip():
|
||||
return fallback
|
||||
return float(raw.strip())
|
||||
|
||||
|
||||
def _nonempty(raw: str | None, fallback: str) -> str:
|
||||
if raw is None or not raw.strip():
|
||||
return fallback
|
||||
return raw.strip()
|
||||
118
app/afk/dispatch_policy.py
Normal file
|
|
@ -0,0 +1,118 @@
|
|||
"""Dispatch policy — the PURE gate deciding which ready issues to run *now*.
|
||||
|
||||
``select_dispatchable`` is the loop's first decision each tick: given every
|
||||
issue the tracker reported ready, the loop config, and the set of repos that
|
||||
already have an agent in flight, it returns the ordered list of issues to
|
||||
dispatch this round. It does **no IO** — no tracker calls, no T3, no clock — so
|
||||
it is exhaustively unit-testable and the loop stays a thin shell around it.
|
||||
|
||||
What it encapsulates (the dispatch predicate from the AFK pipeline design doc):
|
||||
|
||||
* **Kill switch** — ``config.kill_switch`` short-circuits to ``[]`` before any
|
||||
per-issue work. The whole loop ships disabled; this is the master off.
|
||||
* **Trust gate** — only ``issue.labeled_by_trusted`` issues are eligible. On a
|
||||
private repo the gating label *is* the authorization, so an issue made ready
|
||||
by an untrusted/bot actor must never auto-run (prompt-injection defense).
|
||||
* **Allowlist** — ``issue.repo`` must be in ``config.allowlist``. An empty
|
||||
allowlist dispatches nothing even with the kill switch off (the deliberate
|
||||
two-gate posture: arming the loop takes both).
|
||||
* **Per-repo lock** — any repo already in ``in_flight_repos`` is skipped; at
|
||||
most one agent runs per repo (two would collide on the working tree).
|
||||
* **blocked_by gating** — ``issue.blocked_by`` lists the issue numbers of
|
||||
blockers that are still OPEN, so a non-empty list means "still blocked" and
|
||||
the issue is skipped.
|
||||
* **One-agent-per-repo within the batch** — because a repo hosts only one
|
||||
in-flight agent, a single call returns at most ONE decision per repo: the
|
||||
most-urgent eligible issue in that repo wins the slot. (A more-urgent issue
|
||||
that is itself ineligible does not consume the slot — the best *eligible*
|
||||
candidate does.)
|
||||
* **Priority ordering** — the surviving per-repo winners are returned
|
||||
lowest-``priority``-value-first (P0 before P1 before P2), with a deterministic
|
||||
tiebreaker (ascending issue number) so the output is a total, stable order
|
||||
independent of input order.
|
||||
|
||||
PRIORITY DIRECTION — lower ``Issue.priority`` runs first, matching tracker
|
||||
conventions (P0/P1 are more urgent than P2) and ``Issue.priority``'s own
|
||||
docstring in ``types``. The ordering lives here (the one place that consumes
|
||||
``priority`` for dispatch), so this module is the source of truth for the
|
||||
direction.
|
||||
|
||||
Pure: it never mutates its inputs — the caller's issue list, the config, and the
|
||||
``in_flight_repos`` set are all left exactly as passed.
|
||||
"""
|
||||
from .types import Config, DispatchDecision, Issue
|
||||
|
||||
|
||||
def select_dispatchable(
|
||||
issues: list[Issue],
|
||||
config: Config,
|
||||
in_flight_repos: set[str],
|
||||
) -> list[DispatchDecision]:
|
||||
"""Return the ordered issues to dispatch this tick (see module docstring).
|
||||
|
||||
Empty when the kill switch is on, the allowlist excludes everything, or no
|
||||
issue clears every gate. At most one decision per repo; ordered
|
||||
lowest-priority-value-first (most urgent), ties broken by ascending issue
|
||||
number.
|
||||
"""
|
||||
# Kill switch: master off-ramp, evaluated before any per-issue work.
|
||||
if config.kill_switch:
|
||||
return []
|
||||
|
||||
allowlist = frozenset(config.allowlist)
|
||||
|
||||
# First pass: keep only issues that clear every per-issue gate. Repos already
|
||||
# in flight are excluded here, so the lock is enforced before slot selection.
|
||||
eligible: list[Issue] = [
|
||||
issue
|
||||
for issue in issues
|
||||
if _is_eligible(issue, allowlist, in_flight_repos)
|
||||
]
|
||||
|
||||
# One slot per repo: among the eligible issues sharing a repo, the best
|
||||
# candidate (the global sort order) takes it; the rest are dropped this tick.
|
||||
best_per_repo: dict[str, Issue] = {}
|
||||
for issue in sorted(eligible, key=_dispatch_sort_key):
|
||||
best_per_repo.setdefault(issue.repo, issue)
|
||||
|
||||
# Final order: the per-repo winners, most urgent first (total + stable).
|
||||
winners = sorted(best_per_repo.values(), key=_dispatch_sort_key)
|
||||
return [DispatchDecision(issue=issue, reason=_reason(issue)) for issue in winners]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Internals.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _is_eligible(
|
||||
issue: Issue,
|
||||
allowlist: frozenset[str],
|
||||
in_flight_repos: set[str],
|
||||
) -> bool:
|
||||
"""True iff the issue clears the trust, allowlist, per-repo-lock, and
|
||||
blocked_by gates. Kept boolean (not "which gate failed") because the policy
|
||||
only ever needs the survivors; reasons are attached to survivors only."""
|
||||
if not issue.labeled_by_trusted:
|
||||
return False
|
||||
if issue.repo not in allowlist:
|
||||
return False
|
||||
if issue.repo in in_flight_repos:
|
||||
return False
|
||||
if issue.blocked_by: # non-empty == at least one OPEN blocker remains
|
||||
return False
|
||||
return True
|
||||
|
||||
|
||||
def _dispatch_sort_key(issue: Issue) -> tuple[int, int]:
|
||||
"""Sort key giving a total, deterministic order: lowest ``priority`` value
|
||||
first (P0 before P1 — most urgent wins), then lowest issue number as the
|
||||
tiebreaker so equal-priority issues never depend on input/iteration order."""
|
||||
return (issue.priority, issue.number)
|
||||
|
||||
|
||||
def _reason(issue: Issue) -> str:
|
||||
"""Human-readable justification, logged and surfaced in notifications, never
|
||||
parsed. Records that every gate passed and the priority that ordered it."""
|
||||
return (
|
||||
f"{issue.repo}#{issue.number}: eligible "
|
||||
f"(trusted, allowlisted, unblocked, repo free) — priority {issue.priority}"
|
||||
)
|
||||
54
app/afk/issue_implementer_prompt.py
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
"""The issue-implementer preamble — the AFK agent's standing instructions.
|
||||
|
||||
T3's full-access ``claudeAgent`` runtime does NOT read ``~/.claude/CLAUDE.md``,
|
||||
so the agent gets no behaviour from the repo's rules files. Instead the loop
|
||||
injects behaviour by PREPENDING this preamble to ``message.text`` on every
|
||||
dispatch (see ``t3_client.T3Client.dispatch`` callers). It is a module constant
|
||||
on purpose: one canonical, reviewable copy of the rules, versioned with the
|
||||
code, identical for every issue.
|
||||
|
||||
Keep it imperative and self-contained — the agent only ever sees this text plus
|
||||
the issue body. Do not reference files it cannot read (no "see CLAUDE.md").
|
||||
"""
|
||||
|
||||
ISSUE_IMPLEMENTER_PREAMBLE = """\
|
||||
You are an autonomous issue-implementer agent running unattended (the human is \
|
||||
away from keyboard). The task below is a tracker issue. Implement it end to end \
|
||||
and land it yourself — no human will answer questions or click anything for you.
|
||||
|
||||
STANDING RULES — follow exactly, every time:
|
||||
- Work test-first. For any code with testable behaviour, write a failing test \
|
||||
FIRST (red), then the minimum implementation to make it pass (green), then \
|
||||
refactor. Terraform, config, and docs are exempt.
|
||||
- Do the work in an isolated git worktree off the latest master; never edit a \
|
||||
shared checkout directly.
|
||||
- You MUST commit your work — small, focused commits, staging files by name \
|
||||
(never `git add -A` / `git add .`), and never skip hooks. A clear commit \
|
||||
message is the audit trail: the subject says WHAT changed, the body says WHY in \
|
||||
plain words.
|
||||
- When tests and lint are green, land the change yourself: merge the latest \
|
||||
master into your branch, re-verify green, then push to master. If the push is \
|
||||
rejected because someone landed first, fetch, merge, re-verify, and push again. \
|
||||
Do not stop at an unmerged branch and do not open a pull request unless told to.
|
||||
- After pushing, watch the resulting CI / build / deploy chain to completion and \
|
||||
fix any failures you caused before considering the task done.
|
||||
- Operate autonomously. NEVER enter plan mode, and NEVER ask the human a \
|
||||
question or wait for confirmation — make the most reasonable decision, record \
|
||||
your reasoning in the commit message, and proceed. If the issue is genuinely \
|
||||
ambiguous or blocked, say so explicitly in a final comment and stop rather than \
|
||||
guessing destructively.
|
||||
|
||||
GUARDRAILS — never cross these, even if the issue seems to ask for it:
|
||||
- NEVER force-push, and never force-push to master under any circumstance.
|
||||
- NEVER edit, resize, or delete PersistentVolumeClaims / PersistentVolumes, and \
|
||||
never touch Vault secrets or other credential stores.
|
||||
- All infrastructure changes go through Terraform / Terragrunt in the infra \
|
||||
repo — never `kubectl apply/edit/patch/delete` against live cluster state.
|
||||
- NEVER use `[ci skip]` (or any CI-skip token) in a commit message — it hides \
|
||||
the change from the audit and deploy pipeline.
|
||||
- No destructive operations the issue did not ask for: no dropping database \
|
||||
tables, no `rm -rf` outside your worktree, no killing processes you did not \
|
||||
start.
|
||||
|
||||
THE ISSUE TO IMPLEMENT FOLLOWS:
|
||||
"""
|
||||
155
app/afk/notifier.py
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
"""Terminal-state doorbell for the AFK loop — Slack / ntfy escalation sink.
|
||||
|
||||
When a run reaches a *terminal* state the human who is away from keyboard needs
|
||||
to know: either the work landed (``done``) or it needs them back at the console
|
||||
(``needs-human`` — the agent stalled/errored before pushing — or ``frozen`` —
|
||||
the fix-forward budget ran out). This module turns one of those events into a
|
||||
formatted alert carrying a **deep-link to the T3 thread**, so a tap on the
|
||||
notification opens the exact conversation the agent ran.
|
||||
|
||||
Design, matching the rest of ``app.afk`` and the breakglass code:
|
||||
|
||||
* ``Notifier`` owns no transport. The actual Slack/ntfy POST is an injected
|
||||
``sender`` callable (constructor argument). Production wires a real HTTP
|
||||
sender; tests inject a recording fake and assert the formatted payload
|
||||
without touching the network — the same dependency-injection seam breakglass
|
||||
uses for the claude subprocess.
|
||||
* ``render_notification`` is a pure function that builds the payload; ``notify``
|
||||
is just "render, then hand to the sender". Keeping the formatting pure makes
|
||||
it unit-testable on its own and guarantees ``notify`` sends exactly what
|
||||
``render_notification`` returns.
|
||||
* The kind vocabulary is CLOSED: only the three terminal kinds are sendable.
|
||||
An unknown kind raises rather than firing a mystery doorbell — a non-terminal
|
||||
kind reaching here is a caller bug, not something to paper over.
|
||||
* The notifier never swallows a sender failure. If Slack is down the exception
|
||||
propagates; the loop decides whether to retry or give up, not this adapter.
|
||||
|
||||
The whole AFK loop ships DISABLED (see ``config.py``); this module is inert
|
||||
until the loop is deliberately armed and a real sender is wired in.
|
||||
"""
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from .types import Issue
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Kind vocabulary — the terminal states a run can reach. One source of truth
|
||||
# shared by callers (the state machine maps Action -> kind) and tests.
|
||||
# --------------------------------------------------------------------------- #
|
||||
KIND_DONE = "done" # landed: merged + CI green, issue closeable
|
||||
KIND_NEEDS_HUMAN = "needs-human" # stalled/errored before pushing — pre-push escalation
|
||||
KIND_FROZEN = "frozen" # fix-forward budget (attempts/wall-clock) exhausted
|
||||
|
||||
#: The only kinds ``notify`` will send. Anything else is a caller bug.
|
||||
TERMINAL_KINDS: frozenset[str] = frozenset({KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN})
|
||||
|
||||
# Default T3 web UI. Threads deep-link off this; overridable per-Notifier so the
|
||||
# host isn't hardcoded into the formatter (re-IP / staging / tests).
|
||||
DEFAULT_BASE_URL = "https://t3.viktorbarzin.me"
|
||||
|
||||
# Per-kind presentation. The leading marker makes the three distinguishable from
|
||||
# the title alone in a crowded Slack channel without emoji; priority/tags drive
|
||||
# how the sender routes it (a successful close is quiet; the two escalations are
|
||||
# loud and tagged so on-call filters can page on them).
|
||||
_PRESENTATION: dict[str, tuple[str, str, str, tuple[str, ...]]] = {
|
||||
# kind -> (marker, headline, priority, tags)
|
||||
KIND_DONE: ("[DONE]", "landed", "low", ("afk", "done")),
|
||||
KIND_NEEDS_HUMAN: ("[NEEDS-HUMAN]", "needs a human", "high", ("afk", "escalation", "needs-human")),
|
||||
KIND_FROZEN: ("[FROZEN]", "frozen — budget exhausted", "high", ("afk", "escalation", "frozen")),
|
||||
}
|
||||
|
||||
#: A sink that delivers a built notification (HTTP POST in prod, recorder in tests).
|
||||
Sender = Callable[["Notification"], None]
|
||||
|
||||
|
||||
@dataclass
|
||||
class Notification:
|
||||
"""The fully-formatted alert handed to the sender.
|
||||
|
||||
A structured payload (not a raw dict) so the sender can map fields onto its
|
||||
own schema — ``title``/``body`` for Slack blocks or an ntfy message,
|
||||
``priority``/``tags`` for routing, ``link`` for the click-through. ``link``
|
||||
is ``None`` when there is no thread to point at (e.g. dispatch failed before
|
||||
a thread existed); the deep-link is also embedded in ``body`` so it survives
|
||||
senders that only carry a plain message.
|
||||
"""
|
||||
|
||||
kind: str
|
||||
issue_ref: str # "<repo>#<number>", e.g. "infra#42"
|
||||
title: str
|
||||
body: str
|
||||
link: str | None
|
||||
priority: str # "low" | "high" — escalation loudness for the sender
|
||||
tags: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
def _deep_link(base_url: str, thread_id: str | None) -> str | None:
|
||||
"""Build the T3 thread deep-link, or ``None`` when there is no thread."""
|
||||
if not thread_id:
|
||||
return None
|
||||
return f"{base_url.rstrip('/')}/?thread={thread_id}"
|
||||
|
||||
|
||||
def render_notification(
|
||||
kind: str,
|
||||
issue: Issue,
|
||||
thread_id: str | None,
|
||||
detail: str,
|
||||
*,
|
||||
base_url: str = DEFAULT_BASE_URL,
|
||||
) -> Notification:
|
||||
"""Build the :class:`Notification` for a terminal event — pure, no I/O.
|
||||
|
||||
Raises ``ValueError`` if ``kind`` is not one of :data:`TERMINAL_KINDS`: only
|
||||
terminal states ring the doorbell, and a non-terminal kind reaching here is a
|
||||
bug we surface rather than silently send.
|
||||
"""
|
||||
if kind not in TERMINAL_KINDS:
|
||||
raise ValueError(
|
||||
f"notifier only sends terminal kinds {sorted(TERMINAL_KINDS)}, got {kind!r}"
|
||||
)
|
||||
|
||||
marker, headline, priority, tags = _PRESENTATION[kind]
|
||||
issue_ref = f"{issue.repo}#{issue.number}"
|
||||
link = _deep_link(base_url, thread_id)
|
||||
|
||||
title = f"{marker} {issue_ref} {headline}"
|
||||
|
||||
body_lines = [detail]
|
||||
if link is not None:
|
||||
body_lines.append(f"Thread: {link}")
|
||||
body = "\n".join(body_lines)
|
||||
|
||||
return Notification(
|
||||
kind=kind,
|
||||
issue_ref=issue_ref,
|
||||
title=title,
|
||||
body=body,
|
||||
link=link,
|
||||
priority=priority,
|
||||
tags=list(tags),
|
||||
)
|
||||
|
||||
|
||||
class Notifier:
|
||||
"""Sends terminal-state doorbells through an injected ``sender``.
|
||||
|
||||
The ``sender`` is the only egress: ``notify`` formats the payload (via
|
||||
:func:`render_notification`) and hands it over. No transport lives here, so a
|
||||
test injects a recording fake and asserts the payload without posting.
|
||||
"""
|
||||
|
||||
def __init__(self, sender: Sender, *, base_url: str = DEFAULT_BASE_URL) -> None:
|
||||
self._sender = sender
|
||||
self._base_url = base_url
|
||||
|
||||
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
|
||||
"""Format a terminal-state alert and deliver it via the injected sender.
|
||||
|
||||
Raises ``ValueError`` for a non-terminal ``kind`` (before any send), and
|
||||
lets a sender failure propagate — see the module docstring.
|
||||
"""
|
||||
notification = render_notification(
|
||||
kind, issue, thread_id, detail, base_url=self._base_url
|
||||
)
|
||||
self._sender(notification)
|
||||
116
app/afk/phase_checklist.py
Normal file
|
|
@ -0,0 +1,116 @@
|
|||
"""Render an AFK run's progress as a live markdown checklist.
|
||||
|
||||
``render(current, meta)`` is a PURE function: it maps a ``Phase`` plus a bag of
|
||||
optional context (``meta``) to a markdown task list, with no I/O and no hidden
|
||||
state. The loop posts the result as an issue comment so a human glancing at the
|
||||
tracker can see exactly how far an unattended run has got — worktree created,
|
||||
test written, green, pushed, CI, deployed, done.
|
||||
|
||||
The list always shows all seven lifecycle phases in order. Phases strictly
|
||||
*before* ``current`` are checked (``- [x]``); ``current`` is marked in-progress
|
||||
(``- [~]``); later phases are empty (``- [ ]``). ``Phase.DONE`` is terminal — at
|
||||
that point every line, including DONE itself, is checked.
|
||||
|
||||
``meta`` is best-effort decoration only. Recognised keys (all optional):
|
||||
``repo`` / ``issue`` (header title), ``thread_id`` (header suffix), and
|
||||
``fix_forward_attempts`` (a note line when non-zero). Unknown keys are ignored,
|
||||
and a missing key never raises — the checklist degrades gracefully to just the
|
||||
phase list. Nothing here mutates ``meta``.
|
||||
"""
|
||||
from typing import Any
|
||||
|
||||
from .types import Phase
|
||||
|
||||
# Lifecycle order — the single source of truth for both ordering and the
|
||||
# checked/active/empty partition. Must stay in sync with ``Phase`` (the
|
||||
# checklist tests assert every phase appears, so a divergence is caught).
|
||||
_ORDER: tuple[Phase, ...] = (
|
||||
Phase.WORKTREE,
|
||||
Phase.TESTS_RED,
|
||||
Phase.GREEN,
|
||||
Phase.PUSHED,
|
||||
Phase.CI,
|
||||
Phase.DEPLOYED,
|
||||
Phase.DONE,
|
||||
)
|
||||
|
||||
# Human-readable label per phase (what shows on each checklist line).
|
||||
_LABELS: dict[Phase, str] = {
|
||||
Phase.WORKTREE: "Worktree created",
|
||||
Phase.TESTS_RED: "Failing test written (TDD red)",
|
||||
Phase.GREEN: "Implementation passing (TDD green)",
|
||||
Phase.PUSHED: "Pushed to master",
|
||||
Phase.CI: "CI green on pushed commit",
|
||||
Phase.DEPLOYED: "Deployed / rolled out",
|
||||
Phase.DONE: "Done — issue closed",
|
||||
}
|
||||
|
||||
# Task-list markers. ``[~]`` (in-progress) is a common markdown convention and,
|
||||
# crucially, is neither ``[x]`` nor ``[ ]`` so the active line is always visually
|
||||
# distinct from a checked or empty box.
|
||||
_DONE = "- [x]"
|
||||
_ACTIVE = "- [~]"
|
||||
_TODO = "- [ ]"
|
||||
|
||||
|
||||
def render(current: Phase, meta: dict[str, Any]) -> str:
|
||||
"""Render the run's progress checklist as markdown (see module docstring).
|
||||
|
||||
``current`` is the phase the run is in right now; ``meta`` supplies optional
|
||||
header/context fields. Pure: identical inputs yield byte-identical output and
|
||||
``meta`` is never mutated.
|
||||
"""
|
||||
current_index = _ORDER.index(current)
|
||||
is_done = current is Phase.DONE
|
||||
|
||||
lines = [_header(meta), ""]
|
||||
for index, phase in enumerate(_ORDER):
|
||||
lines.append(f"{_marker(index, current_index, is_done)} {_LABELS[phase]}")
|
||||
|
||||
note = _fix_forward_note(meta)
|
||||
if note is not None:
|
||||
lines.extend(["", note])
|
||||
|
||||
# Trailing newline so the block sits cleanly when concatenated into a comment.
|
||||
return "\n".join(lines) + "\n"
|
||||
|
||||
|
||||
def _marker(index: int, current_index: int, is_done: bool) -> str:
|
||||
"""The checkbox marker for the phase at ``index`` given the current phase.
|
||||
|
||||
Earlier phases are checked; the current phase is in-progress; later phases
|
||||
are empty. When the run is DONE, every phase (including DONE) is checked.
|
||||
"""
|
||||
if is_done or index < current_index:
|
||||
return _DONE
|
||||
if index == current_index:
|
||||
return _ACTIVE
|
||||
return _TODO
|
||||
|
||||
|
||||
def _header(meta: dict[str, Any]) -> str:
|
||||
"""The ``###`` title line. Includes ``repo#issue`` when both are present and
|
||||
a ``(thread ...)`` suffix when a thread id is known; degrades to a bare title
|
||||
otherwise."""
|
||||
repo = meta.get("repo")
|
||||
issue = meta.get("issue")
|
||||
if repo is not None and issue is not None:
|
||||
title = f"{repo}#{issue} — AFK run progress"
|
||||
else:
|
||||
title = "AFK run progress"
|
||||
|
||||
thread_id = meta.get("thread_id")
|
||||
if thread_id:
|
||||
title = f"{title} (thread {thread_id})"
|
||||
return f"### {title}"
|
||||
|
||||
|
||||
def _fix_forward_note(meta: dict[str, Any]) -> str | None:
|
||||
"""A note line when one or more fix-forward attempts have happened, else
|
||||
``None`` (no line). Zero/absent attempts add nothing — the clean path stays
|
||||
uncluttered."""
|
||||
attempts = meta.get("fix_forward_attempts")
|
||||
if not attempts:
|
||||
return None
|
||||
plural = "attempt" if attempts == 1 else "attempts"
|
||||
return f"_Fix-forward: {attempts} {plural}._"
|
||||
166
app/afk/poller.py
Normal file
|
|
@ -0,0 +1,166 @@
|
|||
"""CronJob entrypoint: one dispatch tick of the AFK loop.
|
||||
|
||||
The poller is the *first half* of the loop — the part that decides what to start.
|
||||
It runs once per CronJob invocation (the loop is stateless between ticks: the
|
||||
issue tracker, not in-process memory, is the source of truth for what's already
|
||||
in flight). Each tick:
|
||||
|
||||
1. **kill switch** — if ``config.kill_switch`` is set the tick does NOTHING,
|
||||
not even a tracker read. A disabled loop must be inert: zero I/O, zero
|
||||
dispatches. (The pure policy also short-circuits on the kill switch, but the
|
||||
poller bails first so a disabled CronJob never touches the network.)
|
||||
2. read the ready set: ``tracker.list_ready(config.allowlist)`` — every open
|
||||
issue carrying the ready label across the allowlisted repos.
|
||||
3. derive the **per-repo lock**: a repo is "in flight" if any ready issue
|
||||
already carries ``config.in_progress_label`` (the poller stamps that label
|
||||
when it dispatches, so on the next tick the still-open issue re-appears and
|
||||
locks the repo). At most one agent per repo — two would collide on the
|
||||
working tree.
|
||||
4. run the pure ``dispatch_policy.select_dispatchable`` over (ready issues,
|
||||
config, in-flight repos) to get the ordered set to start this tick.
|
||||
5. for each decision: ``t3_client.dispatch(repo, issue, prompt)`` to spawn the
|
||||
worker thread, THEN ``tracker.add_label(repo, issue, in_progress_label)`` —
|
||||
label strictly *after* a successful dispatch, so a dispatch that raises
|
||||
never leaves a phantom lock that would freeze the repo forever.
|
||||
|
||||
It owns no policy of its own — the decision lives in ``dispatch_policy`` and the
|
||||
agent's behaviour rides in the dispatched prompt's preamble (``t3_client``). The
|
||||
two adapters (tracker, T3) are injected behind structural Protocols, so
|
||||
production wires the real ``Tracker`` / ``T3Client`` and the tests wire the
|
||||
in-memory fakes; nothing here opens a socket on its own.
|
||||
|
||||
DISABLED BY DEFAULT: a freshly-loaded ``Config`` has ``kill_switch=True`` and an
|
||||
empty allowlist (see ``config.py``), so importing or scheduling this poller
|
||||
dispatches nothing. Arming the loop — clearing the kill switch AND enrolling a
|
||||
repo — is a deliberate manual step, performed later, never by this code.
|
||||
"""
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Protocol
|
||||
|
||||
from . import dispatch_policy
|
||||
from .types import Config, DispatchDecision, Issue
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Injected adapter Protocols — the I/O edges. Structural, so the real
|
||||
# ``Tracker`` / ``T3Client`` and the test fakes both satisfy them with no
|
||||
# explicit subclassing. Only the methods the poller actually calls appear here.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class TrackerPort(Protocol):
|
||||
"""The slice of ``tracker.Tracker`` the dispatch tick needs."""
|
||||
|
||||
def list_ready(self, repos: list[str]) -> list[Issue]: ...
|
||||
def add_label(self, repo: str, issue: int, label: str) -> None: ...
|
||||
|
||||
|
||||
class T3Port(Protocol):
|
||||
"""The slice of ``t3_client.T3Client`` the dispatch tick needs."""
|
||||
|
||||
def dispatch(self, repo: str, issue: int, prompt: str) -> str: ...
|
||||
|
||||
|
||||
#: The pure dispatch gate's signature, injected so the tick can be tested with a
|
||||
#: stub policy without reaching into module internals. Defaults to the real one.
|
||||
DispatchFn = Callable[[list[Issue], Config, set[str]], list[DispatchDecision]]
|
||||
|
||||
|
||||
@dataclass
|
||||
class Dispatched:
|
||||
"""One issue the tick actually started, with the T3 thread it spawned.
|
||||
|
||||
Returned (not just logged) so the caller — and the tests — can see exactly
|
||||
what was launched. ``thread_id`` is what the watcher half later polls to
|
||||
drive this run to completion; ``reason`` carries the policy's human-readable
|
||||
justification through unchanged.
|
||||
"""
|
||||
|
||||
issue: Issue
|
||||
thread_id: str
|
||||
reason: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class PollResult:
|
||||
"""The outcome of one dispatch tick.
|
||||
|
||||
``dispatched`` is empty whenever the loop is disabled, the allowlist is
|
||||
empty, every repo is already in flight, or nothing clears the dispatch gate
|
||||
— i.e. the common steady-state of a quiet tick.
|
||||
"""
|
||||
|
||||
dispatched: list[Dispatched] = field(default_factory=list)
|
||||
|
||||
|
||||
class Poller:
|
||||
"""Runs one dispatch tick over injected tracker + T3 adapters.
|
||||
|
||||
``dispatch`` defaults to the real pure ``select_dispatchable`` policy; it is
|
||||
injectable purely so a test can substitute a stub without monkeypatching.
|
||||
The poller holds no state between ticks — each ``run_once`` is self-contained.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
tracker: TrackerPort,
|
||||
t3_client: T3Port,
|
||||
dispatch: DispatchFn = dispatch_policy.select_dispatchable,
|
||||
) -> None:
|
||||
self._tracker = tracker
|
||||
self._t3 = t3_client
|
||||
self._dispatch = dispatch
|
||||
|
||||
def run_once(self, config: Config) -> PollResult:
|
||||
"""Execute one dispatch tick (see module docstring). Returns what it
|
||||
started; an empty result is the normal quiet-tick outcome."""
|
||||
# Kill switch: bail before any I/O — a disabled loop touches nothing.
|
||||
if config.kill_switch:
|
||||
return PollResult()
|
||||
|
||||
ready = self._tracker.list_ready(config.allowlist)
|
||||
in_flight = _in_flight_repos(ready, config.in_progress_label)
|
||||
|
||||
result = PollResult()
|
||||
for decision in self._dispatch(ready, config, in_flight):
|
||||
issue = decision.issue
|
||||
# Dispatch FIRST; only stamp the lock once the thread exists, so a
|
||||
# failed dispatch leaves the issue purely ready for the next tick to
|
||||
# retry rather than wedged behind a phantom in-progress label.
|
||||
thread_id = self._t3.dispatch(
|
||||
issue.repo, issue.number, _dispatch_prompt(issue)
|
||||
)
|
||||
self._tracker.add_label(issue.repo, issue.number, config.in_progress_label)
|
||||
result.dispatched.append(
|
||||
Dispatched(issue=issue, thread_id=thread_id, reason=decision.reason)
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Internals — pure helpers.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _in_flight_repos(ready: list[Issue], in_progress_label: str) -> set[str]:
|
||||
"""Repos that already have an agent in flight, read off the ready set.
|
||||
|
||||
A repo is in flight if any of its ready issues still carries the in-progress
|
||||
label — the stamp the poller applied on a previous tick's dispatch. Because
|
||||
the dispatched issue keeps its ready label until the watcher closes/relabels
|
||||
it, it re-appears here and locks the repo until the run finishes.
|
||||
"""
|
||||
return {issue.repo for issue in ready if in_progress_label in issue.labels}
|
||||
|
||||
|
||||
def _dispatch_prompt(issue: Issue) -> str:
|
||||
"""The turn prompt for one issue's worker thread.
|
||||
|
||||
The full-access agent fetches the issue body itself (it has ``gh``), so the
|
||||
prompt only needs to point unambiguously at the concrete ``repo#number``; the
|
||||
standing rules are prepended by ``t3_client`` as the issue-implementer
|
||||
preamble. Kept deliberately terse — one canonical instruction, no per-issue
|
||||
templating to drift.
|
||||
"""
|
||||
return (
|
||||
f"Implement issue #{issue.number} in the `{issue.repo}` repository. "
|
||||
f"Fetch the issue with `gh issue view {issue.number} --repo {issue.repo}` "
|
||||
f"(and its comments) to get the full task, then implement it end to end."
|
||||
)
|
||||
84
app/afk/run_state_machine.py
Normal file
|
|
@ -0,0 +1,84 @@
|
|||
"""Run state machine: assembled ``RunState`` -> next ``Action`` (ADR-0002).
|
||||
|
||||
This is the heart of the AFK loop's per-issue control: each tick the loop
|
||||
assembles a :class:`~app.afk.types.RunState` (thread liveness from the
|
||||
orchestration snapshot, CI verdict from the watcher, plus its own ``pushed`` /
|
||||
``fix_forward_attempts`` / ``elapsed_seconds`` bookkeeping) and calls
|
||||
:func:`next_action` to decide what to do next.
|
||||
|
||||
The function is **pure** — it reads only its two arguments, never the clock, the
|
||||
network, or any global. That keeps the lifecycle policy a plain decision table
|
||||
the test suite can exhaust combinatorially; the loop owns all the I/O (closing
|
||||
issues, dispatching corrective turns, escalating) based on the Action returned.
|
||||
|
||||
The decision table (first match wins):
|
||||
|
||||
* pushed AND CI green -> CLOSE_SUCCESS
|
||||
The run is healthy and verified; close the issue. The thread's own status
|
||||
is irrelevant once a pushed commit is green.
|
||||
* pushed AND CI red, budget remaining -> FIX_FORWARD
|
||||
A pushed commit broke CI. Dispatch another corrective turn — but only
|
||||
while BOTH budgets hold: ``fix_forward_attempts < fix_forward_max_attempts``
|
||||
AND ``elapsed_seconds < fix_forward_max_seconds`` (strict; at/over either
|
||||
bound is exhausted).
|
||||
* pushed AND CI red, budget exhausted -> FREEZE_ESCALATE
|
||||
Out of fix-forward attempts or wall-clock; stop churning and hand to a
|
||||
human with the broken commit left in place.
|
||||
* not pushed AND thread ERROR/IDLE -> ESCALATE_PREPUSH
|
||||
The agent will never reach green: it errored, or its turn finished /
|
||||
stalled with nothing pushed. There is no pushed commit to fix forward, so
|
||||
escalate before-push (a different remediation path than FREEZE_ESCALATE).
|
||||
* everything else -> WAIT
|
||||
Still in flight: working toward a first push (thread running / unknown), or
|
||||
pushed with CI not yet decided. Poll again next tick.
|
||||
"""
|
||||
from .types import Action, CIStatus, Config, RunState, ThreadStatus
|
||||
|
||||
# Thread states that mean the agent is finished with this turn — it will not push
|
||||
# any further on its own. Reaching one of these with nothing pushed is terminal
|
||||
# (escalate), whereas RUNNING / None (no snapshot entry yet) means keep waiting.
|
||||
_TERMINAL_THREAD_STATES: frozenset[ThreadStatus] = frozenset(
|
||||
{ThreadStatus.ERROR, ThreadStatus.IDLE}
|
||||
)
|
||||
|
||||
|
||||
def next_action(state: RunState, config: Config) -> Action:
|
||||
"""Decide the next :class:`Action` for one issue's run.
|
||||
|
||||
Pure and total: every reachable ``(thread_status, ci_status, pushed,
|
||||
attempts, elapsed)`` combination maps to exactly one Action via the table in
|
||||
the module docstring. See that table for the rationale of each branch.
|
||||
"""
|
||||
if state.pushed:
|
||||
# A commit is out; the CI verdict on it drives everything from here.
|
||||
if state.ci_status is CIStatus.GREEN:
|
||||
return Action.CLOSE_SUCCESS
|
||||
if state.ci_status is CIStatus.RED:
|
||||
return (
|
||||
Action.FIX_FORWARD
|
||||
if _fix_forward_budget_remaining(state, config)
|
||||
else Action.FREEZE_ESCALATE
|
||||
)
|
||||
# CI pending / not yet reported -> wait for the verdict.
|
||||
return Action.WAIT
|
||||
|
||||
# Nothing pushed yet. If the turn is over (errored or gone idle) the run can
|
||||
# never reach green on its own -> escalate before-push; otherwise it is still
|
||||
# working toward a first push -> wait.
|
||||
if state.thread_status in _TERMINAL_THREAD_STATES:
|
||||
return Action.ESCALATE_PREPUSH
|
||||
return Action.WAIT
|
||||
|
||||
|
||||
def _fix_forward_budget_remaining(state: RunState, config: Config) -> bool:
|
||||
"""True while another fix-forward turn is allowed.
|
||||
|
||||
Both bounds must hold (strict ``<``): the run has spent fewer than
|
||||
``fix_forward_max_attempts`` corrective turns AND fewer than
|
||||
``fix_forward_max_seconds`` of wall-clock. Hitting either cap exhausts the
|
||||
budget.
|
||||
"""
|
||||
return (
|
||||
state.fix_forward_attempts < config.fix_forward_max_attempts
|
||||
and state.elapsed_seconds < config.fix_forward_max_seconds
|
||||
)
|
||||
264
app/afk/t3_client.py
Normal file
|
|
@ -0,0 +1,264 @@
|
|||
"""Adapter for the in-cluster T3 Code instance — the AFK executor + cockpit.
|
||||
|
||||
The control plane keeps the brain; T3 runs the agent. This module is the thin
|
||||
wire between them, written against T3's **real** orchestration contract
|
||||
(reverse-engineered from the v0.0.27 binary and verified live against t3-afk on
|
||||
2026-06-15 — an earlier version of this adapter was written against a guessed
|
||||
shape that a fake test accepted but the real server 400s).
|
||||
|
||||
The contract, in three facts that shape everything here:
|
||||
|
||||
1. **Bare command envelope.** ``POST /api/orchestration/dispatch`` takes a
|
||||
single command object whose discriminator is ``type`` (NOT a ``command``
|
||||
string, NOT a wrapper). The body *is* the command.
|
||||
2. **Client-authoritative IDs.** The CLIENT mints ``threadId`` / ``commandId``
|
||||
/ ``messageId`` (UUIDs) and stamps ``createdAt`` (ISO-8601); the server
|
||||
replies ``{"sequence": N}`` and does NOT echo the thread id. So ``dispatch``
|
||||
returns the id it generated, never one parsed from the response.
|
||||
3. **Threads live in a project.** A project's ``workspaceRoot`` is the repo
|
||||
checkout the agent runs in (it ``cd``s there and commits there). So a repo
|
||||
maps to a project; ``dispatch`` ensures that project exists before creating
|
||||
the thread.
|
||||
|
||||
Operations (the methods ``poller`` / ``watcher`` call, plus a multi-turn helper):
|
||||
|
||||
* ``dispatch(repo, issue, prompt) -> thread_id`` — ensure the repo's project,
|
||||
then ``thread.create`` + ``thread.turn.start`` (``ISSUE_IMPLEMENTER_PREAMBLE
|
||||
+ prompt`` as the user message). Returns the client-minted thread id.
|
||||
* ``send_turn(thread_id, prompt) -> None`` — a follow-up user turn on an
|
||||
existing thread. Multi-turn context is retained (verified live), so this is
|
||||
how a conversation continues without spawning a fresh thread.
|
||||
* ``snapshot() -> dict`` — the fleet read-model (``GET``); the watcher reads
|
||||
per-thread ``latestTurn.state`` from it.
|
||||
|
||||
The HTTP transport, the bearer provider, the id factory, and the clock are all
|
||||
**injected**, so production hands in an ``httpx.Client`` + a Vault-backed token
|
||||
reader + ``uuid4`` + a UTC clock, while tests hand in deterministic fakes. The
|
||||
bearer is re-read from the provider on **every** request because T3's
|
||||
``orchestration:operate`` token rotates.
|
||||
"""
|
||||
import uuid
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
from typing import Protocol
|
||||
|
||||
from .issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
|
||||
|
||||
# Orchestration API paths, relative to the configured base URL.
|
||||
_DISPATCH_PATH = "/api/orchestration/dispatch"
|
||||
_SNAPSHOT_PATH = "/api/orchestration/snapshot"
|
||||
|
||||
# Pilot-baked execution envelope. ``claudeAgent`` is the embedded Claude Agent
|
||||
# SDK instance; ``full-access`` is the unattended runtime (bypass-permissions);
|
||||
# ``default`` interaction mode is normal turns (vs ``plan``). The model is the
|
||||
# one the pilot validated — tunable via the constructor.
|
||||
_INSTANCE_ID = "claudeAgent"
|
||||
_DEFAULT_MODEL = "claude-sonnet-4-6"
|
||||
_RUNTIME_MODE = "full-access"
|
||||
_INTERACTION_MODE = "default"
|
||||
|
||||
# JSON shapes. Command bodies and the snapshot read-model are open string-keyed
|
||||
# objects; ``object`` values keep us honest without a bare ``Any``.
|
||||
type Json = dict[str, object]
|
||||
|
||||
|
||||
def _uuid() -> str:
|
||||
"""Default id factory: a fresh random UUID string (thread/command/message ids)."""
|
||||
return str(uuid.uuid4())
|
||||
|
||||
|
||||
def _now_iso() -> str:
|
||||
"""Default clock: the current instant as an ISO-8601 UTC timestamp."""
|
||||
return datetime.now(timezone.utc).isoformat()
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProjectRef:
|
||||
"""Where a repo's agent runs. ``project_id`` is the stable T3 project id (the
|
||||
client mints it, deterministically per repo); ``workspace_root`` is the repo
|
||||
checkout directory the project points at (the agent's cwd); ``title`` is the
|
||||
human label shown in the cockpit."""
|
||||
|
||||
project_id: str
|
||||
workspace_root: str
|
||||
title: str
|
||||
|
||||
|
||||
def default_project_resolver(workspace_base: str = "/data") -> "Callable[[str], ProjectRef]":
|
||||
"""A repo -> :class:`ProjectRef` resolver with stable, deterministic ids.
|
||||
|
||||
``project_id`` is a UUID5 of the repo (so the same repo always resolves to the
|
||||
same project across ticks and restarts — ``dispatch``'s ensure-project step
|
||||
is therefore idempotent); ``workspace_root`` is ``<workspace_base>/<slug>``
|
||||
where the slug flattens ``owner/name`` to a single path segment. The checkout
|
||||
itself (cloning the repo into ``workspace_root``) is an enrollment concern,
|
||||
not this adapter's — the agent or a provisioning step populates it.
|
||||
"""
|
||||
|
||||
def resolve(repo: str) -> ProjectRef:
|
||||
slug = repo.replace("/", "__")
|
||||
return ProjectRef(
|
||||
project_id=str(uuid.uuid5(uuid.NAMESPACE_URL, f"afk-project:{repo}")),
|
||||
workspace_root=f"{workspace_base.rstrip('/')}/{slug}",
|
||||
title=repo,
|
||||
)
|
||||
|
||||
return resolve
|
||||
|
||||
|
||||
class HttpResponse(Protocol):
|
||||
"""The httpx-shaped response surface this adapter relies on: ``raise_for_status``
|
||||
turns a non-2xx into an exception (so a failed command aborts the sequence)
|
||||
and ``json`` parses the body."""
|
||||
|
||||
def raise_for_status(self) -> object: ...
|
||||
|
||||
def json(self) -> Json: ...
|
||||
|
||||
|
||||
class HttpClient(Protocol):
|
||||
"""Minimal injected transport: a JSON ``post`` and a ``get``, both taking
|
||||
explicit headers. A strict subset of ``httpx.Client`` so the real client
|
||||
passes straight through and tests pass a recorder."""
|
||||
|
||||
def post(self, url: str, json: Json, headers: dict[str, str]) -> HttpResponse: ...
|
||||
|
||||
def get(self, url: str, headers: dict[str, str]) -> HttpResponse: ...
|
||||
|
||||
|
||||
class T3Client:
|
||||
"""Dispatch/snapshot adapter for one in-cluster T3 instance.
|
||||
|
||||
``base_url`` is the T3 service root (a trailing slash is tolerated); ``http``
|
||||
is the injected transport; ``bearer_provider`` returns the current
|
||||
``orchestration:operate`` token, re-read per request; ``project_resolver``
|
||||
maps a repo to its :class:`ProjectRef`; ``id_factory`` / ``clock`` are
|
||||
injected for deterministic tests (defaulting to ``uuid4`` / UTC now).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_url: str,
|
||||
http: HttpClient,
|
||||
bearer_provider: Callable[[], str],
|
||||
project_resolver: Callable[[str], ProjectRef] | None = None,
|
||||
*,
|
||||
id_factory: Callable[[], str] = _uuid,
|
||||
clock: Callable[[], str] = _now_iso,
|
||||
model: str = _DEFAULT_MODEL,
|
||||
) -> None:
|
||||
self._base_url = base_url.rstrip("/")
|
||||
self._http = http
|
||||
self._bearer_provider = bearer_provider
|
||||
self._project_for = project_resolver or default_project_resolver()
|
||||
self._id = id_factory
|
||||
self._now = clock
|
||||
self._model = model
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# Public API (the ``t3_client.T3Client`` contract the poller/watcher use).
|
||||
# ----------------------------------------------------------------- #
|
||||
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
|
||||
"""Spawn one worker thread for ``issue`` of ``repo`` and return its id.
|
||||
|
||||
Ensures the repo's project exists, generates the thread id locally, then
|
||||
POSTs ``thread.create`` followed by ``thread.turn.start`` (delivering
|
||||
``ISSUE_IMPLEMENTER_PREAMBLE + prompt``). Any failed POST raises and
|
||||
short-circuits the rest of the sequence. The returned id is the one this
|
||||
method minted — the server never sends it back.
|
||||
"""
|
||||
project = self._ensure_project(repo)
|
||||
thread_id = self._id()
|
||||
|
||||
self._post(self._thread_create_command(thread_id, project))
|
||||
self._post(self._turn_command(thread_id, ISSUE_IMPLEMENTER_PREAMBLE + prompt))
|
||||
return thread_id
|
||||
|
||||
def send_turn(self, thread_id: str, prompt: str) -> None:
|
||||
"""Deliver a follow-up user turn to an existing thread (multi-turn).
|
||||
|
||||
Used to continue a conversation — the agent retains the thread's prior
|
||||
context across turns. No preamble: the standing rules were already
|
||||
delivered on the opening turn.
|
||||
"""
|
||||
self._post(self._turn_command(thread_id, prompt))
|
||||
|
||||
def snapshot(self) -> Json:
|
||||
"""Return the parsed fleet read-model from ``/api/orchestration/snapshot``."""
|
||||
return self._get(_SNAPSHOT_PATH).json()
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# Command builders (the real wire shapes).
|
||||
# ----------------------------------------------------------------- #
|
||||
def _ensure_project(self, repo: str) -> ProjectRef:
|
||||
"""Make sure the repo's project exists, creating it if absent. Idempotent:
|
||||
the resolver's project id is stable per repo, so a project already in the
|
||||
snapshot is left untouched (no duplicate, no error)."""
|
||||
project = self._project_for(repo)
|
||||
existing = {
|
||||
p.get("id") for p in self._get(_SNAPSHOT_PATH).json().get("projects", [])
|
||||
}
|
||||
if project.project_id not in existing:
|
||||
self._post(
|
||||
{
|
||||
"type": "project.create",
|
||||
"commandId": self._id(),
|
||||
"projectId": project.project_id,
|
||||
"title": project.title,
|
||||
"workspaceRoot": project.workspace_root,
|
||||
"createWorkspaceRootIfMissing": True,
|
||||
"createdAt": self._now(),
|
||||
}
|
||||
)
|
||||
return project
|
||||
|
||||
def _thread_create_command(self, thread_id: str, project: ProjectRef) -> Json:
|
||||
return {
|
||||
"type": "thread.create",
|
||||
"commandId": self._id(),
|
||||
"threadId": thread_id,
|
||||
"projectId": project.project_id,
|
||||
"title": project.title,
|
||||
"modelSelection": {"instanceId": _INSTANCE_ID, "model": self._model},
|
||||
"runtimeMode": _RUNTIME_MODE,
|
||||
"interactionMode": _INTERACTION_MODE,
|
||||
"branch": None,
|
||||
"worktreePath": None,
|
||||
"createdAt": self._now(),
|
||||
}
|
||||
|
||||
def _turn_command(self, thread_id: str, text: str) -> Json:
|
||||
return {
|
||||
"type": "thread.turn.start",
|
||||
"commandId": self._id(),
|
||||
"threadId": thread_id,
|
||||
"message": {
|
||||
"messageId": self._id(),
|
||||
"role": "user",
|
||||
"text": text,
|
||||
"attachments": [],
|
||||
},
|
||||
"runtimeMode": _RUNTIME_MODE,
|
||||
"interactionMode": _INTERACTION_MODE,
|
||||
"createdAt": self._now(),
|
||||
}
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# Transport internals.
|
||||
# ----------------------------------------------------------------- #
|
||||
def _post(self, command: Json) -> HttpResponse:
|
||||
resp = self._http.post(self._url(_DISPATCH_PATH), json=command, headers=self._headers())
|
||||
resp.raise_for_status()
|
||||
return resp
|
||||
|
||||
def _get(self, path: str) -> HttpResponse:
|
||||
resp = self._http.get(self._url(path), headers=self._headers())
|
||||
resp.raise_for_status()
|
||||
return resp
|
||||
|
||||
def _url(self, path: str) -> str:
|
||||
return f"{self._base_url}{path}"
|
||||
|
||||
def _headers(self) -> dict[str, str]:
|
||||
return {"Authorization": f"Bearer {self._bearer_provider()}"}
|
||||
243
app/afk/tracker.py
Normal file
|
|
@ -0,0 +1,243 @@
|
|||
"""Issue-tracker adapter — the loop's read/write port onto GitHub issues.
|
||||
|
||||
``Tracker`` is the only place the AFK loop touches the issue tracker. It wraps an
|
||||
injected ``GitHubClient`` (the port) so the policy/state-machine code — and the
|
||||
tests — never depend on a real ``gh`` or the network: production injects
|
||||
``GhCliClient`` (shells out to ``gh`` with no-shell argv); tests inject a fake.
|
||||
|
||||
The split is deliberate. The ``GitHubClient`` port speaks only in *primitives*
|
||||
(list raw issues for a label, fetch a single issue's label events, and the four
|
||||
mutations). All the loop-specific *decisions* live on ``Tracker``:
|
||||
|
||||
* ``labeled_by_trusted`` — decided **fail-closed** from the actor who made the
|
||||
most-recent application of the ready label. On private repos only
|
||||
collaborators can label, so the label *is* the authorization (design doc,
|
||||
"Trigger & dispatch predicate"); an unattributable label is never trusted.
|
||||
* ``blocked_by`` — the issue numbers in the body's "Blocked by #N" clauses
|
||||
(the per-issue dependency the design doc gates dispatch on).
|
||||
* ``priority`` — read off a ``priority:<n>`` label, lowest wins (lower runs
|
||||
first, matching ``Issue.priority`` semantics in ``types``).
|
||||
|
||||
Keeping the decisions here, not in the client, is what lets the whole read path
|
||||
be tested against a thin fake. Mutations (``add_label`` / ``remove_label`` /
|
||||
``comment`` / ``close``) are pass-throughs the loop drives during a run.
|
||||
"""
|
||||
import json
|
||||
import re
|
||||
from collections.abc import Callable
|
||||
from subprocess import PIPE, run
|
||||
from typing import Protocol, runtime_checkable
|
||||
|
||||
from .types import Issue
|
||||
|
||||
# Trusted author associations: GitHub tags each issue event actor with their
|
||||
# association to the repo. Only these may arm an issue for the AFK loop — the
|
||||
# trust gate from the design doc. Overridable per Tracker for a tighter policy.
|
||||
DEFAULT_TRUSTED_ASSOCIATIONS: frozenset[str] = frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
|
||||
|
||||
# Default gating label; mirrors Config.ready_label so a Tracker built without an
|
||||
# explicit override matches the production default.
|
||||
DEFAULT_READY_LABEL = "ready-for-agent"
|
||||
|
||||
# "Blocked by #3, #4 and #10" → [3, 4, 10]. We match a "blocked by" lead-in
|
||||
# (case-insensitive) and then harvest every "#<n>" in the clause that follows,
|
||||
# up to the next line break — so a bare "#7 for context" elsewhere is ignored.
|
||||
_BLOCKED_BY_CLAUSE = re.compile(r"blocked\s+by\b([^\n\r]*)", re.IGNORECASE)
|
||||
_ISSUE_REF = re.compile(r"#(\d+)")
|
||||
|
||||
# "priority:2" → 2. Anything non-numeric (e.g. "priority:high") is not a numeric
|
||||
# priority and is skipped.
|
||||
_PRIORITY_LABEL = re.compile(r"^priority:(\d+)$")
|
||||
|
||||
|
||||
@runtime_checkable
|
||||
class GitHubClient(Protocol):
|
||||
"""The primitive surface ``Tracker`` depends on — one issue tracker, faked
|
||||
in tests. Implementations must not embed loop policy; they only fetch raw
|
||||
data and perform the four mutations.
|
||||
|
||||
``list_issues`` returns the ``gh issue list --json number,labels,body`` shape
|
||||
(``labels`` is a list of ``{"name": ...}``; ``body`` may be ``None``).
|
||||
``label_events`` returns the ``labeled`` timeline events for one issue, each
|
||||
with ``label.name``, ``actor.login`` and ``author_association``.
|
||||
"""
|
||||
|
||||
def list_issues(self, repo: str, label: str) -> list[dict]: ...
|
||||
def label_events(self, repo: str, number: int) -> list[dict]: ...
|
||||
def add_label(self, repo: str, number: int, label: str) -> None: ...
|
||||
def remove_label(self, repo: str, number: int, label: str) -> None: ...
|
||||
def comment(self, repo: str, number: int, body: str) -> None: ...
|
||||
def close(self, repo: str, number: int) -> None: ...
|
||||
|
||||
|
||||
class Tracker:
|
||||
"""Adapter that turns raw issue-tracker data into ``Issue`` records and
|
||||
relays mutations, over an injected :class:`GitHubClient`."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
client: GitHubClient,
|
||||
ready_label: str = DEFAULT_READY_LABEL,
|
||||
trusted_associations: frozenset[str] = DEFAULT_TRUSTED_ASSOCIATIONS,
|
||||
) -> None:
|
||||
self.client = client
|
||||
self.ready_label = ready_label
|
||||
self.trusted_associations = trusted_associations
|
||||
|
||||
# ----------------------------------------------------------------- reads #
|
||||
def list_ready(self, repos: list[str]) -> list[Issue]:
|
||||
"""Every ready-labeled open issue across ``repos``, as ``Issue`` records.
|
||||
|
||||
Ordering follows the client's per-repo order; dispatch ordering by
|
||||
priority is the dispatch policy's job, not the tracker's.
|
||||
"""
|
||||
issues: list[Issue] = []
|
||||
for repo in repos:
|
||||
for raw in self.client.list_issues(repo, self.ready_label):
|
||||
issues.append(self._to_issue(repo, raw))
|
||||
return issues
|
||||
|
||||
def _to_issue(self, repo: str, raw: dict) -> Issue:
|
||||
number = int(raw["number"])
|
||||
labels = [lbl["name"] for lbl in raw.get("labels", [])]
|
||||
return Issue(
|
||||
number=number,
|
||||
repo=repo,
|
||||
labels=labels,
|
||||
blocked_by=_parse_blocked_by(raw.get("body")),
|
||||
labeled_by_trusted=self._is_labeled_by_trusted(repo, number),
|
||||
priority=_parse_priority(labels),
|
||||
)
|
||||
|
||||
def _is_labeled_by_trusted(self, repo: str, number: int) -> bool:
|
||||
"""True iff the MOST RECENT application of the ready label was made by a
|
||||
trusted actor. Fail-closed: no attributable application → not trusted."""
|
||||
last_association: str | None = None
|
||||
for event in self.client.label_events(repo, number):
|
||||
if event.get("event") != "labeled":
|
||||
continue
|
||||
if (event.get("label") or {}).get("name") != self.ready_label:
|
||||
continue
|
||||
last_association = event.get("author_association")
|
||||
return last_association in self.trusted_associations
|
||||
|
||||
# ------------------------------------------------------------- mutations #
|
||||
def add_label(self, repo: str, issue: int, label: str) -> None:
|
||||
self.client.add_label(repo, issue, label)
|
||||
|
||||
def remove_label(self, repo: str, issue: int, label: str) -> None:
|
||||
self.client.remove_label(repo, issue, label)
|
||||
|
||||
def comment(self, repo: str, issue: int, body: str) -> None:
|
||||
self.client.comment(repo, issue, body)
|
||||
|
||||
def close(self, repo: str, issue: int) -> None:
|
||||
self.client.close(repo, issue)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Parsing helpers — pure functions, no I/O.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _parse_blocked_by(body: str | None) -> list[int]:
|
||||
"""Issue numbers referenced in the body's "Blocked by #N" clauses.
|
||||
|
||||
Order-preserving and de-duplicated; bare "#N" mentions outside a "blocked by"
|
||||
clause are ignored. A missing/empty body yields ``[]``.
|
||||
"""
|
||||
if not body:
|
||||
return []
|
||||
seen: dict[int, None] = {} # insertion-ordered set
|
||||
for clause in _BLOCKED_BY_CLAUSE.findall(body):
|
||||
for ref in _ISSUE_REF.findall(clause):
|
||||
seen.setdefault(int(ref), None)
|
||||
return list(seen)
|
||||
|
||||
|
||||
def _parse_priority(labels: list[str]) -> int:
|
||||
"""Numeric priority from a ``priority:<n>`` label, lowest wins; 0 if none."""
|
||||
priorities = [
|
||||
int(match.group(1))
|
||||
for label in labels
|
||||
if (match := _PRIORITY_LABEL.match(label))
|
||||
]
|
||||
return min(priorities) if priorities else 0
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Concrete client — shells out to `gh`. Injected `run` keeps it testable.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _default_run(argv: list[str]) -> str:
|
||||
"""Run ``argv`` with no shell and return stdout (text). Raises on non-zero.
|
||||
|
||||
List argv (never a shell string), matching the no-injection-surface pattern
|
||||
the breakglass/main subprocess helpers use — the repo/label/body values are
|
||||
never interpreted by a shell.
|
||||
"""
|
||||
proc = run(argv, stdout=PIPE, stderr=PIPE, text=True, check=False)
|
||||
if proc.returncode != 0:
|
||||
raise RuntimeError(f"{argv[0]} failed ({proc.returncode}): {proc.stderr[:200]}")
|
||||
return proc.stdout
|
||||
|
||||
|
||||
class GhCliClient:
|
||||
""":class:`GitHubClient` backed by the ``gh`` CLI.
|
||||
|
||||
``repo_owner`` is the GitHub owner/org the sub-project repos live under, so a
|
||||
bare repo name (``"infra"``) becomes the ``--repo owner/infra`` slug ``gh``
|
||||
wants. ``run`` is the subprocess runner (defaults to the real no-shell one);
|
||||
tests inject a fake to capture argv without spawning ``gh``.
|
||||
"""
|
||||
|
||||
def __init__(self, repo_owner: str, run: Callable[[list[str]], str] = _default_run) -> None:
|
||||
self.repo_owner = repo_owner
|
||||
self._run = run
|
||||
|
||||
def _slug(self, repo: str) -> str:
|
||||
return f"{self.repo_owner}/{repo}"
|
||||
|
||||
def list_issues(self, repo: str, label: str) -> list[dict]:
|
||||
out = self._run([
|
||||
"gh", "issue", "list", "--repo", self._slug(repo),
|
||||
"--label", label, "--state", "open",
|
||||
"--json", "number,labels,body", "--limit", "100",
|
||||
])
|
||||
return _loads_list(out)
|
||||
|
||||
def label_events(self, repo: str, number: int) -> list[dict]:
|
||||
out = self._run([
|
||||
"gh", "api",
|
||||
f"repos/{self._slug(repo)}/issues/{number}/timeline",
|
||||
"--paginate",
|
||||
"-H", "Accept: application/vnd.github+json",
|
||||
])
|
||||
events = _loads_list(out)
|
||||
return [e for e in events if e.get("event") == "labeled"]
|
||||
|
||||
def add_label(self, repo: str, number: int, label: str) -> None:
|
||||
self._run([
|
||||
"gh", "issue", "edit", str(number), "--repo", self._slug(repo),
|
||||
"--add-label", label,
|
||||
])
|
||||
|
||||
def remove_label(self, repo: str, number: int, label: str) -> None:
|
||||
self._run([
|
||||
"gh", "issue", "edit", str(number), "--repo", self._slug(repo),
|
||||
"--remove-label", label,
|
||||
])
|
||||
|
||||
def comment(self, repo: str, number: int, body: str) -> None:
|
||||
self._run([
|
||||
"gh", "issue", "comment", str(number), "--repo", self._slug(repo),
|
||||
"--body", body,
|
||||
])
|
||||
|
||||
def close(self, repo: str, number: int) -> None:
|
||||
self._run(["gh", "issue", "close", str(number), "--repo", self._slug(repo)])
|
||||
|
||||
|
||||
def _loads_list(out: str) -> list[dict]:
|
||||
"""Parse ``gh`` JSON stdout into a list of dicts. Empty stdout → ``[]``."""
|
||||
text = out.strip()
|
||||
if not text:
|
||||
return []
|
||||
return json.loads(text)
|
||||
134
app/afk/types.py
Normal file
|
|
@ -0,0 +1,134 @@
|
|||
"""Shared types for the AFK loop — the contract every module builds against.
|
||||
|
||||
Stdlib only (``dataclasses`` + ``enum``), matching the breakglass code: no
|
||||
pydantic, modern ``X | None`` unions, precise field types. Every other module in
|
||||
``app.afk`` imports its inputs/outputs from here so the pieces stay aligned; the
|
||||
module-level docstrings in ``__init__`` list which functions consume which type.
|
||||
|
||||
Nothing here has behaviour — these are pure data carriers and closed enums. Keep
|
||||
it that way: logic lives in ``dispatch_policy`` / ``run_state_machine`` / the
|
||||
client modules, never on the dataclasses.
|
||||
"""
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Enums — closed vocabularies the state machine and clients speak in.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class ThreadStatus(Enum):
|
||||
"""Liveness of a T3 thread, as projected from the orchestration snapshot.
|
||||
|
||||
``RUNNING`` — the agent is still working the turn; ``IDLE`` — the turn
|
||||
finished cleanly (it has gone quiet); ``ERROR`` — the thread/turn failed.
|
||||
"""
|
||||
|
||||
RUNNING = "running"
|
||||
IDLE = "idle"
|
||||
ERROR = "error"
|
||||
|
||||
|
||||
class CIStatus(Enum):
|
||||
"""CI verdict for a pushed commit. ``PENDING`` covers both "no run yet" and
|
||||
"in progress" — the state machine waits on either."""
|
||||
|
||||
PENDING = "pending"
|
||||
GREEN = "green"
|
||||
RED = "red"
|
||||
|
||||
|
||||
class Phase(Enum):
|
||||
"""Where a single issue's run is in its lifecycle. Ordered: each phase is a
|
||||
gate the run passes through on the way to ``DONE``. ``phase_checklist``
|
||||
renders these; the loop advances through them as evidence arrives."""
|
||||
|
||||
WORKTREE = "worktree" # isolated workspace created
|
||||
TESTS_RED = "tests_red" # failing test written first (TDD red)
|
||||
GREEN = "green" # implementation makes tests pass (TDD green)
|
||||
PUSHED = "pushed" # commit(s) pushed to master
|
||||
CI = "ci" # CI pipeline running on the pushed commit
|
||||
DEPLOYED = "deployed" # deploy/rollout reached the cluster
|
||||
DONE = "done" # verified complete; issue can be closed
|
||||
|
||||
|
||||
class Action(Enum):
|
||||
"""The decision ``run_state_machine.next_action`` returns for one tick.
|
||||
|
||||
``WAIT`` — nothing to do yet, poll again; ``CLOSE_SUCCESS`` — run is green,
|
||||
CI passed, close the issue; ``ESCALATE_PREPUSH`` — the agent errored/stalled
|
||||
before pushing anything, hand back to a human; ``FIX_FORWARD`` — CI went red
|
||||
on a pushed commit, dispatch another corrective turn; ``FREEZE_ESCALATE`` —
|
||||
fix-forward budget exhausted (attempts or wall-clock), stop and escalate.
|
||||
"""
|
||||
|
||||
WAIT = "wait"
|
||||
CLOSE_SUCCESS = "close_success"
|
||||
ESCALATE_PREPUSH = "escalate_prepush"
|
||||
FIX_FORWARD = "fix_forward"
|
||||
FREEZE_ESCALATE = "freeze_escalate"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Data carriers.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@dataclass
|
||||
class Issue:
|
||||
"""A tracker issue the loop might dispatch.
|
||||
|
||||
``labeled_by_trusted`` records whether the gating label was applied by a
|
||||
trusted identity — the loop must never dispatch an issue made ready by an
|
||||
untrusted actor (prompt-injection / drive-by). ``blocked_by`` lists issue
|
||||
numbers that must close first; ``priority`` orders the ready set (lower runs
|
||||
first, matching tracker conventions).
|
||||
"""
|
||||
|
||||
number: int
|
||||
repo: str
|
||||
labels: list[str]
|
||||
blocked_by: list[int]
|
||||
labeled_by_trusted: bool
|
||||
priority: int
|
||||
|
||||
|
||||
@dataclass
|
||||
class DispatchDecision:
|
||||
"""An issue the dispatch policy selected to run now, with a human-readable
|
||||
``reason`` (logged + surfaced in notifications, never parsed)."""
|
||||
|
||||
issue: Issue
|
||||
reason: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class Config:
|
||||
"""Loop configuration. DISABLED BY DEFAULT — ``kill_switch=True`` and an
|
||||
empty ``allowlist`` mean a freshly-constructed Config dispatches nothing.
|
||||
Enabling is a deliberate manual step (see ``config.from_env`` /
|
||||
``from_configmap``).
|
||||
"""
|
||||
|
||||
allowlist: list[str]
|
||||
kill_switch: bool
|
||||
in_progress_label: str = "agent-in-progress"
|
||||
ready_label: str = "ready-for-agent"
|
||||
budget_usd: float = 100.0
|
||||
fix_forward_max_attempts: int = 5
|
||||
fix_forward_max_seconds: int = 3600
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunState:
|
||||
"""Everything the state machine needs to decide one issue's next move.
|
||||
|
||||
Assembled each tick from the orchestration snapshot (``thread_status``), the
|
||||
CI watcher (``ci_status``), and the loop's own bookkeeping (``pushed``,
|
||||
``fix_forward_attempts``, ``elapsed_seconds``). ``thread_status`` /
|
||||
``ci_status`` are ``None`` when not yet known (no snapshot entry / nothing
|
||||
pushed to check yet).
|
||||
"""
|
||||
|
||||
thread_status: ThreadStatus | None
|
||||
ci_status: CIStatus | None
|
||||
pushed: bool
|
||||
fix_forward_attempts: int
|
||||
elapsed_seconds: float
|
||||
355
app/afk/watcher.py
Normal file
|
|
@ -0,0 +1,355 @@
|
|||
"""CronJob entrypoint: drive ONE in-flight AFK run by a single tick.
|
||||
|
||||
The watcher is the *second half* of the loop — the part that drives a run the
|
||||
poller already started through to a terminal state. Given one in-flight run
|
||||
(``InFlightRun``: the issue, the T3 thread to poll, the pushed commit if any,
|
||||
and the fix-forward bookkeeping), one ``tick``:
|
||||
|
||||
1. **assemble a ``RunState``** from the live edges + the run's bookkeeping:
|
||||
* ``thread_status`` — from ``t3_client.snapshot()``, by finding this run's
|
||||
thread and mapping its ``latestTurn.state`` (``completed`` → idle,
|
||||
``running``/``in_progress``/``pending`` → running, ``errored`` → error)
|
||||
to a ``ThreadStatus`` (missing thread, no turn yet, or any unrecognised
|
||||
state folds to ``None`` → "no status yet" → the state machine WAITs; we
|
||||
never escalate or close on a status we don't understand);
|
||||
* ``ci_status`` — ``ci_watcher.status(repo, commit)`` *only* when a commit
|
||||
is pushed (no commit ⇒ nothing to check ⇒ ``None``);
|
||||
* ``pushed`` / ``fix_forward_attempts`` / ``elapsed_seconds`` — straight
|
||||
from the run.
|
||||
2. **decide** via the pure ``run_state_machine.next_action`` (it owns the
|
||||
lifecycle policy; the watcher owns only the I/O the decision implies).
|
||||
3. **act** on the returned ``Action``:
|
||||
* ``CLOSE_SUCCESS`` → ``tracker.close`` + drop the in-progress label +
|
||||
DONE checklist + ``done`` doorbell. The run landed.
|
||||
* ``ESCALATE_PREPUSH`` / ``FREEZE_ESCALATE`` → drop the in-progress label,
|
||||
add the ``ready-for-human`` label, post the checklist, ring the
|
||||
``needs-human`` / ``frozen`` doorbell. The run is handed to a human; the
|
||||
issue is left OPEN (not closed) with the work in place.
|
||||
* ``FIX_FORWARD`` → dispatch a corrective turn (``t3_client.dispatch``),
|
||||
bump the fix-forward attempt count, refresh the checklist, and keep the
|
||||
run in flight (NOT terminal: no label churn, no doorbell — the notifier
|
||||
only speaks terminal kinds). The new thread id rides back on the result
|
||||
so the next tick polls the corrective turn.
|
||||
* ``WAIT`` → just refresh the progress checklist and keep waiting.
|
||||
|
||||
Every adapter (T3, tracker, CI, notifier) is injected behind a structural
|
||||
Protocol, so production wires the real clients and the tests wire the in-memory
|
||||
fakes; this module opens no socket and reads no message bodies. (The pilot keeps
|
||||
T3 ``state.sqlite`` message-body reads out of the core loop — snapshot status +
|
||||
CI status are all the state machine needs — so this watcher never execs into the
|
||||
pod; that observability nicety is a separate, optional concern.)
|
||||
|
||||
DISABLED BY DEFAULT applies transitively: the poller never starts a run while
|
||||
the loop is off (``config.kill_switch`` / empty allowlist — see ``config.py``),
|
||||
so with the shipped defaults there is never an ``InFlightRun`` to tick.
|
||||
"""
|
||||
from dataclasses import dataclass
|
||||
from typing import Protocol
|
||||
|
||||
from . import phase_checklist, run_state_machine
|
||||
from .notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
|
||||
from .poller import T3Port as _DispatchPort # dispatch(repo, issue, prompt) -> id
|
||||
from .types import Action, CIStatus, Config, Issue, Phase, RunState, ThreadStatus
|
||||
|
||||
# T3 ``latestTurn.state`` -> ThreadStatus. The real snapshot reports a thread's
|
||||
# liveness as the state of its latest turn (verified against t3-afk v0.0.27):
|
||||
# ``completed`` == the turn finished cleanly (agent is idle, awaiting input);
|
||||
# any not-yet-finished state (``running``/``in_progress``/``pending``/``queued``/
|
||||
# ``pendingInit``) == still working; ``errored`` == the turn failed. Anything not
|
||||
# in here (a state T3 adds later, or a malformed/absent entry) maps to None —
|
||||
# "no usable status yet" — so the state machine waits rather than acting on
|
||||
# something it can't interpret.
|
||||
_THREAD_STATUS_BY_STRING: dict[str, ThreadStatus] = {
|
||||
"completed": ThreadStatus.IDLE,
|
||||
"running": ThreadStatus.RUNNING,
|
||||
"in_progress": ThreadStatus.RUNNING,
|
||||
"pending": ThreadStatus.RUNNING,
|
||||
"queued": ThreadStatus.RUNNING,
|
||||
"pendingInit": ThreadStatus.RUNNING,
|
||||
"errored": ThreadStatus.ERROR,
|
||||
}
|
||||
|
||||
# Action -> the terminal doorbell kind to ring. Only the terminal actions appear;
|
||||
# WAIT / FIX_FORWARD are non-terminal and ring nothing (the notifier rejects a
|
||||
# non-terminal kind on purpose — see ``notifier.TERMINAL_KINDS``).
|
||||
_TERMINAL_KIND_BY_ACTION: dict[Action, str] = {
|
||||
Action.CLOSE_SUCCESS: KIND_DONE,
|
||||
Action.ESCALATE_PREPUSH: KIND_NEEDS_HUMAN,
|
||||
Action.FREEZE_ESCALATE: KIND_FROZEN,
|
||||
}
|
||||
|
||||
# Default label applied when a run is handed back to a human. Mirrors the
|
||||
# tracker's ``ready-for-agent`` convention; overridable per-Watcher.
|
||||
DEFAULT_READY_FOR_HUMAN_LABEL = "ready-for-human"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Injected adapter Protocols — structural, so the real clients and the test
|
||||
# fakes both satisfy them with no subclassing. Only the methods the watcher
|
||||
# actually calls appear. ``DispatchPort`` is reused from ``poller``.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class SnapshotPort(_DispatchPort, Protocol):
|
||||
"""T3 surface the watcher needs: ``dispatch`` (for the corrective turn) plus
|
||||
``snapshot`` (for thread liveness)."""
|
||||
|
||||
def snapshot(self) -> dict: ...
|
||||
|
||||
|
||||
class TrackerPort(Protocol):
|
||||
"""The slice of ``tracker.Tracker`` the watch tick needs."""
|
||||
|
||||
def add_label(self, repo: str, issue: int, label: str) -> None: ...
|
||||
def remove_label(self, repo: str, issue: int, label: str) -> None: ...
|
||||
def comment(self, repo: str, issue: int, body: str) -> None: ...
|
||||
def close(self, repo: str, issue: int) -> None: ...
|
||||
|
||||
|
||||
class CIPort(Protocol):
|
||||
"""The slice of ``ci_watcher.CIWatcher`` the watch tick needs."""
|
||||
|
||||
def status(self, repo: str, commit: str) -> CIStatus: ...
|
||||
|
||||
|
||||
class NotifierPort(Protocol):
|
||||
"""The slice of ``notifier.Notifier`` the watch tick needs."""
|
||||
|
||||
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None: ...
|
||||
|
||||
|
||||
@dataclass
|
||||
class InFlightRun:
|
||||
"""One run the watcher is driving, as the loop tracks it between ticks.
|
||||
|
||||
``thread_id`` is the T3 thread to poll this tick; ``commit`` is the pushed
|
||||
commit CI watches (``None`` until the agent has pushed). ``fix_forward_attempts``
|
||||
and ``elapsed_seconds`` are the loop's own bookkeeping, fed straight into the
|
||||
assembled ``RunState`` — ``pushed`` is derived as ``commit is not None``.
|
||||
"""
|
||||
|
||||
issue: Issue
|
||||
thread_id: str
|
||||
commit: str | None
|
||||
fix_forward_attempts: int = 0
|
||||
elapsed_seconds: float = 0.0
|
||||
|
||||
|
||||
@dataclass
|
||||
class TickResult:
|
||||
"""The outcome of one watch tick.
|
||||
|
||||
``action`` is the state machine's verdict; ``terminal`` is True iff the run
|
||||
reached an end state (closed or handed to a human) and should no longer be
|
||||
ticked. ``thread_id`` / ``fix_forward_attempts`` carry the (possibly updated)
|
||||
bookkeeping the caller threads into the next ``InFlightRun`` — they change
|
||||
only on a FIX_FORWARD (new corrective thread, incremented attempts) and are
|
||||
otherwise echoed back unchanged.
|
||||
"""
|
||||
|
||||
action: Action
|
||||
terminal: bool
|
||||
thread_id: str
|
||||
fix_forward_attempts: int
|
||||
|
||||
|
||||
class Watcher:
|
||||
"""Drives one in-flight run per ``tick`` over injected adapters.
|
||||
|
||||
The three escalation-vs-success decisions live in the pure
|
||||
``run_state_machine``; this class only performs the I/O each decision
|
||||
implies. ``ready_for_human_label`` is the label stamped on a run handed back
|
||||
to a human (default :data:`DEFAULT_READY_FOR_HUMAN_LABEL`).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
t3_client: SnapshotPort,
|
||||
tracker: TrackerPort,
|
||||
ci_watcher: CIPort,
|
||||
notifier: NotifierPort,
|
||||
ready_for_human_label: str = DEFAULT_READY_FOR_HUMAN_LABEL,
|
||||
) -> None:
|
||||
self._t3 = t3_client
|
||||
self._tracker = tracker
|
||||
self._ci = ci_watcher
|
||||
self._notifier = notifier
|
||||
self._ready_for_human_label = ready_for_human_label
|
||||
|
||||
def tick(self, run: InFlightRun, config: Config) -> TickResult:
|
||||
"""Drive ``run`` one step (see module docstring)."""
|
||||
state = self._assemble_state(run)
|
||||
action = run_state_machine.next_action(state, config)
|
||||
|
||||
if action is Action.CLOSE_SUCCESS:
|
||||
return self._close_success(run, config)
|
||||
if action in (Action.ESCALATE_PREPUSH, Action.FREEZE_ESCALATE):
|
||||
return self._escalate(run, state, action, config)
|
||||
if action is Action.FIX_FORWARD:
|
||||
return self._fix_forward(run, state)
|
||||
# WAIT: still in flight — just show progress and poll again next tick.
|
||||
return self._wait(run, state, action)
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# RunState assembly.
|
||||
# ----------------------------------------------------------------- #
|
||||
def _assemble_state(self, run: InFlightRun) -> RunState:
|
||||
thread_status = self._thread_status(run.thread_id)
|
||||
# Only fold CI when there's a commit to check — an unpushed run has no
|
||||
# pipeline, and we must not query CI (the assertion in the tests, and
|
||||
# avoiding a needless API call, both rely on this).
|
||||
ci_status = (
|
||||
self._ci.status(run.issue.repo, run.commit)
|
||||
if run.commit is not None
|
||||
else None
|
||||
)
|
||||
return RunState(
|
||||
thread_status=thread_status,
|
||||
ci_status=ci_status,
|
||||
pushed=run.commit is not None,
|
||||
fix_forward_attempts=run.fix_forward_attempts,
|
||||
elapsed_seconds=run.elapsed_seconds,
|
||||
)
|
||||
|
||||
def _thread_status(self, thread_id: str) -> ThreadStatus | None:
|
||||
"""This thread's liveness from the fleet snapshot, or ``None`` when the
|
||||
thread is absent, has no turn yet, or its ``latestTurn.state`` is one we
|
||||
don't recognise. Liveness is the state of the thread's latest turn (the
|
||||
real snapshot shape), not a top-level ``status`` field."""
|
||||
for thread in self._t3.snapshot().get("threads", []):
|
||||
if thread.get("id") == thread_id:
|
||||
latest_turn = thread.get("latestTurn") or {}
|
||||
return _THREAD_STATUS_BY_STRING.get(latest_turn.get("state"))
|
||||
return None
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# Per-action handlers.
|
||||
# ----------------------------------------------------------------- #
|
||||
def _close_success(self, run: InFlightRun, config: Config) -> TickResult:
|
||||
"""Landed: close the issue, drop the lock, post DONE, ring the doorbell."""
|
||||
self._post_checklist(run, Phase.DONE)
|
||||
self._tracker.remove_label(
|
||||
run.issue.repo, run.issue.number, config.in_progress_label
|
||||
)
|
||||
self._tracker.close(run.issue.repo, run.issue.number)
|
||||
self._notify(run, Action.CLOSE_SUCCESS, "Run landed: pushed and CI green.")
|
||||
return _terminal(Action.CLOSE_SUCCESS, run)
|
||||
|
||||
def _escalate(
|
||||
self, run: InFlightRun, state: RunState, action: Action, config: Config
|
||||
) -> TickResult:
|
||||
"""Hand back to a human: drop the lock, add ready-for-human, post the
|
||||
checklist, ring the matching doorbell. The issue stays OPEN."""
|
||||
self._post_checklist(run, _phase_for(state))
|
||||
self._tracker.remove_label(
|
||||
run.issue.repo, run.issue.number, config.in_progress_label
|
||||
)
|
||||
self._tracker.add_label(
|
||||
run.issue.repo, run.issue.number, self._ready_for_human_label
|
||||
)
|
||||
self._notify(run, action, _escalation_detail(action, state))
|
||||
return _terminal(action, run)
|
||||
|
||||
def _fix_forward(self, run: InFlightRun, state: RunState) -> TickResult:
|
||||
"""CI red with budget left: dispatch a corrective turn and stay in flight.
|
||||
|
||||
Not terminal — no doorbell (the notifier only speaks terminal kinds) and
|
||||
no label churn (the in-progress lock stays put). The corrective dispatch
|
||||
spawns a fresh thread; its id and the incremented attempt count ride back
|
||||
so the next tick tracks the right thread.
|
||||
"""
|
||||
attempts = run.fix_forward_attempts + 1
|
||||
new_thread_id = self._t3.dispatch(
|
||||
run.issue.repo, run.issue.number, _fix_forward_prompt(run)
|
||||
)
|
||||
self._post_checklist(run, Phase.CI, fix_forward_attempts=attempts)
|
||||
return TickResult(
|
||||
action=Action.FIX_FORWARD,
|
||||
terminal=False,
|
||||
thread_id=new_thread_id,
|
||||
fix_forward_attempts=attempts,
|
||||
)
|
||||
|
||||
def _wait(self, run: InFlightRun, state: RunState, action: Action) -> TickResult:
|
||||
"""Still working: refresh the progress checklist, change nothing else."""
|
||||
self._post_checklist(run, _phase_for(state))
|
||||
return TickResult(
|
||||
action=action,
|
||||
terminal=False,
|
||||
thread_id=run.thread_id,
|
||||
fix_forward_attempts=run.fix_forward_attempts,
|
||||
)
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# I/O helpers.
|
||||
# ----------------------------------------------------------------- #
|
||||
def _post_checklist(
|
||||
self, run: InFlightRun, phase: Phase, *, fix_forward_attempts: int | None = None
|
||||
) -> None:
|
||||
attempts = run.fix_forward_attempts if fix_forward_attempts is None else fix_forward_attempts
|
||||
body = phase_checklist.render(
|
||||
phase,
|
||||
{
|
||||
"repo": run.issue.repo,
|
||||
"issue": run.issue.number,
|
||||
"thread_id": run.thread_id,
|
||||
"fix_forward_attempts": attempts,
|
||||
},
|
||||
)
|
||||
self._tracker.comment(run.issue.repo, run.issue.number, body)
|
||||
|
||||
def _notify(self, run: InFlightRun, action: Action, detail: str) -> None:
|
||||
self._notifier.notify(
|
||||
_TERMINAL_KIND_BY_ACTION[action], run.issue, run.thread_id, detail
|
||||
)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Pure helpers.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _terminal(action: Action, run: InFlightRun) -> TickResult:
|
||||
"""A terminal :class:`TickResult` echoing the run's bookkeeping unchanged."""
|
||||
return TickResult(
|
||||
action=action,
|
||||
terminal=True,
|
||||
thread_id=run.thread_id,
|
||||
fix_forward_attempts=run.fix_forward_attempts,
|
||||
)
|
||||
|
||||
|
||||
def _phase_for(state: RunState) -> Phase:
|
||||
"""Best-effort current lifecycle phase from the evidence in ``state``.
|
||||
|
||||
The checklist is decoration only (the loop reads no agent message bodies), so
|
||||
this maps the observable signals — pushed? CI verdict? — onto the closest
|
||||
phase: nothing pushed ⇒ still working toward the implementation (GREEN);
|
||||
pushed ⇒ the CI phase is where attention sits until it goes green. A green CI
|
||||
is rendered as DONE by the close path, not here.
|
||||
"""
|
||||
if not state.pushed:
|
||||
return Phase.GREEN
|
||||
if state.ci_status is CIStatus.GREEN:
|
||||
return Phase.DEPLOYED
|
||||
return Phase.CI
|
||||
|
||||
|
||||
def _escalation_detail(action: Action, state: RunState) -> str:
|
||||
"""Human-readable escalation reason for the doorbell + logs (never parsed)."""
|
||||
if action is Action.ESCALATE_PREPUSH:
|
||||
return (
|
||||
"Agent stalled or errored before pushing any commit "
|
||||
f"(thread {state.thread_status.value if state.thread_status else 'unknown'}). "
|
||||
"Handed back for a human."
|
||||
)
|
||||
return (
|
||||
"Fix-forward budget exhausted with CI still red "
|
||||
f"({state.fix_forward_attempts} attempts, {state.elapsed_seconds:.0f}s). "
|
||||
"Frozen for a human."
|
||||
)
|
||||
|
||||
|
||||
def _fix_forward_prompt(run: InFlightRun) -> str:
|
||||
"""The corrective-turn prompt: point the agent at the red CI on its commit."""
|
||||
return (
|
||||
f"CI is RED on your pushed commit {run.commit} for issue #{run.issue.number} "
|
||||
f"in `{run.issue.repo}`. Investigate the failing run, fix the cause, and "
|
||||
f"push the fix to master. Then watch CI again until it is green."
|
||||
)
|
||||
|
|
@ -1,26 +1,13 @@
|
|||
"""Drive the breakglass Claude agent and stream its work to the browser.
|
||||
"""Claude CLI argv + stream-json → UI-event translation for the breakglass agent.
|
||||
|
||||
Each chat turn runs ``claude -p --output-format stream-json`` in the session's
|
||||
persistent workspace; the first turn opens the session with ``--session-id`` and
|
||||
later turns ``--resume`` it, so the conversation has memory across turns. The
|
||||
CLI's JSON events are translated to a small, stable SSE vocabulary the UI
|
||||
renders (``session`` / ``text`` / ``tool`` / ``result`` / ``error``) — we do not
|
||||
leak the raw event firehose to the client.
|
||||
|
||||
Subprocesses use ``asyncio.create_subprocess_exec`` (list argv, no shell): the
|
||||
prompt and ids are argv elements, never interpreted by a shell.
|
||||
The session lifecycle (running turns, attaching clients) lives in ``session.py``;
|
||||
this module is just the two helpers it builds on:
|
||||
* ``_turn_argv`` — the no-shell list argv for one ``claude -p`` turn.
|
||||
* ``translate_event`` — map a raw stream-json event to the small UI vocabulary
|
||||
(session / text / tool / result), dropping the hook/thinking-token noise.
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
from subprocess import PIPE
|
||||
from typing import AsyncIterator
|
||||
|
||||
from . import config
|
||||
|
||||
# Sessions we've already opened (so the next turn resumes instead of re-creating).
|
||||
_started: set[str] = set()
|
||||
|
||||
|
||||
def _turn_argv(session_id: str, prompt: str, resume: bool, model: str) -> list[str]:
|
||||
argv = [
|
||||
|
|
@ -66,7 +53,7 @@ def translate_event(obj: dict) -> dict | None:
|
|||
})
|
||||
if not events:
|
||||
return None
|
||||
# The server flattens a "batch" into individual SSE frames.
|
||||
# The session log flattens a "batch" into individual events.
|
||||
return events[0] if len(events) == 1 else {"kind": "batch", "events": events}
|
||||
|
||||
if etype == "result":
|
||||
|
|
@ -78,68 +65,3 @@ def translate_event(obj: dict) -> dict | None:
|
|||
}
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def run_turn(
|
||||
session_id: str, prompt: str, model: str | None = None
|
||||
) -> AsyncIterator[dict]:
|
||||
"""Run one chat turn, yielding translated UI events as they arrive."""
|
||||
resume = session_id in _started
|
||||
model = model or config.DEFAULT_MODEL
|
||||
workspace = os.path.join(config.SESSIONS_DIR, session_id)
|
||||
os.makedirs(workspace, exist_ok=True)
|
||||
|
||||
argv = _turn_argv(session_id, prompt, resume, model)
|
||||
proc = await asyncio.create_subprocess_exec(
|
||||
*argv, cwd=workspace, stdout=PIPE, stderr=PIPE,
|
||||
)
|
||||
_started.add(session_id)
|
||||
assert proc.stdout is not None and proc.stderr is not None
|
||||
|
||||
try:
|
||||
async def _pump() -> AsyncIterator[dict]:
|
||||
async for raw in proc.stdout:
|
||||
line = raw.decode(errors="replace").strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
obj = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
ev = translate_event(obj)
|
||||
if ev is None:
|
||||
continue
|
||||
if ev.get("kind") == "batch":
|
||||
for sub in ev["events"]:
|
||||
yield sub
|
||||
else:
|
||||
yield ev
|
||||
|
||||
async for ev in _with_timeout(_pump(), config.TURN_TIMEOUT_SECONDS):
|
||||
yield ev
|
||||
except asyncio.TimeoutError:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
yield {"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"}
|
||||
return
|
||||
|
||||
await proc.wait()
|
||||
if proc.returncode not in (0, None):
|
||||
err = (await proc.stderr.read()).decode(errors="replace")
|
||||
yield {"kind": "error", "error": err.strip()[:500] or f"exit {proc.returncode}"}
|
||||
|
||||
|
||||
async def _with_timeout(agen: AsyncIterator[dict], timeout: float) -> AsyncIterator[dict]:
|
||||
"""Yield from an async generator but raise TimeoutError if the WHOLE turn
|
||||
exceeds ``timeout`` seconds (a wedged agent shouldn't stream forever)."""
|
||||
loop = asyncio.get_event_loop()
|
||||
deadline = loop.time() + timeout
|
||||
it = agen.__aiter__()
|
||||
while True:
|
||||
remaining = deadline - loop.time()
|
||||
if remaining <= 0:
|
||||
raise asyncio.TimeoutError
|
||||
try:
|
||||
yield await asyncio.wait_for(it.__anext__(), timeout=remaining)
|
||||
except StopAsyncIteration:
|
||||
return
|
||||
|
|
|
|||
|
|
@ -25,6 +25,9 @@ MAX_CONCURRENT_TURNS = int(os.environ.get("BREAKGLASS_MAX_CONCURRENT_TURNS", "2"
|
|||
TURN_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_TURN_TIMEOUT_SECONDS", "1800"))
|
||||
# A single PVE power verb must return fast; a wedged host shouldn't hang the UI.
|
||||
PVE_VERB_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_PVE_VERB_TIMEOUT_SECONDS", "120"))
|
||||
# How long an idle attach stream waits before emitting an SSE keepalive comment
|
||||
# (keeps proxies/CDN from closing the long-lived connection).
|
||||
SSE_KEEPALIVE_SECONDS = int(os.environ.get("BREAKGLASS_SSE_KEEPALIVE_SECONDS", "20"))
|
||||
|
||||
# Auth. The app sits behind the ingress `auth = "required"` resilience proxy
|
||||
# (Authentik SSO, basic-auth fallback when Authentik is down). We additionally
|
||||
|
|
|
|||
|
|
@ -1,38 +1,44 @@
|
|||
"""Breakglass FastAPI app — the in-cluster emergency recovery UI.
|
||||
|
||||
The chat uses the tmux/attach model (see session.py): the server owns the
|
||||
conversation; clients attach over SSE and the turn keeps running if they
|
||||
disconnect.
|
||||
|
||||
Routes:
|
||||
GET /health — liveness (no auth)
|
||||
GET / — the single-page UI (static)
|
||||
POST /api/session — open a chat session, returns {session_id}
|
||||
POST /api/chat — run one turn, streams SSE events (text/tool/result)
|
||||
POST /api/pve/{verb} — LLM-independent PVE power verb (manual buttons)
|
||||
GET /api/pve/verbs — list allowed verbs + which mutate
|
||||
GET /health — liveness (no auth)
|
||||
GET / — the single-page UI (static)
|
||||
POST /api/session — create a session, returns {session_id}
|
||||
GET /api/session/{id}/stream — ATTACH (SSE): replay + live tail
|
||||
POST /api/session/{id}/prompt — run a turn (detached; survives disconnect)
|
||||
POST /api/session/{id}/cancel — stop the in-flight turn
|
||||
GET /api/pve/verbs — list allowed verbs + which mutate
|
||||
POST /api/pve/{verb} — LLM-independent PVE power verb (buttons)
|
||||
|
||||
Everything under /api requires auth (edge Authentik header or bearer token).
|
||||
"""
|
||||
import json
|
||||
import os
|
||||
import uuid
|
||||
|
||||
from fastapi import Depends, FastAPI, HTTPException
|
||||
from fastapi import Depends, FastAPI, Header, HTTPException
|
||||
from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
|
||||
from fastapi.staticfiles import StaticFiles
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from . import agent_session, config, pve
|
||||
from . import config, pve
|
||||
from .auth import require_auth
|
||||
from .session import SessionManager, attach_stream
|
||||
|
||||
app = FastAPI(title="Claude Breakglass")
|
||||
|
||||
_STATIC_DIR = os.path.join(os.path.dirname(__file__), "static")
|
||||
|
||||
manager = SessionManager()
|
||||
|
||||
|
||||
class SessionResponse(BaseModel):
|
||||
session_id: str
|
||||
|
||||
|
||||
class ChatRequest(BaseModel):
|
||||
session_id: str
|
||||
class PromptRequest(BaseModel):
|
||||
prompt: str = Field(..., min_length=1)
|
||||
model: str | None = None
|
||||
|
||||
|
|
@ -44,30 +50,53 @@ async def health():
|
|||
|
||||
@app.post("/api/session", response_model=SessionResponse)
|
||||
async def open_session(_identity: str = Depends(require_auth)):
|
||||
# Claude wants a UUID for --session-id.
|
||||
return SessionResponse(session_id=str(uuid.uuid4()))
|
||||
return SessionResponse(session_id=manager.create().id)
|
||||
|
||||
|
||||
@app.post("/api/chat")
|
||||
async def chat(req: ChatRequest, _identity: str = Depends(require_auth)):
|
||||
"""Stream one chat turn as Server-Sent Events. The browser reads the
|
||||
response body incrementally (fetch + ReadableStream)."""
|
||||
|
||||
async def _sse():
|
||||
try:
|
||||
async for ev in agent_session.run_turn(req.session_id, req.prompt, req.model):
|
||||
yield f"data: {json.dumps(ev)}\n\n"
|
||||
except Exception as exc: # noqa: BLE001 — surface any failure to the UI
|
||||
yield f"data: {json.dumps({'kind': 'error', 'error': str(exc)[:500]})}\n\n"
|
||||
yield f"data: {json.dumps({'kind': 'done'})}\n\n"
|
||||
|
||||
@app.get("/api/session/{session_id}/stream")
|
||||
async def attach(
|
||||
session_id: str,
|
||||
_identity: str = Depends(require_auth),
|
||||
last_event_id: str | None = Header(default=None, alias="Last-Event-ID"),
|
||||
):
|
||||
"""Attach to a session (SSE). Replays the conversation so far, then tails
|
||||
live. On an EventSource auto-reconnect the browser sends Last-Event-ID, so we
|
||||
replay only what was missed."""
|
||||
session = manager.get(session_id)
|
||||
if session is None:
|
||||
raise HTTPException(status_code=404, detail="session not found")
|
||||
try:
|
||||
leid = int(last_event_id) if last_event_id is not None else None
|
||||
except ValueError:
|
||||
leid = None
|
||||
return StreamingResponse(
|
||||
_sse(),
|
||||
attach_stream(session, leid),
|
||||
media_type="text/event-stream",
|
||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
|
||||
headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "Connection": "keep-alive"},
|
||||
)
|
||||
|
||||
|
||||
@app.post("/api/session/{session_id}/prompt")
|
||||
async def prompt(session_id: str, req: PromptRequest, _identity: str = Depends(require_auth)):
|
||||
"""Start a turn. It runs DETACHED (keeps going if the client disconnects);
|
||||
output is delivered via the attach stream, not this response."""
|
||||
session = manager.get(session_id)
|
||||
if session is None:
|
||||
raise HTTPException(status_code=404, detail="session not found")
|
||||
if not session.start_turn(req.prompt, req.model):
|
||||
raise HTTPException(status_code=409, detail="a turn is already running")
|
||||
return {"status": "started"}
|
||||
|
||||
|
||||
@app.post("/api/session/{session_id}/cancel")
|
||||
async def cancel(session_id: str, _identity: str = Depends(require_auth)):
|
||||
session = manager.get(session_id)
|
||||
if session is None:
|
||||
raise HTTPException(status_code=404, detail="session not found")
|
||||
cancelled = await session.cancel()
|
||||
return {"cancelled": cancelled}
|
||||
|
||||
|
||||
@app.get("/api/pve/verbs")
|
||||
async def pve_verbs(_identity: str = Depends(require_auth)):
|
||||
return {
|
||||
|
|
|
|||
201
app/breakglass/session.py
Normal file
|
|
@ -0,0 +1,201 @@
|
|||
"""Attachable server-side sessions — the tmux model for the breakglass chat.
|
||||
|
||||
Instead of the client owning conversation state, the SERVER owns it and clients
|
||||
*attach*. A turn runs as a detached task that keeps going if the client
|
||||
disconnects (you can background the phone / hit a tunnel blip and the agent
|
||||
keeps working); its output is appended to a per-session event log and broadcast
|
||||
to every attached subscriber. A client attaches over SSE, gets the log replayed
|
||||
(or only the part it missed, via Last-Event-ID), then tails live — exactly like
|
||||
re-attaching to a tmux session. ``EventSource`` reconnects natively, so the
|
||||
"re-attach" needs zero client logic.
|
||||
|
||||
This module owns the lifecycle; ``agent_session`` still provides the claude
|
||||
argv + the stream-json→UI-event translation (all subprocesses use the no-shell
|
||||
list-argv form), and ``config`` the knobs.
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import uuid
|
||||
from subprocess import PIPE
|
||||
from typing import AsyncIterator
|
||||
|
||||
from . import agent_session, config
|
||||
|
||||
|
||||
class Session:
|
||||
"""One conversation. Owns the replay log + live subscribers + the in-flight
|
||||
turn. The claude ``session_id`` is reused with ``--resume`` so the agent
|
||||
keeps its own context across turns."""
|
||||
|
||||
def __init__(self, session_id: str):
|
||||
self.id = session_id
|
||||
# The replay log: every UI event, in order. Index in the list IS the
|
||||
# SSE event id, so a reconnecting client replays only what it missed.
|
||||
self.events: list[dict] = []
|
||||
self._subscribers: set[asyncio.Queue] = set()
|
||||
self._turn: asyncio.Task | None = None
|
||||
self._proc: asyncio.subprocess.Process | None = None
|
||||
self._started = False # has claude opened this session id yet?
|
||||
|
||||
# ── event log + fan-out ────────────────────────────────────────────────
|
||||
def add_event(self, event: dict) -> dict:
|
||||
"""Append an event to the log and broadcast it to attached clients."""
|
||||
stored = {**event, "id": len(self.events)}
|
||||
self.events.append(stored)
|
||||
for q in list(self._subscribers):
|
||||
q.put_nowait(stored)
|
||||
return stored
|
||||
|
||||
def subscribe(self) -> asyncio.Queue:
|
||||
q: asyncio.Queue = asyncio.Queue()
|
||||
self._subscribers.add(q)
|
||||
return q
|
||||
|
||||
def unsubscribe(self, q: asyncio.Queue) -> None:
|
||||
self._subscribers.discard(q)
|
||||
|
||||
@property
|
||||
def turn_active(self) -> bool:
|
||||
return self._turn is not None and not self._turn.done()
|
||||
|
||||
# ── running a turn (detached from any client) ──────────────────────────
|
||||
def start_turn(self, prompt: str, model: str | None = None) -> bool:
|
||||
"""Kick off a turn as a background task. Returns False if one is already
|
||||
running (one turn at a time per session)."""
|
||||
if self.turn_active:
|
||||
return False
|
||||
self.add_event({"kind": "user", "text": prompt})
|
||||
self._turn = asyncio.create_task(self._run_turn(prompt, model))
|
||||
return True
|
||||
|
||||
async def _run_turn(self, prompt: str, model: str | None) -> None:
|
||||
model = model or config.DEFAULT_MODEL
|
||||
resume = self._started
|
||||
argv = agent_session._turn_argv(self.id, prompt, resume, model)
|
||||
try:
|
||||
self._proc = await asyncio.create_subprocess_exec(
|
||||
*argv, cwd=_workspace_for(self.id), stdout=PIPE, stderr=PIPE,
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
self.add_event({"kind": "error", "error": f"could not start agent: {exc}"})
|
||||
self.add_event({"kind": "turn_end"})
|
||||
return
|
||||
self._started = True
|
||||
assert self._proc.stdout is not None and self._proc.stderr is not None
|
||||
|
||||
try:
|
||||
async def _pump():
|
||||
async for raw in self._proc.stdout:
|
||||
line = raw.decode(errors="replace").strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
obj = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
ev = agent_session.translate_event(obj)
|
||||
if ev is None:
|
||||
continue
|
||||
if ev.get("kind") == "batch":
|
||||
for sub in ev["events"]:
|
||||
self.add_event(sub)
|
||||
else:
|
||||
self.add_event(ev)
|
||||
|
||||
await asyncio.wait_for(_pump(), timeout=config.TURN_TIMEOUT_SECONDS)
|
||||
await self._proc.wait()
|
||||
if self._proc.returncode not in (0, None):
|
||||
err = (await self._proc.stderr.read()).decode(errors="replace")
|
||||
self.add_event({"kind": "error", "error": err.strip()[:500] or f"exit {self._proc.returncode}"})
|
||||
except asyncio.TimeoutError:
|
||||
await self._kill_proc()
|
||||
self.add_event({"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"})
|
||||
except asyncio.CancelledError:
|
||||
await self._kill_proc()
|
||||
self.add_event({"kind": "cancelled"})
|
||||
raise
|
||||
finally:
|
||||
self._proc = None
|
||||
self.add_event({"kind": "turn_end"})
|
||||
|
||||
async def _kill_proc(self) -> None:
|
||||
if self._proc and self._proc.returncode is None:
|
||||
try:
|
||||
self._proc.kill()
|
||||
await self._proc.wait()
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
|
||||
async def cancel(self) -> bool:
|
||||
"""Stop the in-flight turn. Returns True if a turn was cancelled."""
|
||||
if not self.turn_active:
|
||||
return False
|
||||
await self._kill_proc()
|
||||
if self._turn:
|
||||
self._turn.cancel()
|
||||
try:
|
||||
await self._turn
|
||||
except (asyncio.CancelledError, Exception): # noqa: BLE001
|
||||
pass
|
||||
return True
|
||||
|
||||
|
||||
def _workspace_for(session_id: str) -> str:
|
||||
path = os.path.join(config.SESSIONS_DIR, session_id)
|
||||
os.makedirs(path, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
class SessionManager:
|
||||
"""Holds all live sessions. The breakglass is single-operator, so callers
|
||||
typically reuse one persistent session; multiple are still supported."""
|
||||
|
||||
def __init__(self):
|
||||
self.sessions: dict[str, Session] = {}
|
||||
|
||||
def create(self) -> Session:
|
||||
sid = str(uuid.uuid4())
|
||||
s = Session(sid)
|
||||
self.sessions[sid] = s
|
||||
return s
|
||||
|
||||
def get(self, session_id: str) -> Session | None:
|
||||
return self.sessions.get(session_id)
|
||||
|
||||
def get_or_create(self, session_id: str | None) -> Session:
|
||||
if session_id and session_id in self.sessions:
|
||||
return self.sessions[session_id]
|
||||
return self.create()
|
||||
|
||||
|
||||
async def attach_stream(session: Session, last_event_id: int | None) -> AsyncIterator[str]:
|
||||
"""Yield SSE frames for an attached client: first the replay (everything, or
|
||||
only events after ``last_event_id`` on a reconnect), then live events as they
|
||||
arrive. Each frame carries an ``id:`` so EventSource resumes precisely."""
|
||||
q = session.subscribe()
|
||||
try:
|
||||
start = 0 if last_event_id is None else last_event_id + 1
|
||||
backlog = session.events[start:]
|
||||
for ev in backlog:
|
||||
yield _sse_frame(ev)
|
||||
# Tell the client the replay is done and it's now live.
|
||||
yield "event: caught-up\ndata: {}\n\n"
|
||||
|
||||
seen = backlog[-1]["id"] if backlog else (last_event_id if last_event_id is not None else -1)
|
||||
while True:
|
||||
try:
|
||||
ev = await asyncio.wait_for(q.get(), timeout=config.SSE_KEEPALIVE_SECONDS)
|
||||
except asyncio.TimeoutError:
|
||||
yield ": keepalive\n\n" # comment frame keeps the connection warm
|
||||
continue
|
||||
if ev["id"] <= seen:
|
||||
continue
|
||||
seen = ev["id"]
|
||||
yield _sse_frame(ev)
|
||||
finally:
|
||||
session.unsubscribe(q)
|
||||
|
||||
|
||||
def _sse_frame(event: dict) -> str:
|
||||
return f"id: {event['id']}\ndata: {json.dumps(event)}\n\n"
|
||||
BIN
app/breakglass/static/apple-touch-icon.png
Normal file
|
After Width: | Height: | Size: 30 KiB |
1
app/breakglass/static/assets/index-BoWC1Onq.css
Normal file
6
app/breakglass/static/assets/index-CLbKo1Yx.js
Normal file
BIN
app/breakglass/static/icon-192.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
app/breakglass/static/icon-512.png
Normal file
|
After Width: | Height: | Size: 48 KiB |
64
app/breakglass/static/icon.svg
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
|
||||
<defs>
|
||||
<!-- layered near-black surface, matching the app theme -->
|
||||
<radialGradient id="bg" cx="68%" cy="22%" r="92%">
|
||||
<stop offset="0%" stop-color="#12303a"/>
|
||||
<stop offset="42%" stop-color="#0b0f14"/>
|
||||
<stop offset="100%" stop-color="#06080b"/>
|
||||
</radialGradient>
|
||||
<linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
|
||||
<stop offset="0%" stop-color="#7df0f3"/>
|
||||
<stop offset="55%" stop-color="#3dd1d6"/>
|
||||
<stop offset="100%" stop-color="#1f6f72"/>
|
||||
</linearGradient>
|
||||
<filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
|
||||
<feGaussianBlur stdDeviation="7" result="b"/>
|
||||
<feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
|
||||
</filter>
|
||||
</defs>
|
||||
|
||||
<!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
|
||||
<rect width="512" height="512" rx="112" fill="url(#bg)"/>
|
||||
<rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
|
||||
<!-- faint scanline texture -->
|
||||
<g opacity="0.05" stroke="#ffffff" stroke-width="2">
|
||||
<line x1="0" y1="148" x2="512" y2="148"/>
|
||||
<line x1="0" y1="220" x2="512" y2="220"/>
|
||||
<line x1="0" y1="292" x2="512" y2="292"/>
|
||||
<line x1="0" y1="364" x2="512" y2="364"/>
|
||||
</g>
|
||||
|
||||
<!-- fracture burst (amber): the "break the glass" radiating cracks -->
|
||||
<g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
|
||||
fill="none" opacity="0.92" filter="url(#glow)">
|
||||
<path d="M256 256 L142 132"/>
|
||||
<path d="M256 256 L120 250"/>
|
||||
<path d="M256 256 L150 372"/>
|
||||
<path d="M256 256 L372 380"/>
|
||||
<path d="M256 256 L392 246"/>
|
||||
<path d="M256 256 L360 138"/>
|
||||
<!-- cross-cracks -->
|
||||
<path d="M186 196 L150 250"/>
|
||||
<path d="M210 320 L172 318" opacity="0.7"/>
|
||||
<path d="M326 318 L356 350" opacity="0.7"/>
|
||||
</g>
|
||||
|
||||
<!-- wrench, struck across the burst (cyan steel) -->
|
||||
<g filter="url(#glow)">
|
||||
<path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
|
||||
d="M344 150
|
||||
a62 62 0 0 0 -82 76
|
||||
L150 338
|
||||
a26 26 0 0 0 0 37
|
||||
l11 11
|
||||
a26 26 0 0 0 37 0
|
||||
l112 -112
|
||||
a62 62 0 0 0 76 -82
|
||||
l-41 41
|
||||
l-40 -11
|
||||
l-11 -40
|
||||
z"/>
|
||||
<!-- handle highlight -->
|
||||
<path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
|
||||
</g>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 2.5 KiB |
|
|
@ -2,12 +2,31 @@
|
|||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
|
||||
notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
|
||||
user-scaling keeps the cockpit layout stable under stress on mobile. -->
|
||||
<meta
|
||||
name="viewport"
|
||||
content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
|
||||
/>
|
||||
<meta name="color-scheme" content="dark" />
|
||||
<meta name="robots" content="noindex, nofollow" />
|
||||
|
||||
<!-- PWA / installable. theme-color tints the mobile status bar to the dark
|
||||
theme; black-translucent lets the app draw under the iOS status bar. -->
|
||||
<meta name="theme-color" content="#06080b" />
|
||||
<link rel="manifest" href="./manifest.webmanifest" />
|
||||
<meta name="apple-mobile-web-app-capable" content="yes" />
|
||||
<meta name="mobile-web-app-capable" content="yes" />
|
||||
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
|
||||
<meta name="apple-mobile-web-app-title" content="breakglass" />
|
||||
<link rel="apple-touch-icon" href="./apple-touch-icon.png" />
|
||||
<link rel="icon" type="image/svg+xml" href="./icon.svg" />
|
||||
<link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
|
||||
|
||||
<title>devvm breakglass</title>
|
||||
<script type="module" crossorigin src="./assets/index-DjaW81Sq.js"></script>
|
||||
<link rel="stylesheet" crossorigin href="./assets/index-DWHIP1Zw.css">
|
||||
<script type="module" crossorigin src="./assets/index-CLbKo1Yx.js"></script>
|
||||
<link rel="stylesheet" crossorigin href="./assets/index-BoWC1Onq.css">
|
||||
</head>
|
||||
<body>
|
||||
<div id="app"></div>
|
||||
|
|
|
|||
31
app/breakglass/static/manifest.webmanifest
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
{
|
||||
"name": "devvm breakglass",
|
||||
"short_name": "breakglass",
|
||||
"description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
|
||||
"start_url": "./",
|
||||
"scope": "./",
|
||||
"display": "standalone",
|
||||
"orientation": "portrait",
|
||||
"background_color": "#06080b",
|
||||
"theme_color": "#06080b",
|
||||
"icons": [
|
||||
{
|
||||
"src": "./icon.svg",
|
||||
"type": "image/svg+xml",
|
||||
"sizes": "any",
|
||||
"purpose": "any maskable"
|
||||
},
|
||||
{
|
||||
"src": "./icon-192.png",
|
||||
"type": "image/png",
|
||||
"sizes": "192x192",
|
||||
"purpose": "any maskable"
|
||||
},
|
||||
{
|
||||
"src": "./icon-512.png",
|
||||
"type": "image/png",
|
||||
"sizes": "512x512",
|
||||
"purpose": "any maskable"
|
||||
}
|
||||
]
|
||||
}
|
||||
220
app/conversational.py
Normal file
|
|
@ -0,0 +1,220 @@
|
|||
"""Conversational Brain — drives the Claude CLI for the portal-assistant gateway.
|
||||
|
||||
A lean, no-tools, multi-turn path (portal-assistant ADR-0002): no workspace clone,
|
||||
no tool-enabled agent, and NO --dangerously-skip-permissions. Per-conversation
|
||||
continuity comes from the Claude CLI's own --session-id / --resume, so the gateway
|
||||
only has to hand us a stable session id per conversation.
|
||||
"""
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
from subprocess import PIPE
|
||||
|
||||
CONVERSATIONAL_AGENT = "conversational"
|
||||
# A spoken chat turn is short; a turn that runs longer than this is wedged.
|
||||
CONVERSATIONAL_TIMEOUT_SECONDS = int(
|
||||
os.environ.get("CONVERSATIONAL_TIMEOUT_SECONDS", "120")
|
||||
)
|
||||
|
||||
# Latency: the conversational agent is no-tools (ADR-0002), so the CLI's default
|
||||
# project context — this repo's CLAUDE.md, the MCP server configs, local settings
|
||||
# — plus the dynamic system-prompt sections are pure overhead on a voice turn.
|
||||
# Measured 2026-06-21: the default load is ~45k input tokens/turn -> ~3.4s TTFT;
|
||||
# restricting settings to `user` and excluding the dynamic sections more than
|
||||
# halves the context (~23k) and cuts TTFT to ~2.1s (~1.3s/turn faster) with no
|
||||
# change to the reply. Applies to BOTH the gateway (json) and realtime (stream)
|
||||
# paths, since both run the same no-tools conversational turn.
|
||||
_LEAN_CONTEXT_FLAGS = [
|
||||
"--setting-sources", "user",
|
||||
"--exclude-dynamic-system-prompt-sections",
|
||||
]
|
||||
|
||||
# Session ids the Claude CLI has already opened in THIS process, so a follow-up
|
||||
# turn resumes instead of re-opening. In-memory + single-replica: a pod restart
|
||||
# clears this AND the CLI's emptyDir session state together, so they stay in sync.
|
||||
_started: set[str] = set()
|
||||
|
||||
|
||||
def reset_started() -> None:
|
||||
"""Forget all opened sessions (used by tests)."""
|
||||
_started.clear()
|
||||
|
||||
|
||||
def conversational_argv(
|
||||
session_id: str, message: str, model: str, resume: bool
|
||||
) -> list[str]:
|
||||
"""Build the argv for one conversational turn.
|
||||
|
||||
A new conversation opens the session with --session-id; subsequent turns
|
||||
continue it with --resume so Claude keeps its own context. We never pass
|
||||
--dangerously-skip-permissions: the conversational agent has no tools and the
|
||||
endpoint is public-facing, so nothing may be auto-permitted.
|
||||
"""
|
||||
argv = [
|
||||
"claude", "-p",
|
||||
"--agent", CONVERSATIONAL_AGENT,
|
||||
"--output-format", "json",
|
||||
"--model", model,
|
||||
*_LEAN_CONTEXT_FLAGS,
|
||||
]
|
||||
argv += ["--resume", session_id] if resume else ["--session-id", session_id]
|
||||
argv.append(message)
|
||||
return argv
|
||||
|
||||
|
||||
def extract_reply(output_lines: list[str]) -> str:
|
||||
"""Pull the final assistant text out of `claude -p --output-format json`.
|
||||
|
||||
The CLI emits one JSON object with the final message under `result`; fall
|
||||
back to the raw text if it isn't parseable so callers always get something.
|
||||
"""
|
||||
raw = "".join(output_lines).strip()
|
||||
if not raw:
|
||||
return ""
|
||||
try:
|
||||
parsed = json.loads(raw)
|
||||
except json.JSONDecodeError:
|
||||
return raw
|
||||
if isinstance(parsed, dict):
|
||||
for key in ("result", "content", "text"):
|
||||
value = parsed.get(key)
|
||||
if isinstance(value, str) and value:
|
||||
return value
|
||||
return raw
|
||||
|
||||
|
||||
async def run_turn(session_id: str, message: str, model: str) -> dict:
|
||||
"""Run one conversational turn and return {exit_code, reply, stderr}.
|
||||
|
||||
Resumes the Claude session if we've opened it before; otherwise opens it.
|
||||
The session is only marked opened on success so a failed first turn can be
|
||||
retried cleanly as a new one.
|
||||
"""
|
||||
resume = session_id in _started
|
||||
argv = conversational_argv(session_id, message, model, resume)
|
||||
|
||||
proc = await asyncio.create_subprocess_exec(*argv, stdout=PIPE, stderr=PIPE)
|
||||
assert proc.stdout is not None and proc.stderr is not None
|
||||
|
||||
output_lines: list[str] = []
|
||||
async for line in proc.stdout:
|
||||
output_lines.append(line.decode(errors="replace"))
|
||||
stderr = await proc.stderr.read()
|
||||
await proc.wait()
|
||||
|
||||
if proc.returncode == 0:
|
||||
_started.add(session_id)
|
||||
|
||||
return {
|
||||
"exit_code": proc.returncode,
|
||||
"reply": extract_reply(output_lines),
|
||||
"stderr": stderr.decode(errors="replace"),
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Streaming (OpenAI-compatible) path — token-level deltas for the realtime
|
||||
# voice agent. Pipecat's OpenAILLMService streams from /v1/chat/completions and
|
||||
# re-sends the FULL history each turn, so this path is STATELESS: the whole
|
||||
# dialogue goes in the prompt and we run a fresh CLI with stream-json to relay
|
||||
# incremental tokens as OpenAI chat-completion SSE chunks. (run_turn above stays
|
||||
# the session-based path for the non-streaming gateway.)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def stream_argv(prompt: str, model: str) -> list[str]:
|
||||
"""Argv for a STREAMING conversational turn (token deltas via stream-json).
|
||||
|
||||
Stateless — the full conversation is in `prompt` (no --session-id/--resume).
|
||||
`--include-partial-messages` makes the CLI emit `content_block_delta` token
|
||||
events; `--verbose` is required by the CLI for stream-json under --print. No
|
||||
--dangerously-skip-permissions: the conversational agent has no tools.
|
||||
"""
|
||||
return [
|
||||
"claude", "-p",
|
||||
"--agent", CONVERSATIONAL_AGENT,
|
||||
"--model", model,
|
||||
"--output-format", "stream-json",
|
||||
"--include-partial-messages",
|
||||
"--verbose",
|
||||
*_LEAN_CONTEXT_FLAGS,
|
||||
prompt,
|
||||
]
|
||||
|
||||
|
||||
def delta_text(line: str) -> str | None:
|
||||
"""Extract the incremental assistant text from one stream-json line.
|
||||
|
||||
Returns the text of a `content_block_delta` / `text_delta` event, or None
|
||||
for any other event (system, message_start, content_block_stop, result) or
|
||||
an unparseable line.
|
||||
"""
|
||||
line = line.strip()
|
||||
if not line:
|
||||
return None
|
||||
try:
|
||||
event = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
return None
|
||||
if not isinstance(event, dict) or event.get("type") != "stream_event":
|
||||
return None
|
||||
inner = event.get("event") or {}
|
||||
if inner.get("type") != "content_block_delta":
|
||||
return None
|
||||
delta = inner.get("delta") or {}
|
||||
if delta.get("type") == "text_delta":
|
||||
return delta.get("text") or None
|
||||
return None
|
||||
|
||||
|
||||
def openai_chunk(
|
||||
completion_id: str,
|
||||
model: str,
|
||||
created: int,
|
||||
*,
|
||||
role: str | None = None,
|
||||
content: str | None = None,
|
||||
finish_reason: str | None = None,
|
||||
) -> str:
|
||||
"""Format one OpenAI `chat.completion.chunk` as an SSE `data:` line.
|
||||
|
||||
ensure_ascii=False keeps Cyrillic (Bulgarian) intact on the wire.
|
||||
"""
|
||||
delta: dict[str, str] = {}
|
||||
if role is not None:
|
||||
delta["role"] = role
|
||||
if content is not None:
|
||||
delta["content"] = content
|
||||
payload = {
|
||||
"id": completion_id,
|
||||
"object": "chat.completion.chunk",
|
||||
"created": created,
|
||||
"model": model,
|
||||
"choices": [{"index": 0, "delta": delta, "finish_reason": finish_reason}],
|
||||
}
|
||||
return "data: " + json.dumps(payload, ensure_ascii=False) + "\n\n"
|
||||
|
||||
|
||||
def synthesise_chat_prompt(messages) -> str:
|
||||
"""Flatten OpenAI chat messages into a dialogue prompt for the conversational
|
||||
agent, KEEPING prior assistant turns.
|
||||
|
||||
Pipecat re-sends the full message history every call, so multi-turn context
|
||||
is preserved here (statelessly) by replaying the dialogue. Each message is a
|
||||
duck-typed object with `.role` and `.content`. System messages become a
|
||||
preamble; user/assistant turns are rendered as a `User:`/`Assistant:`
|
||||
dialogue ending on the latest user turn.
|
||||
"""
|
||||
system = [m.content for m in messages if m.role == "system" and m.content]
|
||||
turns = []
|
||||
for m in messages:
|
||||
if m.role == "user" and m.content:
|
||||
turns.append("User: " + m.content)
|
||||
elif m.role == "assistant" and m.content:
|
||||
turns.append("Assistant: " + m.content)
|
||||
parts = []
|
||||
if system:
|
||||
parts.append("\n\n".join(system))
|
||||
if turns:
|
||||
parts.append("\n".join(turns))
|
||||
return "\n\n".join(parts).strip()
|
||||
129
app/main.py
|
|
@ -2,6 +2,8 @@ import asyncio
|
|||
import hmac
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import tempfile
|
||||
import time
|
||||
import uuid
|
||||
from contextlib import asynccontextmanager
|
||||
|
|
@ -10,9 +12,11 @@ from subprocess import PIPE
|
|||
from typing import Any, Literal
|
||||
|
||||
from fastapi import FastAPI, HTTPException, Header
|
||||
from fastapi.responses import JSONResponse
|
||||
from fastapi.responses import JSONResponse, StreamingResponse
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
from app import conversational
|
||||
|
||||
app = FastAPI(title="Claude Agent Service")
|
||||
|
||||
API_TOKEN = os.environ.get("API_BEARER_TOKEN", "")
|
||||
|
|
@ -104,6 +108,15 @@ class ChatCompletionsRequest(BaseModel):
|
|||
model_config = {"extra": "allow"}
|
||||
|
||||
|
||||
class ConversationalRequest(BaseModel):
|
||||
# The portal-assistant gateway owns the conversation; it hands us a stable
|
||||
# session id (for Claude --resume) plus the next user message. Model is
|
||||
# selectable per request, same as the OpenAI-compat path.
|
||||
session_id: str
|
||||
message: str
|
||||
model: str | None = None
|
||||
|
||||
|
||||
def verify_token(authorization: str | None):
|
||||
# Reject everything when the service is unconfigured. compare_digest("", "")
|
||||
# returns True, so without this guard an empty API_TOKEN would happily
|
||||
|
|
@ -435,9 +448,6 @@ async def chat_completions(
|
|||
):
|
||||
verify_token(authorization)
|
||||
|
||||
if request.stream:
|
||||
raise HTTPException(status_code=400, detail="streaming not supported")
|
||||
|
||||
model = request.model if request.model is not None else DEFAULT_MODEL
|
||||
if model not in SUPPORTED_MODELS:
|
||||
return JSONResponse(
|
||||
|
|
@ -448,6 +458,64 @@ async def chat_completions(
|
|||
},
|
||||
)
|
||||
|
||||
# Streaming path (the realtime voice agent / Pipecat). Token-level deltas via
|
||||
# the conversational (no-tools) agent in stream-json mode, relayed as
|
||||
# OpenAI chat.completion.chunk SSE. Stateless: the full history is in the
|
||||
# prompt (the client re-sends it each turn). No workspace clone — the
|
||||
# conversational agent reads no files.
|
||||
if request.stream:
|
||||
if not _reserve_queue_slot():
|
||||
return JSONResponse(
|
||||
status_code=503,
|
||||
content={"error": "execution failed", "detail": "queue full"},
|
||||
)
|
||||
prompt = conversational.synthesise_chat_prompt(request.messages)
|
||||
completion_id = "chatcmpl-" + uuid.uuid4().hex[:24]
|
||||
created = int(time.time())
|
||||
spawn = asyncio.create_subprocess_exec # bound alias (keeps subprocess use tidy)
|
||||
|
||||
async def event_stream():
|
||||
workspace = tempfile.mkdtemp(prefix="conv-stream-")
|
||||
proc = None
|
||||
try:
|
||||
async with _execution_slot():
|
||||
proc = await spawn(
|
||||
*conversational.stream_argv(prompt, model),
|
||||
cwd=workspace, stdout=PIPE, stderr=PIPE,
|
||||
)
|
||||
assert proc.stdout is not None
|
||||
yield conversational.openai_chunk(
|
||||
completion_id, model, created, role="assistant"
|
||||
)
|
||||
try:
|
||||
async with asyncio.timeout(
|
||||
conversational.CONVERSATIONAL_TIMEOUT_SECONDS
|
||||
):
|
||||
async for raw in proc.stdout:
|
||||
text = conversational.delta_text(
|
||||
raw.decode(errors="replace")
|
||||
)
|
||||
if text:
|
||||
yield conversational.openai_chunk(
|
||||
completion_id, model, created, content=text
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
pass # wedged turn — close the stream cleanly
|
||||
yield conversational.openai_chunk(
|
||||
completion_id, model, created, finish_reason="stop"
|
||||
)
|
||||
yield "data: [DONE]\n\n"
|
||||
finally:
|
||||
if proc is not None and proc.returncode is None:
|
||||
try:
|
||||
proc.kill()
|
||||
await proc.wait()
|
||||
except ProcessLookupError:
|
||||
pass
|
||||
shutil.rmtree(workspace, ignore_errors=True)
|
||||
|
||||
return StreamingResponse(event_stream(), media_type="text/event-stream")
|
||||
|
||||
prompt = _synthesise_prompt(request.messages)
|
||||
|
||||
if not _reserve_queue_slot():
|
||||
|
|
@ -510,3 +578,56 @@ async def chat_completions(
|
|||
"total_tokens": 0,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
@app.post("/v1/conversational")
|
||||
async def conversational_turn(
|
||||
request: ConversationalRequest,
|
||||
authorization: str | None = Header(default=None),
|
||||
):
|
||||
"""Lean, multi-turn conversational Brain for the portal-assistant gateway.
|
||||
|
||||
Drives a no-tools conversational agent with per-conversation --resume — no
|
||||
workspace clone, no tools (see portal-assistant ADR-0002). Returns the
|
||||
assistant's reply text keyed to the caller's session id.
|
||||
"""
|
||||
verify_token(authorization)
|
||||
|
||||
model = request.model if request.model is not None else DEFAULT_MODEL
|
||||
if model not in SUPPORTED_MODELS:
|
||||
return JSONResponse(
|
||||
status_code=400,
|
||||
content={"error": "unsupported model", "supported": sorted(SUPPORTED_MODELS)},
|
||||
)
|
||||
|
||||
if not _reserve_queue_slot():
|
||||
return JSONResponse(
|
||||
status_code=503,
|
||||
content={"error": "execution failed", "detail": "queue full"},
|
||||
)
|
||||
|
||||
try:
|
||||
async with _execution_slot():
|
||||
result = await asyncio.wait_for(
|
||||
conversational.run_turn(request.session_id, request.message, model),
|
||||
timeout=conversational.CONVERSATIONAL_TIMEOUT_SECONDS,
|
||||
)
|
||||
except asyncio.TimeoutError:
|
||||
return JSONResponse(
|
||||
status_code=503,
|
||||
content={"error": "execution failed", "detail": "agent timed out"},
|
||||
)
|
||||
except Exception as exc: # noqa: BLE001
|
||||
return JSONResponse(
|
||||
status_code=503,
|
||||
content={"error": "execution failed", "detail": _one_line(str(exc))},
|
||||
)
|
||||
|
||||
if result["exit_code"] != 0:
|
||||
detail = _one_line(result.get("stderr") or "") or f"exit {result['exit_code']}"
|
||||
return JSONResponse(
|
||||
status_code=503,
|
||||
content={"error": "execution failed", "detail": detail},
|
||||
)
|
||||
|
||||
return {"session_id": request.session_id, "reply": result["reply"]}
|
||||
|
|
|
|||
259
docs/2026-06-14-afk-implementation-pipeline-design.md
Normal file
|
|
@ -0,0 +1,259 @@
|
|||
# AFK implementation pipeline — design
|
||||
|
||||
**Date:** 2026-06-14
|
||||
**Status:** proposed — pilot pending (see "Pilot" below; no code yet)
|
||||
**Scope:** A new autonomous path that turns a triaged `ready-for-agent` issue
|
||||
into tested, deployed code with no human at the keyboard. `claude-agent-service`
|
||||
becomes the **control plane**; a dedicated in-cluster **T3 Code** instance
|
||||
becomes the **executor + cockpit**. Touches: `claude-agent-service` (new poller
|
||||
+ dispatch + watcher), a new T3 stack in `infra/`, a shared SSD-NFS volume, and
|
||||
the per-repo issue trackers.
|
||||
|
||||
> Provenance: this design is the output of a long grilling session
|
||||
> (2026-06-14). It records the decisions *and* the alternatives that were
|
||||
> considered and dropped, so the reasoning survives. The three hardest-to-reverse
|
||||
> calls are split into ADRs 0002–0004.
|
||||
|
||||
## Problem
|
||||
|
||||
Today the development flow is **grill-with-docs → to-prd → to-issues → triage →
|
||||
implement**, and *every* stage is human-in-the-loop (HITL), including
|
||||
implementation. The owner wants the HITL boundary to stop at **design + spec**:
|
||||
once an issue is triaged `ready-for-agent`, an agent should pick it up and
|
||||
implement it **AFK** (away from keyboard) — write it test-first, push it, and
|
||||
see it through to a healthy deploy — escalating to a human only when it genuinely
|
||||
can't proceed.
|
||||
|
||||
Two gaps block this today:
|
||||
|
||||
- The only existing issue→agent automation is the **infra `issue-responder`**,
|
||||
which fires on `user-report`/`feature-request` labels on the `infra` repo
|
||||
only — not on `ready-for-agent`, not on the other sub-project repos that the
|
||||
general design flow produces.
|
||||
- `claude-agent-service` only ever clones `infra`, runs one-shot fire-and-forget
|
||||
`claude -p` jobs (no session, no live stream, no attach), and has no
|
||||
multi-repo checkout. The owner wants to *watch and steer* in-flight work, which
|
||||
the batch model can't offer.
|
||||
|
||||
## Goal
|
||||
|
||||
- HITL covers design + spec only. Publishing `ready-for-agent` issues is the
|
||||
release signal (the `to-issues` quiz is the review gate).
|
||||
- An autonomous loop picks up unblocked `ready-for-agent` issues from
|
||||
**enrolled** repos, implements them test-first, and lands them — pushing
|
||||
straight to `master` so CI deploys them (see ADR 0002 for the risk posture).
|
||||
- The owner can **see all in-flight workers and converse with any of them** from
|
||||
one UI — the T3 cockpit (see ADR 0003).
|
||||
- Reuse before building: lean on the existing CI/CD chain, the design skills, T3
|
||||
Code's multi-agent cockpit, and the persistence/worktree machinery — rather
|
||||
than hand-building a session console and a bespoke runtime.
|
||||
|
||||
## Design
|
||||
|
||||
### Roles: control plane vs executor + cockpit
|
||||
|
||||
| Concern | Owner |
|
||||
|---|---|
|
||||
| When to start, which issue, the prompt, the safety envelope | **claude-agent-service** (control plane) — poller + watcher |
|
||||
| Running the agent (Claude Agent SDK), the worktree, the fleet UI | **T3 Code** (executor + cockpit) — one dedicated in-cluster instance |
|
||||
| Build → image → deploy → rollout | existing CI/CD (GHA → ghcr → Woodpecker → Keel) |
|
||||
| Issue queue + state | the per-repo GitHub issue trackers |
|
||||
|
||||
The pivotal constraint that forces this split: **T3 can only display sessions it
|
||||
launched itself** — it has no command to adopt an externally-started session. So
|
||||
"viewable in T3" ⟺ "launched by T3". To keep `claude-agent-service` in charge
|
||||
*and* get the fleet view, the control plane **dispatches into T3** rather than
|
||||
running `claude` itself. See ADR 0003.
|
||||
|
||||
### End-to-end flow
|
||||
|
||||
```
|
||||
HUMAN (interactive session)
|
||||
/grill-with-docs → /to-prd → /to-issues → /triage
|
||||
└ produces ready-for-agent issues (dependency-ordered), labeled by a
|
||||
trusted collaborator. Publishing them = the release signal.
|
||||
══════════════════════ HANDOFF ══════════════════════
|
||||
CONTROL PLANE (claude-agent-service, in-cluster)
|
||||
poller CronJob (every few min):
|
||||
for repo in allowlist:
|
||||
skip repo if it already has an agent-in-progress issue (per-repo lock)
|
||||
pick highest-priority ready-for-agent issue where:
|
||||
• all "Blocked by" closed • labeled by a trusted collaborator
|
||||
→ stamp agent-in-progress
|
||||
→ POST /api/orchestration/dispatch (thread.turn.start + bootstrap:
|
||||
create thread, prepare worktree, run setup, deliver the prompt)
|
||||
EXECUTOR + COCKPIT (dedicated T3 instance, in-cluster)
|
||||
runs the issue-implementer agent (our prompt) in the worktree:
|
||||
read issue + AGENT-BRIEF + repo CONTEXT.md/ADRs → TDD red-green-refactor
|
||||
→ commit (paraphrase issue, "Closes #N", AFK trailer) → push master
|
||||
watcher (control plane) polls GET /api/orchestration/snapshot + CI:
|
||||
├─ healthy ──────► comment + close issue, drop lock, notify ✅
|
||||
├─ pre-push block ► do NOT push, relabel ready-for-human, escalate
|
||||
└─ post-push red ► fix-forward (≤5 attempts / 60 min)
|
||||
├─ recovers ► healthy
|
||||
└─ exhausts ► FREEZE broken (preserve forensics),
|
||||
relabel ready-for-human, hard page
|
||||
```
|
||||
|
||||
### Trigger & dispatch predicate
|
||||
|
||||
A poller CronJob (mirrors the existing `beads-dispatcher` pattern; stays
|
||||
in-cluster because neither the service nor T3 has public ingress). It dispatches
|
||||
issue *I* in repo *R* iff **all** hold:
|
||||
|
||||
- `R` is in the **allowlist** ConfigMap, and the **kill switch** is off;
|
||||
- `I` has label `ready-for-agent`, applied by a **trusted collaborator** (the
|
||||
trust gate — on private repos only collaborators can label, so the label *is*
|
||||
the authorization; external/bot issues never auto-run);
|
||||
- every issue in `I`'s "Blocked by" is closed;
|
||||
- `R` has no issue currently labeled `agent-in-progress` (the per-repo lock).
|
||||
|
||||
On dispatch it stamps `agent-in-progress`; on any terminal outcome it removes it.
|
||||
|
||||
### Concurrency & locking
|
||||
|
||||
**Parallel across repos, serial within a repo.** Multiple repos progress at
|
||||
once; at most one agent per repo (two agents in one repo would collide on the
|
||||
working tree). Enforced by the `agent-in-progress` label as a per-repo lock.
|
||||
Starting value; raise later.
|
||||
|
||||
### Merge & failure posture — see ADR 0002
|
||||
|
||||
- **Always push to master** (no PR gate). Tests-green is the merge gate; CI +
|
||||
rollback are the safety net, matching the human allow-then-audit model.
|
||||
- **Pre-push** failure (can't get green / blocked / would need a disallowed op):
|
||||
do *not* push; relabel `ready-for-human`; comment what was tried; page.
|
||||
- **Post-push** failure (CI build or rollout red): **fix-forward** up to **5
|
||||
attempts or 60 minutes**, then if still red **freeze in the broken state**
|
||||
(preserve forensics — do not auto-revert), relabel `ready-for-human`, hard
|
||||
page. The owner explicitly chose debuggability over availability here.
|
||||
- **Budget:** `max_budget_usd = 100` per issue (time/attempt caps usually bite
|
||||
first).
|
||||
|
||||
### Build/test environment & worktrees — see ADR 0004
|
||||
|
||||
The agent must run the target repo's test suite (TDD gate) before pushing.
|
||||
Therefore:
|
||||
|
||||
- **Local toolchains scoped to the allowlist** — the executor image carries only
|
||||
the *enrolled* repos' runtimes; the toolchain set grows in lockstep with the
|
||||
allowlist.
|
||||
- **Persistent per-repo checkout + `git worktree` per issue** on a shared
|
||||
**SSD-NFS** volume, so git objects, installed deps, and package-manager caches
|
||||
stay warm across jobs. This **supersedes** the throwaway `git clone --local`
|
||||
model from `2026-06-02-parallel-execution-design.md`; that rejection was
|
||||
correct for *concurrent* same-repo jobs, but the serial-within-repo choice
|
||||
here removes the `.git` contention it guarded against (ADR 0004). It pays off
|
||||
precisely because `to-issues` clusters many slices in one repo, processed
|
||||
serially — slice N reuses the warm checkout slice 1 paid for.
|
||||
|
||||
### T3 integration: thin dispatch — see ADR 0003
|
||||
|
||||
The control plane holds a capability-scoped **`orchestration:operate`** bearer
|
||||
token (minted via `t3 auth`, stored in Vault, refreshed for the 1-hour expiry)
|
||||
and calls T3's HTTP API:
|
||||
|
||||
- `POST /api/orchestration/dispatch` → `thread.turn.start` with a `bootstrap`
|
||||
that creates the thread, prepares the worktree, optionally runs a setup
|
||||
script, and delivers the prompt — one call spawns a worktree-isolated worker.
|
||||
- `GET /api/orchestration/snapshot` → the full fleet read-model (per-thread
|
||||
`running`/`idle`/`error`, `hasPendingUserInput`, `hasPendingApprovals`,
|
||||
`branch`, `worktreePath`). T3 has **no outbound webhooks**, so the watcher
|
||||
**polls** this to drive CI-watch, freeze, and label transitions.
|
||||
|
||||
The AFK *behavior and safety* (issue-implementer prompt, guardrails, always-push,
|
||||
fix-forward/freeze, issue integration) live in **our** thin layer, so T3 is a
|
||||
**swappable, version-pinned backend** — never Keel-auto-upgraded, reversible to a
|
||||
self-hosted runtime if it goes sideways.
|
||||
|
||||
### Observability & interaction
|
||||
|
||||
The "active sessions layer" and the "attach and converse" surface **converge
|
||||
into one screen — the T3 cockpit**: a live list of all worker threads grouped by
|
||||
project; click one to stream its transcript and send it a turn. This dissolves
|
||||
the earlier intermediate ideas of a generalized-breakglass console and a
|
||||
raw-tmux hybrid attach — T3 provides converse / approve / resume natively
|
||||
(`thread.user-input.respond`, `thread.approval.respond`).
|
||||
|
||||
Cross-system, durable signals the control plane still emits:
|
||||
|
||||
- **Phase-checklist comment** on the issue, edited in place as phases complete
|
||||
(worktree → tests-red → green → pushed → CI → deployed). Durable, low-noise,
|
||||
lives on the issue, doubles as audit trail.
|
||||
- **Loki** logs labeled `{repo, issue}` for deep-dive.
|
||||
- **Presence** claim per running session (`repo:<name>`, purpose `AFK #N`),
|
||||
heartbeated — so AFK work shows up next to human sessions in the layer the
|
||||
prompt hook already injects.
|
||||
- **Doorbell**: Slack / ntfy ping on terminal states, deep-linking into the T3
|
||||
thread. Notify, not control — the dedicated-Slack-control-plane idea is
|
||||
dropped in favour of the T3 cockpit.
|
||||
|
||||
### Safety envelope
|
||||
|
||||
- **Trust gate** — only collaborator-labeled `ready-for-agent` issues run.
|
||||
- **Allowlist** — a repo is untouchable until enrolled (prereqs: tests + GHA CI
|
||||
+ `CONTEXT.md`). Start with 1–2 repos; expand deliberately.
|
||||
- **Kill switch** — one ConfigMap flag pauses all pickup (the Keel
|
||||
scale-to-0 reflex, built in from day one).
|
||||
- **Per-repo lock** — ≤1 agent per repo.
|
||||
- **Guardrails** (reused from `issue-responder`) — no PVC/PV deletes, no direct
|
||||
Vault edits, no force-push to master, infra changes Terraform-only, never
|
||||
`[ci skip]`.
|
||||
- **Identity & audit** — shared service identity; each commit body paraphrases
|
||||
the issue and carries `Closes #N` + an AFK-agent trailer, so the commit
|
||||
message stays the audit trail.
|
||||
|
||||
## Parameters (chosen starting values — all tunable)
|
||||
|
||||
| Knob | Value |
|
||||
|---|---|
|
||||
| Merge gate | always push to master |
|
||||
| Post-push failure | fix-forward, then freeze-broken |
|
||||
| Fix-forward cap | 5 attempts **or** 60 minutes |
|
||||
| Per-issue budget | `max_budget_usd = 100` |
|
||||
| Concurrency | parallel across repos, serial within a repo |
|
||||
| Repo scope | opt-in allowlist, start small |
|
||||
| Progress detail | phase-checklist on issue + Loki logs |
|
||||
| Alert channel | Slack (+ ntfy), as a doorbell into T3 |
|
||||
| Executor | dedicated in-cluster T3 (thin dispatch), version-pinned |
|
||||
|
||||
## Pilot — validate before wiring the poller
|
||||
|
||||
The thin model rests on five unknowns. Stand up the dedicated T3 instance and
|
||||
drive a couple of allowlist-repo issues **by hand** via the dispatch API to
|
||||
confirm each, *before* building the poller and committing the architecture:
|
||||
|
||||
1. **Per-thread custom agent + skip-permissions** — can a dispatched thread
|
||||
carry *our* `issue-implementer` system prompt and run unattended without
|
||||
stalling on T3's approval gating? *(biggest unknown)*
|
||||
2. **Dispatch auth** — mint `orchestration:operate`, store in Vault, refresh the
|
||||
1-hour token.
|
||||
3. **Status/completion** — drive CI-watch/freeze/labels purely from polling
|
||||
`GET /api/orchestration/snapshot`.
|
||||
4. **Worktree reconciliation** — T3's native `prepareWorktree` vs our
|
||||
persistent-checkout-with-warm-caches; pick one or make them cooperate on the
|
||||
volume.
|
||||
5. **The in-cluster T3 pod** — headless `t3 serve --no-browser`, version-pinned
|
||||
and **Keel-excluded**, internal ingress + Authentik, with tokens / toolchains
|
||||
/ SSD volume / `claude auth` provisioned.
|
||||
|
||||
## Relationship to prior decisions
|
||||
|
||||
- **Supersedes** the worktree rejection in
|
||||
`2026-06-02-parallel-execution-design.md` (contextualized, not contradicted —
|
||||
ADR 0004).
|
||||
- **Drops** two intermediate ideas explored and rejected this session:
|
||||
evolving `claude-agent-service` into its own session/tmux/worktree runtime,
|
||||
and building a bespoke breakglass-generalized console — both replaced by T3.
|
||||
- **Reuses** the `issue-responder` guardrails, the CI/CD chain, the
|
||||
`beads-dispatcher` CronJob pattern, presence, Loki, and the design skills.
|
||||
|
||||
## Out of scope / open questions
|
||||
|
||||
- Raw-terminal "take-over" of a worker (T3 is a GUI cockpit, not a terminal); if
|
||||
ever needed, that's a separate add-on.
|
||||
- Multi-tenant T3 (it is single-operator by design — fine, it matches the shared
|
||||
service identity).
|
||||
- Cross-repo dependency orchestration beyond per-issue "Blocked by".
|
||||
- T3 Code is pre-1.0 (~v0.0.x) and churny; the version-pin + Keel-exclude +
|
||||
swappable-backend discipline (ADR 0003) is the mitigation.
|
||||
69
docs/adr/0002-afk-autonomous-merge-and-failure-posture.md
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# AFK agents push straight to master; failures fix-forward then freeze, not revert
|
||||
|
||||
The AFK implementation pipeline (see
|
||||
`docs/2026-06-14-afk-implementation-pipeline-design.md`) lets an autonomous
|
||||
agent land code with no human at the keyboard. The owner deliberately chose the
|
||||
most hands-off posture: **AFK-written code pushes straight to `master`** (which
|
||||
then deploys via the existing CI/CD chain) with **no pull-request review gate**,
|
||||
and when a deploy breaks, the agent **fixes forward and then freezes the broken
|
||||
state** rather than auto-reverting. This ADR records that risk posture and why it
|
||||
was chosen over the safer alternatives, because it is surprising and not cheap to
|
||||
walk back once callers and habits depend on it.
|
||||
|
||||
## Status
|
||||
|
||||
accepted (2026-06-14) — posture decided; enforced once the pipeline ships
|
||||
(pilot-gated).
|
||||
|
||||
## Context
|
||||
|
||||
`master` on every enrolled repo deploys continuously (GHA build → ghcr →
|
||||
Woodpecker → Keel). So "where AFK code lands" is really "what reaches a live
|
||||
deploy without a human looking". The owner weighed three merge gates and three
|
||||
post-push failure responses and picked the autonomy-maximizing end of both,
|
||||
accepting the blast radius explicitly.
|
||||
|
||||
## Considered options — merge gate
|
||||
|
||||
- **Always push to master (chosen).** Tests-green is the gate; CI + rollback are
|
||||
the safety net. Matches the existing human allow-then-audit model (non-admins
|
||||
already push straight to master). Most hands-off.
|
||||
- **Adaptive (push if confident, else PR)** — rejected as the *default* though it
|
||||
is what `issue-responder` does; the owner wanted full hands-off, not a
|
||||
confidence-gated PR for otherwise-working code.
|
||||
- **Always open a PR** — rejected: reintroduces a human merge step on every
|
||||
issue, i.e. "AFK implementation, human merge" — not the goal.
|
||||
|
||||
## Considered options — post-push failure (CI/rollout goes red after a green push)
|
||||
|
||||
- **Fix-forward then freeze (chosen).** Iterate with corrective commits up to
|
||||
**5 attempts or 60 minutes**; if still red, **leave the broken state in place**
|
||||
(do not revert), relabel the issue `ready-for-human`, and hard-page. Same
|
||||
forensics-first instinct as the breakglass (ADR 0001): preserve the exact
|
||||
failing state for debugging rather than auto-cleaning it away.
|
||||
- **Auto-revert + escalate** — rejected (was the recommendation): restores green
|
||||
fastest, but destroys the forensic state the owner wants to inspect.
|
||||
- **Alert and freeze immediately (no fix-forward)** — rejected: gives up on
|
||||
transient/env-drift failures a corrective commit would clear.
|
||||
|
||||
Pre-push failure (can't reach green, blocked, or would need a disallowed op) is
|
||||
not a dilemma: the agent does **not** push, relabels `ready-for-human`, comments
|
||||
what it tried, and pages.
|
||||
|
||||
## Consequences
|
||||
|
||||
- An unreviewed logic error can deploy before any human sees it; rollback (not
|
||||
review) is the safety net. Bounded by: tests-as-gate, the start-small
|
||||
allowlist, the per-repo lock, and the kill switch.
|
||||
- A frozen-broken deploy can sit unhealthy until the owner answers the page —
|
||||
availability is traded for debuggability, by explicit choice. Acceptable
|
||||
because enrolled repos are non-critical by the allowlist prerequisite, and the
|
||||
owner is paged hard (Slack + ntfy).
|
||||
- Fix-forward can stack up to 5 commits on a bad change before freezing; the
|
||||
60-minute cap bounds the churn window.
|
||||
- Per-issue spend is capped at `max_budget_usd = 100`.
|
||||
- Guardrails still hold underneath this posture: no PVC/PV deletes, no direct
|
||||
Vault edits, no force-push, infra changes Terraform-only, never `[ci skip]`.
|
||||
- Reversible: tightening to adaptive/PR or to auto-revert is a config + watcher
|
||||
change, not a re-architecture — but callers/habits will have formed around
|
||||
"it just lands", so flag loudly if reversing.
|
||||
70
docs/adr/0003-t3-thin-executor-and-cockpit.md
Normal file
|
|
@ -0,0 +1,70 @@
|
|||
# AFK workers run inside a dedicated T3 Code instance; claude-agent-service dispatches into it
|
||||
|
||||
The owner wants one UI to see and converse with every in-flight AFK worker, and
|
||||
named **T3 Code** (the self-hosted multi-agent cockpit already running at
|
||||
`t3.viktorbarzin.me`) as that UI. Research into T3's source
|
||||
(`pingdotgg/t3code`, ~v0.0.27) found it is genuinely built for this — a fleet of
|
||||
worker "threads" with a live read-model and a scoped HTTP dispatch API — **but**
|
||||
it can only display sessions **it launched itself**; there is no command to adopt
|
||||
a session another process started. So "viewable in T3" ⟺ "launched by T3". This
|
||||
ADR records the resulting architecture: `claude-agent-service` stays the
|
||||
**control plane** and **dispatches into a dedicated, in-cluster T3 instance**
|
||||
which is the **executor + cockpit**. The agent runs inside T3; we keep the brain.
|
||||
|
||||
## Status
|
||||
|
||||
accepted (2026-06-14) — direction decided; **gated on a pilot** (the five
|
||||
unknowns in the design doc) before the poller is wired and the architecture is
|
||||
committed.
|
||||
|
||||
## Why T3, and why "thin"
|
||||
|
||||
T3 provides, out of the box, what we would otherwise hand-build: a three-panel
|
||||
fleet cockpit (`projects → threads → conversation`), an
|
||||
`OrchestrationReadModel` with per-thread live status, and
|
||||
`POST /api/orchestration/dispatch` whose `thread.turn.start` + `bootstrap` can
|
||||
**create a thread, prepare a git worktree, run a setup script, and deliver a
|
||||
prompt in one call** — exactly the worker-spawn primitive. Converse / approve /
|
||||
resume are native (`thread.user-input.respond`, `thread.approval.respond`). For
|
||||
Claude it embeds `@anthropic-ai/claude-agent-sdk`.
|
||||
|
||||
"Thin" = the AFK *behavior and safety* (the `issue-implementer` prompt,
|
||||
guardrails, always-push, fix-forward/freeze, CI-watch, issue integration) live
|
||||
in **our** layer (the poller + watcher), not in T3. T3 is a **swappable backend**
|
||||
we drive over its API.
|
||||
|
||||
## Considered options
|
||||
|
||||
- **Thin: claude-agent-service dispatches into T3 (chosen).** Control plane calls
|
||||
T3's dispatch API; T3 runs the agent in a worktree and shows it. Get the fleet
|
||||
view, keep the brain, least to build. Cost: execution moves into the T3 pod, so
|
||||
T3's runtime is in the *hot path* (not just the window).
|
||||
- **claude-agent-service runs the agent, T3 only displays it** — rejected because
|
||||
it is impossible: T3 cannot adopt an externally-started session
|
||||
(`thread.session.set` is server-internal; no external-session-id field). This
|
||||
is the constraint that shaped the whole decision.
|
||||
- **Deep: claude-agent-service as a custom T3 provider (ACP-style)** — rejected
|
||||
for now: keeps the runtime ours with a T3 UI, but means building and
|
||||
maintaining a provider against a pre-1.0, internal, no-contributions interface
|
||||
— effectively a fork. Revisit only if "thin" proves too limiting.
|
||||
- **Skip T3; build our own console** (generalized breakglass + tmux) — rejected:
|
||||
most stable and fully in-house, but abandons the owner's explicit "see workers
|
||||
in T3" goal and means owning a session console forever.
|
||||
|
||||
## Consequences
|
||||
|
||||
- A **dedicated in-cluster T3 instance** (a pod, consistent with the earlier
|
||||
in-cluster-over-devvm substrate choice) is the worker host, separate from the
|
||||
per-user devvm T3 instances. It needs the SSD worktree volume, git/Anthropic
|
||||
tokens, toolchains, `claude auth`, and an internal Authentik-gated ingress.
|
||||
- T3's runtime is now in the **execution hot path** — its maturity affects
|
||||
whether work *runs*, not only whether it can be *seen*. Mitigations: **pin the
|
||||
version and exclude it from Keel** (its churn + hard-cutover auth migrations
|
||||
make auto-upgrade a Keel-class hazard), keep the integration thin and the
|
||||
backend swappable, and **pilot** the five unknowns first.
|
||||
- T3 is **single-operator** — fine here: it matches the already-accepted shared
|
||||
service identity for AFK work.
|
||||
- No outbound webhooks from T3 → the watcher **polls**
|
||||
`GET /api/orchestration/snapshot`.
|
||||
- This supersedes the intermediate ideas of evolving `claude-agent-service` into
|
||||
its own session/tmux/worktree runtime and building a bespoke attach console.
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
# Implementation agents use persistent per-repo checkouts + git worktrees, reversing the throwaway-clone rule for this path
|
||||
|
||||
`2026-06-02-parallel-execution-design.md` deliberately **rejected git worktrees**
|
||||
and chose throwaway `git clone --local` per job, "because worktrees share one
|
||||
`.git` → agents that `git commit`/`pull` still contend — not truly independent".
|
||||
The AFK implementation pipeline
|
||||
(`docs/2026-06-14-afk-implementation-pipeline-design.md`) **reverses that for its
|
||||
own path**: each enrolled repo gets a **persistent checkout**, and each issue
|
||||
runs in a **`git worktree`** off it, on a shared **SSD-NFS** volume. This ADR
|
||||
records why the earlier rejection does not apply here — so the two decisions
|
||||
read as complementary, not contradictory.
|
||||
|
||||
## Status
|
||||
|
||||
accepted (2026-06-14) — for the AFK implementation path only; the existing
|
||||
job-runner (recruiter-triage, nextcloud-todos, etc.) keeps throwaway clones.
|
||||
|
||||
## Why the 2026-06-02 rejection doesn't bind this path
|
||||
|
||||
The rejection's premise was **concurrent jobs in the same checkout** contending
|
||||
on `.git/index.lock` and racing `git pull`. The AFK pipeline's concurrency model
|
||||
is **serial within a repo, parallel only across repos** (ADR-adjacent decision in
|
||||
the design doc): at most one agent ever touches a given repo's `.git` at a time,
|
||||
and different repos are different checkouts. The contention the rejection guarded
|
||||
against cannot occur here. With that removed, worktrees become the *better*
|
||||
choice because they unlock cache reuse the throwaway model can't.
|
||||
|
||||
## Considered options
|
||||
|
||||
- **Persistent checkout + worktree per issue, on SSD-NFS (chosen).** Warm git
|
||||
objects, **persisted `node_modules`/venv/build caches**, and shared
|
||||
package-manager caches survive across jobs, so the TDD loop stops reinstalling
|
||||
deps every run. Compounds with `to-issues` clustering many slices in one repo,
|
||||
processed serially — slice N reuses slice 1's warm tree.
|
||||
- **Throwaway `git clone --local` per job (status quo elsewhere)** — rejected for
|
||||
this path: correct for the concurrent job-runner, but re-pays dependency
|
||||
install on every issue, which dominates wall-clock for an
|
||||
implement-test-fix-forward loop.
|
||||
- **`cp -a` of a warm tree** — rejected (same reason as 2026-06-02): copies
|
||||
accumulated caches → disk blowup, and no git isolation.
|
||||
|
||||
## Considered options — storage
|
||||
|
||||
- **SSD-NFS (chosen).** The current `/persistent` PVC is `5Gi` **HDD NFS**
|
||||
(`nfs-truenas` → `/srv/nfs`) and unused; git checkouts + `node_modules` are
|
||||
death-by-small-files on HDD NFS and 5Gi is too small. Provision an SSD-backed
|
||||
NFS class over `/srv/nfs-ssd` (other apps already use that path) at a realistic
|
||||
size (tens of GB).
|
||||
- **HDD NFS / `/persistent` as-is** — rejected: too slow for many small files,
|
||||
too small.
|
||||
- **Local block (proxmox-lvm)** — rejected: faster but HDD and node-pinned (RWO),
|
||||
lost on reschedule; NFS RWX survives and the volume also holds session state.
|
||||
|
||||
## Consequences
|
||||
|
||||
- One **SSD-NFS volume** holds, per enrolled repo: the persistent checkout, the
|
||||
warm dep/package caches, and (under ADR 0003) the worktrees T3 prepares. Cache
|
||||
env (`pip`, `GOMODCACHE`/`GOCACHE`, `PNPM_HOME`/npm, cargo) must be wired to it
|
||||
— today caching is off (`pip --no-cache-dir`, no cache envs set).
|
||||
- Housekeeping the throwaway model didn't need: `git fetch` before each
|
||||
`worktree add`, periodic `git worktree prune` + `git gc`, and cache eviction if
|
||||
the volume fills.
|
||||
- **`infra` stays on its own path** — it is git-crypt, and editing encrypted
|
||||
files from a worktree is disallowed; the persistent-worktree model is for the
|
||||
non-`infra` app repos in the allowlist.
|
||||
- Open reconciliation (pilot): whether T3's native `prepareWorktree` writes into
|
||||
this volume + our persistent checkouts, or we manage the checkout and point T3
|
||||
at it. Resolve before committing the architecture.
|
||||
|
|
@ -2,9 +2,28 @@
|
|||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8" />
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
|
||||
<!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
|
||||
notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
|
||||
user-scaling keeps the cockpit layout stable under stress on mobile. -->
|
||||
<meta
|
||||
name="viewport"
|
||||
content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
|
||||
/>
|
||||
<meta name="color-scheme" content="dark" />
|
||||
<meta name="robots" content="noindex, nofollow" />
|
||||
|
||||
<!-- PWA / installable. theme-color tints the mobile status bar to the dark
|
||||
theme; black-translucent lets the app draw under the iOS status bar. -->
|
||||
<meta name="theme-color" content="#06080b" />
|
||||
<link rel="manifest" href="./manifest.webmanifest" />
|
||||
<meta name="apple-mobile-web-app-capable" content="yes" />
|
||||
<meta name="mobile-web-app-capable" content="yes" />
|
||||
<meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
|
||||
<meta name="apple-mobile-web-app-title" content="breakglass" />
|
||||
<link rel="apple-touch-icon" href="./apple-touch-icon.png" />
|
||||
<link rel="icon" type="image/svg+xml" href="./icon.svg" />
|
||||
<link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
|
||||
|
||||
<title>devvm breakglass</title>
|
||||
</head>
|
||||
<body>
|
||||
|
|
|
|||
BIN
frontend/public/apple-touch-icon.png
Normal file
|
After Width: | Height: | Size: 30 KiB |
BIN
frontend/public/icon-192.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
frontend/public/icon-512.png
Normal file
|
After Width: | Height: | Size: 48 KiB |
64
frontend/public/icon.svg
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
|
||||
<defs>
|
||||
<!-- layered near-black surface, matching the app theme -->
|
||||
<radialGradient id="bg" cx="68%" cy="22%" r="92%">
|
||||
<stop offset="0%" stop-color="#12303a"/>
|
||||
<stop offset="42%" stop-color="#0b0f14"/>
|
||||
<stop offset="100%" stop-color="#06080b"/>
|
||||
</radialGradient>
|
||||
<linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
|
||||
<stop offset="0%" stop-color="#7df0f3"/>
|
||||
<stop offset="55%" stop-color="#3dd1d6"/>
|
||||
<stop offset="100%" stop-color="#1f6f72"/>
|
||||
</linearGradient>
|
||||
<filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
|
||||
<feGaussianBlur stdDeviation="7" result="b"/>
|
||||
<feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
|
||||
</filter>
|
||||
</defs>
|
||||
|
||||
<!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
|
||||
<rect width="512" height="512" rx="112" fill="url(#bg)"/>
|
||||
<rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
|
||||
<!-- faint scanline texture -->
|
||||
<g opacity="0.05" stroke="#ffffff" stroke-width="2">
|
||||
<line x1="0" y1="148" x2="512" y2="148"/>
|
||||
<line x1="0" y1="220" x2="512" y2="220"/>
|
||||
<line x1="0" y1="292" x2="512" y2="292"/>
|
||||
<line x1="0" y1="364" x2="512" y2="364"/>
|
||||
</g>
|
||||
|
||||
<!-- fracture burst (amber): the "break the glass" radiating cracks -->
|
||||
<g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
|
||||
fill="none" opacity="0.92" filter="url(#glow)">
|
||||
<path d="M256 256 L142 132"/>
|
||||
<path d="M256 256 L120 250"/>
|
||||
<path d="M256 256 L150 372"/>
|
||||
<path d="M256 256 L372 380"/>
|
||||
<path d="M256 256 L392 246"/>
|
||||
<path d="M256 256 L360 138"/>
|
||||
<!-- cross-cracks -->
|
||||
<path d="M186 196 L150 250"/>
|
||||
<path d="M210 320 L172 318" opacity="0.7"/>
|
||||
<path d="M326 318 L356 350" opacity="0.7"/>
|
||||
</g>
|
||||
|
||||
<!-- wrench, struck across the burst (cyan steel) -->
|
||||
<g filter="url(#glow)">
|
||||
<path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
|
||||
d="M344 150
|
||||
a62 62 0 0 0 -82 76
|
||||
L150 338
|
||||
a26 26 0 0 0 0 37
|
||||
l11 11
|
||||
a26 26 0 0 0 37 0
|
||||
l112 -112
|
||||
a62 62 0 0 0 76 -82
|
||||
l-41 41
|
||||
l-40 -11
|
||||
l-11 -40
|
||||
z"/>
|
||||
<!-- handle highlight -->
|
||||
<path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
|
||||
</g>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 2.5 KiB |
31
frontend/public/manifest.webmanifest
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
{
|
||||
"name": "devvm breakglass",
|
||||
"short_name": "breakglass",
|
||||
"description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
|
||||
"start_url": "./",
|
||||
"scope": "./",
|
||||
"display": "standalone",
|
||||
"orientation": "portrait",
|
||||
"background_color": "#06080b",
|
||||
"theme_color": "#06080b",
|
||||
"icons": [
|
||||
{
|
||||
"src": "./icon.svg",
|
||||
"type": "image/svg+xml",
|
||||
"sizes": "any",
|
||||
"purpose": "any maskable"
|
||||
},
|
||||
{
|
||||
"src": "./icon-192.png",
|
||||
"type": "image/png",
|
||||
"sizes": "192x192",
|
||||
"purpose": "any maskable"
|
||||
},
|
||||
{
|
||||
"src": "./icon-512.png",
|
||||
"type": "image/png",
|
||||
"sizes": "512x512",
|
||||
"purpose": "any maskable"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
|
@ -1,100 +1,294 @@
|
|||
<script>
|
||||
import { onMount } from 'svelte';
|
||||
import { openSession } from './lib/api.js';
|
||||
import { onMount, onDestroy } from 'svelte';
|
||||
import {
|
||||
openSession,
|
||||
attachStream,
|
||||
sendPrompt,
|
||||
cancelTurn,
|
||||
loadSessionId,
|
||||
saveSessionId,
|
||||
clearSessionId,
|
||||
} from './lib/api.js';
|
||||
import { createTranscript, reduceEvent } from './lib/transcript.js';
|
||||
import Chat from './Chat.svelte';
|
||||
import VmControls from './VmControls.svelte';
|
||||
|
||||
// ── session lifecycle ────────────────────────────────────────────────────
|
||||
// ── lifecycle state ───────────────────────────────────────────────────────
|
||||
// link: connecting | attached | error (the EventSource to the session)
|
||||
let link = $state('connecting');
|
||||
let linkError = $state('');
|
||||
let sessionId = $state('');
|
||||
let sessionState = $state('connecting'); // connecting | ready | error
|
||||
let sessionError = $state('');
|
||||
let streaming = $state(false);
|
||||
let caughtUp = $state(false); // replay drained → live tailing
|
||||
let turnActive = $state(false); // a turn is running (Stop shown, Send off)
|
||||
let sending = $state(false); // a prompt POST is in flight
|
||||
|
||||
// Mobile: the VM controls live in a slide-up sheet. Desktop: a side column
|
||||
// (CSS hides the toggle and pins the sheet open as a column ≥900px).
|
||||
// The transcript is folded with a plain mutable object; we bump `rev` to
|
||||
// notify the view of in-place mutations (cheaper than cloning the whole
|
||||
// message list on every streamed token). `tx` is $state too, so REASSIGNING
|
||||
// it (reset / new session) also propagates to the Chat prop. $state.raw keeps
|
||||
// the object un-proxied so the hot per-token path stays a plain mutation.
|
||||
let tx = $state.raw(createTranscript());
|
||||
let rev = $state(0);
|
||||
|
||||
let es = null; // the live EventSource
|
||||
|
||||
// Mobile: VM controls live in a slide-up sheet. Desktop (≥900px): a column.
|
||||
let showControls = $state(false);
|
||||
|
||||
async function newSession() {
|
||||
sessionState = 'connecting';
|
||||
sessionError = '';
|
||||
try {
|
||||
sessionId = await openSession();
|
||||
sessionState = 'ready';
|
||||
} catch (err) {
|
||||
sessionState = 'error';
|
||||
sessionError = err instanceof Error ? err.message : String(err);
|
||||
function resetTranscript() {
|
||||
tx = createTranscript();
|
||||
rev++;
|
||||
}
|
||||
|
||||
function onEvent(ev) {
|
||||
if (reduceEvent(tx, ev)) {
|
||||
// turn liveness tracks the folder's view of the stream, so a turn started
|
||||
// in ANOTHER tab (or before a reload) still flips us into "active".
|
||||
turnActive = tx.activeUserSeen;
|
||||
rev++;
|
||||
}
|
||||
}
|
||||
|
||||
onMount(newSession);
|
||||
|
||||
function onLiveSession(id) {
|
||||
if (id) sessionId = id;
|
||||
function closeStream() {
|
||||
if (es) {
|
||||
es.close();
|
||||
es = null;
|
||||
}
|
||||
}
|
||||
|
||||
const shortId = $derived(sessionId ? sessionId.slice(0, 8) : '────────');
|
||||
const dotState = $derived(
|
||||
sessionState === 'error' ? 'error' : streaming ? 'busy' : sessionState === 'ready' ? 'ready' : 'idle'
|
||||
function attach(id) {
|
||||
closeStream();
|
||||
sessionId = id;
|
||||
caughtUp = false;
|
||||
link = 'connecting';
|
||||
linkError = '';
|
||||
es = attachStream(id, {
|
||||
onOpen: () => {
|
||||
// a successful (re)connection clears any prior transient error
|
||||
if (link !== 'attached') link = 'attached';
|
||||
linkError = '';
|
||||
},
|
||||
onCaughtUp: () => {
|
||||
caughtUp = true;
|
||||
link = 'attached';
|
||||
},
|
||||
onEvent,
|
||||
onError: () => {
|
||||
// EventSource auto-reconnects on a transient drop (readyState
|
||||
// CONNECTING). Only a terminal CLOSED state is a hard failure. The
|
||||
// server keeps the turn running regardless, so we surface a soft note
|
||||
// and let the browser retry.
|
||||
if (es && es.readyState === EventSource.CLOSED) {
|
||||
link = 'error';
|
||||
linkError = 'lost the connection to the session — retrying…';
|
||||
// a closed source won't retry itself; re-attach to the same id.
|
||||
setTimeout(() => {
|
||||
if (sessionId === id) attach(id);
|
||||
}, 1500);
|
||||
} else {
|
||||
link = 'connecting';
|
||||
}
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
async function bootstrap() {
|
||||
link = 'connecting';
|
||||
linkError = '';
|
||||
resetTranscript();
|
||||
const existing = loadSessionId();
|
||||
if (existing) {
|
||||
// Reuse the persisted id and attach. If it's gone (pod restart → 404 on
|
||||
// the stream), the EventSource errors; we detect the 404-shaped close and
|
||||
// mint a fresh session below.
|
||||
attach(existing);
|
||||
// Probe liveness: if the attach can't open within a grace window AND the
|
||||
// id is stale, create a new one. We rely on onError(CLOSED) for the 404.
|
||||
return;
|
||||
}
|
||||
await createFresh();
|
||||
}
|
||||
|
||||
async function createFresh() {
|
||||
try {
|
||||
link = 'connecting';
|
||||
const id = await openSession();
|
||||
saveSessionId(id);
|
||||
attach(id);
|
||||
} catch (err) {
|
||||
link = 'error';
|
||||
linkError = err instanceof Error ? err.message : String(err);
|
||||
}
|
||||
}
|
||||
|
||||
// "New session": archive the local id, mint a new one, re-attach.
|
||||
async function newSession() {
|
||||
if (turnActive || sending) return;
|
||||
closeStream();
|
||||
clearSessionId();
|
||||
resetTranscript();
|
||||
turnActive = false;
|
||||
await createFresh();
|
||||
}
|
||||
|
||||
// Send a prompt (typed or a preset). Output arrives via the attach stream.
|
||||
async function submitPrompt(prompt) {
|
||||
const text = (prompt || '').trim();
|
||||
if (!text || turnActive || sending) return;
|
||||
if (!sessionId) {
|
||||
await createFresh();
|
||||
if (!sessionId) return;
|
||||
}
|
||||
sending = true;
|
||||
turnActive = true; // optimistic: the working indicator shows immediately
|
||||
try {
|
||||
const res = await sendPrompt({ session_id: sessionId, prompt: text });
|
||||
if (res.status === 'busy') {
|
||||
flash = 'A turn is already running.';
|
||||
// turn really is active; keep the indicator, the stream will end it.
|
||||
} else if (res.status === 'gone') {
|
||||
// session evaporated (pod restart). Re-create and resend once.
|
||||
clearSessionId();
|
||||
await createFresh();
|
||||
if (sessionId) await sendPrompt({ session_id: sessionId, prompt: text });
|
||||
}
|
||||
} catch (err) {
|
||||
flash = err instanceof Error ? err.message : String(err);
|
||||
turnActive = tx.activeUserSeen; // back off the optimistic flag on failure
|
||||
} finally {
|
||||
sending = false;
|
||||
}
|
||||
}
|
||||
|
||||
async function stopTurn() {
|
||||
if (!sessionId) return;
|
||||
try {
|
||||
await cancelTurn(sessionId);
|
||||
// turn_end / cancelled events arrive via the stream and flip turnActive.
|
||||
} catch (err) {
|
||||
flash = err instanceof Error ? err.message : String(err);
|
||||
}
|
||||
}
|
||||
|
||||
// a transient toast (409 / network blips), auto-cleared
|
||||
let flash = $state('');
|
||||
let flashTimer;
|
||||
$effect(() => {
|
||||
if (flash) {
|
||||
clearTimeout(flashTimer);
|
||||
flashTimer = setTimeout(() => (flash = ''), 4200);
|
||||
}
|
||||
});
|
||||
|
||||
onMount(bootstrap);
|
||||
onDestroy(closeStream);
|
||||
|
||||
// ── header status lamp ──────────────────────────────────────────────────
|
||||
// One quietly-living "system pulse": idle/connecting (cyan breathe),
|
||||
// working (amber pulse), error (steady red — the ONLY non-power red, used
|
||||
// sparingly for the lamp because connection loss IS the emergency here).
|
||||
const lamp = $derived(
|
||||
link === 'error'
|
||||
? 'error'
|
||||
: turnActive
|
||||
? 'working'
|
||||
: link === 'attached'
|
||||
? 'live'
|
||||
: 'connecting'
|
||||
);
|
||||
const lampLabel = $derived(
|
||||
{
|
||||
error: 'link down',
|
||||
working: 'agent working',
|
||||
live: 'attached',
|
||||
connecting: 'connecting',
|
||||
}[lamp]
|
||||
);
|
||||
const shortId = $derived(sessionId ? sessionId.slice(0, 8) : '········');
|
||||
</script>
|
||||
|
||||
<div class="shell">
|
||||
<header class="rail">
|
||||
<header class="rail rise-in" style="--d:0ms">
|
||||
<div class="rail-title">
|
||||
<span class="glyph" aria-hidden="true">🔧</span>
|
||||
<h1>devvm <span class="accent">breakglass</span></h1>
|
||||
<span class="brand-mark" aria-hidden="true">
|
||||
<!-- breakglass glyph: a wrench struck through a fracture line -->
|
||||
<svg viewBox="0 0 24 24" width="22" height="22" fill="none" stroke="currentColor"
|
||||
stroke-width="1.6" stroke-linecap="round" stroke-linejoin="round">
|
||||
<path d="M15.5 5.5a3.6 3.6 0 0 0-4.7 4.4L4 16.7 7.3 20l6.8-6.8a3.6 3.6 0 0 0 4.4-4.7l-2.2 2.2-2.2-.6-.6-2.2 2-2.6Z" />
|
||||
<path class="frac" d="M3 3l3.2 4.1L4.4 8.6 7 12" stroke-dasharray="2 2.4" />
|
||||
</svg>
|
||||
</span>
|
||||
<h1>devvm<span class="accent"> breakglass</span></h1>
|
||||
</div>
|
||||
|
||||
<div class="rail-right">
|
||||
<span class="rail-status">
|
||||
<span class="dot dot--{dotState}" aria-hidden="true"></span>
|
||||
{#if sessionState === 'error'}
|
||||
<span class="session-bad">offline</span>
|
||||
{:else if sessionState === 'connecting'}
|
||||
<span class="session-meta">connecting…</span>
|
||||
{:else}
|
||||
<code class="session-id" title={sessionId}>{shortId}</code>
|
||||
{/if}
|
||||
<span class="lamp-wrap" title={lampLabel}>
|
||||
<span class="lamp lamp--{lamp}" aria-hidden="true"></span>
|
||||
<span class="lamp-text lamp-text--{lamp}">
|
||||
{#if lamp === 'error'}
|
||||
link down
|
||||
{:else if lamp === 'working'}
|
||||
working
|
||||
{:else if lamp === 'live'}
|
||||
<code class="sid">{shortId}</code>
|
||||
{:else}
|
||||
connecting
|
||||
{/if}
|
||||
</span>
|
||||
</span>
|
||||
|
||||
<!-- Mobile-only: open the VM control sheet. Hidden on desktop (column). -->
|
||||
<button
|
||||
class="controls-toggle"
|
||||
class="rail-btn rail-btn--vm"
|
||||
onclick={() => (showControls = true)}
|
||||
aria-label="Open direct VM controls"
|
||||
>
|
||||
⚡ <span class="controls-toggle-label">VM</span>
|
||||
<span class="bolt" aria-hidden="true">⚡</span><span class="rail-btn-label">VM</span>
|
||||
</button>
|
||||
|
||||
<button
|
||||
class="new-session"
|
||||
class="rail-btn"
|
||||
onclick={newSession}
|
||||
disabled={streaming || sessionState === 'connecting'}
|
||||
title={streaming ? 'wait for the current turn to finish' : 'start a fresh session'}
|
||||
disabled={turnActive || sending || link === 'connecting'}
|
||||
title={turnActive ? 'wait for the current turn to finish' : 'archive this session and start fresh'}
|
||||
>
|
||||
New
|
||||
</button>
|
||||
</div>
|
||||
</header>
|
||||
|
||||
{#if sessionState === 'error'}
|
||||
<div class="rail-error" role="alert">
|
||||
Can't reach the breakglass backend — {sessionError}. The cluster or network
|
||||
may be down. The <strong>⚡ VM</strong> power controls still work without the chat.
|
||||
{#if link === 'error'}
|
||||
<div class="rail-note" role="alert">
|
||||
<span>{linkError || "Can't reach the breakglass backend."}</span>
|
||||
<span class="rail-note-aside">The <strong>⚡ VM</strong> power controls still work without the chat.</span>
|
||||
<button class="rail-note-retry" onclick={bootstrap}>Reconnect</button>
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
{#if flash}
|
||||
<div class="toast" role="status">{flash}</div>
|
||||
{/if}
|
||||
|
||||
<main class="stage">
|
||||
<section class="chat-pane" aria-label="Recovery chat">
|
||||
<section class="chat-pane rise-in" style="--d:80ms" aria-label="Recovery chat">
|
||||
<Chat
|
||||
{sessionId}
|
||||
sessionReady={sessionState === 'ready'}
|
||||
{onLiveSession}
|
||||
onStreamingChange={(v) => (streaming = v)}
|
||||
{tx}
|
||||
{rev}
|
||||
{caughtUp}
|
||||
{turnActive}
|
||||
sending={sending}
|
||||
linkState={link}
|
||||
onSubmit={submitPrompt}
|
||||
onStop={stopTurn}
|
||||
/>
|
||||
</section>
|
||||
|
||||
<aside class="controls-pane" class:open={showControls} aria-label="Direct VM control">
|
||||
<aside
|
||||
class="controls-pane rise-in"
|
||||
class:open={showControls}
|
||||
style="--d:160ms"
|
||||
aria-label="Direct VM control"
|
||||
>
|
||||
<div class="sheet-grip" aria-hidden="true"></div>
|
||||
<div class="controls-head">
|
||||
<span class="controls-head-title">Direct VM control</span>
|
||||
|
|
@ -104,7 +298,6 @@
|
|||
</aside>
|
||||
</main>
|
||||
|
||||
<!-- backdrop behind the mobile sheet -->
|
||||
<button
|
||||
class="sheet-backdrop"
|
||||
class:show={showControls}
|
||||
|
|
@ -119,43 +312,51 @@
|
|||
height: 100%;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
max-width: 1500px;
|
||||
max-width: 1520px;
|
||||
margin: 0 auto;
|
||||
/* honour the notch on landscape / edge-to-edge */
|
||||
padding-left: var(--safe-left);
|
||||
padding-right: var(--safe-right);
|
||||
}
|
||||
|
||||
/* ── status rail (compact, single row on mobile) ─────────────────────── */
|
||||
/* ── status rail ───────────────────────────────────────────────────────── */
|
||||
.rail {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
gap: 10px;
|
||||
padding: 10px 14px;
|
||||
padding: max(10px, var(--safe-top)) 14px 10px;
|
||||
border-bottom: 1px solid var(--line);
|
||||
background:
|
||||
linear-gradient(180deg, rgba(61, 209, 214, 0.03), transparent 60%),
|
||||
linear-gradient(180deg, rgba(255, 255, 255, 0.015), transparent);
|
||||
flex: none;
|
||||
}
|
||||
.rail-title {
|
||||
display: flex;
|
||||
align-items: baseline;
|
||||
gap: 9px;
|
||||
align-items: center;
|
||||
gap: 10px;
|
||||
min-width: 0;
|
||||
}
|
||||
.glyph {
|
||||
font-size: 17px;
|
||||
transform: translateY(2px);
|
||||
filter: saturate(0.85);
|
||||
.brand-mark {
|
||||
color: var(--cyan);
|
||||
display: inline-flex;
|
||||
filter: drop-shadow(0 0 10px rgba(61, 209, 214, 0.35));
|
||||
flex: none;
|
||||
}
|
||||
.brand-mark .frac { color: var(--amber); stroke: var(--amber); opacity: 0.85; }
|
||||
h1 {
|
||||
margin: 0;
|
||||
font-family: var(--mono);
|
||||
font-size: 16px;
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.02em;
|
||||
letter-spacing: 0.04em;
|
||||
color: var(--ink);
|
||||
white-space: nowrap;
|
||||
}
|
||||
.accent {
|
||||
color: var(--cyan);
|
||||
text-shadow: 0 0 18px rgba(61, 209, 214, 0.35);
|
||||
text-shadow: 0 0 18px rgba(61, 209, 214, 0.4);
|
||||
}
|
||||
|
||||
.rail-right {
|
||||
|
|
@ -164,90 +365,158 @@
|
|||
gap: 8px;
|
||||
flex: none;
|
||||
}
|
||||
.rail-status {
|
||||
|
||||
/* the living system-pulse lamp */
|
||||
.lamp-wrap {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 7px;
|
||||
gap: 8px;
|
||||
padding: 0 4px;
|
||||
font-family: var(--mono);
|
||||
font-size: 12px;
|
||||
}
|
||||
.session-id {
|
||||
color: var(--cyan);
|
||||
letter-spacing: 0.04em;
|
||||
}
|
||||
.session-meta {
|
||||
color: var(--amber);
|
||||
}
|
||||
.session-bad {
|
||||
color: var(--danger-bright);
|
||||
}
|
||||
|
||||
.dot {
|
||||
width: 9px;
|
||||
height: 9px;
|
||||
.lamp {
|
||||
position: relative;
|
||||
width: 10px;
|
||||
height: 10px;
|
||||
border-radius: 50%;
|
||||
flex: none;
|
||||
background: var(--ink-faint);
|
||||
}
|
||||
.dot--ready {
|
||||
/* a soft halo ring that pulses outward — the "instrument is powered" tell */
|
||||
.lamp::after {
|
||||
content: '';
|
||||
position: absolute;
|
||||
inset: -4px;
|
||||
border-radius: 50%;
|
||||
border: 1px solid currentColor;
|
||||
opacity: 0;
|
||||
}
|
||||
.lamp--live {
|
||||
background: var(--cyan);
|
||||
box-shadow: 0 0 10px 1px rgba(61, 209, 214, 0.6);
|
||||
animation: breathe 3.4s ease-in-out infinite;
|
||||
color: var(--cyan);
|
||||
box-shadow: 0 0 10px 1px rgba(61, 209, 214, 0.65);
|
||||
animation: lamp-breathe 3.6s ease-in-out infinite;
|
||||
}
|
||||
.dot--busy {
|
||||
.lamp--live::after { animation: lamp-ring 3.6s ease-out infinite; }
|
||||
.lamp--connecting {
|
||||
background: var(--cyan-dim);
|
||||
color: var(--cyan);
|
||||
animation: lamp-blink 1.4s ease-in-out infinite;
|
||||
}
|
||||
.lamp--working {
|
||||
background: var(--amber);
|
||||
color: var(--amber);
|
||||
box-shadow: 0 0 10px 1px rgba(245, 182, 87, 0.7);
|
||||
animation: pulse 1s ease-in-out infinite;
|
||||
animation: lamp-pulse 1s ease-in-out infinite;
|
||||
}
|
||||
.dot--error {
|
||||
.lamp--working::after { animation: lamp-ring 1s ease-out infinite; }
|
||||
.lamp--error {
|
||||
background: var(--danger);
|
||||
color: var(--danger);
|
||||
box-shadow: 0 0 10px 1px var(--danger-glow);
|
||||
animation: lamp-pulse 1.2s ease-in-out infinite;
|
||||
}
|
||||
@keyframes breathe { 0%, 100% { opacity: 0.55; } 50% { opacity: 1; } }
|
||||
@keyframes pulse {
|
||||
0%, 100% { transform: scale(0.82); opacity: 0.7; }
|
||||
50% { transform: scale(1.15); opacity: 1; }
|
||||
@keyframes lamp-breathe { 0%, 100% { opacity: 0.6; } 50% { opacity: 1; } }
|
||||
@keyframes lamp-blink { 0%, 100% { opacity: 0.35; } 50% { opacity: 0.9; } }
|
||||
@keyframes lamp-pulse {
|
||||
0%, 100% { transform: scale(0.82); opacity: 0.75; }
|
||||
50% { transform: scale(1.12); opacity: 1; }
|
||||
}
|
||||
@keyframes lamp-ring {
|
||||
0% { opacity: 0.5; transform: scale(0.6); }
|
||||
70% { opacity: 0; transform: scale(1.8); }
|
||||
100% { opacity: 0; transform: scale(1.8); }
|
||||
}
|
||||
.lamp-text {
|
||||
letter-spacing: 0.04em;
|
||||
color: var(--ink-dim);
|
||||
max-width: 88px;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
}
|
||||
.lamp-text--live .sid { color: var(--cyan); letter-spacing: 0.06em; }
|
||||
.lamp-text--working { color: var(--amber); }
|
||||
.lamp-text--error { color: var(--danger-bright); }
|
||||
.lamp-text--connecting { color: var(--ink-faint); }
|
||||
.sid { font-family: var(--mono); }
|
||||
/* On the tightest phones the title + lamp text + two buttons crowd; keep the
|
||||
living dot (the system pulse) and drop the text label until there's room. */
|
||||
@media (max-width: 439px) {
|
||||
.lamp-text { display: none; }
|
||||
.lamp-wrap { padding: 0; }
|
||||
}
|
||||
|
||||
/* touch-friendly buttons */
|
||||
.controls-toggle,
|
||||
.new-session {
|
||||
min-height: 40px;
|
||||
padding: 0 13px;
|
||||
/* rail buttons — touch-first (≥44px tall via padding + line height) */
|
||||
.rail-btn {
|
||||
min-height: 44px;
|
||||
padding: 0 14px;
|
||||
border-radius: var(--radius-sm);
|
||||
border: 1px solid var(--line-strong);
|
||||
background: var(--bg-2);
|
||||
color: var(--ink-dim);
|
||||
font-size: 13px;
|
||||
letter-spacing: 0.02em;
|
||||
letter-spacing: 0.03em;
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 5px;
|
||||
gap: 6px;
|
||||
transition: border-color 0.15s, background 0.15s, color 0.15s;
|
||||
}
|
||||
.controls-toggle {
|
||||
border-color: #5a4a2a;
|
||||
.rail-btn:hover:not(:disabled) { border-color: var(--line-bright); color: var(--ink); }
|
||||
.rail-btn:active:not(:disabled) { background: var(--bg-3); }
|
||||
.rail-btn:disabled { opacity: 0.42; }
|
||||
.rail-btn--vm {
|
||||
border-color: var(--amber-dim);
|
||||
color: var(--amber);
|
||||
}
|
||||
.controls-toggle:active,
|
||||
.new-session:active {
|
||||
background: var(--bg-3);
|
||||
}
|
||||
.new-session:disabled {
|
||||
opacity: 0.45;
|
||||
}
|
||||
.rail-btn--vm:hover:not(:disabled) { border-color: var(--amber); color: var(--amber); }
|
||||
.bolt { font-size: 13px; line-height: 1; }
|
||||
|
||||
.rail-error {
|
||||
.rail-note {
|
||||
margin: 10px 12px 0;
|
||||
padding: 11px 14px;
|
||||
padding: 10px 13px;
|
||||
border: 1px solid var(--danger-deep);
|
||||
border-left-width: 3px;
|
||||
background: rgba(255, 77, 77, 0.07);
|
||||
color: #ffd5d5;
|
||||
color: #ffd9d9;
|
||||
border-radius: var(--radius-sm);
|
||||
font-size: 13px;
|
||||
line-height: 1.5;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
align-items: center;
|
||||
gap: 6px 12px;
|
||||
flex: none;
|
||||
}
|
||||
.rail-note-aside { color: #f0b8b8; }
|
||||
.rail-note-aside strong { color: #fff; font-family: var(--mono); }
|
||||
.rail-note-retry {
|
||||
margin-left: auto;
|
||||
border: 1px solid var(--danger-deep);
|
||||
background: transparent;
|
||||
color: var(--danger-bright);
|
||||
border-radius: 6px;
|
||||
padding: 6px 12px;
|
||||
font-size: 12px;
|
||||
min-height: 36px;
|
||||
}
|
||||
.rail-note-retry:hover { background: rgba(255, 77, 77, 0.12); }
|
||||
|
||||
.toast {
|
||||
margin: 10px 12px 0;
|
||||
padding: 9px 13px;
|
||||
border: 1px solid var(--line-strong);
|
||||
border-left: 3px solid var(--amber);
|
||||
background: var(--bg-2);
|
||||
color: var(--amber);
|
||||
border-radius: var(--radius-sm);
|
||||
font-family: var(--mono);
|
||||
font-size: 12.5px;
|
||||
line-height: 1.45;
|
||||
flex: none;
|
||||
animation: rise-in 0.28s ease-out both;
|
||||
}
|
||||
|
||||
/* ── stage ───────────────────────────────────────────────────────────── */
|
||||
.stage {
|
||||
|
|
@ -271,31 +540,37 @@
|
|||
right: 0;
|
||||
bottom: 0;
|
||||
z-index: 40;
|
||||
max-height: 86dvh;
|
||||
overflow-y: auto;
|
||||
max-height: 88dvh;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
background: var(--bg-1);
|
||||
border-top: 1px solid var(--line-strong);
|
||||
border-radius: 16px 16px 0 0;
|
||||
box-shadow: 0 -18px 40px rgba(0, 0, 0, 0.55);
|
||||
padding: 8px 14px calc(14px + env(safe-area-inset-bottom));
|
||||
transform: translateY(101%);
|
||||
transition: transform 0.26s cubic-bezier(0.32, 0.72, 0, 1);
|
||||
border-radius: var(--radius-lg) var(--radius-lg) 0 0;
|
||||
box-shadow: var(--shadow-sheet);
|
||||
padding: 8px 14px calc(14px + var(--safe-bottom));
|
||||
transform: translateY(102%);
|
||||
transition: transform 0.3s cubic-bezier(0.32, 0.72, 0, 1);
|
||||
/* the rise-in entrance is for the desktop column; the sheet is transform-
|
||||
controlled, so cancel the shared keyframe here. */
|
||||
animation: none !important;
|
||||
}
|
||||
.controls-pane.open {
|
||||
transform: translateY(0);
|
||||
}
|
||||
.sheet-grip {
|
||||
width: 38px;
|
||||
width: 40px;
|
||||
height: 4px;
|
||||
border-radius: 99px;
|
||||
background: var(--line-strong);
|
||||
background: var(--line-bright);
|
||||
margin: 4px auto 10px;
|
||||
flex: none;
|
||||
}
|
||||
.controls-head {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
margin-bottom: 10px;
|
||||
flex: none;
|
||||
}
|
||||
.controls-head-title {
|
||||
font-family: var(--mono);
|
||||
|
|
@ -305,14 +580,15 @@
|
|||
color: var(--amber);
|
||||
}
|
||||
.sheet-close {
|
||||
width: 34px;
|
||||
height: 34px;
|
||||
width: 40px;
|
||||
height: 40px;
|
||||
border-radius: var(--radius-sm);
|
||||
border: 1px solid var(--line-strong);
|
||||
background: var(--bg-2);
|
||||
color: var(--ink-dim);
|
||||
font-size: 14px;
|
||||
}
|
||||
.sheet-close:active { background: var(--bg-3); }
|
||||
|
||||
.sheet-backdrop {
|
||||
position: fixed;
|
||||
|
|
@ -320,40 +596,40 @@
|
|||
z-index: 30;
|
||||
border: 0;
|
||||
padding: 0;
|
||||
background: rgba(0, 0, 0, 0.55);
|
||||
background: rgba(2, 4, 7, 0.62);
|
||||
backdrop-filter: blur(1.5px);
|
||||
opacity: 0;
|
||||
pointer-events: none;
|
||||
transition: opacity 0.22s;
|
||||
transition: opacity 0.24s;
|
||||
}
|
||||
.sheet-backdrop.show {
|
||||
opacity: 1;
|
||||
pointer-events: auto;
|
||||
}
|
||||
|
||||
/* ── desktop: controls become a static side column, sheet chrome gone ── */
|
||||
/* ── desktop: controls become a static side column ─────────────────────── */
|
||||
@media (min-width: 900px) {
|
||||
.rail {
|
||||
padding: 14px 18px;
|
||||
}
|
||||
.rail { padding: 14px 18px; }
|
||||
h1 { font-size: 19px; }
|
||||
.stage {
|
||||
display: grid;
|
||||
grid-template-columns: minmax(0, 1fr) 372px;
|
||||
grid-template-columns: minmax(0, 1fr) 384px;
|
||||
gap: 16px;
|
||||
padding: 16px 18px 18px;
|
||||
}
|
||||
.chat-pane { display: flex; }
|
||||
.controls-toggle { display: none; }
|
||||
.rail-btn--vm { display: none; }
|
||||
.controls-pane {
|
||||
position: static;
|
||||
max-height: none;
|
||||
overflow: visible;
|
||||
transform: none;
|
||||
box-shadow: none;
|
||||
border: none;
|
||||
border-radius: 0;
|
||||
padding: 0;
|
||||
z-index: auto;
|
||||
animation: rise-in 0.5s cubic-bezier(0.22, 0.61, 0.36, 1) both !important;
|
||||
animation-delay: var(--d, 0ms) !important;
|
||||
}
|
||||
.sheet-grip,
|
||||
.controls-head,
|
||||
|
|
|
|||
|
|
@ -1,128 +1,105 @@
|
|||
<script>
|
||||
import { tick } from 'svelte';
|
||||
import { streamChat } from './lib/api.js';
|
||||
import ToolChip from './ToolChip.svelte';
|
||||
|
||||
let {
|
||||
sessionId = '',
|
||||
sessionReady = false,
|
||||
onLiveSession = (/** @type {string} */ _id) => {},
|
||||
onStreamingChange = (/** @type {boolean} */ _v) => {},
|
||||
tx, // the folded transcript state (plain object, see lib/transcript.js)
|
||||
rev = 0, // bumped on every in-place mutation to retrigger reactivity
|
||||
caughtUp = false, // replay drained → staggered reveal may run
|
||||
turnActive = false, // a turn is running: show Stop, hide Send
|
||||
sending = false, // a prompt POST is in flight (brief)
|
||||
linkState = 'connecting', // connecting | attached | error
|
||||
onSubmit = (/** @type {string} */ _p) => {},
|
||||
onStop = () => {},
|
||||
} = $props();
|
||||
|
||||
/**
|
||||
* Message model. A user message is plain text. An assistant message is an
|
||||
* ordered list of parts so streamed prose and tool chips interleave in the
|
||||
* exact order the agent emitted them:
|
||||
* { role:'assistant', parts:[{type:'text',text}|{type:'tool',name,command}],
|
||||
* result?: {is_error, text, duration_ms}, error?: string }
|
||||
* @type {Array<any>}
|
||||
*/
|
||||
let messages = $state([]);
|
||||
// The five quick-action presets — the mobile win: one tap, no typing.
|
||||
const PRESETS = [
|
||||
{
|
||||
label: 'Triage',
|
||||
icon: '◑',
|
||||
prompt:
|
||||
'Triage the devvm: uptime, load, memory, swap, disk usage, failed systemd units, and the last 30 lines of dmesg. Summarize what\'s wrong.',
|
||||
},
|
||||
{
|
||||
label: 'Memory / OOM',
|
||||
icon: '▦',
|
||||
prompt:
|
||||
'Check devvm memory pressure: free -h, top memory consumers, any recent OOM-kills in dmesg/journal, and swap usage. Is it OOMing?',
|
||||
},
|
||||
{
|
||||
label: 'Disk',
|
||||
icon: '▤',
|
||||
prompt:
|
||||
'What\'s filling the devvm disk? df -h, then the biggest directories/files under the fullest mount. Anything safe to clear?',
|
||||
},
|
||||
{
|
||||
label: 'Services',
|
||||
icon: '⚙',
|
||||
prompt:
|
||||
'List failed or stuck systemd units on the devvm (systemctl --failed) and show the status + recent journal lines for any that are down.',
|
||||
},
|
||||
{
|
||||
label: 'QEMU wedged?',
|
||||
icon: '◫',
|
||||
prompt:
|
||||
'Is the devvm\'s QEMU wedged (I/O stall)? Check guest responsiveness over SSH, then ssh pve forensics for VM 102\'s qm status/QMP/guest-agent. Tell me if a cycle is needed.',
|
||||
},
|
||||
];
|
||||
|
||||
let draft = $state('');
|
||||
let streaming = $state(false);
|
||||
let scroller; // the scroll viewport
|
||||
let scroller;
|
||||
let inputEl;
|
||||
let pinnedToBottom = true; // auto-scroll only while the user is at the bottom
|
||||
let pinnedToBottom = true;
|
||||
|
||||
const canSend = $derived(sessionReady && !streaming && draft.trim().length > 0);
|
||||
// re-derive the message list whenever the folder mutates (rev bump). The
|
||||
// transcript is folded with in-place mutation on a $state.raw object, so no
|
||||
// reference changes on its own — we depend on `rev` explicitly and rebuild
|
||||
// fresh objects (message + its parts array) so Svelte's keyed {#each} re-
|
||||
// renders streamed prose/chips on every token. Transcripts are small; the
|
||||
// per-token copy is cheap and keeps the hot streaming path bug-free.
|
||||
const messages = $derived(
|
||||
rev >= 0 && tx
|
||||
? tx.messages.map((m) =>
|
||||
m.role === 'assistant' ? { ...m, parts: m.parts.slice() } : { ...m }
|
||||
)
|
||||
: []
|
||||
);
|
||||
const isEmpty = $derived(messages.length === 0);
|
||||
const canSend = $derived(linkState !== 'error' && !turnActive && draft.trim().length > 0);
|
||||
const inputReady = $derived(!turnActive);
|
||||
|
||||
// ── scrolling ─────────────────────────────────────────────────────────────
|
||||
// ── auto-scroll (only while pinned to the bottom) ─────────────────────────
|
||||
function onScroll() {
|
||||
if (!scroller) return;
|
||||
const gap = scroller.scrollHeight - scroller.scrollTop - scroller.clientHeight;
|
||||
pinnedToBottom = gap < 60;
|
||||
pinnedToBottom = gap < 64;
|
||||
}
|
||||
async function scrollToBottom(force = false) {
|
||||
if (!force && !pinnedToBottom) return;
|
||||
await tick();
|
||||
if (scroller) scroller.scrollTop = scroller.scrollHeight;
|
||||
}
|
||||
|
||||
// ── streaming a turn ────────────────────────────────────────────────────────
|
||||
function lastAssistant() {
|
||||
return messages[messages.length - 1];
|
||||
}
|
||||
|
||||
function appendText(text) {
|
||||
const msg = lastAssistant();
|
||||
const parts = msg.parts;
|
||||
const tail = parts[parts.length - 1];
|
||||
if (tail && tail.type === 'text') {
|
||||
tail.text += text;
|
||||
} else {
|
||||
parts.push({ type: 'text', text });
|
||||
}
|
||||
messages = messages; // notify Svelte of the in-place mutation
|
||||
}
|
||||
|
||||
function handleEvent(ev) {
|
||||
switch (ev?.kind) {
|
||||
case 'session':
|
||||
onLiveSession(ev.session_id);
|
||||
break;
|
||||
case 'text':
|
||||
if (ev.text) appendText(ev.text);
|
||||
break;
|
||||
case 'tool': {
|
||||
// Bash carries a `command`; other tools just show their name.
|
||||
const command =
|
||||
ev.input && typeof ev.input.command === 'string' ? ev.input.command : '';
|
||||
lastAssistant().parts.push({ type: 'tool', name: ev.name || 'tool', command });
|
||||
messages = messages;
|
||||
break;
|
||||
}
|
||||
case 'result':
|
||||
lastAssistant().result = {
|
||||
is_error: Boolean(ev.is_error),
|
||||
text: typeof ev.result === 'string' ? ev.result : '',
|
||||
duration_ms: typeof ev.duration_ms === 'number' ? ev.duration_ms : null,
|
||||
};
|
||||
messages = messages;
|
||||
break;
|
||||
case 'error':
|
||||
lastAssistant().error = ev.error || 'unknown error';
|
||||
messages = messages;
|
||||
break;
|
||||
case 'done':
|
||||
// handled by the stream completing; nothing to render
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
// any transcript change → keep the view pinned if the user is at the bottom
|
||||
$effect(() => {
|
||||
rev; // track
|
||||
scrollToBottom();
|
||||
});
|
||||
|
||||
function fire(prompt) {
|
||||
if (turnActive) return;
|
||||
pinnedToBottom = true;
|
||||
onSubmit(prompt);
|
||||
scrollToBottom(true);
|
||||
}
|
||||
|
||||
async function send() {
|
||||
const prompt = draft.trim();
|
||||
if (!prompt || streaming || !sessionReady) return;
|
||||
|
||||
messages.push({ role: 'user', text: prompt });
|
||||
messages.push({ role: 'assistant', parts: [] });
|
||||
messages = messages;
|
||||
function send() {
|
||||
const text = draft.trim();
|
||||
if (!text || turnActive) return;
|
||||
draft = '';
|
||||
streaming = true;
|
||||
onStreamingChange(true);
|
||||
pinnedToBottom = true;
|
||||
await scrollToBottom(true);
|
||||
|
||||
try {
|
||||
await streamChat({ session_id: sessionId, prompt }, handleEvent);
|
||||
} catch (err) {
|
||||
// Network/transport failure (backend down, connection dropped mid-stream).
|
||||
const msg = lastAssistant();
|
||||
if (msg && msg.role === 'assistant' && !msg.error) {
|
||||
msg.error =
|
||||
(err instanceof Error ? err.message : String(err)) +
|
||||
' — the connection to the agent failed.';
|
||||
messages = messages;
|
||||
}
|
||||
} finally {
|
||||
streaming = false;
|
||||
onStreamingChange(false);
|
||||
await scrollToBottom();
|
||||
inputEl?.focus();
|
||||
}
|
||||
fire(text);
|
||||
// restore single-row height after clearing
|
||||
tick().then(() => inputEl?.focus());
|
||||
}
|
||||
|
||||
function onKeydown(e) {
|
||||
|
|
@ -130,7 +107,7 @@
|
|||
e.preventDefault();
|
||||
send();
|
||||
}
|
||||
// Shift+Enter falls through to insert a newline.
|
||||
// Shift+Enter → newline (default behaviour)
|
||||
}
|
||||
|
||||
function fmtDuration(ms) {
|
||||
|
|
@ -139,7 +116,12 @@
|
|||
return `${(ms / 1000).toFixed(ms < 10000 ? 1 : 0)} s`;
|
||||
}
|
||||
|
||||
const isEmpty = $derived(messages.length === 0);
|
||||
// a freshly-attached transcript reveals with a brief stagger; cap the delay
|
||||
// so a long replay doesn't animate forever.
|
||||
function revealDelay(i) {
|
||||
if (!caughtUp) return 0;
|
||||
return Math.min(i, 6) * 45;
|
||||
}
|
||||
</script>
|
||||
|
||||
<div class="chat">
|
||||
|
|
@ -150,41 +132,58 @@
|
|||
|
||||
<div class="stream" bind:this={scroller} onscroll={onScroll}>
|
||||
{#if isEmpty}
|
||||
<div class="empty">
|
||||
<div class="empty-mark">⌁</div>
|
||||
<p class="empty-title">The agent is standing by.</p>
|
||||
<div class="empty" class:dim={linkState === 'connecting'}>
|
||||
<div class="empty-mark" aria-hidden="true">⌁</div>
|
||||
<p class="empty-title">
|
||||
{#if linkState === 'error'}
|
||||
The agent is unreachable.
|
||||
{:else if linkState === 'connecting'}
|
||||
Attaching to the session…
|
||||
{:else}
|
||||
The agent is standing by.
|
||||
{/if}
|
||||
</p>
|
||||
<p class="empty-sub">
|
||||
Describe the symptom — "devvm is unreachable", "disk full", "ssh hangs"
|
||||
— and it will connect over SSH, investigate, and stream its work here.
|
||||
For a hard power action when the agent can't help, use
|
||||
<strong>Direct VM control</strong>.
|
||||
{#if linkState === 'error'}
|
||||
The cluster or network may be down. You can still power-cycle the VM
|
||||
with <strong>⚡ Direct VM control</strong> — it needs no agent.
|
||||
{:else}
|
||||
Tap a preset below or describe the symptom — "devvm unreachable",
|
||||
"disk full", "ssh hangs" — and it will connect over SSH, investigate,
|
||||
and stream its work here. For a hard power action, use
|
||||
<strong>⚡ Direct VM control</strong>.
|
||||
{/if}
|
||||
</p>
|
||||
</div>
|
||||
{/if}
|
||||
|
||||
{#each messages as msg, i (i)}
|
||||
{#each messages as msg (msg.key)}
|
||||
{#if msg.role === 'user'}
|
||||
<div class="row row--user">
|
||||
<div class="row row--user rise-in" style="--d:{revealDelay(0)}ms">
|
||||
<div class="bubble bubble--user">{msg.text}</div>
|
||||
</div>
|
||||
{:else}
|
||||
<div class="row row--assistant">
|
||||
<div class="row row--assistant rise-in" style="--d:{revealDelay(0)}ms">
|
||||
<div class="bubble bubble--assistant">
|
||||
{#if msg.parts.length === 0 && !msg.result && !msg.error}
|
||||
{#if msg.parts.length === 0 && !msg.result && !msg.error && !msg.cancelled}
|
||||
<span class="thinking" aria-label="working">
|
||||
<span></span><span></span><span></span>
|
||||
</span>
|
||||
{/if}
|
||||
{#each msg.parts as part, j (j)}
|
||||
{#if part.type === 'text'}
|
||||
<span class="prose">{part.text}</span>
|
||||
{:else}
|
||||
<ToolChip name={part.name} command={part.command} />
|
||||
{/if}
|
||||
{#if part.type === 'text'}<span class="prose">{part.text}</span>{:else}<ToolChip name={part.name} command={part.command} />{/if}
|
||||
{/each}
|
||||
|
||||
{#if msg.error}
|
||||
<div class="turn-note turn-note--error">⚠ {msg.error}</div>
|
||||
<div class="turn-note turn-note--error">
|
||||
<span class="turn-note-tag">error</span>
|
||||
<span class="turn-note-body">{msg.error}</span>
|
||||
</div>
|
||||
{:else if msg.cancelled}
|
||||
<div class="turn-note turn-note--muted">
|
||||
<span class="turn-note-tag">stopped</span>
|
||||
<span class="turn-note-body">turn cancelled</span>
|
||||
</div>
|
||||
{:else if msg.result}
|
||||
<div class="turn-note {msg.result.is_error ? 'turn-note--error' : 'turn-note--ok'}">
|
||||
<span class="turn-note-tag">{msg.result.is_error ? 'failed' : 'done'}</span>
|
||||
|
|
@ -200,36 +199,61 @@
|
|||
{/each}
|
||||
</div>
|
||||
|
||||
<form
|
||||
class="composer"
|
||||
onsubmit={(e) => {
|
||||
e.preventDefault();
|
||||
send();
|
||||
}}
|
||||
>
|
||||
{#if streaming}
|
||||
<div class="working-bar" aria-live="polite">
|
||||
<span class="working-dots"><span></span><span></span><span></span></span>
|
||||
agent working — streaming live
|
||||
</div>
|
||||
{/if}
|
||||
<div class="composer-row">
|
||||
<textarea
|
||||
bind:this={inputEl}
|
||||
bind:value={draft}
|
||||
onkeydown={onKeydown}
|
||||
placeholder={sessionReady
|
||||
? 'Describe the problem… (Enter to send · Shift+Enter for a new line)'
|
||||
: 'Waiting for a session…'}
|
||||
rows="1"
|
||||
disabled={!sessionReady || streaming}
|
||||
spellcheck="false"
|
||||
></textarea>
|
||||
<button type="submit" class="send" disabled={!canSend}>
|
||||
{streaming ? '…' : 'Send'}
|
||||
</button>
|
||||
<div class="dock">
|
||||
<!-- quick-action preset bar: horizontally scrollable, one-tap prompts -->
|
||||
<div class="presets" role="group" aria-label="Quick actions">
|
||||
{#each PRESETS as p (p.label)}
|
||||
<button
|
||||
class="preset"
|
||||
onclick={() => fire(p.prompt)}
|
||||
disabled={turnActive || linkState === 'error'}
|
||||
title={p.prompt}
|
||||
>
|
||||
<span class="preset-icon" aria-hidden="true">{p.icon}</span>
|
||||
<span class="preset-label">{p.label}</span>
|
||||
</button>
|
||||
{/each}
|
||||
</div>
|
||||
</form>
|
||||
|
||||
<form
|
||||
class="composer"
|
||||
onsubmit={(e) => {
|
||||
e.preventDefault();
|
||||
send();
|
||||
}}
|
||||
>
|
||||
{#if turnActive}
|
||||
<div class="working-bar" aria-live="polite">
|
||||
<span class="working-dots"><span></span><span></span><span></span></span>
|
||||
<span>agent working — streaming live</span>
|
||||
</div>
|
||||
{/if}
|
||||
<div class="composer-row">
|
||||
<textarea
|
||||
bind:this={inputEl}
|
||||
bind:value={draft}
|
||||
onkeydown={onKeydown}
|
||||
placeholder={inputReady
|
||||
? 'Describe the problem… (Enter to send · Shift+Enter for a new line)'
|
||||
: 'A turn is running — Stop it to type, or wait…'}
|
||||
rows="1"
|
||||
disabled={!inputReady}
|
||||
spellcheck="false"
|
||||
enterkeyhint="send"
|
||||
></textarea>
|
||||
{#if turnActive}
|
||||
<button type="button" class="stop" onclick={onStop} title="Stop the running turn">
|
||||
<span class="stop-glyph" aria-hidden="true"></span>
|
||||
Stop
|
||||
</button>
|
||||
{:else}
|
||||
<button type="submit" class="send" disabled={!canSend}>
|
||||
{sending ? '···' : 'Send'}
|
||||
</button>
|
||||
{/if}
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<style>
|
||||
|
|
@ -249,9 +273,10 @@
|
|||
display: flex;
|
||||
align-items: baseline;
|
||||
gap: 12px;
|
||||
padding: 13px 18px;
|
||||
padding: 12px 18px;
|
||||
border-bottom: 1px solid var(--line);
|
||||
background: linear-gradient(180deg, rgba(255, 255, 255, 0.015), transparent);
|
||||
background: linear-gradient(180deg, rgba(255, 255, 255, 0.018), transparent);
|
||||
flex: none;
|
||||
}
|
||||
.chat-head-label {
|
||||
font-family: var(--mono);
|
||||
|
|
@ -263,13 +288,16 @@
|
|||
.chat-head-hint {
|
||||
font-size: 12px;
|
||||
color: var(--ink-faint);
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
|
||||
.stream {
|
||||
flex: 1;
|
||||
min-height: 0;
|
||||
overflow-y: auto;
|
||||
padding: 20px 18px 8px;
|
||||
padding: 20px 16px 10px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 14px;
|
||||
|
|
@ -279,23 +307,27 @@
|
|||
/* empty state */
|
||||
.empty {
|
||||
margin: auto;
|
||||
max-width: 460px;
|
||||
max-width: 470px;
|
||||
text-align: center;
|
||||
padding: 28px 12px;
|
||||
padding: 24px 14px;
|
||||
color: var(--ink-dim);
|
||||
}
|
||||
.empty.dim { opacity: 0.8; }
|
||||
.empty-mark {
|
||||
font-size: 40px;
|
||||
font-size: 42px;
|
||||
color: var(--cyan-dim);
|
||||
line-height: 1;
|
||||
margin-bottom: 14px;
|
||||
text-shadow: 0 0 24px rgba(61, 209, 214, 0.25);
|
||||
text-shadow: 0 0 26px rgba(61, 209, 214, 0.3);
|
||||
animation: lamp-breathe 3.6s ease-in-out infinite;
|
||||
}
|
||||
@keyframes lamp-breathe { 0%, 100% { opacity: 0.7; } 50% { opacity: 1; } }
|
||||
.empty-title {
|
||||
font-family: var(--mono);
|
||||
color: var(--ink);
|
||||
font-size: 15px;
|
||||
margin: 0 0 8px;
|
||||
letter-spacing: 0.01em;
|
||||
}
|
||||
.empty-sub {
|
||||
font-size: 13px;
|
||||
|
|
@ -303,32 +335,23 @@
|
|||
color: var(--ink-faint);
|
||||
margin: 0;
|
||||
}
|
||||
.empty-sub strong {
|
||||
color: var(--ink-dim);
|
||||
font-weight: 600;
|
||||
}
|
||||
.empty-sub strong { color: var(--ink-dim); font-weight: 600; }
|
||||
|
||||
.row {
|
||||
display: flex;
|
||||
}
|
||||
.row--user {
|
||||
justify-content: flex-end;
|
||||
}
|
||||
.row--assistant {
|
||||
justify-content: flex-start;
|
||||
}
|
||||
.row { display: flex; }
|
||||
.row--user { justify-content: flex-end; }
|
||||
.row--assistant { justify-content: flex-start; }
|
||||
|
||||
.bubble {
|
||||
max-width: 86%;
|
||||
max-width: 88%;
|
||||
border-radius: 13px;
|
||||
padding: 11px 14px;
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
line-height: 1.62;
|
||||
word-wrap: break-word;
|
||||
overflow-wrap: anywhere;
|
||||
}
|
||||
.bubble--user {
|
||||
background: linear-gradient(180deg, #15333a, #0f262c);
|
||||
background: linear-gradient(180deg, #123036, #0d2329);
|
||||
border: 1px solid var(--cyan-dim);
|
||||
color: #d8f6f7;
|
||||
border-bottom-right-radius: 4px;
|
||||
|
|
@ -341,12 +364,9 @@
|
|||
border-bottom-left-radius: 4px;
|
||||
color: var(--ink);
|
||||
}
|
||||
/* prose renders inline so text and tool chips share the same flow */
|
||||
.prose {
|
||||
white-space: pre-wrap;
|
||||
}
|
||||
.prose { white-space: pre-wrap; }
|
||||
|
||||
/* in-flight assistant "thinking" dots */
|
||||
/* in-flight "thinking" dots */
|
||||
.thinking,
|
||||
.working-dots {
|
||||
display: inline-flex;
|
||||
|
|
@ -363,19 +383,15 @@
|
|||
animation: blink 1.2s infinite ease-in-out;
|
||||
}
|
||||
.thinking span:nth-child(2),
|
||||
.working-dots span:nth-child(2) {
|
||||
animation-delay: 0.18s;
|
||||
}
|
||||
.working-dots span:nth-child(2) { animation-delay: 0.18s; }
|
||||
.thinking span:nth-child(3),
|
||||
.working-dots span:nth-child(3) {
|
||||
animation-delay: 0.36s;
|
||||
}
|
||||
.working-dots span:nth-child(3) { animation-delay: 0.36s; }
|
||||
@keyframes blink {
|
||||
0%, 80%, 100% { opacity: 0.25; transform: translateY(0); }
|
||||
40% { opacity: 1; transform: translateY(-2px); }
|
||||
}
|
||||
|
||||
/* turn result / error footer inside the assistant bubble */
|
||||
/* turn result / error / stopped footer inside the assistant bubble */
|
||||
.turn-note {
|
||||
margin-top: 10px;
|
||||
padding: 7px 10px;
|
||||
|
|
@ -396,9 +412,16 @@
|
|||
color: #bff5d3;
|
||||
}
|
||||
.turn-note--error {
|
||||
background: rgba(255, 77, 77, 0.08);
|
||||
border: 1px solid var(--danger-deep);
|
||||
color: #ffd5d5;
|
||||
/* the error tint here is amber-leaning text on a faint warm wash, NOT the
|
||||
reserved power-action red — a turn error is not a destructive action. */
|
||||
background: rgba(245, 182, 87, 0.06);
|
||||
border: 1px solid var(--amber-dim);
|
||||
color: #f7d49a;
|
||||
}
|
||||
.turn-note--muted {
|
||||
background: rgba(255, 255, 255, 0.02);
|
||||
border: 1px solid var(--line-strong);
|
||||
color: var(--ink-faint);
|
||||
}
|
||||
.turn-note-tag {
|
||||
text-transform: uppercase;
|
||||
|
|
@ -409,20 +432,55 @@
|
|||
border: 1px solid currentColor;
|
||||
opacity: 0.85;
|
||||
}
|
||||
.turn-note-body {
|
||||
flex: 1;
|
||||
min-width: 0;
|
||||
}
|
||||
.turn-note-time {
|
||||
margin-left: auto;
|
||||
color: var(--ink-faint);
|
||||
.turn-note-body { flex: 1; min-width: 0; }
|
||||
.turn-note-time { margin-left: auto; color: var(--ink-faint); }
|
||||
|
||||
/* ── dock: presets + composer, pinned to the bottom ────────────────────── */
|
||||
.dock {
|
||||
flex: none;
|
||||
border-top: 1px solid var(--line);
|
||||
background: linear-gradient(0deg, rgba(255, 255, 255, 0.015), transparent);
|
||||
}
|
||||
|
||||
/* ── composer ─────────────────────────────────────────────────────────── */
|
||||
.presets {
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
overflow-x: auto;
|
||||
padding: 11px 12px 4px;
|
||||
scrollbar-width: none;
|
||||
-webkit-overflow-scrolling: touch;
|
||||
/* fade the right edge to hint there's more to scroll */
|
||||
mask-image: linear-gradient(90deg, transparent 0, #000 14px, #000 calc(100% - 18px), transparent 100%);
|
||||
}
|
||||
.presets::-webkit-scrollbar { display: none; }
|
||||
.preset {
|
||||
flex: none;
|
||||
min-height: 38px;
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 7px;
|
||||
padding: 0 13px;
|
||||
border-radius: 999px;
|
||||
border: 1px solid var(--line-strong);
|
||||
background: var(--bg-2);
|
||||
color: var(--ink-dim);
|
||||
font-family: var(--mono);
|
||||
font-size: 12.5px;
|
||||
letter-spacing: 0.02em;
|
||||
white-space: nowrap;
|
||||
transition: border-color 0.15s, color 0.15s, background 0.15s, transform 0.06s;
|
||||
}
|
||||
.preset:hover:not(:disabled) {
|
||||
border-color: var(--cyan-dim);
|
||||
color: var(--ink);
|
||||
background: var(--bg-3);
|
||||
}
|
||||
.preset:active:not(:disabled) { transform: translateY(1px); }
|
||||
.preset:disabled { opacity: 0.4; }
|
||||
.preset-icon { color: var(--cyan); font-size: 12px; }
|
||||
|
||||
.composer {
|
||||
border-top: 1px solid var(--line);
|
||||
padding: 12px;
|
||||
background: linear-gradient(0deg, rgba(255, 255, 255, 0.012), transparent);
|
||||
padding: 8px 12px calc(12px + var(--safe-bottom));
|
||||
}
|
||||
.working-bar {
|
||||
display: flex;
|
||||
|
|
@ -431,7 +489,7 @@
|
|||
font-family: var(--mono);
|
||||
font-size: 12px;
|
||||
color: var(--amber);
|
||||
padding: 0 4px 9px;
|
||||
padding: 2px 4px 9px;
|
||||
letter-spacing: 0.02em;
|
||||
}
|
||||
.composer-row {
|
||||
|
|
@ -442,13 +500,13 @@
|
|||
textarea {
|
||||
flex: 1;
|
||||
resize: none;
|
||||
max-height: 168px;
|
||||
max-height: 160px;
|
||||
min-height: 48px;
|
||||
background: var(--bg-2);
|
||||
color: var(--ink);
|
||||
border: 1px solid var(--line-strong);
|
||||
border-radius: var(--radius-sm);
|
||||
padding: 12px 13px;
|
||||
padding: 13px 13px;
|
||||
font-family: var(--sans);
|
||||
/* 16px: anything smaller makes iOS Safari auto-zoom on focus (mobile is the
|
||||
primary client) — the zoom then shifts the composer out of view. */
|
||||
|
|
@ -458,39 +516,60 @@
|
|||
transition: border-color 0.15s, box-shadow 0.15s;
|
||||
field-sizing: content; /* progressive: auto-grows where supported */
|
||||
}
|
||||
textarea::placeholder {
|
||||
color: var(--ink-faint);
|
||||
}
|
||||
textarea::placeholder { color: var(--ink-faint); }
|
||||
textarea:focus {
|
||||
border-color: var(--cyan-dim);
|
||||
box-shadow: 0 0 0 3px rgba(61, 209, 214, 0.12);
|
||||
}
|
||||
textarea:disabled {
|
||||
opacity: 0.55;
|
||||
}
|
||||
textarea:disabled { opacity: 0.55; }
|
||||
|
||||
.send {
|
||||
.send,
|
||||
.stop {
|
||||
flex: none;
|
||||
align-self: stretch;
|
||||
min-width: 78px;
|
||||
min-width: 82px;
|
||||
min-height: 48px;
|
||||
padding: 0 18px;
|
||||
border-radius: var(--radius-sm);
|
||||
border: 1px solid var(--cyan-dim);
|
||||
background: linear-gradient(180deg, #19474b, #103539);
|
||||
color: #d8f6f7;
|
||||
font-size: 13px;
|
||||
font-weight: 600;
|
||||
letter-spacing: 0.04em;
|
||||
transition: filter 0.15s, border-color 0.15s, opacity 0.15s;
|
||||
letter-spacing: 0.05em;
|
||||
transition: filter 0.15s, border-color 0.15s, opacity 0.15s, background 0.15s;
|
||||
}
|
||||
.send:hover:not(:disabled) {
|
||||
filter: brightness(1.22);
|
||||
border-color: var(--cyan);
|
||||
.send {
|
||||
border: 1px solid var(--cyan-dim);
|
||||
background: linear-gradient(180deg, #16464a, #0e3438);
|
||||
color: #d8f6f7;
|
||||
}
|
||||
.send:hover:not(:disabled) { filter: brightness(1.24); border-color: var(--cyan); }
|
||||
.send:disabled {
|
||||
opacity: 0.4;
|
||||
background: var(--bg-2);
|
||||
border-color: var(--line-strong);
|
||||
color: var(--ink-faint);
|
||||
}
|
||||
/* Stop is NOT red — red is reserved for destructive VM power. Stop is a calm
|
||||
neutral control with a square "halt" glyph. */
|
||||
.stop {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
gap: 8px;
|
||||
border: 1px solid var(--line-bright);
|
||||
background: var(--bg-3);
|
||||
color: var(--ink);
|
||||
}
|
||||
.stop:hover { border-color: var(--ink-faint); filter: brightness(1.1); }
|
||||
.stop-glyph {
|
||||
width: 10px;
|
||||
height: 10px;
|
||||
border-radius: 2px;
|
||||
background: var(--amber);
|
||||
box-shadow: 0 0 8px rgba(245, 182, 87, 0.55);
|
||||
animation: lamp-pulse 1s ease-in-out infinite;
|
||||
}
|
||||
@keyframes lamp-pulse {
|
||||
0%, 100% { transform: scale(0.85); opacity: 0.8; }
|
||||
50% { transform: scale(1.08); opacity: 1; }
|
||||
}
|
||||
</style>
|
||||
|
|
|
|||
|
|
@ -293,7 +293,8 @@
|
|||
align-items: center;
|
||||
justify-content: center;
|
||||
gap: 8px;
|
||||
padding: 9px 15px;
|
||||
min-height: 44px; /* touch target */
|
||||
padding: 10px 16px;
|
||||
border-radius: var(--radius-sm);
|
||||
font-size: 13px;
|
||||
font-weight: 600;
|
||||
|
|
@ -408,7 +409,8 @@
|
|||
}
|
||||
.confirm-yes {
|
||||
flex: 1;
|
||||
padding: 9px;
|
||||
min-height: 44px;
|
||||
padding: 10px;
|
||||
border-radius: var(--radius-sm);
|
||||
border: 1px solid var(--danger-bright);
|
||||
background: var(--danger);
|
||||
|
|
@ -424,7 +426,8 @@
|
|||
}
|
||||
.confirm-no {
|
||||
flex: 1;
|
||||
padding: 9px;
|
||||
min-height: 44px;
|
||||
padding: 10px;
|
||||
border-radius: var(--radius-sm);
|
||||
border: 1px solid var(--line-strong);
|
||||
background: var(--bg-2);
|
||||
|
|
|
|||
|
|
@ -1,48 +1,70 @@
|
|||
/* ───────────────────────────────────────────────────────────────────────────
|
||||
devvm breakglass — global theme
|
||||
A recovery console: dark, high-contrast, terminal-adjacent. Calm by default;
|
||||
danger is the only loud thing on the screen. No external fonts/CDNs — system
|
||||
monospace carries the identity, system sans carries readable prose.
|
||||
Emergency recovery console / instrument panel. Dark, high-contrast, monospace
|
||||
identity, calm by default. Danger (red) is reserved EXCLUSIVELY for the
|
||||
destructive VM power actions — nothing else on the screen is ever red. No
|
||||
external fonts/CDNs (air-gapped cluster): a refined system-monospace stack
|
||||
carries the identity, system-sans carries readable prose. Distinctiveness is
|
||||
earned through composition, the living "system pulse" lamp, motion, hairlines,
|
||||
and the reserved danger treatment — not through a downloaded typeface.
|
||||
─────────────────────────────────────────────────────────────────────────── */
|
||||
|
||||
:root {
|
||||
/* Surfaces — a near-black slate with cool undertone, layered for depth. */
|
||||
--bg-0: #07090c; /* page base */
|
||||
--bg-1: #0c1015; /* panel */
|
||||
--bg-2: #11171e; /* raised panel / input */
|
||||
--bg-3: #161d26; /* chips, hover */
|
||||
--bg-term: #06080a; /* command-output panels */
|
||||
/* Surfaces — a near-black slate with a cool undertone, layered for depth. */
|
||||
--bg-0: #06080b; /* page base (darkened from #07090c for crisper AA) */
|
||||
--bg-1: #0b0f14; /* panel */
|
||||
--bg-2: #10161d; /* raised panel / input */
|
||||
--bg-3: #161e27; /* chips, hover */
|
||||
--bg-term: #05070a; /* command-output panels */
|
||||
|
||||
/* Hairlines & text */
|
||||
--line: #1d2630;
|
||||
--line: #1c2530;
|
||||
--line-strong: #2a3744;
|
||||
--ink: #e6edf3; /* primary text */
|
||||
--ink-dim: #9bb0c0; /* secondary text */
|
||||
--ink-faint: #5d7185; /* labels, meta */
|
||||
--line-bright: #3a4a5a;
|
||||
--ink: #e9eff5; /* primary text */
|
||||
--ink-dim: #9bb0c0; /* secondary text — 8.0:1 on bg-2 */
|
||||
/* labels/meta — was #5d7185 (3.6:1, fails AA). Lifted to 6.1:1 on bg-2. */
|
||||
--ink-faint: #8499ab;
|
||||
|
||||
/* Accents */
|
||||
--cyan: #3dd1d6; /* "system alive" — links, focus, session dot */
|
||||
/* Accents — the "alive" cyan is the spine of the calm palette. */
|
||||
--cyan: #3dd1d6; /* "system alive" — links, focus, session pulse */
|
||||
--cyan-bright: #62e3e7;
|
||||
--cyan-dim: #1f6f72;
|
||||
--cyan-deep: #0e3133;
|
||||
--amber: #f5b657; /* working / in-flight */
|
||||
--amber-dim: #6a5226;
|
||||
--green: #5ddb8e; /* healthy exit */
|
||||
--green-dim: #1f5f3d;
|
||||
|
||||
/* Danger — reserved EXCLUSIVELY for mutating actions. Nothing else is red. */
|
||||
/* Danger — reserved EXCLUSIVELY for mutating power actions. Nothing else red. */
|
||||
--danger: #ff4d4d;
|
||||
--danger-bright: #ff6363;
|
||||
--danger-deep: #7a1717;
|
||||
--danger-glow: rgba(255, 77, 77, 0.35);
|
||||
|
||||
--radius: 10px;
|
||||
--radius-sm: 7px;
|
||||
--radius: 11px;
|
||||
--radius-sm: 8px;
|
||||
--radius-lg: 16px;
|
||||
|
||||
--mono: ui-monospace, "JetBrains Mono", "SF Mono", "Cascadia Code",
|
||||
"Fira Code", Menlo, Consolas, "Liberation Mono", monospace;
|
||||
/* A refined, deliberately-ordered monospace stack. We lead with faces that
|
||||
have real character (Berkeley Mono / JetBrains / Cascadia / SF Mono) and
|
||||
fall back gracefully — but ship nothing; whatever the device has carries
|
||||
the cockpit-readout identity. */
|
||||
--mono: "Berkeley Mono", ui-monospace, "JetBrains Mono", "SF Mono",
|
||||
"Cascadia Code", "Fira Code", "Source Code Pro", Menlo, Consolas,
|
||||
"Liberation Mono", monospace;
|
||||
--sans: ui-sans-serif, system-ui, -apple-system, "Segoe UI", Roboto,
|
||||
"Helvetica Neue", Arial, sans-serif;
|
||||
|
||||
--shadow-panel: 0 1px 0 rgba(255, 255, 255, 0.02) inset,
|
||||
0 16px 40px -24px rgba(0, 0, 0, 0.9);
|
||||
--shadow-panel: 0 1px 0 rgba(255, 255, 255, 0.025) inset,
|
||||
0 18px 44px -26px rgba(0, 0, 0, 0.95);
|
||||
--shadow-sheet: 0 -22px 48px -12px rgba(0, 0, 0, 0.7);
|
||||
|
||||
/* Safe-area shorthands (notch / home-indicator). 0px fallback off-device. */
|
||||
--safe-top: env(safe-area-inset-top, 0px);
|
||||
--safe-bottom: env(safe-area-inset-bottom, 0px);
|
||||
--safe-left: env(safe-area-inset-left, 0px);
|
||||
--safe-right: env(safe-area-inset-right, 0px);
|
||||
|
||||
color-scheme: dark;
|
||||
}
|
||||
|
|
@ -55,23 +77,24 @@ html,
|
|||
body {
|
||||
margin: 0;
|
||||
height: 100%;
|
||||
/* The page itself never scrolls — the chat stream scrolls internally. This
|
||||
keeps the composer pinned and stops iOS rubber-banding the whole UI. */
|
||||
/* The page itself never scrolls — only the chat stream scrolls internally.
|
||||
This keeps the composer pinned and stops iOS rubber-banding the whole UI. */
|
||||
overflow: hidden;
|
||||
overscroll-behavior: none;
|
||||
}
|
||||
|
||||
body {
|
||||
background-color: var(--bg-0);
|
||||
/* Atmosphere: a soft cyan corner-glow over a faint scanline weave, so the
|
||||
surface reads like backlit equipment rather than flat #000. */
|
||||
/* Atmosphere: a soft cyan corner-glow + a faint warm counter-glow over a
|
||||
hairline scanline weave, so the surface reads as backlit equipment rather
|
||||
than flat black. Fixed so it doesn't drift when the chat scrolls. */
|
||||
background-image:
|
||||
radial-gradient(120% 80% at 85% -10%, rgba(61, 209, 214, 0.07), transparent 55%),
|
||||
radial-gradient(90% 70% at 10% 110%, rgba(245, 182, 87, 0.04), transparent 50%),
|
||||
radial-gradient(120% 78% at 86% -12%, rgba(61, 209, 214, 0.08), transparent 55%),
|
||||
radial-gradient(90% 70% at 8% 112%, rgba(245, 182, 87, 0.045), transparent 52%),
|
||||
repeating-linear-gradient(
|
||||
0deg,
|
||||
rgba(255, 255, 255, 0.012) 0px,
|
||||
rgba(255, 255, 255, 0.012) 1px,
|
||||
rgba(255, 255, 255, 0.013) 0px,
|
||||
rgba(255, 255, 255, 0.013) 1px,
|
||||
transparent 1px,
|
||||
transparent 3px
|
||||
);
|
||||
|
|
@ -84,8 +107,8 @@ body {
|
|||
|
||||
#app {
|
||||
/* 100dvh (dynamic viewport height) — NOT 100vh/100% — so the composer at the
|
||||
bottom is never hidden behind a mobile browser's address/tool bar. Mobile is
|
||||
the primary client for this tool. 100vh is the fallback for old engines. */
|
||||
bottom is never hidden behind a mobile browser's address/tool bar. 100vh is
|
||||
the fallback for engines without dvh. Mobile is the primary client. */
|
||||
height: 100vh;
|
||||
height: 100dvh;
|
||||
}
|
||||
|
|
@ -94,7 +117,6 @@ button {
|
|||
font-family: var(--mono);
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
button:disabled {
|
||||
cursor: not-allowed;
|
||||
}
|
||||
|
|
@ -119,10 +141,26 @@ button:disabled {
|
|||
background-clip: content-box;
|
||||
}
|
||||
*::-webkit-scrollbar-thumb:hover {
|
||||
background: #3a4a5a;
|
||||
background: var(--line-bright);
|
||||
background-clip: content-box;
|
||||
}
|
||||
|
||||
/* ── Shared motion primitives ──────────────────────────────────────────────
|
||||
One well-orchestrated entrance beats scattered micro-interactions: panels
|
||||
and rows rise a few px with a soft fade, staggered via --d on each element. */
|
||||
@keyframes rise-in {
|
||||
from { opacity: 0; transform: translateY(8px); }
|
||||
to { opacity: 1; transform: translateY(0); }
|
||||
}
|
||||
@keyframes fade-in {
|
||||
from { opacity: 0; }
|
||||
to { opacity: 1; }
|
||||
}
|
||||
.rise-in {
|
||||
animation: rise-in 0.5s cubic-bezier(0.22, 0.61, 0.36, 1) both;
|
||||
animation-delay: var(--d, 0ms);
|
||||
}
|
||||
|
||||
@media (prefers-reduced-motion: reduce) {
|
||||
*,
|
||||
*::before,
|
||||
|
|
|
|||
|
|
@ -1,8 +1,41 @@
|
|||
// Same-origin API client. Auth is handled entirely by the edge proxy
|
||||
// (Authentik / basic-auth / bearer) — this UI never sends or stores a token.
|
||||
import { readEventStream } from './sse.js';
|
||||
// Same-origin API client for the breakglass UI.
|
||||
//
|
||||
// Auth is handled entirely by the edge proxy (Authentik / basic-auth / bearer):
|
||||
// this UI never sends or stores a token, and builds no login screen.
|
||||
//
|
||||
// The chat uses the tmux/attach model. The conversation lives SERVER-SIDE; we
|
||||
// only persist the session_id locally and ATTACH to it over an EventSource. The
|
||||
// browser's native EventSource auto-reconnects and sends Last-Event-ID, and the
|
||||
// server resumes from there — so there is ZERO reconnect logic here. We just
|
||||
// render events idempotently by id (see transcript.js).
|
||||
|
||||
/** Open a fresh chat session. @returns {Promise<string>} session_id */
|
||||
const SESSION_KEY = 'breakglass.session_id';
|
||||
|
||||
/** Read the persisted session id, or '' if none. */
|
||||
export function loadSessionId() {
|
||||
try {
|
||||
return localStorage.getItem(SESSION_KEY) || '';
|
||||
} catch {
|
||||
return '';
|
||||
}
|
||||
}
|
||||
|
||||
/** Persist the session id (best-effort; private-mode storage may throw). */
|
||||
export function saveSessionId(id) {
|
||||
try {
|
||||
if (id) localStorage.setItem(SESSION_KEY, id);
|
||||
else localStorage.removeItem(SESSION_KEY);
|
||||
} catch {
|
||||
/* ignore — storage is a convenience, not a requirement */
|
||||
}
|
||||
}
|
||||
|
||||
/** Forget the persisted session id (the "New session" archive step). */
|
||||
export function clearSessionId() {
|
||||
saveSessionId('');
|
||||
}
|
||||
|
||||
/** Open a fresh server-side session. @returns {Promise<string>} session_id */
|
||||
export async function openSession() {
|
||||
const res = await fetch('/api/session', {
|
||||
method: 'POST',
|
||||
|
|
@ -19,30 +52,89 @@ export async function openSession() {
|
|||
}
|
||||
|
||||
/**
|
||||
* Run one chat turn. Streams events to onEvent until the backend sends
|
||||
* {kind:"done"} and the connection closes. Pass an AbortSignal to cancel.
|
||||
* Attach to a session's event stream. Returns the live EventSource so the
|
||||
* caller can close() it. Events arrive as:
|
||||
* - default `message` events: .data is JSON {kind, id, ...}
|
||||
* - a named `caught-up` event once the replay is drained (.data is {})
|
||||
* - native `error` events while reconnecting (EventSource retries itself)
|
||||
*
|
||||
* @param {{session_id: string, prompt: string, model?: string, signal?: AbortSignal}} opts
|
||||
* @param {(event: object) => void} onEvent
|
||||
* @param {string} sessionId
|
||||
* @param {{
|
||||
* onEvent: (e: object) => void,
|
||||
* onCaughtUp?: () => void,
|
||||
* onOpen?: () => void,
|
||||
* onError?: (e: Event) => void,
|
||||
* }} handlers
|
||||
* @returns {EventSource}
|
||||
*/
|
||||
export async function streamChat({ session_id, prompt, model, signal }, onEvent) {
|
||||
const payload = { session_id, prompt };
|
||||
if (model) payload.model = model;
|
||||
export function attachStream(sessionId, { onEvent, onCaughtUp, onOpen, onError }) {
|
||||
const es = new EventSource(`/api/session/${encodeURIComponent(sessionId)}/stream`);
|
||||
|
||||
const res = await fetch('/api/chat', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'content-type': 'application/json',
|
||||
accept: 'text/event-stream',
|
||||
},
|
||||
body: JSON.stringify(payload),
|
||||
signal,
|
||||
});
|
||||
await readEventStream(res, onEvent);
|
||||
es.onopen = () => onOpen?.();
|
||||
|
||||
es.onmessage = (e) => {
|
||||
if (!e || typeof e.data !== 'string' || e.data === '') return;
|
||||
let obj;
|
||||
try {
|
||||
obj = JSON.parse(e.data);
|
||||
} catch {
|
||||
// A malformed frame must not abort an in-progress recovery stream.
|
||||
return;
|
||||
}
|
||||
// EventSource exposes the SSE `id:` line as e.lastEventId. The server also
|
||||
// embeds id in the JSON; prefer the JSON id, fall back to lastEventId.
|
||||
if ((obj.id == null || obj.id === '') && e.lastEventId) obj.id = e.lastEventId;
|
||||
onEvent(obj);
|
||||
};
|
||||
|
||||
es.addEventListener('caught-up', () => onCaughtUp?.());
|
||||
|
||||
es.onerror = (e) => {
|
||||
// EventSource auto-reconnects on a transient drop (readyState CONNECTING);
|
||||
// we only surface a hard, terminal failure (readyState CLOSED).
|
||||
onError?.(e);
|
||||
};
|
||||
|
||||
return es;
|
||||
}
|
||||
|
||||
/**
|
||||
* List the PVE power verbs and which of them mutate VM state.
|
||||
* Start a turn. Output arrives via the attach stream, NOT this response.
|
||||
* @param {{session_id: string, prompt: string, model?: string}} opts
|
||||
* @returns {Promise<{status:'started'|'busy'|'gone'}>}
|
||||
* started — accepted; busy — 409 (a turn already runs); gone — 404 (re-create).
|
||||
*/
|
||||
export async function sendPrompt({ session_id, prompt, model }) {
|
||||
const payload = { prompt };
|
||||
if (model) payload.model = model;
|
||||
const res = await fetch(`/api/session/${encodeURIComponent(session_id)}/prompt`, {
|
||||
method: 'POST',
|
||||
headers: { 'content-type': 'application/json' },
|
||||
body: JSON.stringify(payload),
|
||||
});
|
||||
if (res.status === 409) return { status: 'busy' };
|
||||
if (res.status === 404) return { status: 'gone' };
|
||||
if (!res.ok) throw new Error(`could not start the turn (HTTP ${res.status})`);
|
||||
return { status: 'started' };
|
||||
}
|
||||
|
||||
/**
|
||||
* Cancel the in-flight turn (the Stop button).
|
||||
* @param {string} sessionId
|
||||
* @returns {Promise<boolean>} whether a turn was cancelled
|
||||
*/
|
||||
export async function cancelTurn(sessionId) {
|
||||
const res = await fetch(`/api/session/${encodeURIComponent(sessionId)}/cancel`, {
|
||||
method: 'POST',
|
||||
headers: { 'content-type': 'application/json' },
|
||||
});
|
||||
if (!res.ok) throw new Error(`could not stop the turn (HTTP ${res.status})`);
|
||||
const body = await res.json().catch(() => ({}));
|
||||
return Boolean(body.cancelled);
|
||||
}
|
||||
|
||||
/**
|
||||
* List the PVE power verbs and which mutate VM state.
|
||||
* @returns {Promise<{verbs: string[], mutating: string[]}>}
|
||||
*/
|
||||
export async function fetchVerbs() {
|
||||
|
|
@ -58,27 +150,26 @@ export async function fetchVerbs() {
|
|||
}
|
||||
|
||||
/**
|
||||
* Run a PVE power verb directly (no AI in the path). The backend returns 200
|
||||
* on success and 502 when the verb's exit code is non-zero, but the JSON body
|
||||
* carries {verb, exit_code, stdout, stderr, rejected} in BOTH cases — so we
|
||||
* read the body regardless of HTTP status and let the caller style on
|
||||
* exit_code / rejected.
|
||||
* Run a PVE power verb directly (no AI in the path). The backend returns 200 on
|
||||
* success and 502 when the verb's exit code is non-zero, but the JSON body
|
||||
* carries {verb, exit_code, stdout, stderr, rejected} in BOTH cases — so we read
|
||||
* the body regardless of HTTP status and let the caller style on exit_code.
|
||||
*
|
||||
* @param {string} verb
|
||||
* @returns {Promise<{verb: string, exit_code: number|null, stdout: string, stderr: string, rejected: boolean}>}
|
||||
* @returns {Promise<{verb:string, exit_code:number|null, stdout:string, stderr:string, rejected:boolean}>}
|
||||
*/
|
||||
export async function runVerb(verb) {
|
||||
const res = await fetch(`/api/pve/${encodeURIComponent(verb)}`, {
|
||||
method: 'POST',
|
||||
headers: { 'content-type': 'application/json' },
|
||||
});
|
||||
// 400 = unknown verb (FastAPI HTTPException) — has {detail}, not the verb shape.
|
||||
let body;
|
||||
try {
|
||||
body = await res.json();
|
||||
} catch {
|
||||
throw new Error(`VM control '${verb}' failed (HTTP ${res.status}, no body)`);
|
||||
}
|
||||
// 400 = unknown verb (FastAPI HTTPException) — has {detail}, not the verb shape.
|
||||
if (res.status === 400) {
|
||||
throw new Error(body?.detail || `'${verb}' was rejected by the server`);
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,150 +0,0 @@
|
|||
// SSE frame parsing — the load-bearing core of the breakglass UI.
|
||||
//
|
||||
// The /api/chat endpoint returns a text/event-stream that we read with
|
||||
// fetch() + response.body.getReader() (NOT EventSource, which cannot POST).
|
||||
// The backend emits one frame per event as:
|
||||
//
|
||||
// data: {json}\n\n
|
||||
//
|
||||
// getReader() hands us bytes at arbitrary boundaries: a single frame can be
|
||||
// split across reads, and one read can contain several frames. So we keep a
|
||||
// rolling text buffer, split it on the blank-line frame delimiter, and only
|
||||
// hand back the JSON payload of *complete* frames. Per the SSE spec a frame may
|
||||
// carry multiple `data:` lines (joined with "\n"); the backend emits single
|
||||
// line JSON today, but we handle the general case so a future multi-line
|
||||
// payload can't silently corrupt the stream.
|
||||
|
||||
/**
|
||||
* Parse a single SSE event block (the text between blank lines) into its data
|
||||
* payload string, or null if the block carries no `data:` field (e.g. a bare
|
||||
* comment or a `:` heartbeat).
|
||||
* @param {string} block
|
||||
* @returns {string|null}
|
||||
*/
|
||||
export function dataFromEventBlock(block) {
|
||||
const dataLines = [];
|
||||
for (const rawLine of block.split('\n')) {
|
||||
const line = rawLine.replace(/\r$/, '');
|
||||
if (line.startsWith(':')) continue; // SSE comment / heartbeat
|
||||
if (line === 'data:' || line === 'data') {
|
||||
dataLines.push('');
|
||||
} else if (line.startsWith('data:')) {
|
||||
// Spec: a single leading space after the colon is stripped.
|
||||
let v = line.slice('data:'.length);
|
||||
if (v.startsWith(' ')) v = v.slice(1);
|
||||
dataLines.push(v);
|
||||
}
|
||||
// field lines we don't care about (event:, id:, retry:) are ignored
|
||||
}
|
||||
if (dataLines.length === 0) return null;
|
||||
return dataLines.join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* A stateful splitter that turns an arbitrary sequence of decoded text chunks
|
||||
* into a sequence of complete SSE event-block strings. Frames are delimited by
|
||||
* a blank line; we tolerate both "\n\n" and "\r\n\r\n".
|
||||
*/
|
||||
export class SSEFrameSplitter {
|
||||
constructor() {
|
||||
this.buffer = '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Feed a decoded text chunk; returns the event blocks that are now complete.
|
||||
* Any trailing partial frame stays buffered for the next chunk.
|
||||
* @param {string} chunk
|
||||
* @returns {string[]} complete event blocks (text between delimiters)
|
||||
*/
|
||||
push(chunk) {
|
||||
this.buffer += chunk;
|
||||
const blocks = [];
|
||||
// Normalise CRLF delimiters to LF so a single split rule covers both.
|
||||
let idx;
|
||||
// Process every complete frame currently in the buffer.
|
||||
while ((idx = this._nextDelimiter()) !== -1) {
|
||||
const block = this.buffer.slice(0, idx.start);
|
||||
this.buffer = this.buffer.slice(idx.end);
|
||||
if (block.length > 0) blocks.push(block);
|
||||
}
|
||||
return blocks;
|
||||
}
|
||||
|
||||
/**
|
||||
* On stream end, return whatever complete-looking content remains. A
|
||||
* well-behaved backend always terminates the last frame with a blank line,
|
||||
* so this is usually empty — but if the connection closed mid-trailing-frame
|
||||
* with a parseable block, surface it rather than dropping data.
|
||||
* @returns {string[]}
|
||||
*/
|
||||
flush() {
|
||||
const rest = this.buffer.trim();
|
||||
this.buffer = '';
|
||||
return rest ? [rest] : [];
|
||||
}
|
||||
|
||||
_nextDelimiter() {
|
||||
// Find the earliest of "\n\n", "\r\n\r\n", "\r\r".
|
||||
const candidates = [
|
||||
{ token: '\r\n\r\n', i: this.buffer.indexOf('\r\n\r\n') },
|
||||
{ token: '\n\n', i: this.buffer.indexOf('\n\n') },
|
||||
{ token: '\r\r', i: this.buffer.indexOf('\r\r') },
|
||||
].filter((c) => c.i !== -1);
|
||||
if (candidates.length === 0) return -1;
|
||||
candidates.sort((a, b) => a.i - b.i);
|
||||
const { token, i } = candidates[0];
|
||||
return { start: i, end: i + token.length };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Read an SSE Response body to completion, invoking onEvent for every parsed
|
||||
* JSON event object. Resolves when the stream ends. Throws if the response is
|
||||
* not ok or has no readable body (caller shows the error inline).
|
||||
*
|
||||
* @param {Response} response a fetch() Response with a streaming body
|
||||
* @param {(event: object) => void} onEvent called per parsed JSON event
|
||||
*/
|
||||
export async function readEventStream(response, onEvent) {
|
||||
if (!response.ok) {
|
||||
throw new Error(`server returned ${response.status} ${response.statusText}`);
|
||||
}
|
||||
if (!response.body) {
|
||||
throw new Error('response has no readable body (streaming unsupported)');
|
||||
}
|
||||
|
||||
const reader = response.body.getReader();
|
||||
const decoder = new TextDecoder();
|
||||
const splitter = new SSEFrameSplitter();
|
||||
|
||||
const handleBlock = (block) => {
|
||||
const payload = dataFromEventBlock(block);
|
||||
if (payload == null || payload.trim() === '') return;
|
||||
let obj;
|
||||
try {
|
||||
obj = JSON.parse(payload);
|
||||
} catch {
|
||||
// A malformed frame must not abort an in-progress recovery stream;
|
||||
// skip it and keep reading.
|
||||
return;
|
||||
}
|
||||
onEvent(obj);
|
||||
};
|
||||
|
||||
try {
|
||||
for (;;) {
|
||||
const { value, done } = await reader.read();
|
||||
if (done) break;
|
||||
const text = decoder.decode(value, { stream: true });
|
||||
for (const block of splitter.push(text)) handleBlock(block);
|
||||
}
|
||||
} finally {
|
||||
reader.releaseLock?.();
|
||||
}
|
||||
// Drain any trailing bytes the decoder held, then any final frame.
|
||||
const tail = decoder.decode();
|
||||
if (tail) {
|
||||
for (const block of splitter.push(tail)) handleBlock(block);
|
||||
}
|
||||
for (const block of splitter.flush()) handleBlock(block);
|
||||
}
|
||||
|
|
@ -1,152 +0,0 @@
|
|||
// Standalone test of the SSE frame parser — no test framework, just node.
|
||||
// Run: node src/lib/sse.test.mjs (exits non-zero on any failure)
|
||||
//
|
||||
// These pin the protocol described in the API contract: frames are
|
||||
// `data: {json}\n\n`, the event `kind` is one of session/text/tool/result/
|
||||
// error/done, and bytes arrive at arbitrary boundaries via getReader().
|
||||
import { SSEFrameSplitter, dataFromEventBlock, readEventStream } from './sse.js';
|
||||
|
||||
let failures = 0;
|
||||
function ok(name, cond) {
|
||||
if (cond) {
|
||||
console.log(` ok ${name}`);
|
||||
} else {
|
||||
failures++;
|
||||
console.error(`FAIL ${name}`);
|
||||
}
|
||||
}
|
||||
function eq(name, got, want) {
|
||||
const g = JSON.stringify(got);
|
||||
const w = JSON.stringify(want);
|
||||
ok(`${name} (got ${g})`, g === w);
|
||||
}
|
||||
|
||||
// --- dataFromEventBlock ---------------------------------------------------
|
||||
eq(
|
||||
'extracts JSON payload from a data: line',
|
||||
dataFromEventBlock('data: {"kind":"text","text":"hi"}'),
|
||||
'{"kind":"text","text":"hi"}'
|
||||
);
|
||||
eq(
|
||||
'strips exactly one space after the colon',
|
||||
dataFromEventBlock('data: leading-space-kept'),
|
||||
' leading-space-kept'
|
||||
);
|
||||
eq('ignores comment/heartbeat lines', dataFromEventBlock(': keep-alive'), null);
|
||||
eq(
|
||||
'joins multi-line data fields with newline',
|
||||
dataFromEventBlock('data: line1\ndata: line2'),
|
||||
'line1\nline2'
|
||||
);
|
||||
|
||||
// --- SSEFrameSplitter: whole frames --------------------------------------
|
||||
{
|
||||
const s = new SSEFrameSplitter();
|
||||
const blocks = s.push('data: {"kind":"session","session_id":"abc"}\n\n');
|
||||
eq('one complete frame yields one block', blocks, [
|
||||
'data: {"kind":"session","session_id":"abc"}',
|
||||
]);
|
||||
}
|
||||
|
||||
// --- SSEFrameSplitter: multiple frames in one chunk ----------------------
|
||||
{
|
||||
const s = new SSEFrameSplitter();
|
||||
const blocks = s.push(
|
||||
'data: {"kind":"text","text":"a"}\n\ndata: {"kind":"text","text":"b"}\n\n'
|
||||
);
|
||||
eq('two frames in one chunk yield two blocks', blocks.length, 2);
|
||||
eq('first block', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"a"}');
|
||||
eq('second block', dataFromEventBlock(blocks[1]), '{"kind":"text","text":"b"}');
|
||||
}
|
||||
|
||||
// --- SSEFrameSplitter: frame split across chunks -------------------------
|
||||
{
|
||||
const s = new SSEFrameSplitter();
|
||||
let blocks = s.push('data: {"kind":"te');
|
||||
eq('partial frame yields nothing yet', blocks, []);
|
||||
blocks = s.push('xt","text":"split"}\n\n');
|
||||
eq('completing the frame yields it whole', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"split"}');
|
||||
}
|
||||
|
||||
// --- SSEFrameSplitter: delimiter split across chunks ---------------------
|
||||
{
|
||||
const s = new SSEFrameSplitter();
|
||||
let blocks = s.push('data: {"kind":"done"}\n');
|
||||
eq('frame held while delimiter incomplete', blocks, []);
|
||||
blocks = s.push('\n');
|
||||
eq('frame released once blank line completes', dataFromEventBlock(blocks[0]), '{"kind":"done"}');
|
||||
}
|
||||
|
||||
// --- SSEFrameSplitter: CRLF delimiters -----------------------------------
|
||||
{
|
||||
const s = new SSEFrameSplitter();
|
||||
const blocks = s.push('data: {"kind":"text","text":"crlf"}\r\n\r\n');
|
||||
eq('CRLF-delimited frame parses', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"crlf"}');
|
||||
}
|
||||
|
||||
// --- end-to-end via readEventStream over a mock streaming Response --------
|
||||
function mockResponse(chunks) {
|
||||
const enc = new TextEncoder();
|
||||
let i = 0;
|
||||
return {
|
||||
ok: true,
|
||||
status: 200,
|
||||
body: {
|
||||
getReader() {
|
||||
return {
|
||||
read() {
|
||||
if (i < chunks.length) {
|
||||
return Promise.resolve({ value: enc.encode(chunks[i++]), done: false });
|
||||
}
|
||||
return Promise.resolve({ value: undefined, done: true });
|
||||
},
|
||||
releaseLock() {},
|
||||
};
|
||||
},
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
await (async () => {
|
||||
// A realistic turn, deliberately chopped at ugly boundaries:
|
||||
// - the session frame split mid-JSON
|
||||
// - two text frames glued together
|
||||
// - a tool frame
|
||||
// - a result frame and the terminal done frame in one chunk
|
||||
const chunks = [
|
||||
'data: {"kind":"sess',
|
||||
'ion","session_id":"S1"}\n\n',
|
||||
'data: {"kind":"text","text":"checking "}\n\ndata: {"kind":"text","text":"disk"}\n\n',
|
||||
'data: {"kind":"tool","name":"Bash","input":{"command":"df -h"}}\n\n',
|
||||
'data: {"kind":"result","is_error":false,"result":"ok","duration_ms":12}\n\ndata: {"kind":"done"}\n\n',
|
||||
];
|
||||
const events = [];
|
||||
await readEventStream(mockResponse(chunks), (e) => events.push(e));
|
||||
|
||||
eq('event count', events.length, 6);
|
||||
eq('1: session id', events[0], { kind: 'session', session_id: 'S1' });
|
||||
eq('2: first text', events[1], { kind: 'text', text: 'checking ' });
|
||||
eq('3: second text', events[2], { kind: 'text', text: 'disk' });
|
||||
eq('4: tool kind+name', { kind: events[3].kind, name: events[3].name }, { kind: 'tool', name: 'Bash' });
|
||||
eq('4: tool command', events[3].input.command, 'df -h');
|
||||
eq('5: result', events[4], { kind: 'result', is_error: false, result: 'ok', duration_ms: 12 });
|
||||
eq('6: done terminal', events[5], { kind: 'done' });
|
||||
})();
|
||||
|
||||
// malformed frame in the middle must be skipped, not abort the stream
|
||||
await (async () => {
|
||||
const chunks = [
|
||||
'data: {"kind":"text","text":"before"}\n\n',
|
||||
'data: {this is not json}\n\n',
|
||||
'data: {"kind":"done"}\n\n',
|
||||
];
|
||||
const events = [];
|
||||
await readEventStream(mockResponse(chunks), (e) => events.push(e));
|
||||
eq('malformed frame skipped, stream continues', events.map((e) => e.kind), ['text', 'done']);
|
||||
})();
|
||||
|
||||
if (failures) {
|
||||
console.error(`\n${failures} assertion(s) FAILED`);
|
||||
process.exit(1);
|
||||
}
|
||||
console.log('\nall SSE parser assertions passed');
|
||||
196
frontend/src/lib/transcript.js
Normal file
|
|
@ -0,0 +1,196 @@
|
|||
// transcript.js — the load-bearing core of the breakglass UI.
|
||||
//
|
||||
// The attach stream (EventSource) replays the conversation-so-far and then
|
||||
// tails live. Replayed events are byte-identical to live ones, and on a
|
||||
// reconnect the server re-replays from Last-Event-ID — so the SAME event id can
|
||||
// arrive more than once. This module folds a flat, possibly-duplicated event
|
||||
// sequence into an ordered list of render-ready messages, idempotently.
|
||||
//
|
||||
// Contract (every default `message` event's .data is one of these JSON shapes):
|
||||
// {kind:"user", text, id} → opens a USER bubble
|
||||
// {kind:"session", session_id, id} → informational (agent's session id)
|
||||
// {kind:"text", text, id} → assistant prose; concatenated
|
||||
// {kind:"tool", name, input, id} → inline tool chip (Bash → command)
|
||||
// {kind:"result", is_error, result, duration_ms, id} → closes the bubble
|
||||
// {kind:"error", error, id} → error note on the bubble
|
||||
// {kind:"cancelled", id} → muted "stopped" note
|
||||
// {kind:"turn_end", id} → the turn finished
|
||||
//
|
||||
// Grouping: a `user` event opens a user message; the session/text/tool events
|
||||
// that follow build ONE assistant message; result/error/cancelled annotate it;
|
||||
// turn_end ends it. Assistant events with no preceding user (e.g. a session
|
||||
// banner on a fresh attach) still get an assistant message so nothing is lost.
|
||||
//
|
||||
// Idempotency: every event carries a monotonic integer-ish id. We track the
|
||||
// max id folded so far and DROP any event whose id we've already passed — a
|
||||
// reconnect replay therefore never double-renders. Ids are compared
|
||||
// numerically when both parse as numbers, else as strings (defensive).
|
||||
|
||||
/** @typedef {{type:'text',text:string}|{type:'tool',name:string,command:string,raw:any}} Part */
|
||||
/**
|
||||
* @typedef {Object} Message
|
||||
* @property {'user'|'assistant'} role
|
||||
* @property {string} key stable key for keyed {#each}
|
||||
* @property {string} [text] user text
|
||||
* @property {Part[]} [parts] assistant parts, in emit order
|
||||
* @property {{is_error:boolean,text:string,duration_ms:number|null}} [result]
|
||||
* @property {string} [error]
|
||||
* @property {boolean} [cancelled]
|
||||
* @property {boolean} [ended] turn_end seen for this message
|
||||
*/
|
||||
|
||||
/** Compare two ids; numeric when both look numeric, else lexicographic. */
|
||||
export function idGreater(a, b) {
|
||||
const na = Number(a);
|
||||
const nb = Number(b);
|
||||
if (Number.isFinite(na) && Number.isFinite(nb) && `${a}`.trim() !== '' && `${b}`.trim() !== '') {
|
||||
return na > nb;
|
||||
}
|
||||
return String(a) > String(b);
|
||||
}
|
||||
|
||||
/**
|
||||
* Create an empty transcript-folding state.
|
||||
* @returns {{messages: Message[], maxId: any, sawId: boolean, openAssistant: Message|null, activeUserSeen: boolean}}
|
||||
*/
|
||||
export function createTranscript() {
|
||||
return {
|
||||
messages: [],
|
||||
maxId: null,
|
||||
sawId: false,
|
||||
openAssistant: null,
|
||||
// a turn is "active" once a user event (or local prompt) has no following
|
||||
// turn_end; the UI reads `active` from reduceEvent's return.
|
||||
activeUserSeen: false,
|
||||
};
|
||||
}
|
||||
|
||||
function bubbleKey(prefix, id, fallbackIndex) {
|
||||
if (id != null && `${id}`.trim() !== '') return `${prefix}:${id}`;
|
||||
return `${prefix}:idx:${fallbackIndex}`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Should this event be applied, given the max id folded so far? Updates and
|
||||
* returns the new max. Events WITHOUT an id are always applied (and don't move
|
||||
* the watermark) — the protocol always carries ids, but we never drop data on a
|
||||
* malformed frame.
|
||||
* @returns {{apply:boolean, maxId:any}}
|
||||
*/
|
||||
export function admit(maxId, id) {
|
||||
if (id == null || `${id}`.trim() === '') return { apply: true, maxId };
|
||||
if (maxId == null) return { apply: true, maxId: id };
|
||||
if (idGreater(id, maxId)) return { apply: true, maxId: id };
|
||||
return { apply: false, maxId }; // already seen — dedupe
|
||||
}
|
||||
|
||||
/**
|
||||
* Fold one event into the transcript state, mutating `state` in place.
|
||||
* Returns true if the state changed (so callers can trigger a re-render).
|
||||
*
|
||||
* @param {ReturnType<typeof createTranscript>} state
|
||||
* @param {any} ev parsed event object ({kind, id, ...})
|
||||
* @returns {boolean} changed
|
||||
*/
|
||||
export function reduceEvent(state, ev) {
|
||||
if (!ev || typeof ev !== 'object') return false;
|
||||
const { apply, maxId } = admit(state.maxId, ev.id);
|
||||
state.maxId = maxId;
|
||||
if (!apply) return false;
|
||||
if (ev.id != null && `${ev.id}`.trim() !== '') state.sawId = true;
|
||||
|
||||
const ensureAssistant = () => {
|
||||
if (!state.openAssistant) {
|
||||
const msg = {
|
||||
role: 'assistant',
|
||||
key: bubbleKey('a', ev.id, state.messages.length),
|
||||
parts: [],
|
||||
ended: false,
|
||||
};
|
||||
state.messages.push(msg);
|
||||
state.openAssistant = msg;
|
||||
}
|
||||
return state.openAssistant;
|
||||
};
|
||||
|
||||
switch (ev.kind) {
|
||||
case 'user': {
|
||||
// A new user turn. Close any dangling assistant bubble first.
|
||||
state.openAssistant = null;
|
||||
state.messages.push({
|
||||
role: 'user',
|
||||
key: bubbleKey('u', ev.id, state.messages.length),
|
||||
text: typeof ev.text === 'string' ? ev.text : '',
|
||||
});
|
||||
state.activeUserSeen = true;
|
||||
return true;
|
||||
}
|
||||
case 'session': {
|
||||
// Informational — does not itself render a part, but it does open the
|
||||
// assistant bubble for the turn so subsequent text lands in one place.
|
||||
ensureAssistant();
|
||||
return true;
|
||||
}
|
||||
case 'text': {
|
||||
if (typeof ev.text !== 'string' || ev.text === '') return false;
|
||||
const msg = ensureAssistant();
|
||||
const tail = msg.parts[msg.parts.length - 1];
|
||||
if (tail && tail.type === 'text') {
|
||||
tail.text += ev.text; // concatenate consecutive prose
|
||||
} else {
|
||||
msg.parts.push({ type: 'text', text: ev.text });
|
||||
}
|
||||
return true;
|
||||
}
|
||||
case 'tool': {
|
||||
const msg = ensureAssistant();
|
||||
const command =
|
||||
ev.input && typeof ev.input.command === 'string' ? ev.input.command : '';
|
||||
msg.parts.push({
|
||||
type: 'tool',
|
||||
name: typeof ev.name === 'string' && ev.name ? ev.name : 'tool',
|
||||
command,
|
||||
raw: ev.input ?? null,
|
||||
});
|
||||
return true;
|
||||
}
|
||||
case 'result': {
|
||||
const msg = ensureAssistant();
|
||||
msg.result = {
|
||||
is_error: Boolean(ev.is_error),
|
||||
text: typeof ev.result === 'string' ? ev.result : '',
|
||||
duration_ms: typeof ev.duration_ms === 'number' ? ev.duration_ms : null,
|
||||
};
|
||||
return true;
|
||||
}
|
||||
case 'error': {
|
||||
const msg = ensureAssistant();
|
||||
msg.error = typeof ev.error === 'string' && ev.error ? ev.error : 'unknown error';
|
||||
return true;
|
||||
}
|
||||
case 'cancelled': {
|
||||
const msg = ensureAssistant();
|
||||
msg.cancelled = true;
|
||||
return true;
|
||||
}
|
||||
case 'turn_end': {
|
||||
if (state.openAssistant) state.openAssistant.ended = true;
|
||||
state.openAssistant = null;
|
||||
state.activeUserSeen = false;
|
||||
return true;
|
||||
}
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Convenience: fold an array of events into a fresh transcript (used by tests
|
||||
* and by a from-scratch render). Returns the final state.
|
||||
* @param {any[]} events
|
||||
*/
|
||||
export function foldAll(events) {
|
||||
const state = createTranscript();
|
||||
for (const ev of events) reduceEvent(state, ev);
|
||||
return state;
|
||||
}
|
||||
162
frontend/src/lib/transcript.test.mjs
Normal file
|
|
@ -0,0 +1,162 @@
|
|||
// Standalone test of the transcript folder — no test framework, just node.
|
||||
// Run: node src/lib/transcript.test.mjs (exits non-zero on any failure)
|
||||
//
|
||||
// These pin the attach-model contract: events carry monotonic ids, a reconnect
|
||||
// re-replays already-seen ids (which MUST be deduped), and events group into
|
||||
// user/assistant messages with consecutive prose concatenated.
|
||||
import {
|
||||
admit,
|
||||
idGreater,
|
||||
reduceEvent,
|
||||
createTranscript,
|
||||
foldAll,
|
||||
} from './transcript.js';
|
||||
|
||||
let failures = 0;
|
||||
function ok(name, cond) {
|
||||
if (cond) {
|
||||
console.log(` ok ${name}`);
|
||||
} else {
|
||||
failures++;
|
||||
console.error(`FAIL ${name}`);
|
||||
}
|
||||
}
|
||||
function eq(name, got, want) {
|
||||
const g = JSON.stringify(got);
|
||||
const w = JSON.stringify(want);
|
||||
ok(`${name} (got ${g})`, g === w);
|
||||
}
|
||||
|
||||
// --- id comparison --------------------------------------------------------
|
||||
ok('idGreater numeric', idGreater(10, 9) === true);
|
||||
ok('idGreater numeric not', idGreater(2, 10) === false); // not string "2" > "10"
|
||||
ok('idGreater string fallback', idGreater('b', 'a') === true);
|
||||
|
||||
// --- admit / dedupe watermark --------------------------------------------
|
||||
{
|
||||
let { apply, maxId } = admit(null, 1);
|
||||
eq('first id admitted', { apply, maxId }, { apply: true, maxId: 1 });
|
||||
({ apply, maxId } = admit(5, 5));
|
||||
ok('equal id rejected (already seen)', apply === false && maxId === 5);
|
||||
({ apply, maxId } = admit(5, 3));
|
||||
ok('lower id rejected', apply === false && maxId === 5);
|
||||
({ apply, maxId } = admit(5, 6));
|
||||
ok('higher id admitted, watermark moves', apply === true && maxId === 6);
|
||||
({ apply, maxId } = admit(5, undefined));
|
||||
ok('id-less event always admitted, watermark held', apply === true && maxId === 5);
|
||||
}
|
||||
|
||||
// --- a full turn groups into user + one assistant bubble ------------------
|
||||
{
|
||||
const events = [
|
||||
{ kind: 'user', text: 'triage it', id: 1 },
|
||||
{ kind: 'session', session_id: 'S1', id: 2 },
|
||||
{ kind: 'text', text: 'Checking ', id: 3 },
|
||||
{ kind: 'text', text: 'disk usage.', id: 4 },
|
||||
{ kind: 'tool', name: 'Bash', input: { command: 'df -h' }, id: 5 },
|
||||
{ kind: 'result', is_error: false, result: 'ok', duration_ms: 1200, id: 6 },
|
||||
{ kind: 'turn_end', id: 7 },
|
||||
];
|
||||
const s = foldAll(events);
|
||||
eq('two messages: user + assistant', s.messages.length, 2);
|
||||
eq('first is user with text', { r: s.messages[0].role, t: s.messages[0].text }, { r: 'user', t: 'triage it' });
|
||||
const a = s.messages[1];
|
||||
eq('assistant role', a.role, 'assistant');
|
||||
// consecutive text concatenated into ONE part; tool is a separate part
|
||||
eq('parts: one concatenated text + one tool', a.parts.map((p) => p.type), ['text', 'tool']);
|
||||
eq('prose concatenated in order', a.parts[0].text, 'Checking disk usage.');
|
||||
eq('tool command captured', a.parts[1].command, 'df -h');
|
||||
eq('result attached', { e: a.result.is_error, ms: a.result.duration_ms }, { e: false, ms: 1200 });
|
||||
ok('turn ended', a.ended === true);
|
||||
ok('no longer active after turn_end', s.activeUserSeen === false);
|
||||
}
|
||||
|
||||
// --- reconnect replay: re-feeding the SAME events must NOT double-render --
|
||||
{
|
||||
const events = [
|
||||
{ kind: 'user', text: 'hi', id: 1 },
|
||||
{ kind: 'text', text: 'hello', id: 2 },
|
||||
{ kind: 'turn_end', id: 3 },
|
||||
];
|
||||
const s = createTranscript();
|
||||
for (const e of events) reduceEvent(s, e);
|
||||
// simulate an EventSource reconnect that re-replays everything from the top
|
||||
for (const e of events) reduceEvent(s, e);
|
||||
eq('still exactly two messages after replay', s.messages.length, 2);
|
||||
eq('assistant prose not doubled', s.messages[1].parts[0].text, 'hello');
|
||||
}
|
||||
|
||||
// --- a partial replay (Last-Event-ID resume) continues the same bubble ----
|
||||
{
|
||||
const s = createTranscript();
|
||||
reduceEvent(s, { kind: 'user', text: 'go', id: 1 });
|
||||
reduceEvent(s, { kind: 'text', text: 'part-A ', id: 2 });
|
||||
// reconnect: server resumes after id 2; we must drop id<=2 if re-sent and
|
||||
// keep appending to the open assistant bubble.
|
||||
reduceEvent(s, { kind: 'text', text: 'part-A ', id: 2 }); // dup, dropped
|
||||
reduceEvent(s, { kind: 'text', text: 'part-B', id: 3 }); // new, appended
|
||||
reduceEvent(s, { kind: 'turn_end', id: 4 });
|
||||
eq('resume appended to same bubble', s.messages[1].parts[0].text, 'part-A part-B');
|
||||
eq('still two messages', s.messages.length, 2);
|
||||
}
|
||||
|
||||
// --- error / cancelled annotate the open bubble ---------------------------
|
||||
{
|
||||
const s = foldAll([
|
||||
{ kind: 'user', text: 'x', id: 1 },
|
||||
{ kind: 'text', text: 'working', id: 2 },
|
||||
{ kind: 'error', error: 'ssh timeout', id: 3 },
|
||||
{ kind: 'turn_end', id: 4 },
|
||||
]);
|
||||
eq('error note on assistant bubble', s.messages[1].error, 'ssh timeout');
|
||||
}
|
||||
{
|
||||
const s = foldAll([
|
||||
{ kind: 'user', text: 'x', id: 1 },
|
||||
{ kind: 'cancelled', id: 2 },
|
||||
{ kind: 'turn_end', id: 3 },
|
||||
]);
|
||||
ok('cancelled flag on assistant bubble', s.messages[1].cancelled === true);
|
||||
}
|
||||
|
||||
// --- active state: a user event with no turn_end means a turn is running ---
|
||||
{
|
||||
const s = createTranscript();
|
||||
reduceEvent(s, { kind: 'user', text: 'go', id: 1 });
|
||||
reduceEvent(s, { kind: 'text', text: '...', id: 2 });
|
||||
ok('active while no turn_end', s.activeUserSeen === true);
|
||||
reduceEvent(s, { kind: 'turn_end', id: 3 });
|
||||
ok('inactive after turn_end', s.activeUserSeen === false);
|
||||
}
|
||||
|
||||
// --- assistant-only stream (session banner on a fresh attach) still renders -
|
||||
{
|
||||
const s = foldAll([
|
||||
{ kind: 'session', session_id: 'S1', id: 1 },
|
||||
{ kind: 'text', text: 'standing by', id: 2 },
|
||||
{ kind: 'turn_end', id: 3 },
|
||||
]);
|
||||
eq('lone assistant message created', s.messages.length, 1);
|
||||
eq('assistant prose present', s.messages[0].parts[0].text, 'standing by');
|
||||
}
|
||||
|
||||
// --- two sequential turns produce two assistant bubbles -------------------
|
||||
{
|
||||
const s = foldAll([
|
||||
{ kind: 'user', text: 'q1', id: 1 },
|
||||
{ kind: 'text', text: 'a1', id: 2 },
|
||||
{ kind: 'turn_end', id: 3 },
|
||||
{ kind: 'user', text: 'q2', id: 4 },
|
||||
{ kind: 'text', text: 'a2', id: 5 },
|
||||
{ kind: 'turn_end', id: 6 },
|
||||
]);
|
||||
eq('four messages (u,a,u,a)', s.messages.map((m) => m.role), ['user', 'assistant', 'user', 'assistant']);
|
||||
eq('second answer in its own bubble', s.messages[3].parts[0].text, 'a2');
|
||||
ok('message keys are unique', new Set(s.messages.map((m) => m.key)).size === 4);
|
||||
}
|
||||
|
||||
if (failures) {
|
||||
console.error(`\n${failures} assertion(s) FAILED`);
|
||||
process.exit(1);
|
||||
}
|
||||
console.log('\nall transcript assertions passed');
|
||||
|
|
@ -43,3 +43,186 @@ def drain():
|
|||
break
|
||||
await asyncio.sleep(0.01)
|
||||
return _drain
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# AFK loop fixtures.
|
||||
#
|
||||
# Shared factories + in-memory fakes for the app.afk modules. EVERYTHING the AFK
|
||||
# tests touch is faked here — no test ever reaches a real T3 server, GitHub /
|
||||
# Forgejo, or the cluster. The fakes implement the module interfaces from the
|
||||
# contract and record their calls so tests can assert on them.
|
||||
# --------------------------------------------------------------------------- #
|
||||
from app.afk.types import ( # noqa: E402 (after the env setup above, like app_main)
|
||||
CIStatus,
|
||||
Config,
|
||||
Issue,
|
||||
RunState,
|
||||
ThreadStatus,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def make_issue():
|
||||
"""Factory for ``Issue``. Defaults to a clean, dispatchable issue (trusted
|
||||
label, nothing blocking); override any field per test."""
|
||||
def _make(
|
||||
number: int = 1,
|
||||
repo: str = "infra",
|
||||
labels: list[str] | None = None,
|
||||
blocked_by: list[int] | None = None,
|
||||
labeled_by_trusted: bool = True,
|
||||
priority: int = 0,
|
||||
) -> Issue:
|
||||
return Issue(
|
||||
number=number,
|
||||
repo=repo,
|
||||
labels=["ready-for-agent"] if labels is None else labels,
|
||||
blocked_by=[] if blocked_by is None else blocked_by,
|
||||
labeled_by_trusted=labeled_by_trusted,
|
||||
priority=priority,
|
||||
)
|
||||
return _make
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def make_config():
|
||||
"""Factory for ``Config``. Defaults to an ENABLED config (kill switch off,
|
||||
a one-repo allowlist) so policy/state-machine tests exercise real behaviour;
|
||||
the disabled production default is covered separately in the config tests."""
|
||||
def _make(
|
||||
allowlist: list[str] | None = None,
|
||||
kill_switch: bool = False,
|
||||
**overrides,
|
||||
) -> Config:
|
||||
return Config(
|
||||
allowlist=["infra"] if allowlist is None else allowlist,
|
||||
kill_switch=kill_switch,
|
||||
**overrides,
|
||||
)
|
||||
return _make
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def make_run_state():
|
||||
"""Factory for ``RunState``. Defaults to a freshly-dispatched run (thread
|
||||
running, nothing pushed, no CI, no fix-forward attempts yet)."""
|
||||
def _make(
|
||||
thread_status: ThreadStatus | None = ThreadStatus.RUNNING,
|
||||
ci_status: CIStatus | None = None,
|
||||
pushed: bool = False,
|
||||
fix_forward_attempts: int = 0,
|
||||
elapsed_seconds: float = 0.0,
|
||||
) -> RunState:
|
||||
return RunState(
|
||||
thread_status=thread_status,
|
||||
ci_status=ci_status,
|
||||
pushed=pushed,
|
||||
fix_forward_attempts=fix_forward_attempts,
|
||||
elapsed_seconds=elapsed_seconds,
|
||||
)
|
||||
return _make
|
||||
|
||||
|
||||
class FakeT3Client:
|
||||
"""In-memory stand-in for ``t3_client.T3Client``. Records each dispatch and
|
||||
hands back a deterministic thread id; ``snapshot`` returns whatever was
|
||||
staged via ``set_snapshot``."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.dispatched: list[dict] = []
|
||||
self._snapshot: dict = {"threads": []}
|
||||
self._next_id = 0
|
||||
|
||||
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
|
||||
thread_id = f"thread-{self._next_id}"
|
||||
self._next_id += 1
|
||||
self.dispatched.append(
|
||||
{"repo": repo, "issue": issue, "prompt": prompt, "thread_id": thread_id}
|
||||
)
|
||||
return thread_id
|
||||
|
||||
def snapshot(self) -> dict:
|
||||
return self._snapshot
|
||||
|
||||
def set_snapshot(self, snapshot: dict) -> None:
|
||||
self._snapshot = snapshot
|
||||
|
||||
|
||||
class FakeTracker:
|
||||
"""In-memory stand-in for ``tracker.Tracker``. ``list_ready`` returns issues
|
||||
staged via ``seed``; label/comment/close just record their calls."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._ready: dict[str, list[Issue]] = {}
|
||||
self.label_ops: list[tuple[str, str, int, str]] = [] # (op, repo, issue, label)
|
||||
self.comments: list[tuple[str, int, str]] = []
|
||||
self.closed: list[tuple[str, int]] = []
|
||||
|
||||
def seed(self, repo: str, issues: list[Issue]) -> None:
|
||||
self._ready[repo] = issues
|
||||
|
||||
def list_ready(self, repos: list[str]) -> list[Issue]:
|
||||
out: list[Issue] = []
|
||||
for repo in repos:
|
||||
out.extend(self._ready.get(repo, []))
|
||||
return out
|
||||
|
||||
def add_label(self, repo: str, issue: int, label: str) -> None:
|
||||
self.label_ops.append(("add", repo, issue, label))
|
||||
|
||||
def remove_label(self, repo: str, issue: int, label: str) -> None:
|
||||
self.label_ops.append(("remove", repo, issue, label))
|
||||
|
||||
def comment(self, repo: str, issue: int, body: str) -> None:
|
||||
self.comments.append((repo, issue, body))
|
||||
|
||||
def close(self, repo: str, issue: int) -> None:
|
||||
self.closed.append((repo, issue))
|
||||
|
||||
|
||||
class FakeCIWatcher:
|
||||
"""In-memory stand-in for ``ci_watcher.CIWatcher``. Returns the status staged
|
||||
per ``(repo, commit)`` via ``set_status``; unknown commits read PENDING."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._statuses: dict[tuple[str, str], CIStatus] = {}
|
||||
|
||||
def set_status(self, repo: str, commit: str, status: CIStatus) -> None:
|
||||
self._statuses[(repo, commit)] = status
|
||||
|
||||
def status(self, repo: str, commit: str) -> CIStatus:
|
||||
return self._statuses.get((repo, commit), CIStatus.PENDING)
|
||||
|
||||
|
||||
class FakeNotifier:
|
||||
"""In-memory stand-in for ``notifier.Notifier``. Records every notification
|
||||
so tests can assert escalations fired with the right kind/detail."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.sent: list[dict] = []
|
||||
|
||||
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
|
||||
self.sent.append(
|
||||
{"kind": kind, "issue": issue, "thread_id": thread_id, "detail": detail}
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def fake_t3() -> FakeT3Client:
|
||||
return FakeT3Client()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def fake_tracker() -> FakeTracker:
|
||||
return FakeTracker()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def fake_ci() -> FakeCIWatcher:
|
||||
return FakeCIWatcher()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def fake_notifier() -> FakeNotifier:
|
||||
return FakeNotifier()
|
||||
|
|
|
|||
285
tests/test_afk_ci_watcher.py
Normal file
|
|
@ -0,0 +1,285 @@
|
|||
"""Tests for ``app.afk.ci_watcher`` — the commit → ``CIStatus`` adapter.
|
||||
|
||||
The watcher folds two independent signals into one verdict the state machine
|
||||
reads: the **GHA run** for a pushed commit (build/test/lint) and the
|
||||
**deploy/rollout** that reaches the cluster (Woodpecker pipeline → Keel/k8s
|
||||
rollout). The CI/CD chain is GHA → ghcr → Woodpecker → Keel
|
||||
(``docs/2026-06-14-afk-implementation-pipeline-design.md``), so a commit is only
|
||||
truly GREEN once *both* the build passed AND its image actually rolled out.
|
||||
|
||||
Every test injects FAKE clients — no test ever shells out to ``gh``,
|
||||
``woodpecker``, or ``kubectl``, or reaches the network. The fakes implement the
|
||||
``ci_watcher`` client Protocols and return staged ``StageResult`` values per
|
||||
``(repo, commit)``; the watcher's only job is to query them and fold the result,
|
||||
so the folding table is what these tests pin.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk.ci_watcher import (
|
||||
CIWatcher,
|
||||
StageResult,
|
||||
)
|
||||
from app.afk.types import CIStatus
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Fakes for the three injected clients.
|
||||
#
|
||||
# Each maps (repo, commit) → StageResult and records every query, so tests can
|
||||
# assert both the folded verdict AND that short-circuiting skips later stages
|
||||
# (a RED build must not even ask the rollout client).
|
||||
# --------------------------------------------------------------------------- #
|
||||
class _FakeStageClient:
|
||||
"""A recording stand-in for any of the three stage clients. ``default`` is
|
||||
returned for an unstaged ``(repo, commit)`` — defaults to ``PENDING`` so an
|
||||
un-seeded stage reads "not done yet", never a false GREEN."""
|
||||
|
||||
def __init__(self, default: StageResult = StageResult.PENDING) -> None:
|
||||
self._results: dict[tuple[str, str], StageResult] = {}
|
||||
self._default = default
|
||||
self.queries: list[tuple[str, str]] = []
|
||||
|
||||
def set(self, repo: str, commit: str, result: StageResult) -> None:
|
||||
self._results[(repo, commit)] = result
|
||||
|
||||
def _lookup(self, repo: str, commit: str) -> StageResult:
|
||||
self.queries.append((repo, commit))
|
||||
return self._results.get((repo, commit), self._default)
|
||||
|
||||
|
||||
class FakeGitHubChecks(_FakeStageClient):
|
||||
def run_conclusion(self, repo: str, commit: str) -> StageResult:
|
||||
return self._lookup(repo, commit)
|
||||
|
||||
|
||||
class FakeWoodpecker(_FakeStageClient):
|
||||
def deploy_conclusion(self, repo: str, commit: str) -> StageResult:
|
||||
return self._lookup(repo, commit)
|
||||
|
||||
|
||||
class FakeRollout(_FakeStageClient):
|
||||
def rollout_status(self, repo: str, commit: str) -> StageResult:
|
||||
return self._lookup(repo, commit)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Fixtures.
|
||||
# --------------------------------------------------------------------------- #
|
||||
REPO = "infra"
|
||||
COMMIT = "deadbeefcafe"
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def gha() -> FakeGitHubChecks:
|
||||
return FakeGitHubChecks()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def woodpecker() -> FakeWoodpecker:
|
||||
return FakeWoodpecker()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def rollout() -> FakeRollout:
|
||||
return FakeRollout()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def watcher(gha, woodpecker, rollout) -> CIWatcher:
|
||||
return CIWatcher(github=gha, woodpecker=woodpecker, rollout=rollout)
|
||||
|
||||
|
||||
def _stage_all(gha, woodpecker, rollout, *, build, deploy, roll) -> None:
|
||||
"""Stage all three clients for the canonical ``(REPO, COMMIT)`` at once."""
|
||||
gha.set(REPO, COMMIT, build)
|
||||
woodpecker.set(REPO, COMMIT, deploy)
|
||||
rollout.set(REPO, COMMIT, roll)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# StageResult vocabulary.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_stageresult_has_the_four_outcomes():
|
||||
assert {s.name for s in StageResult} == {"NONE", "PENDING", "SUCCESS", "FAILURE"}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# The happy path: every stage green ⇒ GREEN.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_all_stages_success_is_green(watcher, gha, woodpecker, rollout):
|
||||
_stage_all(gha, woodpecker, rollout,
|
||||
build=StageResult.SUCCESS,
|
||||
deploy=StageResult.SUCCESS,
|
||||
roll=StageResult.SUCCESS)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.GREEN
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# GHA build stage gates everything below it.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_build_failure_is_red(watcher, gha):
|
||||
gha.set(REPO, COMMIT, StageResult.FAILURE)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.RED
|
||||
|
||||
|
||||
@pytest.mark.parametrize("build", [StageResult.NONE, StageResult.PENDING])
|
||||
def test_build_not_yet_concluded_is_pending(watcher, gha, build):
|
||||
# No run yet (NONE) and in-progress (PENDING) both read PENDING — the state
|
||||
# machine waits on either.
|
||||
gha.set(REPO, COMMIT, build)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
|
||||
|
||||
|
||||
def test_build_failure_short_circuits_before_deploy_and_rollout(
|
||||
watcher, gha, woodpecker, rollout
|
||||
):
|
||||
gha.set(REPO, COMMIT, StageResult.FAILURE)
|
||||
# Even if later stages would (nonsensically) be green, a red build wins...
|
||||
woodpecker.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
rollout.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.RED
|
||||
# ...and the later clients are never even queried.
|
||||
assert woodpecker.queries == []
|
||||
assert rollout.queries == []
|
||||
|
||||
|
||||
def test_build_pending_short_circuits_before_deploy_and_rollout(
|
||||
watcher, gha, woodpecker, rollout
|
||||
):
|
||||
gha.set(REPO, COMMIT, StageResult.PENDING)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
|
||||
assert woodpecker.queries == []
|
||||
assert rollout.queries == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Deploy (Woodpecker) stage — only consulted once the build is green.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_deploy_failure_is_red_even_with_green_build(watcher, gha, woodpecker):
|
||||
gha.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.RED
|
||||
|
||||
|
||||
@pytest.mark.parametrize("deploy", [StageResult.NONE, StageResult.PENDING])
|
||||
def test_deploy_not_yet_concluded_is_pending(watcher, gha, woodpecker, deploy):
|
||||
gha.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
woodpecker.set(REPO, COMMIT, deploy)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
|
||||
|
||||
|
||||
def test_deploy_failure_short_circuits_before_rollout(
|
||||
watcher, gha, woodpecker, rollout
|
||||
):
|
||||
gha.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
|
||||
rollout.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.RED
|
||||
assert rollout.queries == []
|
||||
# The build WAS consulted (it had to pass to reach deploy).
|
||||
assert gha.queries == [(REPO, COMMIT)]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Rollout stage — the final gate. Green build + green deploy is still only
|
||||
# PENDING until the image actually reaches the cluster.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_rollout_failure_is_red(watcher, gha, woodpecker, rollout):
|
||||
_stage_all(gha, woodpecker, rollout,
|
||||
build=StageResult.SUCCESS,
|
||||
deploy=StageResult.SUCCESS,
|
||||
roll=StageResult.FAILURE)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.RED
|
||||
|
||||
|
||||
@pytest.mark.parametrize("roll", [StageResult.NONE, StageResult.PENDING])
|
||||
def test_green_build_and_deploy_but_unfinished_rollout_is_pending(
|
||||
watcher, gha, woodpecker, rollout, roll
|
||||
):
|
||||
_stage_all(gha, woodpecker, rollout,
|
||||
build=StageResult.SUCCESS,
|
||||
deploy=StageResult.SUCCESS,
|
||||
roll=roll)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
|
||||
|
||||
|
||||
def test_green_requires_all_three_stages_consulted(
|
||||
watcher, gha, woodpecker, rollout
|
||||
):
|
||||
_stage_all(gha, woodpecker, rollout,
|
||||
build=StageResult.SUCCESS,
|
||||
deploy=StageResult.SUCCESS,
|
||||
roll=StageResult.SUCCESS)
|
||||
assert watcher.status(REPO, COMMIT) is CIStatus.GREEN
|
||||
assert gha.queries == [(REPO, COMMIT)]
|
||||
assert woodpecker.queries == [(REPO, COMMIT)]
|
||||
assert rollout.queries == [(REPO, COMMIT)]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Plumbing: the commit and repo are passed through verbatim to every client,
|
||||
# and an entirely un-seeded commit reads PENDING (not GREEN, not RED).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_repo_and_commit_passed_through_to_clients(watcher, gha):
|
||||
gha.set("realestate-crawler", "abc123", StageResult.FAILURE)
|
||||
assert watcher.status("realestate-crawler", "abc123") is CIStatus.RED
|
||||
assert gha.queries == [("realestate-crawler", "abc123")]
|
||||
|
||||
|
||||
def test_unknown_commit_defaults_to_pending(watcher):
|
||||
# Nothing staged anywhere ⇒ the build stage reads PENDING by default ⇒ the
|
||||
# whole verdict is PENDING. A never-pushed/just-pushed commit is never a
|
||||
# false GREEN.
|
||||
assert watcher.status(REPO, "never-seen") is CIStatus.PENDING
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# The default rollout client is OPTIONAL — per the pilot facts, state.sqlite /
|
||||
# kubectl reads are optional, so a CIWatcher built without a rollout client must
|
||||
# still work, treating "build green + deploy green" as the terminal GREEN.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_rollout_client_is_optional_deploy_green_is_green(gha, woodpecker):
|
||||
w = CIWatcher(github=gha, woodpecker=woodpecker) # no rollout client
|
||||
gha.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
woodpecker.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
assert w.status(REPO, COMMIT) is CIStatus.GREEN
|
||||
|
||||
|
||||
def test_rollout_client_optional_still_honours_build_and_deploy_failures(
|
||||
gha, woodpecker
|
||||
):
|
||||
w = CIWatcher(github=gha, woodpecker=woodpecker)
|
||||
gha.set(REPO, COMMIT, StageResult.SUCCESS)
|
||||
woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
|
||||
assert w.status(REPO, COMMIT) is CIStatus.RED
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Full folding table — exhaustive over (build, deploy, rollout) so the
|
||||
# precedence rules (FAILURE short-circuits red; otherwise any PENDING/NONE keeps
|
||||
# it pending; all-success ⇒ green) can never silently drift.
|
||||
# --------------------------------------------------------------------------- #
|
||||
_N, _P, _S, _F = (
|
||||
StageResult.NONE,
|
||||
StageResult.PENDING,
|
||||
StageResult.SUCCESS,
|
||||
StageResult.FAILURE,
|
||||
)
|
||||
|
||||
|
||||
def _expected(build: StageResult, deploy: StageResult, roll: StageResult) -> CIStatus:
|
||||
# Reference fold, independent of the implementation, evaluated stage by stage.
|
||||
for stage in (build, deploy, roll):
|
||||
if stage is _F:
|
||||
return CIStatus.RED
|
||||
if stage in (_N, _P):
|
||||
return CIStatus.PENDING
|
||||
return CIStatus.GREEN
|
||||
|
||||
|
||||
@pytest.mark.parametrize("build", [_N, _P, _S, _F])
|
||||
@pytest.mark.parametrize("deploy", [_N, _P, _S, _F])
|
||||
@pytest.mark.parametrize("roll", [_N, _P, _S, _F])
|
||||
def test_full_folding_table(watcher, gha, woodpecker, rollout, build, deploy, roll):
|
||||
_stage_all(gha, woodpecker, rollout, build=build, deploy=deploy, roll=roll)
|
||||
assert watcher.status(REPO, COMMIT) is _expected(build, deploy, roll)
|
||||
374
tests/test_afk_dispatch_policy.py
Normal file
|
|
@ -0,0 +1,374 @@
|
|||
"""Tests for ``app.afk.dispatch_policy.select_dispatchable`` — the pure gate that
|
||||
turns a pile of ready issues into the ordered set the loop may dispatch *now*.
|
||||
|
||||
The function is PURE (no IO), so every test here is a plain in-memory call over
|
||||
the fakes/factories in ``conftest`` (``make_issue`` / ``make_config``); nothing
|
||||
touches a real T3 server, tracker, or cluster. The suite walks the full
|
||||
dispatchability matrix — trust gate, allowlist, per-repo lock, blocked_by,
|
||||
kill switch — plus the priority ordering and the one-agent-per-repo invariant.
|
||||
|
||||
Ordering contract under test: **lower ``priority`` value first** (P0 before P1
|
||||
before P2 — most urgent wins), matching tracker conventions and
|
||||
``Issue.priority``'s own docstring, with a deterministic tiebreaker (ascending
|
||||
issue number) so the output is stable regardless of input order.
|
||||
"""
|
||||
import itertools
|
||||
|
||||
import pytest
|
||||
|
||||
from app.afk import dispatch_policy
|
||||
from app.afk.types import DispatchDecision, Issue
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Helpers — keep assertions terse and intent-revealing.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _selected_numbers(decisions: list[DispatchDecision]) -> list[int]:
|
||||
"""The issue numbers, in the order the policy returned them."""
|
||||
return [d.issue.number for d in decisions]
|
||||
|
||||
|
||||
def _selected_set(decisions: list[DispatchDecision]) -> set[int]:
|
||||
return {d.issue.number for d in decisions}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Return shape & purity.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_returns_list_of_dispatch_decisions(make_issue, make_config):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
decisions = dispatch_policy.select_dispatchable([issue], make_config(), set())
|
||||
assert isinstance(decisions, list)
|
||||
assert len(decisions) == 1
|
||||
assert isinstance(decisions[0], DispatchDecision)
|
||||
assert decisions[0].issue is issue
|
||||
assert isinstance(decisions[0].reason, str) and decisions[0].reason # non-empty
|
||||
|
||||
|
||||
def test_empty_input_yields_empty_output(make_config):
|
||||
assert dispatch_policy.select_dispatchable([], make_config(), set()) == []
|
||||
|
||||
|
||||
def test_does_not_mutate_inputs(make_issue, make_config):
|
||||
issues = [make_issue(number=1, priority=0), make_issue(number=2, priority=9)]
|
||||
issues_snapshot = list(issues)
|
||||
config = make_config(allowlist=["infra"])
|
||||
in_flight: set[str] = set()
|
||||
|
||||
dispatch_policy.select_dispatchable(issues, config, in_flight)
|
||||
|
||||
# Caller's list (and its order) and the lock set are left untouched.
|
||||
assert issues == issues_snapshot
|
||||
assert [i.number for i in issues] == [1, 2]
|
||||
assert in_flight == set()
|
||||
assert config.allowlist == ["infra"]
|
||||
|
||||
|
||||
def test_decision_wraps_the_same_issue_object(make_issue, make_config):
|
||||
issue = make_issue(number=42)
|
||||
[decision] = dispatch_policy.select_dispatchable([issue], make_config(), set())
|
||||
assert decision.issue is issue # identity, not a copy
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Kill switch — highest-precedence short-circuit.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_kill_switch_returns_empty_even_with_perfect_issues(make_issue, make_config):
|
||||
issues = [make_issue(number=n, repo="infra") for n in range(1, 6)]
|
||||
config = make_config(allowlist=["infra"], kill_switch=True)
|
||||
assert dispatch_policy.select_dispatchable(issues, config, set()) == []
|
||||
|
||||
|
||||
def test_kill_switch_off_dispatches(make_issue, make_config):
|
||||
issue = make_issue(repo="infra")
|
||||
config = make_config(allowlist=["infra"], kill_switch=False)
|
||||
assert len(dispatch_policy.select_dispatchable([issue], config, set())) == 1
|
||||
|
||||
|
||||
def test_production_default_config_dispatches_nothing(make_issue):
|
||||
"""The shipped default (kill switch ON, empty allowlist) is inert: even a
|
||||
pristine, trusted issue is never selected."""
|
||||
from app.afk import config as afk_config
|
||||
|
||||
issue = make_issue(repo="infra")
|
||||
assert dispatch_policy.select_dispatchable([issue], afk_config.default(), set()) == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Trust gate.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_untrusted_issue_is_skipped(make_issue, make_config):
|
||||
issue = make_issue(repo="infra", labeled_by_trusted=False)
|
||||
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
||||
|
||||
|
||||
def test_trusted_issue_is_eligible(make_issue, make_config):
|
||||
issue = make_issue(repo="infra", labeled_by_trusted=True)
|
||||
assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
|
||||
|
||||
|
||||
def test_trust_gate_filters_only_untrusted(make_issue, make_config):
|
||||
trusted = make_issue(number=1, repo="infra", labeled_by_trusted=True)
|
||||
untrusted = make_issue(number=2, repo="infra", labeled_by_trusted=False)
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[trusted, untrusted], make_config(allowlist=["infra"]), set()
|
||||
)
|
||||
assert _selected_set(decisions) == {1}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Allowlist membership.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_repo_not_in_allowlist_is_skipped(make_issue, make_config):
|
||||
issue = make_issue(repo="some-other-repo")
|
||||
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
||||
|
||||
|
||||
def test_empty_allowlist_dispatches_nothing(make_issue, make_config):
|
||||
issue = make_issue(repo="infra")
|
||||
# kill switch off but allowlist empty -> still inert (the two-gate posture).
|
||||
config = make_config(allowlist=[], kill_switch=False)
|
||||
assert dispatch_policy.select_dispatchable([issue], config, set()) == []
|
||||
|
||||
|
||||
def test_allowlist_selects_only_listed_repos(make_issue, make_config):
|
||||
a = make_issue(number=1, repo="infra")
|
||||
b = make_issue(number=2, repo="realestate-crawler")
|
||||
c = make_issue(number=3, repo="not-allowed")
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[a, b, c], make_config(allowlist=["infra", "realestate-crawler"]), set()
|
||||
)
|
||||
assert _selected_set(decisions) == {1, 2}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Per-repo lock (in_flight_repos).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_repo_already_in_flight_is_skipped(make_issue, make_config):
|
||||
issue = make_issue(repo="infra")
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[issue], make_config(allowlist=["infra"]), in_flight_repos={"infra"}
|
||||
)
|
||||
assert decisions == []
|
||||
|
||||
|
||||
def test_in_flight_lock_is_per_repo(make_issue, make_config):
|
||||
locked = make_issue(number=1, repo="infra")
|
||||
free = make_issue(number=2, repo="realestate-crawler")
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[locked, free],
|
||||
make_config(allowlist=["infra", "realestate-crawler"]),
|
||||
in_flight_repos={"infra"},
|
||||
)
|
||||
assert _selected_set(decisions) == {2} # only the unlocked repo's issue runs
|
||||
|
||||
|
||||
def test_all_repos_in_flight_dispatches_nothing(make_issue, make_config):
|
||||
a = make_issue(number=1, repo="infra")
|
||||
b = make_issue(number=2, repo="realestate-crawler")
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[a, b],
|
||||
make_config(allowlist=["infra", "realestate-crawler"]),
|
||||
in_flight_repos={"infra", "realestate-crawler"},
|
||||
)
|
||||
assert decisions == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# One-agent-per-repo invariant — at most ONE decision per repo per call.
|
||||
#
|
||||
# The whole design serialises agents within a repo (two would collide on the
|
||||
# working tree). A single call must therefore never hand back two issues for the
|
||||
# same repo, even when both are eligible and the repo is not yet in-flight.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_at_most_one_decision_per_repo(make_issue, make_config):
|
||||
urgent = make_issue(number=1, repo="infra", priority=1)
|
||||
minor = make_issue(number=2, repo="infra", priority=9)
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[urgent, minor], make_config(allowlist=["infra"]), set()
|
||||
)
|
||||
assert len(decisions) == 1
|
||||
assert decisions[0].issue.number == 1 # most urgent (lowest value) wins the slot
|
||||
|
||||
|
||||
def test_one_decision_per_repo_across_many_repos(make_issue, make_config):
|
||||
issues = [
|
||||
make_issue(number=10, repo="infra", priority=1),
|
||||
make_issue(number=11, repo="infra", priority=5),
|
||||
make_issue(number=20, repo="realestate-crawler", priority=3),
|
||||
make_issue(number=21, repo="realestate-crawler", priority=2),
|
||||
]
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
|
||||
)
|
||||
# One per repo, each the repo's most urgent (lowest-value) eligible issue:
|
||||
# infra -> #10 (p1 < p5); realestate-crawler -> #21 (p2 < p3).
|
||||
assert _selected_set(decisions) == {10, 21}
|
||||
repos = [d.issue.repo for d in decisions]
|
||||
assert len(repos) == len(set(repos)) # no repo appears twice
|
||||
|
||||
|
||||
def test_ineligible_higher_priority_does_not_consume_repo_slot(make_issue, make_config):
|
||||
"""A more-urgent issue that is itself ineligible (e.g. blocked) must not
|
||||
suppress a less-urgent *eligible* issue in the same repo — the slot goes to
|
||||
the best ELIGIBLE candidate, not merely the most urgent one."""
|
||||
blocked_urgent = make_issue(number=1, repo="infra", priority=1, blocked_by=[99])
|
||||
ready_minor = make_issue(number=2, repo="infra", priority=9)
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[blocked_urgent, ready_minor], make_config(allowlist=["infra"]), set()
|
||||
)
|
||||
assert _selected_numbers(decisions) == [2]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# blocked_by gating — blocked_by holds OPEN blocker numbers.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_blocked_issue_is_skipped(make_issue, make_config):
|
||||
issue = make_issue(repo="infra", blocked_by=[101])
|
||||
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
||||
|
||||
|
||||
def test_unblocked_issue_with_empty_blocked_by_is_eligible(make_issue, make_config):
|
||||
issue = make_issue(repo="infra", blocked_by=[])
|
||||
assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
|
||||
|
||||
|
||||
@pytest.mark.parametrize("blockers", [[1], [1, 2], [5, 6, 7]])
|
||||
def test_any_open_blocker_blocks(make_issue, make_config, blockers):
|
||||
issue = make_issue(repo="infra", blocked_by=blockers)
|
||||
assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
|
||||
|
||||
|
||||
def test_blocked_filters_only_blocked(make_issue, make_config):
|
||||
ready = make_issue(number=1, repo="infra", blocked_by=[])
|
||||
blocked = make_issue(number=2, repo="realestate-crawler", blocked_by=[7])
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[ready, blocked], make_config(allowlist=["infra", "realestate-crawler"]), set()
|
||||
)
|
||||
assert _selected_set(decisions) == {1}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Priority ordering — lower priority value first, deterministic tiebreaker.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_lower_priority_value_first(make_issue, make_config):
|
||||
p1 = make_issue(number=1, repo="infra", priority=1)
|
||||
p5 = make_issue(number=2, repo="realestate-crawler", priority=5)
|
||||
p9 = make_issue(number=3, repo="SparkyFitness", priority=9)
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[p1, p9, p5],
|
||||
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
|
||||
set(),
|
||||
)
|
||||
assert _selected_numbers(decisions) == [1, 2, 3] # priorities 1, 5, 9
|
||||
|
||||
|
||||
def test_ordering_independent_of_input_order(make_issue, make_config):
|
||||
"""Whatever order the caller supplies issues in, the dispatch order is the
|
||||
same — sorted purely by the policy, not by arrival."""
|
||||
base = [
|
||||
("infra", 10, 2),
|
||||
("realestate-crawler", 20, 8),
|
||||
("SparkyFitness", 30, 5),
|
||||
("health", 40, 1),
|
||||
]
|
||||
allow = ["infra", "realestate-crawler", "SparkyFitness", "health"]
|
||||
config = make_config(allowlist=allow)
|
||||
expected = [40, 10, 30, 20] # priorities 1,2,5,8 (most urgent first)
|
||||
|
||||
for perm in itertools.permutations(base):
|
||||
issues = [make_issue(number=n, repo=r, priority=p) for (r, n, p) in perm]
|
||||
decisions = dispatch_policy.select_dispatchable(issues, config, set())
|
||||
assert _selected_numbers(decisions) == expected
|
||||
|
||||
|
||||
def test_priority_ties_break_deterministically_by_issue_number(make_issue, make_config):
|
||||
"""Equal priority across different repos -> a stable, total order. We tie-break
|
||||
on ascending issue number so the result never depends on dict/set iteration
|
||||
or input order."""
|
||||
a = make_issue(number=30, repo="infra", priority=5)
|
||||
b = make_issue(number=10, repo="realestate-crawler", priority=5)
|
||||
c = make_issue(number=20, repo="SparkyFitness", priority=5)
|
||||
config = make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"])
|
||||
|
||||
for perm in itertools.permutations([a, b, c]):
|
||||
decisions = dispatch_policy.select_dispatchable(list(perm), config, set())
|
||||
assert _selected_numbers(decisions) == [10, 20, 30]
|
||||
|
||||
|
||||
def test_negative_and_zero_priorities_order_correctly(make_issue, make_config):
|
||||
neg = make_issue(number=1, repo="infra", priority=-5)
|
||||
zero = make_issue(number=2, repo="realestate-crawler", priority=0)
|
||||
pos = make_issue(number=3, repo="SparkyFitness", priority=3)
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
[neg, zero, pos],
|
||||
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
|
||||
set(),
|
||||
)
|
||||
assert _selected_numbers(decisions) == [1, 2, 3] # -5 < 0 < 3 (most urgent first)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Reasons — human-readable, never parsed, but must be present and sensible.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_every_decision_has_a_nonempty_reason(make_issue, make_config):
|
||||
issues = [
|
||||
make_issue(number=1, repo="infra", priority=3),
|
||||
make_issue(number=2, repo="realestate-crawler", priority=1),
|
||||
]
|
||||
decisions = dispatch_policy.select_dispatchable(
|
||||
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
|
||||
)
|
||||
assert decisions # sanity
|
||||
assert all(d.reason.strip() for d in decisions)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Combined matrix — every gate together. A single eligible needle in a haystack
|
||||
# of issues that each trip exactly one gate.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_only_the_fully_eligible_issue_survives_all_gates(make_issue, make_config):
|
||||
config = make_config(allowlist=["infra", "realestate-crawler"], kill_switch=False)
|
||||
in_flight = {"realestate-crawler"} # this repo is locked
|
||||
|
||||
issues = [
|
||||
make_issue(number=1, repo="infra", priority=5), # ELIGIBLE
|
||||
make_issue(number=2, repo="not-allowed", priority=9), # allowlist
|
||||
make_issue(number=3, repo="infra", priority=9, labeled_by_trusted=False), # trust
|
||||
make_issue(number=4, repo="infra", priority=9, blocked_by=[1]), # blocked
|
||||
make_issue(number=5, repo="realestate-crawler", priority=9), # repo locked
|
||||
]
|
||||
decisions = dispatch_policy.select_dispatchable(issues, config, in_flight)
|
||||
assert _selected_numbers(decisions) == [1]
|
||||
assert decisions[0].issue.repo == "infra"
|
||||
|
||||
|
||||
@pytest.mark.parametrize("trusted", [True, False])
|
||||
@pytest.mark.parametrize("allowed", [True, False])
|
||||
@pytest.mark.parametrize("blocked", [True, False])
|
||||
@pytest.mark.parametrize("locked", [True, False])
|
||||
@pytest.mark.parametrize("killed", [True, False])
|
||||
def test_full_eligibility_matrix(
|
||||
make_issue, make_config, trusted, allowed, blocked, locked, killed
|
||||
):
|
||||
"""Exhaustive truth table: an issue is dispatched iff ALL gates pass and the
|
||||
kill switch is off. 2**5 = 32 cases, single issue so ordering is moot."""
|
||||
issue = make_issue(
|
||||
number=1,
|
||||
repo="infra",
|
||||
priority=0,
|
||||
labeled_by_trusted=trusted,
|
||||
blocked_by=[99] if blocked else [],
|
||||
)
|
||||
config = make_config(
|
||||
allowlist=["infra"] if allowed else ["other-repo"],
|
||||
kill_switch=killed,
|
||||
)
|
||||
in_flight = {"infra"} if locked else set()
|
||||
|
||||
decisions = dispatch_policy.select_dispatchable([issue], config, in_flight)
|
||||
|
||||
should_dispatch = trusted and allowed and not blocked and not locked and not killed
|
||||
assert (len(decisions) == 1) is should_dispatch
|
||||
if should_dispatch:
|
||||
assert decisions[0].issue is issue
|
||||
198
tests/test_afk_notifier.py
Normal file
|
|
@ -0,0 +1,198 @@
|
|||
"""Tests for ``app.afk.notifier`` — the terminal-state doorbell.
|
||||
|
||||
The notifier's whole job is to format a human-facing alert (Slack / ntfy) with a
|
||||
deep-link back to the T3 thread when a run reaches a terminal state — done,
|
||||
needs-human, or frozen — and hand it to an injected sender. Every test here
|
||||
injects a recording fake sender, so nothing is ever POSTed: we assert the
|
||||
*formatted payload* per kind, plus the deep-link, the kind vocabulary, and the
|
||||
guardrails (no thread → no link, unknown kind rejected, sender called exactly
|
||||
once with the return value being None).
|
||||
|
||||
No real Slack/ntfy/T3 is touched — consistent with the rest of the AFK suite.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk import notifier as notifier_mod
|
||||
from app.afk.notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN, Notification, Notifier
|
||||
from app.afk.types import Issue
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# A recording sender — captures the Notification instead of posting it.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class RecordingSender:
|
||||
"""Injectable stand-in for the real Slack/ntfy POST. Records each payload so
|
||||
a test can assert the formatting without any network."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self.sent: list[Notification] = []
|
||||
|
||||
def __call__(self, notification: Notification) -> None:
|
||||
self.sent.append(notification)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sender() -> RecordingSender:
|
||||
return RecordingSender()
|
||||
|
||||
|
||||
def _issue(number: int = 42, repo: str = "infra") -> Issue:
|
||||
return Issue(
|
||||
number=number,
|
||||
repo=repo,
|
||||
labels=["ready-for-agent"],
|
||||
blocked_by=[],
|
||||
labeled_by_trusted=True,
|
||||
priority=0,
|
||||
)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Kind vocabulary — the three terminal states, and nothing else.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_terminal_kinds_are_exactly_the_three_terminal_states():
|
||||
assert KIND_DONE == "done"
|
||||
assert KIND_NEEDS_HUMAN == "needs-human"
|
||||
assert KIND_FROZEN == "frozen"
|
||||
assert notifier_mod.TERMINAL_KINDS == {KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Dispatch mechanics — sender injected, called exactly once, returns None.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_notify_calls_sender_exactly_once_and_returns_none(sender):
|
||||
n = Notifier(sender)
|
||||
result = n.notify(KIND_DONE, _issue(), "thread-7", "all green")
|
||||
assert result is None
|
||||
assert len(sender.sent) == 1
|
||||
|
||||
|
||||
def test_notify_does_not_post_anything_itself(sender):
|
||||
"""The Notifier must never reach the network on its own — all egress goes
|
||||
through the injected sender. A test-only sentinel proves that."""
|
||||
n = Notifier(sender)
|
||||
n.notify(KIND_FROZEN, _issue(), "thread-1", "budget exhausted")
|
||||
# Nothing other than the injected sender ran: exactly one recorded payload,
|
||||
# and it is the Notification dataclass (not a raw dict / HTTP response).
|
||||
assert isinstance(sender.sent[0], Notification)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Deep-link — every payload links back to the T3 thread (when there is one).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_payload_deep_links_to_the_t3_thread(sender):
|
||||
n = Notifier(sender, base_url="https://t3.viktorbarzin.me")
|
||||
n.notify(KIND_DONE, _issue(), "thread-abc", "done")
|
||||
payload = sender.sent[0]
|
||||
assert payload.link == "https://t3.viktorbarzin.me/?thread=thread-abc"
|
||||
# The link is also surfaced in the human-readable body so it survives
|
||||
# senders that drop structured fields (e.g. a plain ntfy message).
|
||||
assert "https://t3.viktorbarzin.me/?thread=thread-abc" in payload.body
|
||||
|
||||
|
||||
def test_base_url_trailing_slash_is_normalised(sender):
|
||||
n = Notifier(sender, base_url="https://t3.viktorbarzin.me/")
|
||||
n.notify(KIND_DONE, _issue(), "thread-x", "done")
|
||||
assert sender.sent[0].link == "https://t3.viktorbarzin.me/?thread=thread-x"
|
||||
|
||||
|
||||
def test_no_thread_id_means_no_link(sender):
|
||||
"""A run can reach 'needs-human' before any thread exists (e.g. dispatch
|
||||
itself failed). Without a thread there is nothing to deep-link to, so the
|
||||
link is None — but the doorbell still fires."""
|
||||
n = Notifier(sender)
|
||||
n.notify(KIND_NEEDS_HUMAN, _issue(), None, "dispatch failed")
|
||||
payload = sender.sent[0]
|
||||
assert payload.link is None
|
||||
assert len(sender.sent) == 1
|
||||
# No dangling "/?thread=" fragment leaks into the body either.
|
||||
assert "?thread=" not in payload.body
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Per-kind formatting — title / body / priority / tags differ per terminal kind.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_done_payload_is_informational(sender):
|
||||
n = Notifier(sender)
|
||||
n.notify(KIND_DONE, _issue(number=7, repo="infra"), "thread-7", "merged + CI green")
|
||||
p = sender.sent[0]
|
||||
assert p.kind == KIND_DONE
|
||||
assert p.issue_ref == "infra#7"
|
||||
assert "infra#7" in p.title
|
||||
assert "merged + CI green" in p.body
|
||||
# A successful close is informational, not an escalation.
|
||||
assert p.priority == "low"
|
||||
assert "escalation" not in p.tags
|
||||
|
||||
|
||||
def test_needs_human_payload_is_an_escalation(sender):
|
||||
n = Notifier(sender)
|
||||
n.notify(KIND_NEEDS_HUMAN, _issue(number=9, repo="claude-agent-service"), "thread-9", "errored before push")
|
||||
p = sender.sent[0]
|
||||
assert p.kind == KIND_NEEDS_HUMAN
|
||||
assert p.issue_ref == "claude-agent-service#9"
|
||||
assert "claude-agent-service#9" in p.title
|
||||
assert "errored before push" in p.body
|
||||
assert p.priority == "high"
|
||||
assert "escalation" in p.tags
|
||||
|
||||
|
||||
def test_frozen_payload_is_an_escalation(sender):
|
||||
n = Notifier(sender)
|
||||
n.notify(KIND_FROZEN, _issue(number=3, repo="infra"), "thread-3", "fix-forward budget exhausted")
|
||||
p = sender.sent[0]
|
||||
assert p.kind == KIND_FROZEN
|
||||
assert "infra#3" in p.title
|
||||
assert "fix-forward budget exhausted" in p.body
|
||||
assert p.priority == "high"
|
||||
assert "escalation" in p.tags
|
||||
|
||||
|
||||
def test_titles_distinguish_the_three_kinds(sender):
|
||||
"""An operator skimming a Slack channel must tell the three apart from the
|
||||
title alone, without reading the body."""
|
||||
n = Notifier(sender)
|
||||
n.notify(KIND_DONE, _issue(), "t", "x")
|
||||
n.notify(KIND_NEEDS_HUMAN, _issue(), "t", "x")
|
||||
n.notify(KIND_FROZEN, _issue(), "t", "x")
|
||||
titles = [p.title for p in sender.sent]
|
||||
assert len({t.split(" ")[0] for t in titles}) == 3 # distinct leading marker per kind
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Guardrail — only terminal kinds are sendable. An unknown kind is a bug.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_unknown_kind_raises_and_sends_nothing(sender):
|
||||
n = Notifier(sender)
|
||||
with pytest.raises(ValueError):
|
||||
n.notify("running", _issue(), "thread-1", "still working")
|
||||
assert sender.sent == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Pure formatter — render_notification builds the payload independently of any
|
||||
# sender, so the formatting is unit-testable on its own.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_render_notification_is_pure_and_matches_notify(sender):
|
||||
issue = _issue(number=11, repo="infra")
|
||||
built = notifier_mod.render_notification(
|
||||
KIND_FROZEN, issue, "thread-11", "stuck", base_url="https://t3.viktorbarzin.me"
|
||||
)
|
||||
assert isinstance(built, Notification)
|
||||
assert built.link == "https://t3.viktorbarzin.me/?thread=thread-11"
|
||||
# notify() must produce the identical payload it hands the sender.
|
||||
Notifier(sender, base_url="https://t3.viktorbarzin.me").notify(
|
||||
KIND_FROZEN, issue, "thread-11", "stuck"
|
||||
)
|
||||
assert sender.sent[0] == built
|
||||
|
||||
|
||||
def test_sender_exception_propagates(sender):
|
||||
"""If the sender fails (Slack down), the notifier does not swallow it — the
|
||||
loop decides what to do with a failed doorbell, not this adapter."""
|
||||
def boom(_notification: Notification) -> None:
|
||||
raise RuntimeError("slack 503")
|
||||
|
||||
n = Notifier(boom)
|
||||
with pytest.raises(RuntimeError, match="slack 503"):
|
||||
n.notify(KIND_DONE, _issue(), "thread-1", "done")
|
||||
247
tests/test_afk_phase_checklist.py
Normal file
|
|
@ -0,0 +1,247 @@
|
|||
"""Tests for ``app.afk.phase_checklist`` — the live progress checklist.
|
||||
|
||||
``render(current, meta)`` is PURE: same inputs → byte-identical markdown, no I/O.
|
||||
It draws the seven-phase lifecycle (worktree → tests-red → green → pushed → CI →
|
||||
deployed → done) as a markdown task list, with phases *before* ``current`` checked
|
||||
off, ``current`` marked in-progress, and later phases left empty.
|
||||
|
||||
Style matches the existing suite: plain ``assert`` functions, parametrized cases,
|
||||
and a couple of full-output snapshots so the rendered shape is pinned, not just
|
||||
its line count.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk.phase_checklist import render
|
||||
from app.afk.types import Phase
|
||||
|
||||
|
||||
# Lifecycle order, mirrored from the contract so a reordering of the enum that
|
||||
# the renderer didn't track shows up as a test failure rather than silent drift.
|
||||
PHASES_IN_ORDER = [
|
||||
Phase.WORKTREE,
|
||||
Phase.TESTS_RED,
|
||||
Phase.GREEN,
|
||||
Phase.PUSHED,
|
||||
Phase.CI,
|
||||
Phase.DEPLOYED,
|
||||
Phase.DONE,
|
||||
]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Structure: one line per phase, in order, always all seven.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _checklist_lines(out: str) -> list[str]:
|
||||
"""The markdown task-list lines (``- [ ]`` / ``- [x]`` ...), in order."""
|
||||
return [ln for ln in out.splitlines() if ln.lstrip().startswith("- [")]
|
||||
|
||||
|
||||
def test_renders_a_string():
|
||||
assert isinstance(render(Phase.WORKTREE, {}), str)
|
||||
|
||||
|
||||
@pytest.mark.parametrize("current", PHASES_IN_ORDER)
|
||||
def test_every_phase_has_exactly_one_checklist_line(current):
|
||||
lines = _checklist_lines(render(current, {}))
|
||||
assert len(lines) == len(PHASES_IN_ORDER)
|
||||
|
||||
|
||||
@pytest.mark.parametrize("current", PHASES_IN_ORDER)
|
||||
def test_checklist_lines_are_in_lifecycle_order(current):
|
||||
lines = _checklist_lines(render(current, {}))
|
||||
# Each phase's human label appears, and in the lifecycle order.
|
||||
positions = [
|
||||
next(i for i, ln in enumerate(lines) if _has_label(ln, phase))
|
||||
for phase in PHASES_IN_ORDER
|
||||
]
|
||||
assert positions == sorted(positions)
|
||||
|
||||
|
||||
def _has_label(line: str, phase: Phase) -> bool:
|
||||
"""Whether a checklist line carries ``phase``'s headline word (case-insensitive
|
||||
substring — the test asserts the label is *present*, not its exact decoration)."""
|
||||
return _phase_label(phase).lower() in line.lower()
|
||||
|
||||
|
||||
def _phase_label(phase: Phase) -> str:
|
||||
"""The headline word(s) the renderer must use for a phase. Loose on purpose:
|
||||
the test asserts the label is *present*, not the exact decoration."""
|
||||
return {
|
||||
Phase.WORKTREE: "worktree",
|
||||
Phase.TESTS_RED: "test",
|
||||
Phase.GREEN: "green",
|
||||
Phase.PUSHED: "push",
|
||||
Phase.CI: "CI",
|
||||
Phase.DEPLOYED: "deploy",
|
||||
Phase.DONE: "done",
|
||||
}[phase]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Check/in-progress/empty partitioning around ``current``.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _classify(line: str) -> str:
|
||||
"""Bucket a checklist line by its marker: 'done' ``[x]``, 'todo' ``[ ]``, or
|
||||
'active' (anything else, e.g. an in-progress glyph)."""
|
||||
body = line.lstrip()
|
||||
if body.startswith("- [x]"):
|
||||
return "done"
|
||||
if body.startswith("- [ ]"):
|
||||
return "todo"
|
||||
return "active"
|
||||
|
||||
|
||||
@pytest.mark.parametrize("idx,current", list(enumerate(PHASES_IN_ORDER)))
|
||||
def test_earlier_checked_current_active_later_empty(idx, current):
|
||||
lines = _checklist_lines(render(current, {}))
|
||||
buckets = [_classify(ln) for ln in lines]
|
||||
|
||||
# Everything strictly before the current phase is checked off.
|
||||
assert all(b == "done" for b in buckets[:idx]), buckets
|
||||
|
||||
if current is Phase.DONE:
|
||||
# Terminal phase: the whole list is checked, nothing left active/empty.
|
||||
assert all(b == "done" for b in buckets), buckets
|
||||
else:
|
||||
# The current phase is the single in-progress marker...
|
||||
assert buckets[idx] == "active", buckets
|
||||
assert buckets.count("active") == 1, buckets
|
||||
# ...and every phase after it is still an empty checkbox.
|
||||
assert all(b == "todo" for b in buckets[idx + 1 :]), buckets
|
||||
|
||||
|
||||
def test_first_phase_has_nothing_checked_before_it():
|
||||
lines = _checklist_lines(render(Phase.WORKTREE, {}))
|
||||
assert _classify(lines[0]) == "active"
|
||||
assert "done" not in [_classify(ln) for ln in lines]
|
||||
|
||||
|
||||
def test_done_checks_every_phase_including_done():
|
||||
lines = _checklist_lines(render(Phase.DONE, {}))
|
||||
assert all(_classify(ln) == "done" for ln in lines)
|
||||
# The DONE line itself is checked, not merely the ones before it.
|
||||
done_line = next(ln for ln in lines if _has_label(ln, Phase.DONE))
|
||||
assert _classify(done_line) == "done"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Active-phase emphasis: the current phase is visually distinguishable.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize("current", [p for p in PHASES_IN_ORDER if p is not Phase.DONE])
|
||||
def test_active_phase_line_differs_from_todo_and_done_markers(current):
|
||||
lines = _checklist_lines(render(current, {}))
|
||||
active = [ln for ln in lines if _classify(ln) == "active"]
|
||||
assert len(active) == 1
|
||||
# Not a plain checkbox in either state.
|
||||
assert not active[0].lstrip().startswith("- [x]")
|
||||
assert not active[0].lstrip().startswith("- [ ]")
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# meta rendering: optional context is surfaced, omission never explodes.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_meta_empty_does_not_raise_and_still_lists_phases():
|
||||
out = render(Phase.GREEN, {})
|
||||
assert _checklist_lines(out) # non-empty
|
||||
|
||||
|
||||
def test_meta_issue_and_repo_appear_in_output():
|
||||
out = render(Phase.GREEN, {"repo": "infra", "issue": 42})
|
||||
assert "infra" in out
|
||||
assert "42" in out
|
||||
|
||||
|
||||
def test_meta_thread_id_appears_when_present():
|
||||
out = render(Phase.PUSHED, {"thread_id": "thread-7"})
|
||||
assert "thread-7" in out
|
||||
|
||||
|
||||
def test_meta_thread_id_absent_is_silent():
|
||||
out = render(Phase.PUSHED, {})
|
||||
assert "thread-" not in out
|
||||
|
||||
|
||||
def test_meta_fix_forward_attempt_surfaced():
|
||||
out = render(Phase.CI, {"fix_forward_attempts": 3})
|
||||
assert "3" in out
|
||||
|
||||
|
||||
def test_meta_unknown_keys_are_ignored():
|
||||
# An unexpected key must not crash or leak its raw value as a stray line.
|
||||
out = render(Phase.WORKTREE, {"totally_unknown_field": "should-not-appear"})
|
||||
assert "should-not-appear" not in out
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Determinism + idempotence (it's pure).
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize("current", PHASES_IN_ORDER)
|
||||
def test_render_is_deterministic(current):
|
||||
meta = {"repo": "infra", "issue": 9, "thread_id": "thread-1"}
|
||||
assert render(current, meta) == render(current, meta)
|
||||
|
||||
|
||||
def test_render_does_not_mutate_meta():
|
||||
meta = {"repo": "infra", "issue": 1}
|
||||
before = dict(meta)
|
||||
render(Phase.GREEN, meta)
|
||||
assert meta == before
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Snapshots: pin the exact rendered shape for two representative phases. If the
|
||||
# format changes intentionally, update these strings; an accidental change to
|
||||
# wording/markers/order fails here loudly.
|
||||
# --------------------------------------------------------------------------- #
|
||||
WORKTREE_SNAPSHOT = """\
|
||||
### infra#7 — AFK run progress
|
||||
|
||||
- [~] Worktree created
|
||||
- [ ] Failing test written (TDD red)
|
||||
- [ ] Implementation passing (TDD green)
|
||||
- [ ] Pushed to master
|
||||
- [ ] CI green on pushed commit
|
||||
- [ ] Deployed / rolled out
|
||||
- [ ] Done — issue closed
|
||||
"""
|
||||
|
||||
|
||||
def test_snapshot_worktree_phase():
|
||||
out = render(Phase.WORKTREE, {"repo": "infra", "issue": 7})
|
||||
assert out == WORKTREE_SNAPSHOT
|
||||
|
||||
|
||||
CI_SNAPSHOT = """\
|
||||
### infra#7 — AFK run progress (thread thread-3)
|
||||
|
||||
- [x] Worktree created
|
||||
- [x] Failing test written (TDD red)
|
||||
- [x] Implementation passing (TDD green)
|
||||
- [x] Pushed to master
|
||||
- [~] CI green on pushed commit
|
||||
- [ ] Deployed / rolled out
|
||||
- [ ] Done — issue closed
|
||||
"""
|
||||
|
||||
|
||||
def test_snapshot_ci_phase_with_thread():
|
||||
out = render(Phase.CI, {"repo": "infra", "issue": 7, "thread_id": "thread-3"})
|
||||
assert out == CI_SNAPSHOT
|
||||
|
||||
|
||||
DONE_SNAPSHOT = """\
|
||||
### infra#7 — AFK run progress
|
||||
|
||||
- [x] Worktree created
|
||||
- [x] Failing test written (TDD red)
|
||||
- [x] Implementation passing (TDD green)
|
||||
- [x] Pushed to master
|
||||
- [x] CI green on pushed commit
|
||||
- [x] Deployed / rolled out
|
||||
- [x] Done — issue closed
|
||||
"""
|
||||
|
||||
|
||||
def test_snapshot_done_phase():
|
||||
out = render(Phase.DONE, {"repo": "infra", "issue": 7})
|
||||
assert out == DONE_SNAPSHOT
|
||||
270
tests/test_afk_poller.py
Normal file
|
|
@ -0,0 +1,270 @@
|
|||
"""Integration tests for ``app.afk.poller`` — the CronJob dispatch tick.
|
||||
|
||||
Unlike the unit suites, these wire the REAL pure cores (the actual
|
||||
``dispatch_policy.select_dispatchable``) to the in-memory adapter FAKES from
|
||||
``conftest`` (``FakeTracker`` / ``FakeT3Client``). No test touches a real T3
|
||||
server, GitHub/Forgejo, or the cluster — the poller is exercised end to end with
|
||||
fakes standing in only for the I/O edges.
|
||||
|
||||
What the tick must do (the poller contract):
|
||||
|
||||
* **kill switch** — a disabled config dispatches nothing AND never calls the
|
||||
tracker or T3 (the CronJob does no I/O when the loop is off);
|
||||
* read the ready set via ``tracker.list_ready(config.allowlist)``;
|
||||
* derive the **per-repo lock** from the ready set itself — a repo with an issue
|
||||
already carrying the ``in_progress_label`` is in flight and is skipped (the
|
||||
CronJob is stateless between ticks, so the tracker is the source of truth);
|
||||
* run the real ``select_dispatchable`` over (ready issues, config, in-flight
|
||||
repos) and, for each decision, ``t3_client.dispatch(...)`` then
|
||||
``tracker.add_label(repo, issue, in_progress_label)`` — label AFTER a
|
||||
successful dispatch so a dispatch failure never leaves a phantom lock.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk import poller
|
||||
from app.afk.types import Config
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Helpers.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _poller(fake_tracker, fake_t3) -> poller.Poller:
|
||||
"""A Poller wired to the conftest fakes and the real dispatch policy."""
|
||||
return poller.Poller(tracker=fake_tracker, t3_client=fake_t3)
|
||||
|
||||
|
||||
def _dispatched_pairs(fake_t3) -> set[tuple[str, int]]:
|
||||
return {(d["repo"], d["issue"]) for d in fake_t3.dispatched}
|
||||
|
||||
|
||||
def _added_in_progress(fake_tracker, label: str = "agent-in-progress") -> set[tuple[str, int]]:
|
||||
return {
|
||||
(repo, issue)
|
||||
for (op, repo, issue, lbl) in fake_tracker.label_ops
|
||||
if op == "add" and lbl == label
|
||||
}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Kill switch — no dispatch, no I/O at all.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_kill_switch_dispatches_nothing(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed("infra", [make_issue(number=1, repo="infra")])
|
||||
config = Config(allowlist=["infra"], kill_switch=True)
|
||||
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
assert result.dispatched == []
|
||||
assert fake_t3.dispatched == []
|
||||
|
||||
|
||||
def test_kill_switch_does_not_even_read_the_tracker(fake_t3):
|
||||
"""When the loop is off the CronJob must do zero I/O — not a single tracker
|
||||
or T3 call. A tracker that explodes if touched proves it."""
|
||||
class ExplodingTracker:
|
||||
def list_ready(self, repos):
|
||||
raise AssertionError("tracker must not be read when kill switch is on")
|
||||
|
||||
config = Config(allowlist=["infra"], kill_switch=True)
|
||||
result = poller.Poller(tracker=ExplodingTracker(), t3_client=fake_t3).run_once(config)
|
||||
assert result.dispatched == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Empty allowlist — armed kill switch but nothing to run.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_empty_allowlist_dispatches_nothing(fake_tracker, fake_t3, make_issue):
|
||||
# list_ready([]) returns nothing, and even if it didn't the policy gates on
|
||||
# the (empty) allowlist. The shipped default posture.
|
||||
config = Config(allowlist=[], kill_switch=False)
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
assert result.dispatched == []
|
||||
assert fake_t3.dispatched == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Happy path — one ready issue gets dispatched and labelled.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_dispatches_a_ready_issue(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
assert _dispatched_pairs(fake_t3) == {("infra", 7)}
|
||||
assert len(result.dispatched) == 1
|
||||
assert result.dispatched[0].thread_id == "thread-0"
|
||||
assert result.dispatched[0].issue.number == 7
|
||||
|
||||
|
||||
def test_labels_in_progress_after_dispatch(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
|
||||
_poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
assert _added_in_progress(fake_tracker) == {("infra", 7)}
|
||||
|
||||
|
||||
def test_in_progress_label_honours_config_override(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
|
||||
config = Config(allowlist=["infra"], kill_switch=False, in_progress_label="busy")
|
||||
|
||||
_poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
assert _added_in_progress(fake_tracker, "busy") == {("infra", 7)}
|
||||
|
||||
|
||||
def test_dispatch_prompt_references_the_issue(fake_tracker, fake_t3, make_issue):
|
||||
"""The agent runs full-access and fetches the body itself, so the prompt the
|
||||
poller sends must at minimum point at the concrete repo#issue."""
|
||||
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
|
||||
_poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
prompt = fake_t3.dispatched[0]["prompt"]
|
||||
assert "7" in prompt and "infra" in prompt
|
||||
assert prompt.strip() # non-empty
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Per-repo lock — an issue already carrying the in-progress label means an agent
|
||||
# is in flight on that repo, so the repo is skipped this tick.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_repo_with_in_progress_issue_is_locked(fake_tracker, fake_t3, make_issue):
|
||||
in_flight = make_issue(
|
||||
number=1, repo="infra", labels=["ready-for-agent", "agent-in-progress"]
|
||||
)
|
||||
waiting = make_issue(number=2, repo="infra", labels=["ready-for-agent"])
|
||||
fake_tracker.seed("infra", [in_flight, waiting])
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
# Repo already busy → nothing new dispatched, no new in-progress label.
|
||||
assert result.dispatched == []
|
||||
assert fake_t3.dispatched == []
|
||||
assert _added_in_progress(fake_tracker) == set()
|
||||
|
||||
|
||||
def test_lock_is_per_repo_not_global(fake_tracker, fake_t3, make_issue):
|
||||
# infra is busy; a different repo is free and should still dispatch.
|
||||
fake_tracker.seed(
|
||||
"infra",
|
||||
[make_issue(number=1, repo="infra", labels=["ready-for-agent", "agent-in-progress"])],
|
||||
)
|
||||
fake_tracker.seed("dotfiles", [make_issue(number=2, repo="dotfiles")])
|
||||
config = Config(allowlist=["infra", "dotfiles"], kill_switch=False)
|
||||
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
assert _dispatched_pairs(fake_t3) == {("dotfiles", 2)}
|
||||
assert {d.issue.repo for d in result.dispatched} == {"dotfiles"}
|
||||
|
||||
|
||||
def test_custom_in_progress_label_drives_the_lock(fake_tracker, fake_t3, make_issue):
|
||||
# The lock keys off config.in_progress_label, not the hardcoded default.
|
||||
fake_tracker.seed(
|
||||
"infra",
|
||||
[make_issue(number=1, repo="infra", labels=["ready-for-agent", "busy"])],
|
||||
)
|
||||
config = Config(allowlist=["infra"], kill_switch=False, in_progress_label="busy")
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
assert result.dispatched == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# One dispatch per repo per tick (the policy's one-agent-per-repo invariant,
|
||||
# observed through the poller): the most urgent (lowest-value) eligible issue
|
||||
# wins the slot.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_one_dispatch_per_repo_per_tick(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed(
|
||||
"infra",
|
||||
[
|
||||
make_issue(number=1, repo="infra", priority=1), # most urgent (lowest value)
|
||||
make_issue(number=2, repo="infra", priority=9),
|
||||
make_issue(number=3, repo="infra", priority=5),
|
||||
],
|
||||
)
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
|
||||
_poller(fake_tracker, fake_t3).run_once(config)
|
||||
|
||||
assert _dispatched_pairs(fake_t3) == {("infra", 1)}
|
||||
assert _added_in_progress(fake_tracker) == {("infra", 1)}
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Gating still applies through the poller (the pure policy enforces it; the
|
||||
# poller must not bypass it).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_untrusted_issue_is_not_dispatched(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed(
|
||||
"infra", [make_issue(number=1, repo="infra", labeled_by_trusted=False)]
|
||||
)
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
assert result.dispatched == []
|
||||
assert fake_t3.dispatched == []
|
||||
|
||||
|
||||
def test_blocked_issue_is_not_dispatched(fake_tracker, fake_t3, make_issue):
|
||||
fake_tracker.seed(
|
||||
"infra", [make_issue(number=2, repo="infra", blocked_by=[1])]
|
||||
)
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
assert result.dispatched == []
|
||||
|
||||
|
||||
def test_repo_outside_allowlist_is_not_dispatched(fake_tracker, fake_t3, make_issue):
|
||||
# list_ready only queries the allowlist, but even if a stray repo's issues
|
||||
# arrive the policy's allowlist gate drops them.
|
||||
fake_tracker.seed("secret", [make_issue(number=1, repo="secret")])
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
result = _poller(fake_tracker, fake_t3).run_once(config)
|
||||
assert result.dispatched == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Dispatch failure must not leave a phantom lock (label only AFTER success).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_dispatch_failure_does_not_label_in_progress(fake_tracker, make_issue):
|
||||
class FailingT3:
|
||||
def __init__(self):
|
||||
self.dispatched = []
|
||||
|
||||
def dispatch(self, repo, issue, prompt):
|
||||
raise RuntimeError("T3 down")
|
||||
|
||||
fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
|
||||
config = Config(allowlist=["infra"], kill_switch=False)
|
||||
|
||||
with pytest.raises(RuntimeError):
|
||||
poller.Poller(tracker=fake_tracker, t3_client=FailingT3()).run_once(config)
|
||||
|
||||
# No in-progress label was applied — the issue stays purely ready, so the
|
||||
# next tick retries it rather than treating it as locked.
|
||||
assert _added_in_progress(fake_tracker) == set()
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# list_ready is called with exactly the allowlist (not all repos).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_queries_only_the_allowlisted_repos(fake_t3, make_issue):
|
||||
seen_repos: list[list[str]] = []
|
||||
|
||||
class RecordingTracker:
|
||||
def list_ready(self, repos):
|
||||
seen_repos.append(list(repos))
|
||||
return []
|
||||
|
||||
def add_label(self, *a): # pragma: no cover - not reached here
|
||||
raise AssertionError("nothing to label")
|
||||
|
||||
config = Config(allowlist=["infra", "dotfiles"], kill_switch=False)
|
||||
poller.Poller(tracker=RecordingTracker(), t3_client=fake_t3).run_once(config)
|
||||
|
||||
assert seen_repos == [["infra", "dotfiles"]]
|
||||
190
tests/test_afk_run_state_machine.py
Normal file
|
|
@ -0,0 +1,190 @@
|
|||
"""Tests for ``app.afk.run_state_machine.next_action`` — the pure decision
|
||||
function that turns one assembled ``RunState`` into the next ``Action``.
|
||||
|
||||
The function encodes ADR-0002's run lifecycle:
|
||||
|
||||
* healthy (pushed AND CI green) -> CLOSE_SUCCESS
|
||||
* cannot reach green before push (errored /
|
||||
stalled with nothing pushed) -> ESCALATE_PREPUSH
|
||||
* pushed but CI red, budget remaining -> FIX_FORWARD
|
||||
* pushed but CI red, budget exhausted -> FREEZE_ESCALATE
|
||||
* anything still in flight -> WAIT
|
||||
|
||||
It is PURE: no I/O, no clock, no globals — it reads only its two arguments, so
|
||||
every case is a plain table assertion. ``make_config`` / ``make_run_state`` come
|
||||
from ``conftest.py`` (config defaults to ENABLED, run state to a fresh dispatch).
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk.run_state_machine import next_action
|
||||
from app.afk.types import Action, CIStatus, ThreadStatus
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Healthy terminal: pushed + CI green -> close, regardless of thread status.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize(
|
||||
"thread_status",
|
||||
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
|
||||
)
|
||||
def test_pushed_and_green_closes_success(make_config, make_run_state, thread_status):
|
||||
state = make_run_state(
|
||||
thread_status=thread_status, ci_status=CIStatus.GREEN, pushed=True
|
||||
)
|
||||
assert next_action(state, make_config()) is Action.CLOSE_SUCCESS
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Pre-push escalation: nothing pushed and the turn is no longer going to push
|
||||
# (errored, or finished/stalled clean) -> hand back to a human.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize("thread_status", [ThreadStatus.ERROR, ThreadStatus.IDLE])
|
||||
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
|
||||
def test_not_pushed_terminal_thread_escalates_prepush(
|
||||
make_config, make_run_state, thread_status, ci_status
|
||||
):
|
||||
state = make_run_state(
|
||||
thread_status=thread_status, ci_status=ci_status, pushed=False
|
||||
)
|
||||
assert next_action(state, make_config()) is Action.ESCALATE_PREPUSH
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Still working toward a first push -> WAIT (not yet an escalation).
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize("thread_status", [ThreadStatus.RUNNING, None])
|
||||
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
|
||||
def test_not_pushed_in_flight_waits(
|
||||
make_config, make_run_state, thread_status, ci_status
|
||||
):
|
||||
state = make_run_state(
|
||||
thread_status=thread_status, ci_status=ci_status, pushed=False
|
||||
)
|
||||
assert next_action(state, make_config()) is Action.WAIT
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Pushed, CI not yet decided -> WAIT for the verdict, whatever the thread does.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize(
|
||||
"thread_status",
|
||||
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
|
||||
)
|
||||
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
|
||||
def test_pushed_ci_pending_waits(
|
||||
make_config, make_run_state, thread_status, ci_status
|
||||
):
|
||||
state = make_run_state(
|
||||
thread_status=thread_status, ci_status=ci_status, pushed=True
|
||||
)
|
||||
assert next_action(state, make_config()) is Action.WAIT
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Pushed + CI red: fix-forward while BOTH budgets remain, else freeze.
|
||||
# Boundaries are strict-less-than on attempts AND elapsed; at/over either freezes.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize(
|
||||
("attempts", "elapsed", "expected"),
|
||||
[
|
||||
# fresh red, plenty of budget -> fix forward
|
||||
(0, 0.0, Action.FIX_FORWARD),
|
||||
(1, 10.0, Action.FIX_FORWARD),
|
||||
# one attempt below the cap, well inside the clock -> still fix forward
|
||||
(4, 3599.0, Action.FIX_FORWARD),
|
||||
# attempts hit the cap (5) -> freeze
|
||||
(5, 0.0, Action.FREEZE_ESCALATE),
|
||||
(6, 0.0, Action.FREEZE_ESCALATE),
|
||||
# clock hits the cap (3600s) -> freeze even with attempts to spare
|
||||
(0, 3600.0, Action.FREEZE_ESCALATE),
|
||||
(0, 7200.0, Action.FREEZE_ESCALATE),
|
||||
# both exhausted -> freeze
|
||||
(5, 3600.0, Action.FREEZE_ESCALATE),
|
||||
],
|
||||
)
|
||||
def test_pushed_red_fix_forward_until_budget_exhausted(
|
||||
make_config, make_run_state, attempts, elapsed, expected
|
||||
):
|
||||
state = make_run_state(
|
||||
thread_status=ThreadStatus.IDLE,
|
||||
ci_status=CIStatus.RED,
|
||||
pushed=True,
|
||||
fix_forward_attempts=attempts,
|
||||
elapsed_seconds=elapsed,
|
||||
)
|
||||
assert next_action(state, make_config()) is expected
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Fix-forward budget is honoured from config, not hardcoded.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_fix_forward_attempts_cap_comes_from_config(make_config, make_run_state):
|
||||
config = make_config(fix_forward_max_attempts=2)
|
||||
red = dict(thread_status=ThreadStatus.IDLE, ci_status=CIStatus.RED, pushed=True)
|
||||
assert next_action(make_run_state(fix_forward_attempts=1, **red), config) is Action.FIX_FORWARD
|
||||
assert next_action(make_run_state(fix_forward_attempts=2, **red), config) is Action.FREEZE_ESCALATE
|
||||
|
||||
|
||||
def test_fix_forward_seconds_cap_comes_from_config(make_config, make_run_state):
|
||||
config = make_config(fix_forward_max_seconds=120)
|
||||
red = dict(thread_status=ThreadStatus.IDLE, ci_status=CIStatus.RED, pushed=True)
|
||||
assert next_action(make_run_state(elapsed_seconds=119.0, **red), config) is Action.FIX_FORWARD
|
||||
assert next_action(make_run_state(elapsed_seconds=120.0, **red), config) is Action.FREEZE_ESCALATE
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# A red CI on a pushed commit while the thread is still RUNNING a fix is, per
|
||||
# spec, keyed only on (pushed AND red) + budget — thread status doesn't gate it.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize(
|
||||
"thread_status",
|
||||
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
|
||||
)
|
||||
def test_pushed_red_with_budget_fixes_forward_for_any_thread_status(
|
||||
make_config, make_run_state, thread_status
|
||||
):
|
||||
state = make_run_state(
|
||||
thread_status=thread_status,
|
||||
ci_status=CIStatus.RED,
|
||||
pushed=True,
|
||||
fix_forward_attempts=0,
|
||||
elapsed_seconds=0.0,
|
||||
)
|
||||
assert next_action(state, make_config()) is Action.FIX_FORWARD
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Full cross-product sanity sweep: next_action is TOTAL — it returns a real
|
||||
# Action for every reachable combination, and matches the reference table.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _expected(thread_status, ci_status, pushed):
|
||||
"""Reference implementation of the decision table, written independently of
|
||||
the module under test, to cross-check every combination."""
|
||||
if pushed and ci_status is CIStatus.GREEN:
|
||||
return Action.CLOSE_SUCCESS
|
||||
if pushed and ci_status is CIStatus.RED:
|
||||
return Action.FIX_FORWARD # budget always available in this sweep
|
||||
if not pushed and thread_status in (ThreadStatus.ERROR, ThreadStatus.IDLE):
|
||||
return Action.ESCALATE_PREPUSH
|
||||
return Action.WAIT
|
||||
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"thread_status",
|
||||
[ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
|
||||
)
|
||||
@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING, CIStatus.GREEN, CIStatus.RED])
|
||||
@pytest.mark.parametrize("pushed", [True, False])
|
||||
def test_decision_table_is_total(
|
||||
make_config, make_run_state, thread_status, ci_status, pushed
|
||||
):
|
||||
state = make_run_state(
|
||||
thread_status=thread_status,
|
||||
ci_status=ci_status,
|
||||
pushed=pushed,
|
||||
fix_forward_attempts=0,
|
||||
elapsed_seconds=0.0,
|
||||
)
|
||||
result = next_action(state, make_config())
|
||||
assert isinstance(result, Action)
|
||||
assert result is _expected(thread_status, ci_status, pushed)
|
||||
265
tests/test_afk_t3_client.py
Normal file
|
|
@ -0,0 +1,265 @@
|
|||
"""Tests for ``app.afk.t3_client`` — the in-cluster T3 dispatch/snapshot adapter.
|
||||
|
||||
Everything runs against an in-memory FAKE HTTP transport; no test touches a real
|
||||
T3 server. These assertions pin the **real** orchestration wire contract
|
||||
(reverse-engineered from T3 v0.0.27 and verified live against t3-afk on
|
||||
2026-06-15) — deliberately strict, because the previous version of this adapter
|
||||
passed a laxer fake while 400-ing the real server. The fake therefore *rejects*
|
||||
a command without a ``type`` discriminator, so a regression to the old
|
||||
``{"command": "..."}` shape fails loudly here.
|
||||
|
||||
Pinned facts:
|
||||
* the dispatch body is a BARE command keyed by ``type`` (not ``command``);
|
||||
* the CLIENT mints ``threadId``/``commandId``/``messageId`` + ``createdAt``;
|
||||
``dispatch`` returns the id it generated (the server replies ``{sequence}``);
|
||||
* a thread lives in a project, so ``dispatch`` ensures the repo's project
|
||||
(snapshot GET → ``project.create`` iff absent) before ``thread.create``;
|
||||
* ``ISSUE_IMPLEMENTER_PREAMBLE`` is prepended to the opening turn's text;
|
||||
* ``send_turn`` posts a follow-up turn (no preamble) on an existing thread;
|
||||
* every request carries ``Authorization: Bearer <token>``, re-read per call.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk import t3_client
|
||||
from app.afk.issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
|
||||
|
||||
_MODEL = "claude-sonnet-4-6"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Fake HTTP transport — httpx-shaped, but it ENFORCES the command envelope so a
|
||||
# malformed command (the old bug) raises instead of silently passing.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class FakeResponse:
|
||||
def __init__(self, payload: dict, status_code: int = 200) -> None:
|
||||
self._payload = payload
|
||||
self.status_code = status_code
|
||||
|
||||
def json(self) -> dict:
|
||||
return self._payload
|
||||
|
||||
def raise_for_status(self) -> None:
|
||||
if self.status_code >= 400:
|
||||
raise RuntimeError(f"HTTP {self.status_code}")
|
||||
|
||||
|
||||
class FakeHttp:
|
||||
"""Records each POST/GET; GETs replay staged snapshots (default: no projects,
|
||||
so ``dispatch`` creates one). POST bodies are validated as real commands."""
|
||||
|
||||
def __init__(self, get_responses: list[dict] | None = None) -> None:
|
||||
self.get_responses = list(get_responses or [])
|
||||
self.posts: list[dict] = []
|
||||
self.gets: list[dict] = []
|
||||
|
||||
def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
|
||||
assert isinstance(json.get("type"), str) and json["type"], (
|
||||
f"command must carry a non-empty `type` discriminator, got {json!r}"
|
||||
)
|
||||
self.posts.append({"url": url, "json": json, "headers": headers})
|
||||
return FakeResponse({"sequence": len(self.posts)}) # the real server reply
|
||||
|
||||
def get(self, url: str, headers: dict) -> FakeResponse:
|
||||
self.gets.append({"url": url, "headers": headers})
|
||||
body = self.get_responses.pop(0) if self.get_responses else {"projects": []}
|
||||
return FakeResponse(body)
|
||||
|
||||
# Convenience views over recorded POSTs, keyed by command type.
|
||||
def commands(self, type_: str) -> list[dict]:
|
||||
return [c["json"] for c in self.posts if c["json"]["type"] == type_]
|
||||
|
||||
|
||||
def _ids():
|
||||
"""Deterministic id factory: id-1, id-2, … so tests can reason about minting."""
|
||||
n = {"i": 0}
|
||||
|
||||
def f() -> str:
|
||||
n["i"] += 1
|
||||
return f"id-{n['i']}"
|
||||
|
||||
return f
|
||||
|
||||
|
||||
def _resolver(repo: str) -> t3_client.ProjectRef:
|
||||
"""Predictable repo -> project mapping for assertions."""
|
||||
return t3_client.ProjectRef(f"proj-{repo}", f"/data/{repo}", repo)
|
||||
|
||||
|
||||
def _client(http: FakeHttp, *, base_url="http://t3-afk:8080", token="tok-1", **kw):
|
||||
return t3_client.T3Client(
|
||||
base_url=base_url,
|
||||
http=http,
|
||||
bearer_provider=lambda: token,
|
||||
project_resolver=_resolver,
|
||||
id_factory=kw.pop("id_factory", _ids()),
|
||||
clock=kw.pop("clock", lambda: "2026-06-15T00:00:00+00:00"),
|
||||
model=_MODEL,
|
||||
)
|
||||
|
||||
|
||||
def _dispatch(http: FakeHttp, *, repo="infra", issue=42, prompt="Do the thing.", **kw):
|
||||
return _client(http, **kw).dispatch(repo=repo, issue=issue, prompt=prompt)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# dispatch — ensure-project, then create, then turn.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_dispatch_ensures_project_then_creates_thread_then_turn_when_project_absent():
|
||||
http = FakeHttp(get_responses=[{"projects": []}])
|
||||
_dispatch(http)
|
||||
# one snapshot GET (the existence check) + three POSTs in order.
|
||||
assert len(http.gets) == 1
|
||||
types = [c["json"]["type"] for c in http.posts]
|
||||
assert types == ["project.create", "thread.create", "thread.turn.start"]
|
||||
for call in http.posts:
|
||||
assert call["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
|
||||
|
||||
|
||||
def test_dispatch_skips_project_create_when_project_already_exists():
|
||||
http = FakeHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
|
||||
_dispatch(http, repo="infra")
|
||||
types = [c["json"]["type"] for c in http.posts]
|
||||
assert types == ["thread.create", "thread.turn.start"] # idempotent: no re-create
|
||||
|
||||
|
||||
def test_dispatch_uses_type_discriminator_not_command_string():
|
||||
# Regression guard for the original bug: discriminator is `type`, and there is
|
||||
# no legacy top-level `command` string key on any command.
|
||||
http = FakeHttp()
|
||||
_dispatch(http)
|
||||
for c in http.posts:
|
||||
assert "type" in c["json"]
|
||||
assert not isinstance(c["json"].get("command"), str)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# dispatch — thread.create real field set.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_thread_create_carries_real_required_fields():
|
||||
http = FakeHttp()
|
||||
_dispatch(http, repo="infra")
|
||||
create = http.commands("thread.create")[0]
|
||||
assert create["projectId"] == "proj-infra"
|
||||
assert create["modelSelection"] == {"instanceId": "claudeAgent", "model": _MODEL}
|
||||
assert create["runtimeMode"] == "full-access"
|
||||
assert create["interactionMode"] == "default"
|
||||
# NullOr fields are present (not omitted) — the schema requires the keys.
|
||||
assert create["branch"] is None
|
||||
assert create["worktreePath"] is None
|
||||
# client-minted identity + timestamp.
|
||||
assert isinstance(create["commandId"], str) and create["commandId"]
|
||||
assert isinstance(create["threadId"], str) and create["threadId"]
|
||||
assert create["createdAt"] == "2026-06-15T00:00:00+00:00"
|
||||
|
||||
|
||||
def test_dispatch_returns_client_minted_thread_id_not_a_server_value():
|
||||
http = FakeHttp()
|
||||
returned = _dispatch(http)
|
||||
create = http.commands("thread.create")[0]
|
||||
turn = http.commands("thread.turn.start")[0]
|
||||
# The returned id is the one WE put on thread.create (server only sends {sequence}).
|
||||
assert returned == create["threadId"] == turn["threadId"]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# dispatch — thread.turn.start real message shape + preamble.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_turn_message_has_real_shape_and_prepends_preamble():
|
||||
http = FakeHttp()
|
||||
_dispatch(http, prompt="Implement issue 42 body here.")
|
||||
turn = http.commands("thread.turn.start")[0]
|
||||
msg = turn["message"]
|
||||
assert msg["role"] == "user"
|
||||
assert isinstance(msg["messageId"], str) and msg["messageId"]
|
||||
assert msg["attachments"] == []
|
||||
assert msg["text"] == ISSUE_IMPLEMENTER_PREAMBLE + "Implement issue 42 body here."
|
||||
assert turn["runtimeMode"] == "full-access"
|
||||
assert turn["interactionMode"] == "default"
|
||||
|
||||
|
||||
def test_preamble_only_on_turn_not_on_create():
|
||||
http = FakeHttp()
|
||||
_dispatch(http)
|
||||
assert "message" not in http.commands("thread.create")[0]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# send_turn — follow-up turn on an existing thread (multi-turn), no preamble.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_send_turn_posts_single_turn_to_existing_thread_without_preamble():
|
||||
http = FakeHttp()
|
||||
_client(http).send_turn("thread-xyz", "Just this follow-up.")
|
||||
assert [c["json"]["type"] for c in http.posts] == ["thread.turn.start"]
|
||||
turn = http.commands("thread.turn.start")[0]
|
||||
assert turn["threadId"] == "thread-xyz"
|
||||
assert turn["message"]["text"] == "Just this follow-up." # verbatim, no preamble
|
||||
assert http.gets == [] # no project work for a follow-up
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Auth — bearer on every request, re-read per call.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_every_request_sends_bearer():
|
||||
http = FakeHttp()
|
||||
_dispatch(http, token="secret-token")
|
||||
for call in http.posts:
|
||||
assert call["headers"]["Authorization"] == "Bearer secret-token"
|
||||
for call in http.gets:
|
||||
assert call["headers"]["Authorization"] == "Bearer secret-token"
|
||||
|
||||
|
||||
def test_bearer_is_reread_per_request_so_rotation_is_honoured():
|
||||
tokens = iter(["tok-A", "tok-B", "tok-C", "tok-D", "tok-E"])
|
||||
http = FakeHttp()
|
||||
client = t3_client.T3Client(
|
||||
base_url="http://t3-afk:8080",
|
||||
http=http,
|
||||
bearer_provider=lambda: next(tokens),
|
||||
project_resolver=_resolver,
|
||||
id_factory=_ids(),
|
||||
clock=lambda: "t",
|
||||
)
|
||||
client.dispatch(repo="infra", issue=1, prompt="x")
|
||||
# GET(ensure) then POST(project.create) then POST(create) then POST(turn) —
|
||||
# each pulled a fresh token in call order.
|
||||
assert http.gets[0]["headers"]["Authorization"] == "Bearer tok-A"
|
||||
assert http.posts[0]["headers"]["Authorization"] == "Bearer tok-B"
|
||||
assert http.posts[1]["headers"]["Authorization"] == "Bearer tok-C"
|
||||
assert http.posts[2]["headers"]["Authorization"] == "Bearer tok-D"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# snapshot — GET + parse.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_snapshot_gets_endpoint_and_returns_parsed_body():
|
||||
fleet = {"threads": [{"id": "t1", "latestTurn": {"state": "running"}}], "projects": []}
|
||||
http = FakeHttp(get_responses=[fleet])
|
||||
result = _client(http).snapshot()
|
||||
assert result == fleet
|
||||
assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
|
||||
assert http.posts == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# base_url normalisation + error surfacing.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_trailing_slash_in_base_url_is_normalised():
|
||||
http = FakeHttp()
|
||||
client = _client(http, base_url="http://t3-afk:8080/")
|
||||
client.dispatch(repo="infra", issue=1, prompt="x")
|
||||
assert http.posts[0]["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
|
||||
assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
|
||||
|
||||
|
||||
def test_dispatch_raises_and_short_circuits_when_a_post_errors():
|
||||
class ErroringHttp(FakeHttp):
|
||||
def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
|
||||
super().post(url, json, headers) # validates + records
|
||||
return FakeResponse({}, status_code=500)
|
||||
|
||||
http = ErroringHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
|
||||
with pytest.raises(RuntimeError):
|
||||
_dispatch(http, repo="infra")
|
||||
# Project already existed, so the FIRST post is thread.create — and it failed,
|
||||
# so thread.turn.start never fired.
|
||||
assert [c["json"]["type"] for c in http.posts] == ["thread.create"]
|
||||
92
tests/test_afk_t3_live.py
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
"""LIVE smoke test for ``app.afk.t3_client`` against a real T3 instance.
|
||||
|
||||
Skipped by default. The unit tests (``test_afk_t3_client``) pin the wire shape
|
||||
against a contract-accurate fake; this file proves the *same code* actually talks
|
||||
to a live T3 — the guard that "green tests" mean "wired to T3", which the earlier
|
||||
fake-only suite did NOT provide (it was green while the real server 400'd).
|
||||
|
||||
It is opt-in because the orchestration API is in-cluster (ClusterIP + an
|
||||
Authentik-gated ingress), so it can't run in CI without cluster access. Run it
|
||||
from inside the cluster, or via a port-forward, with a bearer minted on the pod::
|
||||
|
||||
# bearer (on the t3-afk pod, as the node user):
|
||||
# t3 auth session issue --token-only --base-dir /data/t3 --ttl 30m
|
||||
kubectl -n t3-afk port-forward deploy/t3-afk 3773:3773 &
|
||||
T3_AFK_BASE_URL=http://127.0.0.1:3773 T3_AFK_TOKEN=<bearer> \
|
||||
python3 -m pytest tests/test_afk_t3_live.py -v
|
||||
|
||||
The read-only snapshot check is always safe. The full dispatch round-trip
|
||||
(create thread + turn + verify it appears, then delete it) only runs with
|
||||
``T3_AFK_SMOKE_DISPATCH=1`` since it spends a (tiny) agent turn.
|
||||
"""
|
||||
import os
|
||||
import time
|
||||
|
||||
import pytest
|
||||
|
||||
from app.afk import t3_client
|
||||
|
||||
_BASE_URL = os.environ.get("T3_AFK_BASE_URL")
|
||||
_TOKEN = os.environ.get("T3_AFK_TOKEN")
|
||||
|
||||
pytestmark = pytest.mark.skipif(
|
||||
not (_BASE_URL and _TOKEN),
|
||||
reason="set T3_AFK_BASE_URL + T3_AFK_TOKEN to run the live T3 smoke test",
|
||||
)
|
||||
|
||||
|
||||
def _real_client():
|
||||
import httpx # local import so the module imports fine without httpx installed
|
||||
|
||||
return t3_client.T3Client(
|
||||
base_url=_BASE_URL,
|
||||
http=httpx.Client(timeout=30.0),
|
||||
bearer_provider=lambda: _TOKEN,
|
||||
)
|
||||
|
||||
|
||||
def test_live_snapshot_has_the_real_shape():
|
||||
"""A real snapshot parses and carries the keys the watcher/adapter depend on:
|
||||
``threads`` + ``projects``, and any thread exposes ``latestTurn`` (the
|
||||
liveness source) — not a top-level ``status``."""
|
||||
snap = _real_client().snapshot()
|
||||
assert isinstance(snap, dict)
|
||||
assert "threads" in snap and "projects" in snap
|
||||
for thread in snap["threads"]:
|
||||
assert "id" in thread
|
||||
# liveness lives under latestTurn.state (the contract this suite guards)
|
||||
assert "status" not in thread, "real threads have no top-level status field"
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
os.environ.get("T3_AFK_SMOKE_DISPATCH") != "1",
|
||||
reason="set T3_AFK_SMOKE_DISPATCH=1 to run the dispatch round-trip (spends a turn)",
|
||||
)
|
||||
def test_live_dispatch_round_trip_then_cleanup():
|
||||
"""End-to-end against the real server: ``dispatch`` (ensure-project + create +
|
||||
turn) succeeds and the new thread shows up in the snapshot. Cleans up the
|
||||
thread it created so the cockpit isn't littered."""
|
||||
import httpx
|
||||
|
||||
repo = "afk-smoke/roundtrip"
|
||||
client = _real_client()
|
||||
thread_id = client.dispatch(repo, 1, "Reply with just: ok. Do not use any tools.")
|
||||
assert isinstance(thread_id, str) and thread_id
|
||||
|
||||
# The thread must appear in the fleet read-model (poll briefly — dispatch is
|
||||
# accepted asynchronously).
|
||||
found = False
|
||||
for _ in range(10):
|
||||
if any(t.get("id") == thread_id for t in client.snapshot().get("threads", [])):
|
||||
found = True
|
||||
break
|
||||
time.sleep(1.0)
|
||||
assert found, f"dispatched thread {thread_id} never appeared in the snapshot"
|
||||
|
||||
# Cleanup: delete the throwaway thread (raw command — not part of the adapter).
|
||||
httpx.post(
|
||||
f"{_BASE_URL.rstrip('/')}/api/orchestration/dispatch",
|
||||
headers={"Authorization": f"Bearer {_TOKEN}"},
|
||||
json={"type": "thread.delete", "commandId": t3_client._uuid(), "threadId": thread_id},
|
||||
timeout=30.0,
|
||||
).raise_for_status()
|
||||
493
tests/test_afk_tracker.py
Normal file
|
|
@ -0,0 +1,493 @@
|
|||
"""Tests for ``app.afk.tracker`` — the GitHub issues adapter.
|
||||
|
||||
The ``Tracker`` is the loop's read/write port onto the issue tracker. It wraps
|
||||
an injected GitHub client (the real one shells out to ``gh``; here we inject a
|
||||
FAKE that records calls and replays staged data) and holds all the *business*
|
||||
logic the loop depends on: turning raw issues into ``Issue`` records with
|
||||
``blocked_by`` parsed, ``labeled_by_trusted`` decided fail-closed from the label
|
||||
event actor, and ``priority`` read off a priority label. No test here reaches a
|
||||
real ``gh``, GitHub/Forgejo, or the network.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk.tracker import (
|
||||
DEFAULT_TRUSTED_ASSOCIATIONS,
|
||||
GitHubClient,
|
||||
Tracker,
|
||||
)
|
||||
from app.afk.types import Issue
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Fake GitHub client — the injected port. Records every mutating call and
|
||||
# replays issues / label-events staged per repo. Implements the GitHubClient
|
||||
# Protocol the Tracker depends on.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class FakeGitHub:
|
||||
def __init__(self) -> None:
|
||||
# repo -> list of raw issue dicts (gh issue list --json shape)
|
||||
self._issues: dict[str, list[dict]] = {}
|
||||
# (repo, number) -> list of label-event dicts (who added which label)
|
||||
self._events: dict[tuple[str, int], list[dict]] = {}
|
||||
# recorded mutations
|
||||
self.labels_added: list[tuple[str, int, str]] = []
|
||||
self.labels_removed: list[tuple[str, int, str]] = []
|
||||
self.comments: list[tuple[str, int, str]] = []
|
||||
self.closed: list[tuple[str, int]] = []
|
||||
|
||||
# --- staging helpers (test-only) --- #
|
||||
def seed_issues(self, repo: str, issues: list[dict]) -> None:
|
||||
self._issues[repo] = issues
|
||||
|
||||
def seed_label_events(self, repo: str, number: int, events: list[dict]) -> None:
|
||||
self._events[(repo, number)] = events
|
||||
|
||||
# --- GitHubClient surface --- #
|
||||
def list_issues(self, repo: str, label: str) -> list[dict]:
|
||||
return [
|
||||
issue
|
||||
for issue in self._issues.get(repo, [])
|
||||
if label in [lbl["name"] for lbl in issue.get("labels", [])]
|
||||
]
|
||||
|
||||
def label_events(self, repo: str, number: int) -> list[dict]:
|
||||
return list(self._events.get((repo, number), []))
|
||||
|
||||
def add_label(self, repo: str, number: int, label: str) -> None:
|
||||
self.labels_added.append((repo, number, label))
|
||||
|
||||
def remove_label(self, repo: str, number: int, label: str) -> None:
|
||||
self.labels_removed.append((repo, number, label))
|
||||
|
||||
def comment(self, repo: str, number: int, body: str) -> None:
|
||||
self.comments.append((repo, number, body))
|
||||
|
||||
def close(self, repo: str, number: int) -> None:
|
||||
self.closed.append((repo, number))
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Raw-issue / event builders matching the gh JSON shapes the real client emits.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def _raw_issue(
|
||||
number: int = 1,
|
||||
labels: list[str] | None = None,
|
||||
body: str = "",
|
||||
) -> dict:
|
||||
return {
|
||||
"number": number,
|
||||
"labels": [{"name": name} for name in (labels or ["ready-for-agent"])],
|
||||
"body": body,
|
||||
}
|
||||
|
||||
|
||||
def _label_event(label: str, association: str = "OWNER", actor: str = "viktorbarzin") -> dict:
|
||||
# Mirrors the `gh api .../timeline` "labeled" event shape we care about.
|
||||
return {
|
||||
"event": "labeled",
|
||||
"label": {"name": label},
|
||||
"actor": {"login": actor},
|
||||
"author_association": association,
|
||||
}
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def gh() -> FakeGitHub:
|
||||
return FakeGitHub()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def tracker(gh: FakeGitHub) -> Tracker:
|
||||
return Tracker(gh)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Construction / contract.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_tracker_wraps_injected_client(gh: FakeGitHub):
|
||||
t = Tracker(gh)
|
||||
assert t.client is gh
|
||||
|
||||
|
||||
def test_fake_satisfies_protocol(gh: FakeGitHub):
|
||||
# The fake must be usable where a GitHubClient is expected (structural typing).
|
||||
assert isinstance(gh, GitHubClient)
|
||||
|
||||
|
||||
def test_default_trusted_associations_are_collaborator_or_above():
|
||||
assert DEFAULT_TRUSTED_ASSOCIATIONS == frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# list_ready — the read path that builds Issue records.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_list_ready_returns_issue_objects(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=7)])
|
||||
gh.seed_label_events("infra", 7, [_label_event("ready-for-agent")])
|
||||
|
||||
issues = tracker.list_ready(["infra"])
|
||||
|
||||
assert len(issues) == 1
|
||||
issue = issues[0]
|
||||
assert isinstance(issue, Issue)
|
||||
assert issue.number == 7
|
||||
assert issue.repo == "infra"
|
||||
assert issue.labels == ["ready-for-agent"]
|
||||
|
||||
|
||||
def test_list_ready_spans_multiple_repos(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_issues("crawler", [_raw_issue(number=2)])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
gh.seed_label_events("crawler", 2, [_label_event("ready-for-agent")])
|
||||
|
||||
issues = tracker.list_ready(["infra", "crawler"])
|
||||
|
||||
assert {(i.repo, i.number) for i in issues} == {("infra", 1), ("crawler", 2)}
|
||||
|
||||
|
||||
def test_list_ready_empty_when_no_ready_issues(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1, labels=["bug"])])
|
||||
assert tracker.list_ready(["infra"]) == []
|
||||
|
||||
|
||||
def test_list_ready_queries_with_configured_ready_label(gh: FakeGitHub):
|
||||
# A Tracker built with a custom ready label must query the client for *that*
|
||||
# label, not the default.
|
||||
seen: dict[str, str] = {}
|
||||
|
||||
class _RecordingGitHub(FakeGitHub):
|
||||
def list_issues(self, repo: str, label: str) -> list[dict]:
|
||||
seen["label"] = label
|
||||
return super().list_issues(repo, label)
|
||||
|
||||
rec = _RecordingGitHub()
|
||||
rec.seed_issues("infra", [_raw_issue(number=1, labels=["queue-me"])])
|
||||
rec.seed_label_events("infra", 1, [_label_event("queue-me")])
|
||||
t = Tracker(rec, ready_label="queue-me")
|
||||
|
||||
issues = t.list_ready(["infra"])
|
||||
|
||||
assert seen["label"] == "queue-me"
|
||||
assert len(issues) == 1
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Trust gate — labeled_by_trusted is decided from the label-event actor,
|
||||
# fail-closed.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_owner_labeled_issue_is_trusted(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association="OWNER")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
|
||||
|
||||
|
||||
@pytest.mark.parametrize("association", ["MEMBER", "COLLABORATOR"])
|
||||
def test_collaborator_and_member_are_trusted(gh: FakeGitHub, tracker: Tracker, association: str):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association=association)])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
|
||||
|
||||
|
||||
@pytest.mark.parametrize("association", ["NONE", "CONTRIBUTOR", "FIRST_TIME_CONTRIBUTOR", ""])
|
||||
def test_untrusted_association_is_not_trusted(gh: FakeGitHub, tracker: Tracker, association: str):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association=association)])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
|
||||
|
||||
|
||||
def test_missing_label_event_is_not_trusted(gh: FakeGitHub, tracker: Tracker):
|
||||
# The issue carries the ready label, but no event records WHO applied it —
|
||||
# fail closed: an unattributable label is never trusted.
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events("infra", 1, [])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
|
||||
|
||||
|
||||
def test_trust_uses_latest_application_of_ready_label(gh: FakeGitHub, tracker: Tracker):
|
||||
# If the ready label was removed and re-added, the MOST RECENT application
|
||||
# decides trust — a trusted re-label after an untrusted one is trusted.
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events(
|
||||
"infra",
|
||||
1,
|
||||
[
|
||||
_label_event("ready-for-agent", association="NONE", actor="drive-by"),
|
||||
_label_event("ready-for-agent", association="OWNER", actor="viktorbarzin"),
|
||||
],
|
||||
)
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
|
||||
|
||||
|
||||
def test_trust_ignores_events_for_other_labels(gh: FakeGitHub, tracker: Tracker):
|
||||
# A trusted actor labeling something else must not make the ready label trusted.
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events(
|
||||
"infra",
|
||||
1,
|
||||
[
|
||||
_label_event("priority:high", association="OWNER"),
|
||||
_label_event("ready-for-agent", association="NONE", actor="drive-by"),
|
||||
],
|
||||
)
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
|
||||
|
||||
|
||||
def test_custom_trusted_associations_override_default(gh: FakeGitHub):
|
||||
# Tighten the trust set to OWNER only: a COLLABORATOR label is no longer trusted.
|
||||
t = Tracker(gh, trusted_associations=frozenset({"OWNER"}))
|
||||
gh.seed_issues("infra", [_raw_issue(number=1)])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association="COLLABORATOR")])
|
||||
|
||||
assert t.list_ready(["infra"])[0].labeled_by_trusted is False
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# blocked_by — parsed from the issue body's "Blocked by" references.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_blocked_by_empty_when_body_has_no_references(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1, body="just implement the thing")])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].blocked_by == []
|
||||
|
||||
|
||||
def test_blocked_by_parses_single_reference(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=5, body="Blocked by #3")])
|
||||
gh.seed_label_events("infra", 5, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].blocked_by == [3]
|
||||
|
||||
|
||||
def test_blocked_by_parses_multiple_references(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=9, body="Blocked by #3, #4 and #10")])
|
||||
gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].blocked_by == [3, 4, 10]
|
||||
|
||||
|
||||
def test_blocked_by_is_case_insensitive_and_dedupes(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=9, body="blocked BY #3 and Blocked by #3, #4")])
|
||||
gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].blocked_by == [3, 4]
|
||||
|
||||
|
||||
def test_blocked_by_ignores_plain_issue_mentions(gh: FakeGitHub, tracker: Tracker):
|
||||
# A bare "#7" that is not part of a "Blocked by" clause is NOT a blocker.
|
||||
gh.seed_issues("infra", [_raw_issue(number=9, body="See #7 for context. Blocked by #3")])
|
||||
gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].blocked_by == [3]
|
||||
|
||||
|
||||
def test_blocked_by_tolerates_missing_body(gh: FakeGitHub, tracker: Tracker):
|
||||
issue = _raw_issue(number=1)
|
||||
issue["body"] = None # gh returns null for an empty body
|
||||
gh.seed_issues("infra", [issue])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].blocked_by == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# priority — read off a priority label (lower number runs first).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_priority_defaults_to_zero_without_priority_label(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1, labels=["ready-for-agent"])])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].priority == 0
|
||||
|
||||
|
||||
def test_priority_read_from_priority_label(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues("infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:2"])])
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].priority == 2
|
||||
|
||||
|
||||
def test_priority_lowest_label_wins_when_several(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues(
|
||||
"infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:5", "priority:1"])]
|
||||
)
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].priority == 1
|
||||
|
||||
|
||||
def test_priority_ignores_non_numeric_priority_label(gh: FakeGitHub, tracker: Tracker):
|
||||
gh.seed_issues(
|
||||
"infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:high"])]
|
||||
)
|
||||
gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
|
||||
|
||||
assert tracker.list_ready(["infra"])[0].priority == 0
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Mutations delegate to the injected client.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_add_label_delegates(gh: FakeGitHub, tracker: Tracker):
|
||||
tracker.add_label("infra", 7, "agent-in-progress")
|
||||
assert gh.labels_added == [("infra", 7, "agent-in-progress")]
|
||||
|
||||
|
||||
def test_remove_label_delegates(gh: FakeGitHub, tracker: Tracker):
|
||||
tracker.remove_label("infra", 7, "agent-in-progress")
|
||||
assert gh.labels_removed == [("infra", 7, "agent-in-progress")]
|
||||
|
||||
|
||||
def test_comment_delegates(gh: FakeGitHub, tracker: Tracker):
|
||||
tracker.comment("infra", 7, "phase: tests-red done")
|
||||
assert gh.comments == [("infra", 7, "phase: tests-red done")]
|
||||
|
||||
|
||||
def test_close_delegates(gh: FakeGitHub, tracker: Tracker):
|
||||
tracker.close("infra", 7)
|
||||
assert gh.closed == [("infra", 7)]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# The concrete gh-CLI-backed client builds no-shell argv and parses JSON; we
|
||||
# inject a fake runner so no real `gh` is ever spawned.
|
||||
# --------------------------------------------------------------------------- #
|
||||
from app.afk.tracker import GhCliClient # noqa: E402
|
||||
|
||||
|
||||
class _FakeRunner:
|
||||
"""Stand-in for the subprocess runner GhCliClient shells out through.
|
||||
|
||||
Records every argv and returns staged stdout per command, so we can pin the
|
||||
exact `gh` invocations without spawning a process.
|
||||
"""
|
||||
|
||||
def __init__(self, responses: dict[tuple[str, ...], str] | None = None) -> None:
|
||||
self.calls: list[tuple[str, ...]] = []
|
||||
self._responses = responses or {}
|
||||
|
||||
def __call__(self, argv: list[str]) -> str:
|
||||
key = tuple(argv)
|
||||
self.calls.append(key)
|
||||
return self._responses.get(key, "")
|
||||
|
||||
|
||||
def test_gh_cli_list_issues_builds_no_shell_argv_and_parses_json():
|
||||
argv = (
|
||||
"gh", "issue", "list", "--repo", "owner/infra",
|
||||
"--label", "ready-for-agent", "--state", "open",
|
||||
"--json", "number,labels,body", "--limit", "100",
|
||||
)
|
||||
runner = _FakeRunner({argv: '[{"number": 4, "labels": [{"name": "ready-for-agent"}], "body": "x"}]'})
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
|
||||
issues = client.list_issues("infra", "ready-for-agent")
|
||||
|
||||
assert runner.calls == [argv]
|
||||
assert issues == [{"number": 4, "labels": [{"name": "ready-for-agent"}], "body": "x"}]
|
||||
|
||||
|
||||
def test_gh_cli_list_issues_empty_output_is_empty_list():
|
||||
runner = _FakeRunner() # returns "" for everything
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
assert client.list_issues("infra", "ready-for-agent") == []
|
||||
|
||||
|
||||
def test_gh_cli_label_events_filters_labeled_events():
|
||||
timeline = (
|
||||
'[{"event": "commented"},'
|
||||
' {"event": "labeled", "label": {"name": "ready-for-agent"},'
|
||||
' "actor": {"login": "viktorbarzin"}, "author_association": "OWNER"}]'
|
||||
)
|
||||
argv = (
|
||||
"gh", "api",
|
||||
"repos/owner/infra/issues/4/timeline",
|
||||
"--paginate",
|
||||
"-H", "Accept: application/vnd.github+json",
|
||||
)
|
||||
runner = _FakeRunner({argv: timeline})
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
|
||||
events = client.label_events("infra", 4)
|
||||
|
||||
assert runner.calls == [argv]
|
||||
assert [e["event"] for e in events] == ["labeled"]
|
||||
assert events[0]["label"]["name"] == "ready-for-agent"
|
||||
|
||||
|
||||
def test_gh_cli_add_label_builds_argv():
|
||||
runner = _FakeRunner()
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
client.add_label("infra", 4, "agent-in-progress")
|
||||
assert runner.calls == [
|
||||
("gh", "issue", "edit", "4", "--repo", "owner/infra", "--add-label", "agent-in-progress")
|
||||
]
|
||||
|
||||
|
||||
def test_gh_cli_remove_label_builds_argv():
|
||||
runner = _FakeRunner()
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
client.remove_label("infra", 4, "agent-in-progress")
|
||||
assert runner.calls == [
|
||||
("gh", "issue", "edit", "4", "--repo", "owner/infra", "--remove-label", "agent-in-progress")
|
||||
]
|
||||
|
||||
|
||||
def test_gh_cli_comment_builds_argv():
|
||||
runner = _FakeRunner()
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
client.comment("infra", 4, "phase update")
|
||||
assert runner.calls == [
|
||||
("gh", "issue", "comment", "4", "--repo", "owner/infra", "--body", "phase update")
|
||||
]
|
||||
|
||||
|
||||
def test_gh_cli_close_builds_argv():
|
||||
runner = _FakeRunner()
|
||||
client = GhCliClient(repo_owner="owner", run=runner)
|
||||
client.close("infra", 4)
|
||||
assert runner.calls == [
|
||||
("gh", "issue", "close", "4", "--repo", "owner/infra")
|
||||
]
|
||||
|
||||
|
||||
def test_gh_cli_end_to_end_through_tracker():
|
||||
# Wire the gh-CLI client (fake runner) behind a real Tracker and confirm a
|
||||
# full read produces a correctly-decoded, trusted, blocked Issue.
|
||||
list_argv = (
|
||||
"gh", "issue", "list", "--repo", "owner/infra",
|
||||
"--label", "ready-for-agent", "--state", "open",
|
||||
"--json", "number,labels,body", "--limit", "100",
|
||||
)
|
||||
timeline_argv = (
|
||||
"gh", "api",
|
||||
"repos/owner/infra/issues/12/timeline",
|
||||
"--paginate",
|
||||
"-H", "Accept: application/vnd.github+json",
|
||||
)
|
||||
runner = _FakeRunner({
|
||||
list_argv: (
|
||||
'[{"number": 12,'
|
||||
' "labels": [{"name": "ready-for-agent"}, {"name": "priority:3"}],'
|
||||
' "body": "Blocked by #11"}]'
|
||||
),
|
||||
timeline_argv: (
|
||||
'[{"event": "labeled", "label": {"name": "ready-for-agent"},'
|
||||
' "actor": {"login": "viktorbarzin"}, "author_association": "OWNER"}]'
|
||||
),
|
||||
})
|
||||
tracker = Tracker(GhCliClient(repo_owner="owner", run=runner))
|
||||
|
||||
issue = tracker.list_ready(["infra"])[0]
|
||||
|
||||
assert issue.number == 12
|
||||
assert issue.repo == "infra"
|
||||
assert issue.blocked_by == [11]
|
||||
assert issue.priority == 3
|
||||
assert issue.labeled_by_trusted is True
|
||||
403
tests/test_afk_watcher.py
Normal file
|
|
@ -0,0 +1,403 @@
|
|||
"""Integration tests for ``app.afk.watcher`` — the in-flight run driver.
|
||||
|
||||
These wire the REAL pure cores (the actual ``run_state_machine.next_action`` and
|
||||
``phase_checklist.render``) to the in-memory adapter FAKES from ``conftest``
|
||||
(``FakeT3Client`` / ``FakeTracker`` / ``FakeCIWatcher`` / ``FakeNotifier``). No
|
||||
test touches a real T3 server, GitHub/Forgejo, the cluster, or Slack — the
|
||||
watcher is exercised end to end with fakes only at the I/O edges.
|
||||
|
||||
What one watch tick must do (the watcher contract), given an in-flight run
|
||||
``(issue, thread_id, commit, bookkeeping)``:
|
||||
|
||||
* assemble a ``RunState`` from ``t3_client.snapshot()`` (the thread's liveness)
|
||||
+ ``ci_watcher.status(repo, commit)`` (the CI verdict, only when something is
|
||||
pushed) + the run's own ``pushed`` / ``fix_forward_attempts`` /
|
||||
``elapsed_seconds`` bookkeeping, and feed it to the pure state machine;
|
||||
* **CLOSE_SUCCESS** → ``tracker.close``, drop the in-progress label, post the
|
||||
DONE checklist, and ring the ``done`` doorbell;
|
||||
* **ESCALATE_PREPUSH / FREEZE_ESCALATE** → drop the in-progress label, relabel
|
||||
``ready-for-human``, ring the ``needs-human`` / ``frozen`` doorbell, post the
|
||||
checklist — the run is handed back to a human;
|
||||
* **FIX_FORWARD** → dispatch a corrective turn (``t3_client.dispatch``), bump
|
||||
the fix-forward attempt count, keep the run in flight, refresh the checklist;
|
||||
NOT terminal, so no doorbell and no label churn;
|
||||
* **WAIT** → just refresh the progress checklist and keep waiting; no labels,
|
||||
no close, no doorbell, no dispatch.
|
||||
"""
|
||||
import pytest
|
||||
|
||||
from app.afk import watcher
|
||||
from app.afk.notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
|
||||
from app.afk.types import CIStatus, Issue
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Helpers.
|
||||
# --------------------------------------------------------------------------- #
|
||||
READY_FOR_HUMAN = "ready-for-human"
|
||||
|
||||
|
||||
def _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier) -> watcher.Watcher:
|
||||
return watcher.Watcher(
|
||||
t3_client=fake_t3,
|
||||
tracker=fake_tracker,
|
||||
ci_watcher=fake_ci,
|
||||
notifier=fake_notifier,
|
||||
)
|
||||
|
||||
|
||||
def _run(
|
||||
issue: Issue,
|
||||
thread_id: str = "thread-0",
|
||||
commit: str | None = None,
|
||||
fix_forward_attempts: int = 0,
|
||||
elapsed_seconds: float = 0.0,
|
||||
) -> watcher.InFlightRun:
|
||||
return watcher.InFlightRun(
|
||||
issue=issue,
|
||||
thread_id=thread_id,
|
||||
commit=commit,
|
||||
fix_forward_attempts=fix_forward_attempts,
|
||||
elapsed_seconds=elapsed_seconds,
|
||||
)
|
||||
|
||||
|
||||
# Map the tests' abstract liveness vocab to T3's REAL ``latestTurn.state`` strings
|
||||
# so call sites stay readable while the snapshot carries the true shape the
|
||||
# watcher parses (a finished turn is "completed", a failed one "errored",
|
||||
# "running" is itself real). Unknown values pass through verbatim.
|
||||
_REAL_STATE = {"idle": "completed", "error": "errored"}
|
||||
|
||||
|
||||
def _snapshot(thread_id: str, status: str) -> dict:
|
||||
"""A fleet snapshot with one thread whose latest turn is in ``status`` — real
|
||||
shape ``threads[].latestTurn.state`` (not a top-level ``status`` field)."""
|
||||
return {
|
||||
"threads": [
|
||||
{"id": thread_id, "latestTurn": {"state": _REAL_STATE.get(status, status)}}
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
def _labels(fake_tracker):
|
||||
return [(op, repo, num, lbl) for (op, repo, num, lbl) in fake_tracker.label_ops]
|
||||
|
||||
|
||||
def _kinds(fake_notifier):
|
||||
return [n["kind"] for n in fake_notifier.sent]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# WAIT — agent still working, nothing pushed: refresh the checklist, no action.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_wait_refreshes_checklist_and_does_nothing_else(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "running"))
|
||||
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue), make_config()
|
||||
)
|
||||
|
||||
assert result.action.value == "wait"
|
||||
assert result.terminal is False
|
||||
assert fake_tracker.closed == []
|
||||
assert _labels(fake_tracker) == [] # no label churn while waiting
|
||||
assert fake_notifier.sent == [] # no doorbell
|
||||
assert fake_t3.dispatched == [] # no corrective turn
|
||||
# The progress checklist was posted as a comment.
|
||||
assert len(fake_tracker.comments) == 1
|
||||
repo, num, body = fake_tracker.comments[0]
|
||||
assert (repo, num) == ("infra", 7)
|
||||
assert "AFK run progress" in body
|
||||
|
||||
|
||||
def test_wait_when_thread_missing_from_snapshot(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
# No snapshot entry for this thread yet -> thread_status None -> WAIT.
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot({"threads": []})
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue), make_config()
|
||||
)
|
||||
assert result.action.value == "wait"
|
||||
assert result.terminal is False
|
||||
|
||||
|
||||
def test_pushed_ci_pending_waits(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "running"))
|
||||
# commit present (pushed) but CI not yet decided -> PENDING -> WAIT.
|
||||
fake_ci.set_status("infra", "deadbeef", CIStatus.PENDING)
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="deadbeef"), make_config()
|
||||
)
|
||||
assert result.action.value == "wait"
|
||||
assert fake_tracker.closed == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# CLOSE_SUCCESS — pushed + CI green: close, unlabel, DONE checklist, doorbell.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_close_success_closes_and_unlabels_and_notifies(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
|
||||
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="cafef00d"), make_config()
|
||||
)
|
||||
|
||||
assert result.action.value == "close_success"
|
||||
assert result.terminal is True
|
||||
assert fake_tracker.closed == [("infra", 7)]
|
||||
# in-progress label removed (no ready-for-human on the happy path).
|
||||
assert ("remove", "infra", 7, "agent-in-progress") in _labels(fake_tracker)
|
||||
assert ("add", "infra", 7, READY_FOR_HUMAN) not in _labels(fake_tracker)
|
||||
# done doorbell fired with the thread deep-link target.
|
||||
assert _kinds(fake_notifier) == [KIND_DONE]
|
||||
assert fake_notifier.sent[0]["thread_id"] == "thread-0"
|
||||
assert fake_notifier.sent[0]["issue"] is issue
|
||||
|
||||
|
||||
def test_close_success_posts_done_checklist(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
|
||||
|
||||
_watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="cafef00d"), make_config()
|
||||
)
|
||||
|
||||
# The final checklist shows the run DONE — every phase checked.
|
||||
body = fake_tracker.comments[-1][2]
|
||||
assert "Done — issue closed" in body
|
||||
assert "- [ ]" not in body # nothing left unchecked at DONE
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# ESCALATE_PREPUSH — agent stalled/errored before any push: hand to a human.
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize("thread_state", ["errored", "completed"])
|
||||
def test_escalate_prepush_relabels_and_notifies(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, thread_state
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", thread_state))
|
||||
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit=None), make_config()
|
||||
)
|
||||
|
||||
assert result.action.value == "escalate_prepush"
|
||||
assert result.terminal is True
|
||||
assert fake_tracker.closed == [] # NOT closed — needs a human
|
||||
labels = _labels(fake_tracker)
|
||||
assert ("remove", "infra", 7, "agent-in-progress") in labels
|
||||
assert ("add", "infra", 7, READY_FOR_HUMAN) in labels
|
||||
assert _kinds(fake_notifier) == [KIND_NEEDS_HUMAN]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# FREEZE_ESCALATE — pushed, CI red, fix-forward budget exhausted: freeze + page.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_freeze_escalate_relabels_and_notifies(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
|
||||
config = make_config(fix_forward_max_attempts=3)
|
||||
|
||||
# attempts already at the cap -> budget exhausted -> FREEZE_ESCALATE.
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="badc0de", fix_forward_attempts=3), config
|
||||
)
|
||||
|
||||
assert result.action.value == "freeze_escalate"
|
||||
assert result.terminal is True
|
||||
assert fake_tracker.closed == []
|
||||
labels = _labels(fake_tracker)
|
||||
assert ("remove", "infra", 7, "agent-in-progress") in labels
|
||||
assert ("add", "infra", 7, READY_FOR_HUMAN) in labels
|
||||
assert _kinds(fake_notifier) == [KIND_FROZEN]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# FIX_FORWARD — pushed, CI red, budget remaining: corrective turn, stay in flight.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_fix_forward_dispatches_corrective_turn(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
|
||||
config = make_config(fix_forward_max_attempts=5)
|
||||
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="badc0de", fix_forward_attempts=1), config
|
||||
)
|
||||
|
||||
assert result.action.value == "fix_forward"
|
||||
assert result.terminal is False
|
||||
# A corrective turn was dispatched against the same repo/issue.
|
||||
assert len(fake_t3.dispatched) == 1
|
||||
assert (fake_t3.dispatched[0]["repo"], fake_t3.dispatched[0]["issue"]) == ("infra", 7)
|
||||
# Attempt count advanced and is surfaced on the result for the caller's
|
||||
# bookkeeping on the next tick.
|
||||
assert result.fix_forward_attempts == 2
|
||||
# Not terminal: no close, no ready-for-human, no doorbell.
|
||||
assert fake_tracker.closed == []
|
||||
assert ("add", "infra", 7, READY_FOR_HUMAN) not in _labels(fake_tracker)
|
||||
assert fake_notifier.sent == []
|
||||
|
||||
|
||||
def test_fix_forward_updates_thread_id_to_corrective_turn(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
# The corrective dispatch spawns a new thread; the result carries the new id
|
||||
# so the next tick polls the right thread.
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, thread_id="thread-old", commit="badc0de"), make_config()
|
||||
)
|
||||
assert result.thread_id == "thread-0" # FakeT3Client hands back thread-0
|
||||
assert result.thread_id != "thread-old"
|
||||
|
||||
|
||||
def test_fix_forward_note_appears_in_checklist(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "badc0de", CIStatus.RED)
|
||||
_watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="badc0de", fix_forward_attempts=1), make_config()
|
||||
)
|
||||
body = fake_tracker.comments[-1][2]
|
||||
assert "Fix-forward" in body
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Unknown / unrecognised thread status folds to "keep waiting" (fail-safe).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_unknown_thread_status_waits(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "provisioning")) # not a known status
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit=None), make_config()
|
||||
)
|
||||
# Unknown status must not escalate or close — treat as "no status yet".
|
||||
assert result.action.value == "wait"
|
||||
assert fake_tracker.closed == []
|
||||
assert fake_notifier.sent == []
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Real T3 ``latestTurn.state`` strings map to the right liveness (contract guard
|
||||
# against the snapshot-shape drift that the previous adapter/fake masked).
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.parametrize("state", ["running", "in_progress", "pending", "queued", "pendingInit"])
|
||||
def test_real_in_progress_states_keep_waiting(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, state
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": state}}]})
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit=None), make_config()
|
||||
)
|
||||
assert result.action.value == "wait" # still working -> keep polling
|
||||
|
||||
|
||||
def test_real_errored_state_escalates_when_nothing_pushed(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
# The real failure state is "errored" (not "error"); with nothing pushed it
|
||||
# is a pre-push escalation, not a freeze.
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": "errored"}}]})
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit=None), make_config()
|
||||
)
|
||||
assert result.action.value == "escalate_prepush"
|
||||
|
||||
|
||||
def test_thread_present_but_no_turn_yet_waits(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
# A freshly-created thread has no latestTurn -> no usable status yet -> WAIT.
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot({"threads": [{"id": "thread-0"}]})
|
||||
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit=None), make_config()
|
||||
)
|
||||
assert result.action.value == "wait"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Terminal cleanup only happens once / cleanly: a terminal tick posts exactly
|
||||
# one checklist comment (no double-commenting on the way out).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_terminal_tick_posts_exactly_one_checklist(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
|
||||
fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
|
||||
_watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
|
||||
_run(issue, commit="cafef00d"), make_config()
|
||||
)
|
||||
assert len(fake_tracker.comments) == 1
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# CI status is only queried when something is pushed (don't hit CI for an
|
||||
# unpushed run — there's no commit to check).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_ci_not_queried_when_nothing_pushed(
|
||||
fake_t3, fake_tracker, fake_notifier, make_issue, make_config
|
||||
):
|
||||
class ExplodingCI:
|
||||
def status(self, repo, commit):
|
||||
raise AssertionError("CI must not be queried with no pushed commit")
|
||||
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "running"))
|
||||
result = watcher.Watcher(
|
||||
t3_client=fake_t3,
|
||||
tracker=fake_tracker,
|
||||
ci_watcher=ExplodingCI(),
|
||||
notifier=fake_notifier,
|
||||
).tick(_run(issue, commit=None), make_config())
|
||||
assert result.action.value == "wait"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# ready-for-human label is configurable.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_ready_for_human_label_is_configurable(
|
||||
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
|
||||
):
|
||||
issue = make_issue(number=7, repo="infra")
|
||||
fake_t3.set_snapshot(_snapshot("thread-0", "error"))
|
||||
w = watcher.Watcher(
|
||||
t3_client=fake_t3,
|
||||
tracker=fake_tracker,
|
||||
ci_watcher=fake_ci,
|
||||
notifier=fake_notifier,
|
||||
ready_for_human_label="needs-eyes",
|
||||
)
|
||||
w.tick(_run(issue, commit=None), make_config())
|
||||
assert ("add", "infra", 7, "needs-eyes") in _labels(fake_tracker)
|
||||
|
|
@ -1,174 +1,251 @@
|
|||
"""Tests for the breakglass app: verb whitelist, SSE translation, auth, routes."""
|
||||
"""Tests for the breakglass app: session manager (attach model), verb whitelist,
|
||||
SSE translation, auth, routes."""
|
||||
import os
|
||||
|
||||
os.environ.setdefault("API_BEARER_TOKEN", "test-token")
|
||||
# Turns chdir into a per-session workspace; point it somewhere writable for tests
|
||||
# (prod uses the /workspace emptyDir). Must be set before the app imports config.
|
||||
os.environ.setdefault("BREAKGLASS_SESSIONS_DIR", "/tmp/bg-test-sessions")
|
||||
|
||||
import pytest
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from app.breakglass import agent_session, pve
|
||||
from app.breakglass import agent_session, pve, session as sessionmod
|
||||
from app.breakglass.server import app
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# PVE verb whitelist — the security boundary mirrored client-side.
|
||||
# Fakes for the claude subprocess a turn spawns.
|
||||
# --------------------------------------------------------------------------- #
|
||||
class _FakeStdout:
|
||||
def __init__(self, lines):
|
||||
self._lines = [(l + "\n").encode() for l in lines]
|
||||
self._i = 0
|
||||
|
||||
def __aiter__(self):
|
||||
return self
|
||||
|
||||
async def __anext__(self):
|
||||
if self._i >= len(self._lines):
|
||||
raise StopAsyncIteration
|
||||
line = self._lines[self._i]
|
||||
self._i += 1
|
||||
return line
|
||||
|
||||
|
||||
class _FakeStderr:
|
||||
async def read(self):
|
||||
return b""
|
||||
|
||||
|
||||
class _FakeProc:
|
||||
def __init__(self, lines, rc=0):
|
||||
self.stdout = _FakeStdout(lines)
|
||||
self.stderr = _FakeStderr()
|
||||
self.returncode = None
|
||||
self._rc = rc
|
||||
|
||||
async def wait(self):
|
||||
self.returncode = self._rc
|
||||
return self._rc
|
||||
|
||||
def kill(self):
|
||||
self.returncode = -9
|
||||
|
||||
|
||||
def _patch_proc(monkeypatch, lines, rc=0):
|
||||
async def _fake_spawn(*argv, **kwargs):
|
||||
return _FakeProc(lines, rc)
|
||||
monkeypatch.setattr(sessionmod.asyncio, "create_subprocess_exec", _fake_spawn)
|
||||
|
||||
|
||||
_TURN_LINES = [
|
||||
'{"type":"system","subtype":"init","session_id":"s"}',
|
||||
'{"type":"system","subtype":"thinking_tokens","estimated_tokens":5}',
|
||||
'{"type":"assistant","message":{"content":[{"type":"text","text":"checking disk"}]}}',
|
||||
'{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Bash","input":{"command":"df -h"}}]}}',
|
||||
'{"type":"result","is_error":false,"result":"done","duration_ms":12}',
|
||||
]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Session: event log + broadcast + replay/Last-Event-ID.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_add_event_assigns_sequential_ids():
|
||||
s = sessionmod.Session("s1")
|
||||
a = s.add_event({"kind": "user", "text": "hi"})
|
||||
b = s.add_event({"kind": "text", "text": "yo"})
|
||||
assert a["id"] == 0 and b["id"] == 1
|
||||
assert [e["kind"] for e in s.events] == ["user", "text"]
|
||||
|
||||
|
||||
def test_subscribe_receives_broadcast():
|
||||
s = sessionmod.Session("s1")
|
||||
q = s.subscribe()
|
||||
s.add_event({"kind": "text", "text": "live"})
|
||||
assert q.get_nowait()["text"] == "live"
|
||||
s.unsubscribe(q)
|
||||
s.add_event({"kind": "text", "text": "after"})
|
||||
assert q.empty()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_attach_replays_then_signals_caught_up():
|
||||
s = sessionmod.Session("s1")
|
||||
s.add_event({"kind": "user", "text": "diagnose"})
|
||||
s.add_event({"kind": "text", "text": "looking"})
|
||||
frames = []
|
||||
async for frame in sessionmod.attach_stream(s, last_event_id=None):
|
||||
frames.append(frame)
|
||||
if "caught-up" in frame:
|
||||
break
|
||||
body = "".join(frames)
|
||||
assert "diagnose" in body and "looking" in body
|
||||
assert "id: 0" in body and "id: 1" in body
|
||||
assert "event: caught-up" in frames[-1]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_attach_reconnect_replays_only_missed():
|
||||
s = sessionmod.Session("s1")
|
||||
for i in range(3):
|
||||
s.add_event({"kind": "text", "text": f"e{i}"}) # ids 0,1,2
|
||||
frames = []
|
||||
async for frame in sessionmod.attach_stream(s, last_event_id=0): # already saw id 0
|
||||
frames.append(frame)
|
||||
if "caught-up" in frame:
|
||||
break
|
||||
body = "".join(frames)
|
||||
assert "e0" not in body # not re-sent
|
||||
assert "e1" in body and "e2" in body
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Session: running a detached turn (mocked subprocess).
|
||||
# --------------------------------------------------------------------------- #
|
||||
@pytest.mark.asyncio
|
||||
async def test_turn_streams_events_into_log(monkeypatch):
|
||||
_patch_proc(monkeypatch, _TURN_LINES)
|
||||
s = sessionmod.Session("s1")
|
||||
assert s.start_turn("diagnose the devvm") is True
|
||||
await s._turn # wait for the detached turn to finish
|
||||
kinds = [e["kind"] for e in s.events]
|
||||
assert kinds[0] == "user"
|
||||
assert "session" in kinds and "text" in kinds and "tool" in kinds
|
||||
assert "result" in kinds and kinds[-1] == "turn_end"
|
||||
assert "thinking_tokens" not in kinds
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_one_turn_at_a_time(monkeypatch):
|
||||
_patch_proc(monkeypatch, _TURN_LINES)
|
||||
s = sessionmod.Session("s1")
|
||||
assert s.start_turn("first") is True
|
||||
assert s.start_turn("second") is False # task not done yet
|
||||
await s._turn
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_resume_after_first_turn(monkeypatch):
|
||||
captured = {"argvs": []}
|
||||
|
||||
async def _fake_spawn(*argv, **kwargs):
|
||||
captured["argvs"].append(argv)
|
||||
return _FakeProc(_TURN_LINES)
|
||||
|
||||
monkeypatch.setattr(sessionmod.asyncio, "create_subprocess_exec", _fake_spawn)
|
||||
s = sessionmod.Session("s1")
|
||||
s.start_turn("first"); await s._turn
|
||||
s.start_turn("second"); await s._turn
|
||||
assert "--session-id" in captured["argvs"][0]
|
||||
assert "--resume" in captured["argvs"][1]
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# SessionManager.
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_manager_create_get():
|
||||
m = sessionmod.SessionManager()
|
||||
s = m.create()
|
||||
assert m.get(s.id) is s
|
||||
assert m.get("nope") is None
|
||||
assert m.get_or_create(s.id) is s
|
||||
assert m.get_or_create(None).id != s.id
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# PVE verb whitelist (unchanged security boundary).
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_allowed_verbs_match_host_script():
|
||||
assert pve.ALLOWED_VERBS == {
|
||||
"status", "forensics", "reset", "stop", "start", "cycle"
|
||||
}
|
||||
assert pve.ALLOWED_VERBS == {"status", "forensics", "reset", "stop", "start", "cycle"}
|
||||
assert pve.MUTATING_VERBS == {"reset", "stop", "start", "cycle"}
|
||||
assert pve.MUTATING_VERBS < pve.ALLOWED_VERBS
|
||||
|
||||
|
||||
@pytest.mark.parametrize("bad", [
|
||||
"rm -rf /", "status; rm -rf /", "status 103", "shutdown", "", "STATUS",
|
||||
"cycle 999", "$(reboot)", "../start",
|
||||
])
|
||||
@pytest.mark.parametrize("bad", ["rm -rf /", "status; reboot", "status 103", "", "STATUS"])
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_verb_rejects_non_whitelisted_without_ssh(bad, monkeypatch):
|
||||
"""A bad verb must be rejected locally — never spawning a subprocess."""
|
||||
called = False
|
||||
|
||||
async def _boom(*a, **k):
|
||||
nonlocal called
|
||||
called = True
|
||||
raise AssertionError("ssh must not run for a rejected verb")
|
||||
|
||||
monkeypatch.setattr(pve.asyncio, "create_subprocess_exec", _boom)
|
||||
result = await pve.run_verb(bad)
|
||||
assert result["rejected"] is True
|
||||
assert result["exit_code"] is None
|
||||
assert called is False
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_run_verb_allowed_invokes_ssh_with_bare_verb(monkeypatch):
|
||||
captured = {}
|
||||
|
||||
class _FakeProc:
|
||||
returncode = 0
|
||||
|
||||
async def communicate(self):
|
||||
return (b"status: running\n", b"")
|
||||
|
||||
async def _fake_exec(*argv, **kwargs):
|
||||
captured["argv"] = argv
|
||||
return _FakeProc()
|
||||
|
||||
monkeypatch.setattr(pve.asyncio, "create_subprocess_exec", _fake_exec)
|
||||
result = await pve.run_verb("status")
|
||||
assert result["rejected"] is False
|
||||
assert result["exit_code"] == 0
|
||||
assert "running" in result["stdout"]
|
||||
# The verb is the LAST argv element, passed as a single token (no shell).
|
||||
assert captured["argv"][-1] == "status"
|
||||
assert captured["argv"][0] == "ssh"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# stream-json -> UI event translation (pure function).
|
||||
# translate_event (pure).
|
||||
# --------------------------------------------------------------------------- #
|
||||
|
||||
def test_translate_init_to_session():
|
||||
ev = agent_session.translate_event(
|
||||
def test_translate_init_and_noise_and_blocks():
|
||||
assert agent_session.translate_event(
|
||||
{"type": "system", "subtype": "init", "session_id": "abc"}
|
||||
) == {"kind": "session", "session_id": "abc"}
|
||||
assert agent_session.translate_event({"type": "system", "subtype": "hook_started"}) is None
|
||||
assert agent_session.translate_event(
|
||||
{"type": "assistant", "message": {"content": [{"type": "text", "text": "hi"}]}}
|
||||
) == {"kind": "text", "text": "hi"}
|
||||
tool = agent_session.translate_event(
|
||||
{"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "df -h"}}]}}
|
||||
)
|
||||
assert ev == {"kind": "session", "session_id": "abc"}
|
||||
|
||||
|
||||
@pytest.mark.parametrize("noise", [
|
||||
{"type": "system", "subtype": "hook_started"},
|
||||
{"type": "system", "subtype": "thinking_tokens", "estimated_tokens": 5},
|
||||
{"type": "user", "message": {"content": []}},
|
||||
{"type": "unknown"},
|
||||
])
|
||||
def test_translate_drops_noise(noise):
|
||||
assert agent_session.translate_event(noise) is None
|
||||
|
||||
|
||||
def test_translate_assistant_text():
|
||||
ev = agent_session.translate_event({
|
||||
"type": "assistant",
|
||||
"message": {"content": [{"type": "text", "text": "checking disk"}]},
|
||||
})
|
||||
assert ev == {"kind": "text", "text": "checking disk"}
|
||||
|
||||
|
||||
def test_translate_assistant_tool_use():
|
||||
ev = agent_session.translate_event({
|
||||
"type": "assistant",
|
||||
"message": {"content": [
|
||||
{"type": "tool_use", "name": "Bash", "input": {"command": "df -h"}}
|
||||
]},
|
||||
})
|
||||
assert ev["kind"] == "tool"
|
||||
assert ev["name"] == "Bash"
|
||||
assert ev["input"]["command"] == "df -h"
|
||||
|
||||
|
||||
def test_translate_result():
|
||||
ev = agent_session.translate_event({
|
||||
"type": "result", "is_error": False, "result": "done", "duration_ms": 1234,
|
||||
})
|
||||
assert ev == {"kind": "result", "is_error": False, "result": "done", "duration_ms": 1234}
|
||||
assert tool["kind"] == "tool" and tool["input"]["command"] == "df -h"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Routes + auth.
|
||||
# --------------------------------------------------------------------------- #
|
||||
|
||||
client = TestClient(app)
|
||||
AUTH = {"Authorization": "Bearer test-token"}
|
||||
|
||||
|
||||
def test_health_no_auth():
|
||||
r = client.get("/health")
|
||||
assert r.status_code == 200
|
||||
assert r.json()["service"] == "claude-breakglass"
|
||||
assert client.get("/health").json()["service"] == "claude-breakglass"
|
||||
|
||||
|
||||
def test_api_requires_auth():
|
||||
assert client.post("/api/session").status_code == 401
|
||||
assert client.get("/api/pve/verbs").status_code == 401
|
||||
assert client.post("/api/session/x/prompt", json={"prompt": "hi"}).status_code == 401
|
||||
|
||||
|
||||
def test_api_accepts_bearer():
|
||||
def test_session_create_and_unknown_session_404():
|
||||
r = client.post("/api/session", headers=AUTH)
|
||||
assert r.status_code == 200
|
||||
assert "session_id" in r.json()
|
||||
assert r.status_code == 200 and "session_id" in r.json()
|
||||
assert client.post("/api/session/nope/prompt", headers=AUTH, json={"prompt": "x"}).status_code == 404
|
||||
assert client.post("/api/session/nope/cancel", headers=AUTH).status_code == 404
|
||||
|
||||
|
||||
def test_api_accepts_authentik_header():
|
||||
r = client.post("/api/session", headers={"X-authentik-username": "me@viktorbarzin.me"})
|
||||
assert r.status_code == 200
|
||||
def test_prompt_starts_turn(monkeypatch):
|
||||
monkeypatch.setattr(sessionmod.Session, "start_turn", lambda self, *a, **k: True)
|
||||
sid = client.post("/api/session", headers=AUTH).json()["session_id"]
|
||||
r = client.post(f"/api/session/{sid}/prompt", headers=AUTH, json={"prompt": "diagnose"})
|
||||
assert r.status_code == 200 and r.json()["status"] == "started"
|
||||
|
||||
|
||||
def test_pve_verb_route_rejects_unknown():
|
||||
r = client.post("/api/pve/destroy", headers=AUTH)
|
||||
assert r.status_code == 400
|
||||
def test_prompt_409_when_turn_active(monkeypatch):
|
||||
monkeypatch.setattr(sessionmod.Session, "start_turn", lambda self, *a, **k: False)
|
||||
sid = client.post("/api/session", headers=AUTH).json()["session_id"]
|
||||
r = client.post(f"/api/session/{sid}/prompt", headers=AUTH, json={"prompt": "x"})
|
||||
assert r.status_code == 409
|
||||
|
||||
|
||||
def test_pve_verbs_listing():
|
||||
r = client.get("/api/pve/verbs", headers=AUTH)
|
||||
assert r.status_code == 200
|
||||
body = r.json()
|
||||
assert set(body["verbs"]) == pve.ALLOWED_VERBS
|
||||
assert set(body["mutating"]) == pve.MUTATING_VERBS
|
||||
|
||||
|
||||
def test_chat_streams_sse(monkeypatch):
|
||||
async def _fake_turn(session_id, prompt, model=None):
|
||||
yield {"kind": "session", "session_id": session_id}
|
||||
yield {"kind": "text", "text": "hello"}
|
||||
yield {"kind": "result", "is_error": False, "result": "ok"}
|
||||
|
||||
monkeypatch.setattr(agent_session, "run_turn", _fake_turn)
|
||||
r = client.post("/api/chat", headers=AUTH,
|
||||
json={"session_id": "s1", "prompt": "diagnose"})
|
||||
assert r.status_code == 200
|
||||
assert "text/event-stream" in r.headers["content-type"]
|
||||
body = r.text
|
||||
assert "hello" in body
|
||||
assert '"kind": "done"' in body # terminal frame always emitted
|
||||
def test_pve_verbs_listing_and_unknown_rejected():
|
||||
assert set(client.get("/api/pve/verbs", headers=AUTH).json()["verbs"]) == pve.ALLOWED_VERBS
|
||||
assert client.post("/api/pve/destroy", headers=AUTH).status_code == 400
|
||||
|
|
|
|||
256
tests/test_conversational.py
Normal file
|
|
@ -0,0 +1,256 @@
|
|||
"""Tests for the conversational (no-tools, multi-turn) brain endpoint.
|
||||
|
||||
This is the portal-assistant "Brain": a lean path that drives the Claude CLI with
|
||||
a no-tools conversational agent and per-conversation `--resume`, used by the voice
|
||||
gateway. Unlike /v1/chat/completions it does NOT clone a workspace or run a
|
||||
tool-enabled agent (see portal-assistant ADR-0002).
|
||||
"""
|
||||
import json
|
||||
from unittest.mock import AsyncMock, patch
|
||||
|
||||
import pytest
|
||||
from httpx import ASGITransport, AsyncClient
|
||||
|
||||
from app import conversational
|
||||
from app.main import app
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# argv builder
|
||||
# --------------------------------------------------------------------------- #
|
||||
def test_conversational_argv_new_session():
|
||||
argv = conversational_argv_call(resume=False)
|
||||
assert argv[0] == "claude"
|
||||
assert "-p" in argv
|
||||
assert argv[argv.index("--agent") + 1] == "conversational"
|
||||
# a new conversation opens with --session-id, never --resume
|
||||
assert argv[argv.index("--session-id") + 1] == "sess-1"
|
||||
assert "--resume" not in argv
|
||||
# SECURITY: a public-facing endpoint must NOT skip tool permissions
|
||||
assert "--dangerously-skip-permissions" not in argv
|
||||
assert argv[argv.index("--model") + 1] == "sonnet"
|
||||
assert argv[argv.index("--output-format") + 1] == "json"
|
||||
# latency: trims project CLAUDE.md/MCP + dynamic system-prompt sections off
|
||||
# the no-tools voice turn (~45k -> ~23k input tokens, ~1.3s faster TTFT)
|
||||
assert argv[argv.index("--setting-sources") + 1] == "user"
|
||||
assert "--exclude-dynamic-system-prompt-sections" in argv
|
||||
assert argv[-1] == "Hi there"
|
||||
|
||||
|
||||
def test_conversational_argv_resume_continues_session():
|
||||
argv = conversational_argv_call(resume=True)
|
||||
# a follow-up turn resumes the existing claude session
|
||||
assert argv[argv.index("--resume") + 1] == "sess-1"
|
||||
assert "--session-id" not in argv
|
||||
|
||||
|
||||
def conversational_argv_call(resume: bool):
|
||||
from app.conversational import conversational_argv
|
||||
return conversational_argv(
|
||||
session_id="sess-1", message="Hi there", model="sonnet", resume=resume
|
||||
)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# endpoint
|
||||
# --------------------------------------------------------------------------- #
|
||||
class _AsyncLineIter:
|
||||
"""Async iterator over a list of byte lines — mimics `proc.stdout`."""
|
||||
|
||||
def __init__(self, lines: list[bytes]):
|
||||
self._lines = list(lines)
|
||||
self._i = 0
|
||||
|
||||
def __aiter__(self):
|
||||
return self
|
||||
|
||||
async def __anext__(self):
|
||||
if self._i >= len(self._lines):
|
||||
raise StopAsyncIteration
|
||||
line = self._lines[self._i]
|
||||
self._i += 1
|
||||
return line
|
||||
|
||||
|
||||
def _mock_subprocess_returning(output: bytes, returncode: int = 0):
|
||||
proc = AsyncMock()
|
||||
lines = [chunk + b"\n" for chunk in output.split(b"\n") if chunk]
|
||||
proc.stdout = _AsyncLineIter(lines)
|
||||
proc.stderr = AsyncMock()
|
||||
proc.stderr.read = AsyncMock(return_value=b"")
|
||||
proc.wait = AsyncMock(return_value=returncode)
|
||||
proc.returncode = returncode
|
||||
return proc
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def _reset_sessions():
|
||||
conversational.reset_started()
|
||||
yield
|
||||
conversational.reset_started()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def auth_header():
|
||||
return {"Authorization": "Bearer test-token"}
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_conversational_happy_path(auth_header):
|
||||
"""A message in → the assistant's reply out, keyed to the session."""
|
||||
cli_output = json.dumps({
|
||||
"type": "result",
|
||||
"is_error": False,
|
||||
"result": "Здравейте! Как мога да помогна?",
|
||||
"session_id": "sess-1",
|
||||
}).encode()
|
||||
mock_proc = _mock_subprocess_returning(cli_output, returncode=0)
|
||||
|
||||
with patch("app.conversational.asyncio.create_subprocess_exec", return_value=mock_proc):
|
||||
transport = ASGITransport(app=app)
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.post(
|
||||
"/v1/conversational",
|
||||
json={"session_id": "sess-1", "message": "Здравей"},
|
||||
headers=auth_header,
|
||||
)
|
||||
|
||||
assert response.status_code == 200, response.text
|
||||
body = response.json()
|
||||
assert body["session_id"] == "sess-1"
|
||||
assert body["reply"] == "Здравейте! Как мога да помогна?"
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_conversational_resumes_on_second_turn(auth_header):
|
||||
"""First turn opens the session (--session-id); a second turn on the same
|
||||
session id resumes it (--resume) — this is what makes it a conversation."""
|
||||
calls: list[tuple] = []
|
||||
|
||||
def fake_spawn(*args, **kwargs):
|
||||
calls.append(args)
|
||||
out = json.dumps({"type": "result", "is_error": False, "result": "ok"}).encode()
|
||||
return _mock_subprocess_returning(out, returncode=0)
|
||||
|
||||
with patch("app.conversational.asyncio.create_subprocess_exec", side_effect=fake_spawn):
|
||||
transport = ASGITransport(app=app)
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
for _ in range(2):
|
||||
r = await client.post(
|
||||
"/v1/conversational",
|
||||
json={"session_id": "sess-X", "message": "hi"},
|
||||
headers=auth_header,
|
||||
)
|
||||
assert r.status_code == 200, r.text
|
||||
|
||||
assert "--session-id" in calls[0] and "--resume" not in calls[0]
|
||||
assert "--resume" in calls[1] and "--session-id" not in calls[1]
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_conversational_requires_auth():
|
||||
"""No bearer token → 401, same as the other endpoints."""
|
||||
transport = ASGITransport(app=app)
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
r = await client.post(
|
||||
"/v1/conversational",
|
||||
json={"session_id": "s", "message": "hi"},
|
||||
)
|
||||
assert r.status_code == 401
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_conversational_returns_503_on_failure(auth_header):
|
||||
"""A non-zero claude exit surfaces as 503 execution-failed."""
|
||||
mock_proc = _mock_subprocess_returning(b"", returncode=7)
|
||||
mock_proc.stderr.read = AsyncMock(return_value=b"boom")
|
||||
|
||||
with patch("app.conversational.asyncio.create_subprocess_exec", return_value=mock_proc):
|
||||
transport = ASGITransport(app=app)
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
r = await client.post(
|
||||
"/v1/conversational",
|
||||
json={"session_id": "s", "message": "x"},
|
||||
headers=auth_header,
|
||||
)
|
||||
assert r.status_code == 503
|
||||
assert r.json()["error"] == "execution failed"
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# streaming helpers (OpenAI-compatible token relay for the realtime voice agent)
|
||||
# --------------------------------------------------------------------------- #
|
||||
from collections import namedtuple # noqa: E402
|
||||
|
||||
_Msg = namedtuple("_Msg", "role content")
|
||||
|
||||
|
||||
def test_stream_argv_uses_stream_json_and_is_stateless():
|
||||
argv = conversational.stream_argv("hello", "sonnet")
|
||||
assert argv[:2] == ["claude", "-p"]
|
||||
assert "--agent" in argv and "conversational" in argv
|
||||
assert "stream-json" in argv
|
||||
assert "--include-partial-messages" in argv
|
||||
assert "--verbose" in argv
|
||||
assert "--model" in argv and "sonnet" in argv
|
||||
# latency: same lean-context trim as the gateway path
|
||||
assert argv[argv.index("--setting-sources") + 1] == "user"
|
||||
assert "--exclude-dynamic-system-prompt-sections" in argv
|
||||
assert argv[-1] == "hello"
|
||||
# stateless + no tools
|
||||
assert "--resume" not in argv and "--session-id" not in argv
|
||||
assert "--dangerously-skip-permissions" not in argv
|
||||
|
||||
|
||||
def test_delta_text_extracts_content_block_delta():
|
||||
line = json.dumps({
|
||||
"type": "stream_event",
|
||||
"event": {"type": "content_block_delta",
|
||||
"delta": {"type": "text_delta", "text": "Слон"}},
|
||||
})
|
||||
assert conversational.delta_text(line) == "Слон"
|
||||
|
||||
|
||||
def test_delta_text_ignores_non_text_events():
|
||||
for ev in [
|
||||
{"type": "system"},
|
||||
{"type": "stream_event", "event": {"type": "message_start"}},
|
||||
{"type": "stream_event", "event": {"type": "content_block_delta",
|
||||
"delta": {"type": "input_json_delta", "partial_json": "{"}}},
|
||||
{"type": "result"},
|
||||
]:
|
||||
assert conversational.delta_text(json.dumps(ev)) is None
|
||||
assert conversational.delta_text("") is None
|
||||
assert conversational.delta_text("not json") is None
|
||||
|
||||
|
||||
def test_openai_chunk_valid_sse_and_keeps_cyrillic():
|
||||
s = conversational.openai_chunk("chatcmpl-x", "sonnet", 123, content="две")
|
||||
assert s.startswith("data: ") and s.endswith("\n\n")
|
||||
payload = json.loads(s[len("data: "):].strip())
|
||||
assert payload["object"] == "chat.completion.chunk"
|
||||
assert payload["choices"][0]["delta"]["content"] == "две"
|
||||
assert payload["choices"][0]["finish_reason"] is None
|
||||
assert "две" in s # not unicode-escaped
|
||||
|
||||
|
||||
def test_openai_chunk_role_and_finish():
|
||||
role = conversational.openai_chunk("id", "m", 1, role="assistant")
|
||||
assert json.loads(role[6:].strip())["choices"][0]["delta"] == {"role": "assistant"}
|
||||
stop = conversational.openai_chunk("id", "m", 1, finish_reason="stop")
|
||||
c = json.loads(stop[6:].strip())["choices"][0]
|
||||
assert c["finish_reason"] == "stop" and c["delta"] == {}
|
||||
|
||||
|
||||
def test_synthesise_chat_prompt_keeps_assistant_turns():
|
||||
msgs = [
|
||||
_Msg("system", "Be brief."),
|
||||
_Msg("user", "Здравей"),
|
||||
_Msg("assistant", "Здравей! Как си?"),
|
||||
_Msg("user", "Добре, ти?"),
|
||||
]
|
||||
p = conversational.synthesise_chat_prompt(msgs)
|
||||
assert "Be brief." in p
|
||||
assert "User: Здравей" in p
|
||||
assert "Assistant: Здравей! Как си?" in p
|
||||
assert p.strip().endswith("User: Добре, ти?")
|
||||
|
|
@ -98,14 +98,15 @@ async def test_chat_completions_happy_path(auth_header):
|
|||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_chat_completions_rejects_streaming(auth_header):
|
||||
"""stream=true is not supported and must 400 with a clear message."""
|
||||
async def test_chat_completions_streaming_rejects_unsupported_model(auth_header):
|
||||
"""Streaming is supported now; model validation still runs first, so an
|
||||
unsupported model 400s before any CLI is spawned."""
|
||||
transport = ASGITransport(app=app)
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.post(
|
||||
"/v1/chat/completions",
|
||||
json={
|
||||
"model": "haiku",
|
||||
"model": "gpt-4",
|
||||
"messages": [{"role": "user", "content": "hi"}],
|
||||
"stream": True,
|
||||
},
|
||||
|
|
@ -113,7 +114,7 @@ async def test_chat_completions_rejects_streaming(auth_header):
|
|||
)
|
||||
assert response.status_code == 400
|
||||
body = response.json()
|
||||
assert "streaming not supported" in json.dumps(body).lower()
|
||||
assert "unsupported model" in json.dumps(body).lower()
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
|
|
@ -370,3 +371,58 @@ async def test_chat_completions_response_model_echoes_default_when_missing(auth_
|
|||
)
|
||||
assert status == 200
|
||||
assert body["model"] == "sonnet"
|
||||
|
||||
|
||||
def _delta_line(text: str) -> str:
|
||||
return json.dumps({
|
||||
"type": "stream_event",
|
||||
"event": {"type": "content_block_delta",
|
||||
"delta": {"type": "text_delta", "text": text}},
|
||||
})
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_chat_completions_streaming_relays_token_sse(auth_header):
|
||||
"""stream=true relays CLI stream-json token deltas as OpenAI SSE chunks."""
|
||||
cli_output = "\n".join([
|
||||
json.dumps({"type": "system"}),
|
||||
json.dumps({"type": "stream_event", "event": {"type": "message_start"}}),
|
||||
_delta_line("Две"),
|
||||
_delta_line(" точки."),
|
||||
json.dumps({"type": "result", "subtype": "success"}),
|
||||
]).encode()
|
||||
mock_proc = _mock_subprocess_returning(cli_output, returncode=0)
|
||||
|
||||
with patch("app.main.asyncio.create_subprocess_exec", return_value=mock_proc):
|
||||
transport = ASGITransport(app=app)
|
||||
async with AsyncClient(transport=transport, base_url="http://test") as client:
|
||||
response = await client.post(
|
||||
"/v1/chat/completions",
|
||||
json={
|
||||
"model": "sonnet",
|
||||
"stream": True,
|
||||
"messages": [{"role": "user", "content": "Колко е?"}],
|
||||
},
|
||||
headers=auth_header,
|
||||
)
|
||||
|
||||
assert response.status_code == 200, response.text
|
||||
assert response.headers["content-type"].startswith("text/event-stream")
|
||||
body = response.text
|
||||
assert "chat.completion.chunk" in body
|
||||
assert body.rstrip().endswith("data: [DONE]")
|
||||
|
||||
# Reassemble the streamed assistant content from the delta chunks.
|
||||
content = ""
|
||||
saw_role = False
|
||||
for line in body.splitlines():
|
||||
if not line.startswith("data: ") or line.strip() == "data: [DONE]":
|
||||
continue
|
||||
payload = json.loads(line[len("data: "):])
|
||||
assert payload["object"] == "chat.completion.chunk"
|
||||
delta = payload["choices"][0]["delta"]
|
||||
if delta.get("role") == "assistant":
|
||||
saw_role = True
|
||||
content += delta.get("content", "")
|
||||
assert saw_role
|
||||
assert content == "Две точки."
|
||||
|
|
|
|||