conversational: trim per-turn context to cut brain TTFT ~1.3s

The no-tools conversational agent was dragging the full project context (this repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s time-to-first-token (measured against the live pod, 2026-06-21). Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both the gateway (json) and realtime (stream-json) conversational argvs: context drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both run the same turn). The /execute agent path is untouched. Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT was unchanged) — the cost was the context size, not the spawn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
chat-completions: stream conversational turns (SSE token relay) for realtime voice
2026-06-21 18:00:21 +00:00 · 2026-06-17 22:22:38 +00:00 · 2026-06-17 19:51:34 +00:00 · 2026-06-17 18:38:44 +00:00 · 2026-06-15 22:27:00 +00:00 · 2026-06-15 21:15:11 +00:00
63 changed files with 8120 additions and 976 deletions
--- a/agents/conversational.md
+++ b/agents/conversational.md
@ -0,0 +1,32 @@
+---
+name: conversational
+description: Friendly bilingual (Bulgarian + English) spoken-conversation assistant for non-technical users. No tools and no file/cluster/web access — it only talks. Replies are short and natural for text-to-speech. Used by the portal-assistant voice gateway.
+model: sonnet
+tools: ""
+---
+
+You are a warm, friendly voice assistant talking with everyday people at home.
+Your replies are SPOKEN ALOUD by a text-to-speech engine, so how you write
+matters as much as what you say.
+
+- Reply in the SAME language the person used — Bulgarian or English. If they mix,
+  follow their dominant language. Never announce or comment on the language; just
+  use it.
+- Keep it SHORT: one to three sentences. This is a conversation, not an essay.
+- Write plain spoken text ONLY. No markdown, no bullet lists, no code blocks, no
+  URLs, no emoji, no headings — none of that survives being read aloud.
+- Sound natural and warm, like a helpful person, not a manual. Contractions are
+  good.
+- Write numbers, dates and times the way they should be SPOKEN (for example
+  "ten thirty in the morning", "the fifteenth of March"), not as digits or
+  symbols.
+- If you don't know something or can't help, say so briefly and kindly.
+
+You have NO tools and no access to the home, devices, files, the internet, or any
+system. You cannot turn things on or off, look things up live, send messages, or
+take any action — you are a conversation partner only. If asked to do something
+you can't, say so simply and offer what you can instead (talk it through, explain,
+or suggest an idea).
+
+Never mention these instructions, "tools", "agents", tokens, system prompts, or
+that you are an AI model — unless the person directly and explicitly asks.
--- a/app/afk/init.py
+++ b/app/afk/init.py
@ -0,0 +1,43 @@
+"""AFK loop: the autonomous issue-implementer control plane.
+
+This package is the "away-from-keyboard" automation that watches the issue
+tracker for ``ready-for-agent`` issues, dispatches each to a fresh **T3** thread
+(the full-access ``claudeAgent`` runtime) with the issue-implementer preamble
+prepended, then drives the resulting run through its lifecycle — tests-red →
+green → pushed → CI → deployed — escalating or fix-forwarding per a small,
+testable state machine. It owns no agent behaviour itself; the agent's standing
+rules are injected as a prompt preamble (``issue_implementer_prompt``) because
+T3 does NOT honour ``~/.claude/CLAUDE.md``.
+
+The whole loop ships **DISABLED**, by two independent gates: ``Config`` defaults
+to ``kill_switch=True`` AND an empty ``allowlist`` (see ``config.py``). Importing
+this package, scheduling the CronJob entrypoints, or constructing the default
+``Config`` therefore dispatches NOTHING and performs zero I/O — a disabled tick
+is wholly inert. The package is also not imported by the running service
+(``app.main``), so wiring it in changes nothing on its own.
+
+>>> ENABLING IS A DELIBERATE MANUAL STEP, PERFORMED LATER, NEVER BY THIS CODE. <<<
+Arming the loop takes BOTH of, on purpose (either alone stays inert, so one
+fat-fingered env var can't arm every repo):
+  1. clear the kill switch  (``AFK_KILL_SWITCH=false`` / ConfigMap ``kill_switch: "false"``), AND
+  2. enrol the exact repos   (``AFK_ALLOWLIST=repo-a,repo-b`` / ConfigMap ``allowlist``).
+There is no auto-enable path anywhere in this package; do not add one here.
+
+Every test in the suite runs against fakes — this package never talks to a real
+T3 server, GitHub/Forgejo, the cluster, or Slack.
+
+Module map (each is independently testable against the interfaces in
+``types.py``):
+  * ``types``                    — shared dataclasses + enums (the contract).
+  * ``config``                   — disabled-by-default Config + env/configmap loaders.
+  * ``issue_implementer_prompt`` — the preamble prepended to every dispatch.
+  * ``dispatch_policy``          — which ready issues to dispatch right now (pure).
+  * ``run_state_machine``        — snapshot + CI status → next Action (pure).
+  * ``phase_checklist``          — render the run's progress as a markdown checklist (pure).
+  * ``t3_client``                — the two-POST T3 dispatch + snapshot reader.
+  * ``tracker``                  — issue-tracker reads/labels/comments/close.
+  * ``ci_watcher``               — commit → CI status.
+  * ``notifier``                 — escalation/notification sink.
+  * ``poller``                   — CronJob tick #1: select + dispatch ready issues.
+  * ``watcher``                  — CronJob tick #2: drive one in-flight run to a verdict.
+"""
--- a/app/afk/ci_watcher.py
+++ b/app/afk/ci_watcher.py
@ -0,0 +1,141 @@
+"""CI watcher — fold a pushed commit's pipeline into a single ``CIStatus``.
+
+A commit the agent pushed to ``master`` is only "done" once it has both *built*
+and *deployed*: the CI/CD chain is GHA → ghcr → Woodpecker → Keel
+(``docs/2026-06-14-afk-implementation-pipeline-design.md``). This adapter
+collapses that multi-stage reality into the three-value verdict the state
+machine speaks (:class:`~app.afk.types.CIStatus`): ``PENDING`` / ``GREEN`` /
+``RED``.
+
+It checks three stages in order and stops at the first that decides the verdict:
+
+  1. **build** — the GitHub Actions run for the commit (build + test + lint);
+  2. **deploy** — the Woodpecker pipeline that ships the built image;
+  3. **rollout** — the image actually reaching the cluster (Keel/k8s rollout).
+
+Folding rule, applied stage by stage: a ``FAILURE`` anywhere is ``RED`` (and we
+short-circuit — a red build is never "rolled out", and we don't bother the later
+clients); a stage that hasn't concluded (``NONE`` = no run yet, ``PENDING`` =
+in progress) makes the whole verdict ``PENDING`` (the state machine waits on
+either); only when *every* stage has succeeded is the commit ``GREEN``.
+
+The three stage clients are **injected**, each behind a tiny structural
+:class:`typing.Protocol`, so this module never imports ``gh`` / ``woodpecker`` /
+``kubectl`` and the tests drive it entirely with fakes. The rollout client is
+**optional** — the pilot keeps cluster/``state.sqlite`` reads optional, so a
+watcher built without one treats a green deploy as the terminal ``GREEN``. The
+real client wiring (subprocess argv, JSON parsing, kubectl-exec) lives in the
+adapters that *implement* these Protocols, not here; keeping this module pure
+keeps the folding logic the only thing under test.
+"""
+from enum import Enum
+from typing import Protocol
+
+from .types import CIStatus
+
+
+class StageResult(Enum):
+    """Outcome of one CI/CD stage for a commit, before folding into ``CIStatus``.
+
+    Each injected client returns one of these per ``(repo, commit)``:
+
+    ``NONE`` — no run exists yet for this commit (e.g. the webhook hasn't fired);
+    ``PENDING`` — a run exists and is still in progress;
+    ``SUCCESS`` — the stage concluded green;
+    ``FAILURE`` — the stage concluded red.
+
+    ``NONE`` and ``PENDING`` are distinct on purpose so a client can report
+    "nothing here yet" vs "running" even though both fold to ``CIStatus.PENDING``;
+    keeping them separate lets callers/log lines tell the two apart.
+    """
+
+    NONE = "none"
+    PENDING = "pending"
+    SUCCESS = "success"
+    FAILURE = "failure"
+
+
+# --------------------------------------------------------------------------- #
+# Injected client Protocols — structural, so any object with the right method
+# (real adapter or test fake) satisfies them. No ``Any``: every method is typed
+# (repo, commit) -> StageResult.
+# --------------------------------------------------------------------------- #
+class GitHubChecksClient(Protocol):
+    """Reads the GitHub Actions run (build + test + lint) for a commit."""
+
+    def run_conclusion(self, repo: str, commit: str) -> StageResult: ...
+
+
+class WoodpeckerClient(Protocol):
+    """Reads the Woodpecker deploy pipeline triggered for a commit's image."""
+
+    def deploy_conclusion(self, repo: str, commit: str) -> StageResult: ...
+
+
+class RolloutClient(Protocol):
+    """Reads whether the commit's image has rolled out to the cluster."""
+
+    def rollout_status(self, repo: str, commit: str) -> StageResult: ...
+
+
+class CIWatcher:
+    """Folds build → deploy → rollout into a single :class:`CIStatus`.
+
+    Inject the three stage clients (``github`` and ``woodpecker`` are required;
+    ``rollout`` is optional — omit it to stop the verdict at the deploy stage,
+    matching the pilot's "cluster reads optional" posture). The clients are the
+    only I/O surface, so production passes real adapters and tests pass fakes;
+    :meth:`status` itself is pure.
+    """
+
+    def __init__(
+        self,
+        github: GitHubChecksClient,
+        woodpecker: WoodpeckerClient,
+        rollout: RolloutClient | None = None,
+    ) -> None:
+        self._github = github
+        self._woodpecker = woodpecker
+        self._rollout = rollout
+
+    def status(self, repo: str, commit: str) -> CIStatus:
+        """Return the folded CI verdict for ``commit`` in ``repo``.
+
+        Stages are queried lazily in order and the first decisive one wins: a
+        ``FAILURE`` yields ``RED``, an unconcluded stage (``NONE``/``PENDING``)
+        yields ``PENDING``, and only when every stage has ``SUCCESS`` does the
+        verdict reach ``GREEN``. Short-circuiting is real — a stage is only
+        queried if every earlier stage succeeded, so a red/pending build never
+        touches the deploy or rollout client (the assertions in the tests, and
+        avoiding a needless kubectl-exec, both depend on this). With no rollout
+        client the deploy stage is terminal.
+        """
+        # Each entry is a thunk so a later stage's client is never called once an
+        # earlier stage has already decided the verdict.
+        probes = [
+            lambda: self._github.run_conclusion(repo, commit),
+            lambda: self._woodpecker.deploy_conclusion(repo, commit),
+        ]
+        if self._rollout is not None:
+            rollout = self._rollout  # bind for the closure (narrowed, non-None)
+            probes.append(lambda: rollout.rollout_status(repo, commit))
+
+        for probe in probes:
+            verdict = _stage_verdict(probe())
+            if verdict is not None:
+                return verdict  # FAILURE → RED, NONE/PENDING → PENDING
+        return CIStatus.GREEN
+
+
+def _stage_verdict(stage: StageResult) -> CIStatus | None:
+    """Decisive verdict for a single stage, or ``None`` to "keep going".
+
+    ``FAILURE`` decides ``RED``; an unconcluded stage (``NONE``/``PENDING``)
+    decides ``PENDING``; ``SUCCESS`` is non-decisive (``None``) — the next stage
+    gets to speak, and only the last stage's success folds to ``GREEN``.
+    """
+    if stage is StageResult.FAILURE:
+        return CIStatus.RED
+    if stage in (StageResult.NONE, StageResult.PENDING):
+        return CIStatus.PENDING
+    return None
--- a/app/afk/config.py
+++ b/app/afk/config.py
@ -0,0 +1,127 @@
+"""Config loader for the AFK loop — DISABLED BY DEFAULT.
+
+The whole loop ships off. A bare ``Config()`` (and therefore ``default()``,
+``from_env()`` with nothing set, and ``from_configmap({})``) has
+``kill_switch=True`` and an empty ``allowlist`` — so nothing is ever
+dispatched until an operator deliberately turns it on. Enabling is a TWO-part
+manual step, on purpose:
+
+  1. set ``AFK_KILL_SWITCH=false`` (or ``kill_switch: "false"`` in the
+     ConfigMap), AND
+  2. populate ``AFK_ALLOWLIST`` with the exact repos that may be automated.
+
+Either alone is inert: the kill switch off with an empty allowlist still
+dispatches nothing, and a full allowlist with the kill switch on is frozen.
+Both gates exist so a single fat-fingered env var can't accidentally arm the
+loop across every repo.
+
+``from_env`` reads process env; ``from_configmap`` reads an already-parsed
+string→string mapping (the shape a mounted ConfigMap gives you). They share one
+parser so the two paths can't drift. Lists are comma-separated; booleans accept
+the usual truthy spellings.
+
+This module owns only *loading* a ``Config`` — the dataclass itself lives in
+``types`` and policy decisions live in ``dispatch_policy`` / ``run_state_machine``.
+"""
+import os
+from collections.abc import Mapping
+
+from .types import Config
+
+# Env var names — also the ConfigMap keys (one source of truth for both paths).
+ENV_ALLOWLIST = "AFK_ALLOWLIST"
+ENV_KILL_SWITCH = "AFK_KILL_SWITCH"
+ENV_IN_PROGRESS_LABEL = "AFK_IN_PROGRESS_LABEL"
+ENV_READY_LABEL = "AFK_READY_LABEL"
+ENV_BUDGET_USD = "AFK_BUDGET_USD"
+ENV_FIX_FORWARD_MAX_ATTEMPTS = "AFK_FIX_FORWARD_MAX_ATTEMPTS"
+ENV_FIX_FORWARD_MAX_SECONDS = "AFK_FIX_FORWARD_MAX_SECONDS"
+
+# Spellings accepted as boolean true / false (case-insensitive). Anything else
+# raises rather than silently defaulting — an unparseable kill-switch value must
+# never be guessed safe-or-unsafe.
+_TRUE = frozenset({"1", "true", "yes", "on"})
+_FALSE = frozenset({"0", "false", "no", "off"})
+
+
+def default() -> Config:
+    """The disabled default Config: kill switch ON, allowlist EMPTY.
+
+    Equivalent to ``Config(allowlist=[], kill_switch=True)``; provided as a named
+    entry point so callers don't hardcode the disabled posture themselves.
+    """
+    return Config(allowlist=[], kill_switch=True)
+
+
+def from_env(env: Mapping[str, str] | None = None) -> Config:
+    """Build a Config from environment variables (defaults to ``os.environ``).
+
+    Unset variables fall back to the disabled/contract defaults, so an
+    unconfigured process stays off.
+    """
+    return _from_mapping(os.environ if env is None else env)
+
+
+def from_configmap(data: Mapping[str, str]) -> Config:
+    """Build a Config from a parsed ConfigMap (string→string mapping).
+
+    Identical semantics to ``from_env`` — same keys, same parser — but sourced
+    from a mounted ConfigMap's ``data`` rather than process env. An empty mapping
+    yields the disabled default.
+    """
+    return _from_mapping(data)
+
+
+# --------------------------------------------------------------------------- #
+# Internals — one shared parser so env and ConfigMap paths can't diverge.
+# --------------------------------------------------------------------------- #
+def _from_mapping(data: Mapping[str, str]) -> Config:
+    base = default()
+    return Config(
+        allowlist=_parse_list(data.get(ENV_ALLOWLIST), base.allowlist),
+        kill_switch=_parse_bool(data.get(ENV_KILL_SWITCH), base.kill_switch),
+        in_progress_label=_nonempty(data.get(ENV_IN_PROGRESS_LABEL), base.in_progress_label),
+        ready_label=_nonempty(data.get(ENV_READY_LABEL), base.ready_label),
+        budget_usd=_parse_float(data.get(ENV_BUDGET_USD), base.budget_usd),
+        fix_forward_max_attempts=_parse_int(
+            data.get(ENV_FIX_FORWARD_MAX_ATTEMPTS), base.fix_forward_max_attempts
+        ),
+        fix_forward_max_seconds=_parse_int(
+            data.get(ENV_FIX_FORWARD_MAX_SECONDS), base.fix_forward_max_seconds
+        ),
+    )
+
+
+def _parse_list(raw: str | None, fallback: list[str]) -> list[str]:
+    if raw is None:
+        return list(fallback)
+    return [item.strip() for item in raw.split(",") if item.strip()]
+
+
+def _parse_bool(raw: str | None, fallback: bool) -> bool:
+    if raw is None:
+        return fallback
+    value = raw.strip().lower()
+    if value in _TRUE:
+        return True
+    if value in _FALSE:
+        return False
+    raise ValueError(f"unparseable boolean for AFK config: {raw!r}")
+
+
+def _parse_int(raw: str | None, fallback: int) -> int:
+    if raw is None or not raw.strip():
+        return fallback
+    return int(raw.strip())
+
+
+def _parse_float(raw: str | None, fallback: float) -> float:
+    if raw is None or not raw.strip():
+        return fallback
+    return float(raw.strip())
+
+
+def _nonempty(raw: str | None, fallback: str) -> str:
+    if raw is None or not raw.strip():
+        return fallback
+    return raw.strip()
--- a/app/afk/dispatch_policy.py
+++ b/app/afk/dispatch_policy.py
@ -0,0 +1,118 @@
+"""Dispatch policy — the PURE gate deciding which ready issues to run *now*.
+
+``select_dispatchable`` is the loop's first decision each tick: given every
+issue the tracker reported ready, the loop config, and the set of repos that
+already have an agent in flight, it returns the ordered list of issues to
+dispatch this round. It does **no IO** — no tracker calls, no T3, no clock — so
+it is exhaustively unit-testable and the loop stays a thin shell around it.
+
+What it encapsulates (the dispatch predicate from the AFK pipeline design doc):
+
+  * **Kill switch** — ``config.kill_switch`` short-circuits to ``[]`` before any
+    per-issue work. The whole loop ships disabled; this is the master off.
+  * **Trust gate** — only ``issue.labeled_by_trusted`` issues are eligible. On a
+    private repo the gating label *is* the authorization, so an issue made ready
+    by an untrusted/bot actor must never auto-run (prompt-injection defense).
+  * **Allowlist** — ``issue.repo`` must be in ``config.allowlist``. An empty
+    allowlist dispatches nothing even with the kill switch off (the deliberate
+    two-gate posture: arming the loop takes both).
+  * **Per-repo lock** — any repo already in ``in_flight_repos`` is skipped; at
+    most one agent runs per repo (two would collide on the working tree).
+  * **blocked_by gating** — ``issue.blocked_by`` lists the issue numbers of
+    blockers that are still OPEN, so a non-empty list means "still blocked" and
+    the issue is skipped.
+  * **One-agent-per-repo within the batch** — because a repo hosts only one
+    in-flight agent, a single call returns at most ONE decision per repo: the
+    most-urgent eligible issue in that repo wins the slot. (A more-urgent issue
+    that is itself ineligible does not consume the slot — the best *eligible*
+    candidate does.)
+  * **Priority ordering** — the surviving per-repo winners are returned
+    lowest-``priority``-value-first (P0 before P1 before P2), with a deterministic
+    tiebreaker (ascending issue number) so the output is a total, stable order
+    independent of input order.
+
+PRIORITY DIRECTION — lower ``Issue.priority`` runs first, matching tracker
+conventions (P0/P1 are more urgent than P2) and ``Issue.priority``'s own
+docstring in ``types``. The ordering lives here (the one place that consumes
+``priority`` for dispatch), so this module is the source of truth for the
+direction.
+
+Pure: it never mutates its inputs — the caller's issue list, the config, and the
+``in_flight_repos`` set are all left exactly as passed.
+"""
+from .types import Config, DispatchDecision, Issue
+
+
+def select_dispatchable(
+    issues: list[Issue],
+    config: Config,
+    in_flight_repos: set[str],
+) -> list[DispatchDecision]:
+    """Return the ordered issues to dispatch this tick (see module docstring).
+
+    Empty when the kill switch is on, the allowlist excludes everything, or no
+    issue clears every gate. At most one decision per repo; ordered
+    lowest-priority-value-first (most urgent), ties broken by ascending issue
+    number.
+    """
+    # Kill switch: master off-ramp, evaluated before any per-issue work.
+    if config.kill_switch:
+        return []
+
+    allowlist = frozenset(config.allowlist)
+
+    # First pass: keep only issues that clear every per-issue gate. Repos already
+    # in flight are excluded here, so the lock is enforced before slot selection.
+    eligible: list[Issue] = [
+        issue
+        for issue in issues
+        if _is_eligible(issue, allowlist, in_flight_repos)
+    ]
+
+    # One slot per repo: among the eligible issues sharing a repo, the best
+    # candidate (the global sort order) takes it; the rest are dropped this tick.
+    best_per_repo: dict[str, Issue] = {}
+    for issue in sorted(eligible, key=_dispatch_sort_key):
+        best_per_repo.setdefault(issue.repo, issue)
+
+    # Final order: the per-repo winners, most urgent first (total + stable).
+    winners = sorted(best_per_repo.values(), key=_dispatch_sort_key)
+    return [DispatchDecision(issue=issue, reason=_reason(issue)) for issue in winners]
+
+
+# --------------------------------------------------------------------------- #
+# Internals.
+# --------------------------------------------------------------------------- #
+def _is_eligible(
+    issue: Issue,
+    allowlist: frozenset[str],
+    in_flight_repos: set[str],
+) -> bool:
+    """True iff the issue clears the trust, allowlist, per-repo-lock, and
+    blocked_by gates. Kept boolean (not "which gate failed") because the policy
+    only ever needs the survivors; reasons are attached to survivors only."""
+    if not issue.labeled_by_trusted:
+        return False
+    if issue.repo not in allowlist:
+        return False
+    if issue.repo in in_flight_repos:
+        return False
+    if issue.blocked_by:  # non-empty == at least one OPEN blocker remains
+        return False
+    return True
+
+
+def _dispatch_sort_key(issue: Issue) -> tuple[int, int]:
+    """Sort key giving a total, deterministic order: lowest ``priority`` value
+    first (P0 before P1 — most urgent wins), then lowest issue number as the
+    tiebreaker so equal-priority issues never depend on input/iteration order."""
+    return (issue.priority, issue.number)
+
+
+def _reason(issue: Issue) -> str:
+    """Human-readable justification, logged and surfaced in notifications, never
+    parsed. Records that every gate passed and the priority that ordered it."""
+    return (
+        f"{issue.repo}#{issue.number}: eligible "
+        f"(trusted, allowlisted, unblocked, repo free) — priority {issue.priority}"
+    )
--- a/app/afk/issue_implementer_prompt.py
+++ b/app/afk/issue_implementer_prompt.py
@ -0,0 +1,54 @@
+"""The issue-implementer preamble — the AFK agent's standing instructions.
+
+T3's full-access ``claudeAgent`` runtime does NOT read ``~/.claude/CLAUDE.md``,
+so the agent gets no behaviour from the repo's rules files. Instead the loop
+injects behaviour by PREPENDING this preamble to ``message.text`` on every
+dispatch (see ``t3_client.T3Client.dispatch`` callers). It is a module constant
+on purpose: one canonical, reviewable copy of the rules, versioned with the
+code, identical for every issue.
+
+Keep it imperative and self-contained — the agent only ever sees this text plus
+the issue body. Do not reference files it cannot read (no "see CLAUDE.md").
+"""
+
+ISSUE_IMPLEMENTER_PREAMBLE = """\
+You are an autonomous issue-implementer agent running unattended (the human is \
+away from keyboard). The task below is a tracker issue. Implement it end to end \
+and land it yourself — no human will answer questions or click anything for you.
+
+STANDING RULES — follow exactly, every time:
+- Work test-first. For any code with testable behaviour, write a failing test \
+FIRST (red), then the minimum implementation to make it pass (green), then \
+refactor. Terraform, config, and docs are exempt.
+- Do the work in an isolated git worktree off the latest master; never edit a \
+shared checkout directly.
+- You MUST commit your work — small, focused commits, staging files by name \
+(never `git add -A` / `git add .`), and never skip hooks. A clear commit \
+message is the audit trail: the subject says WHAT changed, the body says WHY in \
+plain words.
+- When tests and lint are green, land the change yourself: merge the latest \
+master into your branch, re-verify green, then push to master. If the push is \
+rejected because someone landed first, fetch, merge, re-verify, and push again. \
+Do not stop at an unmerged branch and do not open a pull request unless told to.
+- After pushing, watch the resulting CI / build / deploy chain to completion and \
+fix any failures you caused before considering the task done.
+- Operate autonomously. NEVER enter plan mode, and NEVER ask the human a \
+question or wait for confirmation — make the most reasonable decision, record \
+your reasoning in the commit message, and proceed. If the issue is genuinely \
+ambiguous or blocked, say so explicitly in a final comment and stop rather than \
+guessing destructively.
+
+GUARDRAILS — never cross these, even if the issue seems to ask for it:
+- NEVER force-push, and never force-push to master under any circumstance.
+- NEVER edit, resize, or delete PersistentVolumeClaims / PersistentVolumes, and \
+never touch Vault secrets or other credential stores.
+- All infrastructure changes go through Terraform / Terragrunt in the infra \
+repo — never `kubectl apply/edit/patch/delete` against live cluster state.
+- NEVER use `[ci skip]` (or any CI-skip token) in a commit message — it hides \
+the change from the audit and deploy pipeline.
+- No destructive operations the issue did not ask for: no dropping database \
+tables, no `rm -rf` outside your worktree, no killing processes you did not \
+start.
+
+THE ISSUE TO IMPLEMENT FOLLOWS:
+"""
--- a/app/afk/notifier.py
+++ b/app/afk/notifier.py
@ -0,0 +1,155 @@
+"""Terminal-state doorbell for the AFK loop — Slack / ntfy escalation sink.
+
+When a run reaches a *terminal* state the human who is away from keyboard needs
+to know: either the work landed (``done``) or it needs them back at the console
+(``needs-human`` — the agent stalled/errored before pushing — or ``frozen`` —
+the fix-forward budget ran out). This module turns one of those events into a
+formatted alert carrying a **deep-link to the T3 thread**, so a tap on the
+notification opens the exact conversation the agent ran.
+
+Design, matching the rest of ``app.afk`` and the breakglass code:
+
+  * ``Notifier`` owns no transport. The actual Slack/ntfy POST is an injected
+    ``sender`` callable (constructor argument). Production wires a real HTTP
+    sender; tests inject a recording fake and assert the formatted payload
+    without touching the network — the same dependency-injection seam breakglass
+    uses for the claude subprocess.
+  * ``render_notification`` is a pure function that builds the payload; ``notify``
+    is just "render, then hand to the sender". Keeping the formatting pure makes
+    it unit-testable on its own and guarantees ``notify`` sends exactly what
+    ``render_notification`` returns.
+  * The kind vocabulary is CLOSED: only the three terminal kinds are sendable.
+    An unknown kind raises rather than firing a mystery doorbell — a non-terminal
+    kind reaching here is a caller bug, not something to paper over.
+  * The notifier never swallows a sender failure. If Slack is down the exception
+    propagates; the loop decides whether to retry or give up, not this adapter.
+
+The whole AFK loop ships DISABLED (see ``config.py``); this module is inert
+until the loop is deliberately armed and a real sender is wired in.
+"""
+from collections.abc import Callable
+from dataclasses import dataclass, field
+
+from .types import Issue
+
+# --------------------------------------------------------------------------- #
+# Kind vocabulary — the terminal states a run can reach. One source of truth
+# shared by callers (the state machine maps Action -> kind) and tests.
+# --------------------------------------------------------------------------- #
+KIND_DONE = "done"                  # landed: merged + CI green, issue closeable
+KIND_NEEDS_HUMAN = "needs-human"    # stalled/errored before pushing — pre-push escalation
+KIND_FROZEN = "frozen"              # fix-forward budget (attempts/wall-clock) exhausted
+
+#: The only kinds ``notify`` will send. Anything else is a caller bug.
+TERMINAL_KINDS: frozenset[str] = frozenset({KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN})
+
+# Default T3 web UI. Threads deep-link off this; overridable per-Notifier so the
+# host isn't hardcoded into the formatter (re-IP / staging / tests).
+DEFAULT_BASE_URL = "https://t3.viktorbarzin.me"
+
+# Per-kind presentation. The leading marker makes the three distinguishable from
+# the title alone in a crowded Slack channel without emoji; priority/tags drive
+# how the sender routes it (a successful close is quiet; the two escalations are
+# loud and tagged so on-call filters can page on them).
+_PRESENTATION: dict[str, tuple[str, str, str, tuple[str, ...]]] = {
+    # kind            -> (marker,     headline,                 priority, tags)
+    KIND_DONE:         ("[DONE]",     "landed",                 "low",  ("afk", "done")),
+    KIND_NEEDS_HUMAN:  ("[NEEDS-HUMAN]", "needs a human",       "high", ("afk", "escalation", "needs-human")),
+    KIND_FROZEN:       ("[FROZEN]",   "frozen — budget exhausted", "high", ("afk", "escalation", "frozen")),
+}
+
+#: A sink that delivers a built notification (HTTP POST in prod, recorder in tests).
+Sender = Callable[["Notification"], None]
+
+
+@dataclass
+class Notification:
+    """The fully-formatted alert handed to the sender.
+
+    A structured payload (not a raw dict) so the sender can map fields onto its
+    own schema — ``title``/``body`` for Slack blocks or an ntfy message,
+    ``priority``/``tags`` for routing, ``link`` for the click-through. ``link``
+    is ``None`` when there is no thread to point at (e.g. dispatch failed before
+    a thread existed); the deep-link is also embedded in ``body`` so it survives
+    senders that only carry a plain message.
+    """
+
+    kind: str
+    issue_ref: str            # "<repo>#<number>", e.g. "infra#42"
+    title: str
+    body: str
+    link: str | None
+    priority: str             # "low" | "high" — escalation loudness for the sender
+    tags: list[str] = field(default_factory=list)
+
+
+def _deep_link(base_url: str, thread_id: str | None) -> str | None:
+    """Build the T3 thread deep-link, or ``None`` when there is no thread."""
+    if not thread_id:
+        return None
+    return f"{base_url.rstrip('/')}/?thread={thread_id}"
+
+
+def render_notification(
+    kind: str,
+    issue: Issue,
+    thread_id: str | None,
+    detail: str,
+    *,
+    base_url: str = DEFAULT_BASE_URL,
+) -> Notification:
+    """Build the :class:`Notification` for a terminal event — pure, no I/O.
+
+    Raises ``ValueError`` if ``kind`` is not one of :data:`TERMINAL_KINDS`: only
+    terminal states ring the doorbell, and a non-terminal kind reaching here is a
+    bug we surface rather than silently send.
+    """
+    if kind not in TERMINAL_KINDS:
+        raise ValueError(
+            f"notifier only sends terminal kinds {sorted(TERMINAL_KINDS)}, got {kind!r}"
+        )
+
+    marker, headline, priority, tags = _PRESENTATION[kind]
+    issue_ref = f"{issue.repo}#{issue.number}"
+    link = _deep_link(base_url, thread_id)
+
+    title = f"{marker} {issue_ref} {headline}"
+
+    body_lines = [detail]
+    if link is not None:
+        body_lines.append(f"Thread: {link}")
+    body = "\n".join(body_lines)
+
+    return Notification(
+        kind=kind,
+        issue_ref=issue_ref,
+        title=title,
+        body=body,
+        link=link,
+        priority=priority,
+        tags=list(tags),
+    )
+
+
+class Notifier:
+    """Sends terminal-state doorbells through an injected ``sender``.
+
+    The ``sender`` is the only egress: ``notify`` formats the payload (via
+    :func:`render_notification`) and hands it over. No transport lives here, so a
+    test injects a recording fake and asserts the payload without posting.
+    """
+
+    def __init__(self, sender: Sender, *, base_url: str = DEFAULT_BASE_URL) -> None:
+        self._sender = sender
+        self._base_url = base_url
+
+    def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
+        """Format a terminal-state alert and deliver it via the injected sender.
+
+        Raises ``ValueError`` for a non-terminal ``kind`` (before any send), and
+        lets a sender failure propagate — see the module docstring.
+        """
+        notification = render_notification(
+            kind, issue, thread_id, detail, base_url=self._base_url
+        )
+        self._sender(notification)
--- a/app/afk/phase_checklist.py
+++ b/app/afk/phase_checklist.py
@ -0,0 +1,116 @@
+"""Render an AFK run's progress as a live markdown checklist.
+
+``render(current, meta)`` is a PURE function: it maps a ``Phase`` plus a bag of
+optional context (``meta``) to a markdown task list, with no I/O and no hidden
+state. The loop posts the result as an issue comment so a human glancing at the
+tracker can see exactly how far an unattended run has got — worktree created,
+test written, green, pushed, CI, deployed, done.
+
+The list always shows all seven lifecycle phases in order. Phases strictly
+*before* ``current`` are checked (``- [x]``); ``current`` is marked in-progress
+(``- [~]``); later phases are empty (``- [ ]``). ``Phase.DONE`` is terminal — at
+that point every line, including DONE itself, is checked.
+
+``meta`` is best-effort decoration only. Recognised keys (all optional):
+``repo`` / ``issue`` (header title), ``thread_id`` (header suffix), and
+``fix_forward_attempts`` (a note line when non-zero). Unknown keys are ignored,
+and a missing key never raises — the checklist degrades gracefully to just the
+phase list. Nothing here mutates ``meta``.
+"""
+from typing import Any
+
+from .types import Phase
+
+# Lifecycle order — the single source of truth for both ordering and the
+# checked/active/empty partition. Must stay in sync with ``Phase`` (the
+# checklist tests assert every phase appears, so a divergence is caught).
+_ORDER: tuple[Phase, ...] = (
+    Phase.WORKTREE,
+    Phase.TESTS_RED,
+    Phase.GREEN,
+    Phase.PUSHED,
+    Phase.CI,
+    Phase.DEPLOYED,
+    Phase.DONE,
+)
+
+# Human-readable label per phase (what shows on each checklist line).
+_LABELS: dict[Phase, str] = {
+    Phase.WORKTREE: "Worktree created",
+    Phase.TESTS_RED: "Failing test written (TDD red)",
+    Phase.GREEN: "Implementation passing (TDD green)",
+    Phase.PUSHED: "Pushed to master",
+    Phase.CI: "CI green on pushed commit",
+    Phase.DEPLOYED: "Deployed / rolled out",
+    Phase.DONE: "Done — issue closed",
+}
+
+# Task-list markers. ``[~]`` (in-progress) is a common markdown convention and,
+# crucially, is neither ``[x]`` nor ``[ ]`` so the active line is always visually
+# distinct from a checked or empty box.
+_DONE = "- [x]"
+_ACTIVE = "- [~]"
+_TODO = "- [ ]"
+
+
+def render(current: Phase, meta: dict[str, Any]) -> str:
+    """Render the run's progress checklist as markdown (see module docstring).
+
+    ``current`` is the phase the run is in right now; ``meta`` supplies optional
+    header/context fields. Pure: identical inputs yield byte-identical output and
+    ``meta`` is never mutated.
+    """
+    current_index = _ORDER.index(current)
+    is_done = current is Phase.DONE
+
+    lines = [_header(meta), ""]
+    for index, phase in enumerate(_ORDER):
+        lines.append(f"{_marker(index, current_index, is_done)} {_LABELS[phase]}")
+
+    note = _fix_forward_note(meta)
+    if note is not None:
+        lines.extend(["", note])
+
+    # Trailing newline so the block sits cleanly when concatenated into a comment.
+    return "\n".join(lines) + "\n"
+
+
+def _marker(index: int, current_index: int, is_done: bool) -> str:
+    """The checkbox marker for the phase at ``index`` given the current phase.
+
+    Earlier phases are checked; the current phase is in-progress; later phases
+    are empty. When the run is DONE, every phase (including DONE) is checked.
+    """
+    if is_done or index < current_index:
+        return _DONE
+    if index == current_index:
+        return _ACTIVE
+    return _TODO
+
+
+def _header(meta: dict[str, Any]) -> str:
+    """The ``###`` title line. Includes ``repo#issue`` when both are present and
+    a ``(thread ...)`` suffix when a thread id is known; degrades to a bare title
+    otherwise."""
+    repo = meta.get("repo")
+    issue = meta.get("issue")
+    if repo is not None and issue is not None:
+        title = f"{repo}#{issue} — AFK run progress"
+    else:
+        title = "AFK run progress"
+
+    thread_id = meta.get("thread_id")
+    if thread_id:
+        title = f"{title} (thread {thread_id})"
+    return f"### {title}"
+
+
+def _fix_forward_note(meta: dict[str, Any]) -> str | None:
+    """A note line when one or more fix-forward attempts have happened, else
+    ``None`` (no line). Zero/absent attempts add nothing — the clean path stays
+    uncluttered."""
+    attempts = meta.get("fix_forward_attempts")
+    if not attempts:
+        return None
+    plural = "attempt" if attempts == 1 else "attempts"
+    return f"_Fix-forward: {attempts} {plural}._"
--- a/app/afk/poller.py
+++ b/app/afk/poller.py
@ -0,0 +1,166 @@
+"""CronJob entrypoint: one dispatch tick of the AFK loop.
+
+The poller is the *first half* of the loop — the part that decides what to start.
+It runs once per CronJob invocation (the loop is stateless between ticks: the
+issue tracker, not in-process memory, is the source of truth for what's already
+in flight). Each tick:
+
+  1. **kill switch** — if ``config.kill_switch`` is set the tick does NOTHING,
+     not even a tracker read. A disabled loop must be inert: zero I/O, zero
+     dispatches. (The pure policy also short-circuits on the kill switch, but the
+     poller bails first so a disabled CronJob never touches the network.)
+  2. read the ready set: ``tracker.list_ready(config.allowlist)`` — every open
+     issue carrying the ready label across the allowlisted repos.
+  3. derive the **per-repo lock**: a repo is "in flight" if any ready issue
+     already carries ``config.in_progress_label`` (the poller stamps that label
+     when it dispatches, so on the next tick the still-open issue re-appears and
+     locks the repo). At most one agent per repo — two would collide on the
+     working tree.
+  4. run the pure ``dispatch_policy.select_dispatchable`` over (ready issues,
+     config, in-flight repos) to get the ordered set to start this tick.
+  5. for each decision: ``t3_client.dispatch(repo, issue, prompt)`` to spawn the
+     worker thread, THEN ``tracker.add_label(repo, issue, in_progress_label)`` —
+     label strictly *after* a successful dispatch, so a dispatch that raises
+     never leaves a phantom lock that would freeze the repo forever.
+
+It owns no policy of its own — the decision lives in ``dispatch_policy`` and the
+agent's behaviour rides in the dispatched prompt's preamble (``t3_client``). The
+two adapters (tracker, T3) are injected behind structural Protocols, so
+production wires the real ``Tracker`` / ``T3Client`` and the tests wire the
+in-memory fakes; nothing here opens a socket on its own.
+
+DISABLED BY DEFAULT: a freshly-loaded ``Config`` has ``kill_switch=True`` and an
+empty allowlist (see ``config.py``), so importing or scheduling this poller
+dispatches nothing. Arming the loop — clearing the kill switch AND enrolling a
+repo — is a deliberate manual step, performed later, never by this code.
+"""
+from collections.abc import Callable
+from dataclasses import dataclass, field
+from typing import Protocol
+
+from . import dispatch_policy
+from .types import Config, DispatchDecision, Issue
+
+
+# --------------------------------------------------------------------------- #
+# Injected adapter Protocols — the I/O edges. Structural, so the real
+# ``Tracker`` / ``T3Client`` and the test fakes both satisfy them with no
+# explicit subclassing. Only the methods the poller actually calls appear here.
+# --------------------------------------------------------------------------- #
+class TrackerPort(Protocol):
+    """The slice of ``tracker.Tracker`` the dispatch tick needs."""
+
+    def list_ready(self, repos: list[str]) -> list[Issue]: ...
+    def add_label(self, repo: str, issue: int, label: str) -> None: ...
+
+
+class T3Port(Protocol):
+    """The slice of ``t3_client.T3Client`` the dispatch tick needs."""
+
+    def dispatch(self, repo: str, issue: int, prompt: str) -> str: ...
+
+
+#: The pure dispatch gate's signature, injected so the tick can be tested with a
+#: stub policy without reaching into module internals. Defaults to the real one.
+DispatchFn = Callable[[list[Issue], Config, set[str]], list[DispatchDecision]]
+
+
+@dataclass
+class Dispatched:
+    """One issue the tick actually started, with the T3 thread it spawned.
+
+    Returned (not just logged) so the caller — and the tests — can see exactly
+    what was launched. ``thread_id`` is what the watcher half later polls to
+    drive this run to completion; ``reason`` carries the policy's human-readable
+    justification through unchanged.
+    """
+
+    issue: Issue
+    thread_id: str
+    reason: str
+
+
+@dataclass
+class PollResult:
+    """The outcome of one dispatch tick.
+
+    ``dispatched`` is empty whenever the loop is disabled, the allowlist is
+    empty, every repo is already in flight, or nothing clears the dispatch gate
+    — i.e. the common steady-state of a quiet tick.
+    """
+
+    dispatched: list[Dispatched] = field(default_factory=list)
+
+
+class Poller:
+    """Runs one dispatch tick over injected tracker + T3 adapters.
+
+    ``dispatch`` defaults to the real pure ``select_dispatchable`` policy; it is
+    injectable purely so a test can substitute a stub without monkeypatching.
+    The poller holds no state between ticks — each ``run_once`` is self-contained.
+    """
+
+    def __init__(
+        self,
+        tracker: TrackerPort,
+        t3_client: T3Port,
+        dispatch: DispatchFn = dispatch_policy.select_dispatchable,
+    ) -> None:
+        self._tracker = tracker
+        self._t3 = t3_client
+        self._dispatch = dispatch
+
+    def run_once(self, config: Config) -> PollResult:
+        """Execute one dispatch tick (see module docstring). Returns what it
+        started; an empty result is the normal quiet-tick outcome."""
+        # Kill switch: bail before any I/O — a disabled loop touches nothing.
+        if config.kill_switch:
+            return PollResult()
+
+        ready = self._tracker.list_ready(config.allowlist)
+        in_flight = _in_flight_repos(ready, config.in_progress_label)
+
+        result = PollResult()
+        for decision in self._dispatch(ready, config, in_flight):
+            issue = decision.issue
+            # Dispatch FIRST; only stamp the lock once the thread exists, so a
+            # failed dispatch leaves the issue purely ready for the next tick to
+            # retry rather than wedged behind a phantom in-progress label.
+            thread_id = self._t3.dispatch(
+                issue.repo, issue.number, _dispatch_prompt(issue)
+            )
+            self._tracker.add_label(issue.repo, issue.number, config.in_progress_label)
+            result.dispatched.append(
+                Dispatched(issue=issue, thread_id=thread_id, reason=decision.reason)
+            )
+        return result
+
+
+# --------------------------------------------------------------------------- #
+# Internals — pure helpers.
+# --------------------------------------------------------------------------- #
+def _in_flight_repos(ready: list[Issue], in_progress_label: str) -> set[str]:
+    """Repos that already have an agent in flight, read off the ready set.
+
+    A repo is in flight if any of its ready issues still carries the in-progress
+    label — the stamp the poller applied on a previous tick's dispatch. Because
+    the dispatched issue keeps its ready label until the watcher closes/relabels
+    it, it re-appears here and locks the repo until the run finishes.
+    """
+    return {issue.repo for issue in ready if in_progress_label in issue.labels}
+
+
+def _dispatch_prompt(issue: Issue) -> str:
+    """The turn prompt for one issue's worker thread.
+
+    The full-access agent fetches the issue body itself (it has ``gh``), so the
+    prompt only needs to point unambiguously at the concrete ``repo#number``; the
+    standing rules are prepended by ``t3_client`` as the issue-implementer
+    preamble. Kept deliberately terse — one canonical instruction, no per-issue
+    templating to drift.
+    """
+    return (
+        f"Implement issue #{issue.number} in the `{issue.repo}` repository. "
+        f"Fetch the issue with `gh issue view {issue.number} --repo {issue.repo}` "
+        f"(and its comments) to get the full task, then implement it end to end."
+    )
--- a/app/afk/run_state_machine.py
+++ b/app/afk/run_state_machine.py
@ -0,0 +1,84 @@
+"""Run state machine: assembled ``RunState`` -> next ``Action`` (ADR-0002).
+
+This is the heart of the AFK loop's per-issue control: each tick the loop
+assembles a :class:`~app.afk.types.RunState` (thread liveness from the
+orchestration snapshot, CI verdict from the watcher, plus its own ``pushed`` /
+``fix_forward_attempts`` / ``elapsed_seconds`` bookkeeping) and calls
+:func:`next_action` to decide what to do next.
+
+The function is **pure** — it reads only its two arguments, never the clock, the
+network, or any global. That keeps the lifecycle policy a plain decision table
+the test suite can exhaust combinatorially; the loop owns all the I/O (closing
+issues, dispatching corrective turns, escalating) based on the Action returned.
+
+The decision table (first match wins):
+
+  * pushed AND CI green                         -> CLOSE_SUCCESS
+      The run is healthy and verified; close the issue. The thread's own status
+      is irrelevant once a pushed commit is green.
+  * pushed AND CI red, budget remaining         -> FIX_FORWARD
+      A pushed commit broke CI. Dispatch another corrective turn — but only
+      while BOTH budgets hold: ``fix_forward_attempts < fix_forward_max_attempts``
+      AND ``elapsed_seconds < fix_forward_max_seconds`` (strict; at/over either
+      bound is exhausted).
+  * pushed AND CI red, budget exhausted         -> FREEZE_ESCALATE
+      Out of fix-forward attempts or wall-clock; stop churning and hand to a
+      human with the broken commit left in place.
+  * not pushed AND thread ERROR/IDLE            -> ESCALATE_PREPUSH
+      The agent will never reach green: it errored, or its turn finished /
+      stalled with nothing pushed. There is no pushed commit to fix forward, so
+      escalate before-push (a different remediation path than FREEZE_ESCALATE).
+  * everything else                             -> WAIT
+      Still in flight: working toward a first push (thread running / unknown), or
+      pushed with CI not yet decided. Poll again next tick.
+"""
+from .types import Action, CIStatus, Config, RunState, ThreadStatus
+
+# Thread states that mean the agent is finished with this turn — it will not push
+# any further on its own. Reaching one of these with nothing pushed is terminal
+# (escalate), whereas RUNNING / None (no snapshot entry yet) means keep waiting.
+_TERMINAL_THREAD_STATES: frozenset[ThreadStatus] = frozenset(
+    {ThreadStatus.ERROR, ThreadStatus.IDLE}
+)
+
+
+def next_action(state: RunState, config: Config) -> Action:
+    """Decide the next :class:`Action` for one issue's run.
+
+    Pure and total: every reachable ``(thread_status, ci_status, pushed,
+    attempts, elapsed)`` combination maps to exactly one Action via the table in
+    the module docstring. See that table for the rationale of each branch.
+    """
+    if state.pushed:
+        # A commit is out; the CI verdict on it drives everything from here.
+        if state.ci_status is CIStatus.GREEN:
+            return Action.CLOSE_SUCCESS
+        if state.ci_status is CIStatus.RED:
+            return (
+                Action.FIX_FORWARD
+                if _fix_forward_budget_remaining(state, config)
+                else Action.FREEZE_ESCALATE
+            )
+        # CI pending / not yet reported -> wait for the verdict.
+        return Action.WAIT
+
+    # Nothing pushed yet. If the turn is over (errored or gone idle) the run can
+    # never reach green on its own -> escalate before-push; otherwise it is still
+    # working toward a first push -> wait.
+    if state.thread_status in _TERMINAL_THREAD_STATES:
+        return Action.ESCALATE_PREPUSH
+    return Action.WAIT
+
+
+def _fix_forward_budget_remaining(state: RunState, config: Config) -> bool:
+    """True while another fix-forward turn is allowed.
+
+    Both bounds must hold (strict ``<``): the run has spent fewer than
+    ``fix_forward_max_attempts`` corrective turns AND fewer than
+    ``fix_forward_max_seconds`` of wall-clock. Hitting either cap exhausts the
+    budget.
+    """
+    return (
+        state.fix_forward_attempts < config.fix_forward_max_attempts
+        and state.elapsed_seconds < config.fix_forward_max_seconds
+    )
--- a/app/afk/t3_client.py
+++ b/app/afk/t3_client.py
@ -0,0 +1,264 @@
+"""Adapter for the in-cluster T3 Code instance — the AFK executor + cockpit.
+
+The control plane keeps the brain; T3 runs the agent. This module is the thin
+wire between them, written against T3's **real** orchestration contract
+(reverse-engineered from the v0.0.27 binary and verified live against t3-afk on
+2026-06-15 — an earlier version of this adapter was written against a guessed
+shape that a fake test accepted but the real server 400s).
+
+The contract, in three facts that shape everything here:
+
+  1. **Bare command envelope.** ``POST /api/orchestration/dispatch`` takes a
+     single command object whose discriminator is ``type`` (NOT a ``command``
+     string, NOT a wrapper). The body *is* the command.
+  2. **Client-authoritative IDs.** The CLIENT mints ``threadId`` / ``commandId``
+     / ``messageId`` (UUIDs) and stamps ``createdAt`` (ISO-8601); the server
+     replies ``{"sequence": N}`` and does NOT echo the thread id. So ``dispatch``
+     returns the id it generated, never one parsed from the response.
+  3. **Threads live in a project.** A project's ``workspaceRoot`` is the repo
+     checkout the agent runs in (it ``cd``s there and commits there). So a repo
+     maps to a project; ``dispatch`` ensures that project exists before creating
+     the thread.
+
+Operations (the methods ``poller`` / ``watcher`` call, plus a multi-turn helper):
+
+  * ``dispatch(repo, issue, prompt) -> thread_id`` — ensure the repo's project,
+    then ``thread.create`` + ``thread.turn.start`` (``ISSUE_IMPLEMENTER_PREAMBLE
+    + prompt`` as the user message). Returns the client-minted thread id.
+  * ``send_turn(thread_id, prompt) -> None`` — a follow-up user turn on an
+    existing thread. Multi-turn context is retained (verified live), so this is
+    how a conversation continues without spawning a fresh thread.
+  * ``snapshot() -> dict`` — the fleet read-model (``GET``); the watcher reads
+    per-thread ``latestTurn.state`` from it.
+
+The HTTP transport, the bearer provider, the id factory, and the clock are all
+**injected**, so production hands in an ``httpx.Client`` + a Vault-backed token
+reader + ``uuid4`` + a UTC clock, while tests hand in deterministic fakes. The
+bearer is re-read from the provider on **every** request because T3's
+``orchestration:operate`` token rotates.
+"""
+import uuid
+from collections.abc import Callable
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from typing import Protocol
+
+from .issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
+
+# Orchestration API paths, relative to the configured base URL.
+_DISPATCH_PATH = "/api/orchestration/dispatch"
+_SNAPSHOT_PATH = "/api/orchestration/snapshot"
+
+# Pilot-baked execution envelope. ``claudeAgent`` is the embedded Claude Agent
+# SDK instance; ``full-access`` is the unattended runtime (bypass-permissions);
+# ``default`` interaction mode is normal turns (vs ``plan``). The model is the
+# one the pilot validated — tunable via the constructor.
+_INSTANCE_ID = "claudeAgent"
+_DEFAULT_MODEL = "claude-sonnet-4-6"
+_RUNTIME_MODE = "full-access"
+_INTERACTION_MODE = "default"
+
+# JSON shapes. Command bodies and the snapshot read-model are open string-keyed
+# objects; ``object`` values keep us honest without a bare ``Any``.
+type Json = dict[str, object]
+
+
+def _uuid() -> str:
+    """Default id factory: a fresh random UUID string (thread/command/message ids)."""
+    return str(uuid.uuid4())
+
+
+def _now_iso() -> str:
+    """Default clock: the current instant as an ISO-8601 UTC timestamp."""
+    return datetime.now(timezone.utc).isoformat()
+
+
+@dataclass(frozen=True)
+class ProjectRef:
+    """Where a repo's agent runs. ``project_id`` is the stable T3 project id (the
+    client mints it, deterministically per repo); ``workspace_root`` is the repo
+    checkout directory the project points at (the agent's cwd); ``title`` is the
+    human label shown in the cockpit."""
+
+    project_id: str
+    workspace_root: str
+    title: str
+
+
+def default_project_resolver(workspace_base: str = "/data") -> "Callable[[str], ProjectRef]":
+    """A repo -> :class:`ProjectRef` resolver with stable, deterministic ids.
+
+    ``project_id`` is a UUID5 of the repo (so the same repo always resolves to the
+    same project across ticks and restarts — ``dispatch``'s ensure-project step
+    is therefore idempotent); ``workspace_root`` is ``<workspace_base>/<slug>``
+    where the slug flattens ``owner/name`` to a single path segment. The checkout
+    itself (cloning the repo into ``workspace_root``) is an enrollment concern,
+    not this adapter's — the agent or a provisioning step populates it.
+    """
+
+    def resolve(repo: str) -> ProjectRef:
+        slug = repo.replace("/", "__")
+        return ProjectRef(
+            project_id=str(uuid.uuid5(uuid.NAMESPACE_URL, f"afk-project:{repo}")),
+            workspace_root=f"{workspace_base.rstrip('/')}/{slug}",
+            title=repo,
+        )
+
+    return resolve
+
+
+class HttpResponse(Protocol):
+    """The httpx-shaped response surface this adapter relies on: ``raise_for_status``
+    turns a non-2xx into an exception (so a failed command aborts the sequence)
+    and ``json`` parses the body."""
+
+    def raise_for_status(self) -> object: ...
+
+    def json(self) -> Json: ...
+
+
+class HttpClient(Protocol):
+    """Minimal injected transport: a JSON ``post`` and a ``get``, both taking
+    explicit headers. A strict subset of ``httpx.Client`` so the real client
+    passes straight through and tests pass a recorder."""
+
+    def post(self, url: str, json: Json, headers: dict[str, str]) -> HttpResponse: ...
+
+    def get(self, url: str, headers: dict[str, str]) -> HttpResponse: ...
+
+
+class T3Client:
+    """Dispatch/snapshot adapter for one in-cluster T3 instance.
+
+    ``base_url`` is the T3 service root (a trailing slash is tolerated); ``http``
+    is the injected transport; ``bearer_provider`` returns the current
+    ``orchestration:operate`` token, re-read per request; ``project_resolver``
+    maps a repo to its :class:`ProjectRef`; ``id_factory`` / ``clock`` are
+    injected for deterministic tests (defaulting to ``uuid4`` / UTC now).
+    """
+
+    def __init__(
+        self,
+        base_url: str,
+        http: HttpClient,
+        bearer_provider: Callable[[], str],
+        project_resolver: Callable[[str], ProjectRef] | None = None,
+        *,
+        id_factory: Callable[[], str] = _uuid,
+        clock: Callable[[], str] = _now_iso,
+        model: str = _DEFAULT_MODEL,
+    ) -> None:
+        self._base_url = base_url.rstrip("/")
+        self._http = http
+        self._bearer_provider = bearer_provider
+        self._project_for = project_resolver or default_project_resolver()
+        self._id = id_factory
+        self._now = clock
+        self._model = model
+
+    # ----------------------------------------------------------------- #
+    # Public API (the ``t3_client.T3Client`` contract the poller/watcher use).
+    # ----------------------------------------------------------------- #
+    def dispatch(self, repo: str, issue: int, prompt: str) -> str:
+        """Spawn one worker thread for ``issue`` of ``repo`` and return its id.
+
+        Ensures the repo's project exists, generates the thread id locally, then
+        POSTs ``thread.create`` followed by ``thread.turn.start`` (delivering
+        ``ISSUE_IMPLEMENTER_PREAMBLE + prompt``). Any failed POST raises and
+        short-circuits the rest of the sequence. The returned id is the one this
+        method minted — the server never sends it back.
+        """
+        project = self._ensure_project(repo)
+        thread_id = self._id()
+
+        self._post(self._thread_create_command(thread_id, project))
+        self._post(self._turn_command(thread_id, ISSUE_IMPLEMENTER_PREAMBLE + prompt))
+        return thread_id
+
+    def send_turn(self, thread_id: str, prompt: str) -> None:
+        """Deliver a follow-up user turn to an existing thread (multi-turn).
+
+        Used to continue a conversation — the agent retains the thread's prior
+        context across turns. No preamble: the standing rules were already
+        delivered on the opening turn.
+        """
+        self._post(self._turn_command(thread_id, prompt))
+
+    def snapshot(self) -> Json:
+        """Return the parsed fleet read-model from ``/api/orchestration/snapshot``."""
+        return self._get(_SNAPSHOT_PATH).json()
+
+    # ----------------------------------------------------------------- #
+    # Command builders (the real wire shapes).
+    # ----------------------------------------------------------------- #
+    def _ensure_project(self, repo: str) -> ProjectRef:
+        """Make sure the repo's project exists, creating it if absent. Idempotent:
+        the resolver's project id is stable per repo, so a project already in the
+        snapshot is left untouched (no duplicate, no error)."""
+        project = self._project_for(repo)
+        existing = {
+            p.get("id") for p in self._get(_SNAPSHOT_PATH).json().get("projects", [])
+        }
+        if project.project_id not in existing:
+            self._post(
+                {
+                    "type": "project.create",
+                    "commandId": self._id(),
+                    "projectId": project.project_id,
+                    "title": project.title,
+                    "workspaceRoot": project.workspace_root,
+                    "createWorkspaceRootIfMissing": True,
+                    "createdAt": self._now(),
+                }
+            )
+        return project
+
+    def _thread_create_command(self, thread_id: str, project: ProjectRef) -> Json:
+        return {
+            "type": "thread.create",
+            "commandId": self._id(),
+            "threadId": thread_id,
+            "projectId": project.project_id,
+            "title": project.title,
+            "modelSelection": {"instanceId": _INSTANCE_ID, "model": self._model},
+            "runtimeMode": _RUNTIME_MODE,
+            "interactionMode": _INTERACTION_MODE,
+            "branch": None,
+            "worktreePath": None,
+            "createdAt": self._now(),
+        }
+
+    def _turn_command(self, thread_id: str, text: str) -> Json:
+        return {
+            "type": "thread.turn.start",
+            "commandId": self._id(),
+            "threadId": thread_id,
+            "message": {
+                "messageId": self._id(),
+                "role": "user",
+                "text": text,
+                "attachments": [],
+            },
+            "runtimeMode": _RUNTIME_MODE,
+            "interactionMode": _INTERACTION_MODE,
+            "createdAt": self._now(),
+        }
+
+    # ----------------------------------------------------------------- #
+    # Transport internals.
+    # ----------------------------------------------------------------- #
+    def _post(self, command: Json) -> HttpResponse:
+        resp = self._http.post(self._url(_DISPATCH_PATH), json=command, headers=self._headers())
+        resp.raise_for_status()
+        return resp
+
+    def _get(self, path: str) -> HttpResponse:
+        resp = self._http.get(self._url(path), headers=self._headers())
+        resp.raise_for_status()
+        return resp
+
+    def _url(self, path: str) -> str:
+        return f"{self._base_url}{path}"
+
+    def _headers(self) -> dict[str, str]:
+        return {"Authorization": f"Bearer {self._bearer_provider()}"}
--- a/app/afk/tracker.py
+++ b/app/afk/tracker.py
@ -0,0 +1,243 @@
+"""Issue-tracker adapter — the loop's read/write port onto GitHub issues.
+
+``Tracker`` is the only place the AFK loop touches the issue tracker. It wraps an
+injected ``GitHubClient`` (the port) so the policy/state-machine code — and the
+tests — never depend on a real ``gh`` or the network: production injects
+``GhCliClient`` (shells out to ``gh`` with no-shell argv); tests inject a fake.
+
+The split is deliberate. The ``GitHubClient`` port speaks only in *primitives*
+(list raw issues for a label, fetch a single issue's label events, and the four
+mutations). All the loop-specific *decisions* live on ``Tracker``:
+
+  * ``labeled_by_trusted`` — decided **fail-closed** from the actor who made the
+    most-recent application of the ready label. On private repos only
+    collaborators can label, so the label *is* the authorization (design doc,
+    "Trigger & dispatch predicate"); an unattributable label is never trusted.
+  * ``blocked_by`` — the issue numbers in the body's "Blocked by #N" clauses
+    (the per-issue dependency the design doc gates dispatch on).
+  * ``priority`` — read off a ``priority:<n>`` label, lowest wins (lower runs
+    first, matching ``Issue.priority`` semantics in ``types``).
+
+Keeping the decisions here, not in the client, is what lets the whole read path
+be tested against a thin fake. Mutations (``add_label`` / ``remove_label`` /
+``comment`` / ``close``) are pass-throughs the loop drives during a run.
+"""
+import json
+import re
+from collections.abc import Callable
+from subprocess import PIPE, run
+from typing import Protocol, runtime_checkable
+
+from .types import Issue
+
+# Trusted author associations: GitHub tags each issue event actor with their
+# association to the repo. Only these may arm an issue for the AFK loop — the
+# trust gate from the design doc. Overridable per Tracker for a tighter policy.
+DEFAULT_TRUSTED_ASSOCIATIONS: frozenset[str] = frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
+
+# Default gating label; mirrors Config.ready_label so a Tracker built without an
+# explicit override matches the production default.
+DEFAULT_READY_LABEL = "ready-for-agent"
+
+# "Blocked by #3, #4 and #10" → [3, 4, 10]. We match a "blocked by" lead-in
+# (case-insensitive) and then harvest every "#<n>" in the clause that follows,
+# up to the next line break — so a bare "#7 for context" elsewhere is ignored.
+_BLOCKED_BY_CLAUSE = re.compile(r"blocked\s+by\b([^\n\r]*)", re.IGNORECASE)
+_ISSUE_REF = re.compile(r"#(\d+)")
+
+# "priority:2" → 2. Anything non-numeric (e.g. "priority:high") is not a numeric
+# priority and is skipped.
+_PRIORITY_LABEL = re.compile(r"^priority:(\d+)$")
+
+
+@runtime_checkable
+class GitHubClient(Protocol):
+    """The primitive surface ``Tracker`` depends on — one issue tracker, faked
+    in tests. Implementations must not embed loop policy; they only fetch raw
+    data and perform the four mutations.
+
+    ``list_issues`` returns the ``gh issue list --json number,labels,body`` shape
+    (``labels`` is a list of ``{"name": ...}``; ``body`` may be ``None``).
+    ``label_events`` returns the ``labeled`` timeline events for one issue, each
+    with ``label.name``, ``actor.login`` and ``author_association``.
+    """
+
+    def list_issues(self, repo: str, label: str) -> list[dict]: ...
+    def label_events(self, repo: str, number: int) -> list[dict]: ...
+    def add_label(self, repo: str, number: int, label: str) -> None: ...
+    def remove_label(self, repo: str, number: int, label: str) -> None: ...
+    def comment(self, repo: str, number: int, body: str) -> None: ...
+    def close(self, repo: str, number: int) -> None: ...
+
+
+class Tracker:
+    """Adapter that turns raw issue-tracker data into ``Issue`` records and
+    relays mutations, over an injected :class:`GitHubClient`."""
+
+    def __init__(
+        self,
+        client: GitHubClient,
+        ready_label: str = DEFAULT_READY_LABEL,
+        trusted_associations: frozenset[str] = DEFAULT_TRUSTED_ASSOCIATIONS,
+    ) -> None:
+        self.client = client
+        self.ready_label = ready_label
+        self.trusted_associations = trusted_associations
+
+    # ----------------------------------------------------------------- reads #
+    def list_ready(self, repos: list[str]) -> list[Issue]:
+        """Every ready-labeled open issue across ``repos``, as ``Issue`` records.
+
+        Ordering follows the client's per-repo order; dispatch ordering by
+        priority is the dispatch policy's job, not the tracker's.
+        """
+        issues: list[Issue] = []
+        for repo in repos:
+            for raw in self.client.list_issues(repo, self.ready_label):
+                issues.append(self._to_issue(repo, raw))
+        return issues
+
+    def _to_issue(self, repo: str, raw: dict) -> Issue:
+        number = int(raw["number"])
+        labels = [lbl["name"] for lbl in raw.get("labels", [])]
+        return Issue(
+            number=number,
+            repo=repo,
+            labels=labels,
+            blocked_by=_parse_blocked_by(raw.get("body")),
+            labeled_by_trusted=self._is_labeled_by_trusted(repo, number),
+            priority=_parse_priority(labels),
+        )
+
+    def _is_labeled_by_trusted(self, repo: str, number: int) -> bool:
+        """True iff the MOST RECENT application of the ready label was made by a
+        trusted actor. Fail-closed: no attributable application → not trusted."""
+        last_association: str | None = None
+        for event in self.client.label_events(repo, number):
+            if event.get("event") != "labeled":
+                continue
+            if (event.get("label") or {}).get("name") != self.ready_label:
+                continue
+            last_association = event.get("author_association")
+        return last_association in self.trusted_associations
+
+    # ------------------------------------------------------------- mutations #
+    def add_label(self, repo: str, issue: int, label: str) -> None:
+        self.client.add_label(repo, issue, label)
+
+    def remove_label(self, repo: str, issue: int, label: str) -> None:
+        self.client.remove_label(repo, issue, label)
+
+    def comment(self, repo: str, issue: int, body: str) -> None:
+        self.client.comment(repo, issue, body)
+
+    def close(self, repo: str, issue: int) -> None:
+        self.client.close(repo, issue)
+
+
+# --------------------------------------------------------------------------- #
+# Parsing helpers — pure functions, no I/O.
+# --------------------------------------------------------------------------- #
+def _parse_blocked_by(body: str | None) -> list[int]:
+    """Issue numbers referenced in the body's "Blocked by #N" clauses.
+
+    Order-preserving and de-duplicated; bare "#N" mentions outside a "blocked by"
+    clause are ignored. A missing/empty body yields ``[]``.
+    """
+    if not body:
+        return []
+    seen: dict[int, None] = {}  # insertion-ordered set
+    for clause in _BLOCKED_BY_CLAUSE.findall(body):
+        for ref in _ISSUE_REF.findall(clause):
+            seen.setdefault(int(ref), None)
+    return list(seen)
+
+
+def _parse_priority(labels: list[str]) -> int:
+    """Numeric priority from a ``priority:<n>`` label, lowest wins; 0 if none."""
+    priorities = [
+        int(match.group(1))
+        for label in labels
+        if (match := _PRIORITY_LABEL.match(label))
+    ]
+    return min(priorities) if priorities else 0
+
+
+# --------------------------------------------------------------------------- #
+# Concrete client — shells out to `gh`. Injected `run` keeps it testable.
+# --------------------------------------------------------------------------- #
+def _default_run(argv: list[str]) -> str:
+    """Run ``argv`` with no shell and return stdout (text). Raises on non-zero.
+
+    List argv (never a shell string), matching the no-injection-surface pattern
+    the breakglass/main subprocess helpers use — the repo/label/body values are
+    never interpreted by a shell.
+    """
+    proc = run(argv, stdout=PIPE, stderr=PIPE, text=True, check=False)
+    if proc.returncode != 0:
+        raise RuntimeError(f"{argv[0]} failed ({proc.returncode}): {proc.stderr[:200]}")
+    return proc.stdout
+
+
+class GhCliClient:
+    """:class:`GitHubClient` backed by the ``gh`` CLI.
+
+    ``repo_owner`` is the GitHub owner/org the sub-project repos live under, so a
+    bare repo name (``"infra"``) becomes the ``--repo owner/infra`` slug ``gh``
+    wants. ``run`` is the subprocess runner (defaults to the real no-shell one);
+    tests inject a fake to capture argv without spawning ``gh``.
+    """
+
+    def __init__(self, repo_owner: str, run: Callable[[list[str]], str] = _default_run) -> None:
+        self.repo_owner = repo_owner
+        self._run = run
+
+    def _slug(self, repo: str) -> str:
+        return f"{self.repo_owner}/{repo}"
+
+    def list_issues(self, repo: str, label: str) -> list[dict]:
+        out = self._run([
+            "gh", "issue", "list", "--repo", self._slug(repo),
+            "--label", label, "--state", "open",
+            "--json", "number,labels,body", "--limit", "100",
+        ])
+        return _loads_list(out)
+
+    def label_events(self, repo: str, number: int) -> list[dict]:
+        out = self._run([
+            "gh", "api",
+            f"repos/{self._slug(repo)}/issues/{number}/timeline",
+            "--paginate",
+            "-H", "Accept: application/vnd.github+json",
+        ])
+        events = _loads_list(out)
+        return [e for e in events if e.get("event") == "labeled"]
+
+    def add_label(self, repo: str, number: int, label: str) -> None:
+        self._run([
+            "gh", "issue", "edit", str(number), "--repo", self._slug(repo),
+            "--add-label", label,
+        ])
+
+    def remove_label(self, repo: str, number: int, label: str) -> None:
+        self._run([
+            "gh", "issue", "edit", str(number), "--repo", self._slug(repo),
+            "--remove-label", label,
+        ])
+
+    def comment(self, repo: str, number: int, body: str) -> None:
+        self._run([
+            "gh", "issue", "comment", str(number), "--repo", self._slug(repo),
+            "--body", body,
+        ])
+
+    def close(self, repo: str, number: int) -> None:
+        self._run(["gh", "issue", "close", str(number), "--repo", self._slug(repo)])
+
+
+def _loads_list(out: str) -> list[dict]:
+    """Parse ``gh`` JSON stdout into a list of dicts. Empty stdout → ``[]``."""
+    text = out.strip()
+    if not text:
+        return []
+    return json.loads(text)
--- a/app/afk/types.py
+++ b/app/afk/types.py
@ -0,0 +1,134 @@
+"""Shared types for the AFK loop — the contract every module builds against.
+
+Stdlib only (``dataclasses`` + ``enum``), matching the breakglass code: no
+pydantic, modern ``X | None`` unions, precise field types. Every other module in
+``app.afk`` imports its inputs/outputs from here so the pieces stay aligned; the
+module-level docstrings in ``__init__`` list which functions consume which type.
+
+Nothing here has behaviour — these are pure data carriers and closed enums. Keep
+it that way: logic lives in ``dispatch_policy`` / ``run_state_machine`` / the
+client modules, never on the dataclasses.
+"""
+from dataclasses import dataclass
+from enum import Enum
+
+
+# --------------------------------------------------------------------------- #
+# Enums — closed vocabularies the state machine and clients speak in.
+# --------------------------------------------------------------------------- #
+class ThreadStatus(Enum):
+    """Liveness of a T3 thread, as projected from the orchestration snapshot.
+
+    ``RUNNING`` — the agent is still working the turn; ``IDLE`` — the turn
+    finished cleanly (it has gone quiet); ``ERROR`` — the thread/turn failed.
+    """
+
+    RUNNING = "running"
+    IDLE = "idle"
+    ERROR = "error"
+
+
+class CIStatus(Enum):
+    """CI verdict for a pushed commit. ``PENDING`` covers both "no run yet" and
+    "in progress" — the state machine waits on either."""
+
+    PENDING = "pending"
+    GREEN = "green"
+    RED = "red"
+
+
+class Phase(Enum):
+    """Where a single issue's run is in its lifecycle. Ordered: each phase is a
+    gate the run passes through on the way to ``DONE``. ``phase_checklist``
+    renders these; the loop advances through them as evidence arrives."""
+
+    WORKTREE = "worktree"      # isolated workspace created
+    TESTS_RED = "tests_red"    # failing test written first (TDD red)
+    GREEN = "green"            # implementation makes tests pass (TDD green)
+    PUSHED = "pushed"          # commit(s) pushed to master
+    CI = "ci"                  # CI pipeline running on the pushed commit
+    DEPLOYED = "deployed"      # deploy/rollout reached the cluster
+    DONE = "done"              # verified complete; issue can be closed
+
+
+class Action(Enum):
+    """The decision ``run_state_machine.next_action`` returns for one tick.
+
+    ``WAIT`` — nothing to do yet, poll again; ``CLOSE_SUCCESS`` — run is green,
+    CI passed, close the issue; ``ESCALATE_PREPUSH`` — the agent errored/stalled
+    before pushing anything, hand back to a human; ``FIX_FORWARD`` — CI went red
+    on a pushed commit, dispatch another corrective turn; ``FREEZE_ESCALATE`` —
+    fix-forward budget exhausted (attempts or wall-clock), stop and escalate.
+    """
+
+    WAIT = "wait"
+    CLOSE_SUCCESS = "close_success"
+    ESCALATE_PREPUSH = "escalate_prepush"
+    FIX_FORWARD = "fix_forward"
+    FREEZE_ESCALATE = "freeze_escalate"
+
+
+# --------------------------------------------------------------------------- #
+# Data carriers.
+# --------------------------------------------------------------------------- #
+@dataclass
+class Issue:
+    """A tracker issue the loop might dispatch.
+
+    ``labeled_by_trusted`` records whether the gating label was applied by a
+    trusted identity — the loop must never dispatch an issue made ready by an
+    untrusted actor (prompt-injection / drive-by). ``blocked_by`` lists issue
+    numbers that must close first; ``priority`` orders the ready set (lower runs
+    first, matching tracker conventions).
+    """
+
+    number: int
+    repo: str
+    labels: list[str]
+    blocked_by: list[int]
+    labeled_by_trusted: bool
+    priority: int
+
+
+@dataclass
+class DispatchDecision:
+    """An issue the dispatch policy selected to run now, with a human-readable
+    ``reason`` (logged + surfaced in notifications, never parsed)."""
+
+    issue: Issue
+    reason: str
+
+
+@dataclass
+class Config:
+    """Loop configuration. DISABLED BY DEFAULT — ``kill_switch=True`` and an
+    empty ``allowlist`` mean a freshly-constructed Config dispatches nothing.
+    Enabling is a deliberate manual step (see ``config.from_env`` /
+    ``from_configmap``).
+    """
+
+    allowlist: list[str]
+    kill_switch: bool
+    in_progress_label: str = "agent-in-progress"
+    ready_label: str = "ready-for-agent"
+    budget_usd: float = 100.0
+    fix_forward_max_attempts: int = 5
+    fix_forward_max_seconds: int = 3600
+
+
+@dataclass
+class RunState:
+    """Everything the state machine needs to decide one issue's next move.
+
+    Assembled each tick from the orchestration snapshot (``thread_status``), the
+    CI watcher (``ci_status``), and the loop's own bookkeeping (``pushed``,
+    ``fix_forward_attempts``, ``elapsed_seconds``). ``thread_status`` /
+    ``ci_status`` are ``None`` when not yet known (no snapshot entry / nothing
+    pushed to check yet).
+    """
+
+    thread_status: ThreadStatus | None
+    ci_status: CIStatus | None
+    pushed: bool
+    fix_forward_attempts: int
+    elapsed_seconds: float
--- a/app/afk/watcher.py
+++ b/app/afk/watcher.py
@ -0,0 +1,355 @@
+"""CronJob entrypoint: drive ONE in-flight AFK run by a single tick.
+
+The watcher is the *second half* of the loop — the part that drives a run the
+poller already started through to a terminal state. Given one in-flight run
+(``InFlightRun``: the issue, the T3 thread to poll, the pushed commit if any,
+and the fix-forward bookkeeping), one ``tick``:
+
+  1. **assemble a ``RunState``** from the live edges + the run's bookkeeping:
+       * ``thread_status`` — from ``t3_client.snapshot()``, by finding this run's
+         thread and mapping its ``latestTurn.state`` (``completed`` → idle,
+         ``running``/``in_progress``/``pending`` → running, ``errored`` → error)
+         to a ``ThreadStatus`` (missing thread, no turn yet, or any unrecognised
+         state folds to ``None`` → "no status yet" → the state machine WAITs; we
+         never escalate or close on a status we don't understand);
+       * ``ci_status`` — ``ci_watcher.status(repo, commit)`` *only* when a commit
+         is pushed (no commit ⇒ nothing to check ⇒ ``None``);
+       * ``pushed`` / ``fix_forward_attempts`` / ``elapsed_seconds`` — straight
+         from the run.
+  2. **decide** via the pure ``run_state_machine.next_action`` (it owns the
+     lifecycle policy; the watcher owns only the I/O the decision implies).
+  3. **act** on the returned ``Action``:
+       * ``CLOSE_SUCCESS`` → ``tracker.close`` + drop the in-progress label +
+         DONE checklist + ``done`` doorbell. The run landed.
+       * ``ESCALATE_PREPUSH`` / ``FREEZE_ESCALATE`` → drop the in-progress label,
+         add the ``ready-for-human`` label, post the checklist, ring the
+         ``needs-human`` / ``frozen`` doorbell. The run is handed to a human; the
+         issue is left OPEN (not closed) with the work in place.
+       * ``FIX_FORWARD`` → dispatch a corrective turn (``t3_client.dispatch``),
+         bump the fix-forward attempt count, refresh the checklist, and keep the
+         run in flight (NOT terminal: no label churn, no doorbell — the notifier
+         only speaks terminal kinds). The new thread id rides back on the result
+         so the next tick polls the corrective turn.
+       * ``WAIT`` → just refresh the progress checklist and keep waiting.
+
+Every adapter (T3, tracker, CI, notifier) is injected behind a structural
+Protocol, so production wires the real clients and the tests wire the in-memory
+fakes; this module opens no socket and reads no message bodies. (The pilot keeps
+T3 ``state.sqlite`` message-body reads out of the core loop — snapshot status +
+CI status are all the state machine needs — so this watcher never execs into the
+pod; that observability nicety is a separate, optional concern.)
+
+DISABLED BY DEFAULT applies transitively: the poller never starts a run while
+the loop is off (``config.kill_switch`` / empty allowlist — see ``config.py``),
+so with the shipped defaults there is never an ``InFlightRun`` to tick.
+"""
+from dataclasses import dataclass
+from typing import Protocol
+
+from . import phase_checklist, run_state_machine
+from .notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
+from .poller import T3Port as _DispatchPort  # dispatch(repo, issue, prompt) -> id
+from .types import Action, CIStatus, Config, Issue, Phase, RunState, ThreadStatus
+
+# T3 ``latestTurn.state`` -> ThreadStatus. The real snapshot reports a thread's
+# liveness as the state of its latest turn (verified against t3-afk v0.0.27):
+# ``completed`` == the turn finished cleanly (agent is idle, awaiting input);
+# any not-yet-finished state (``running``/``in_progress``/``pending``/``queued``/
+# ``pendingInit``) == still working; ``errored`` == the turn failed. Anything not
+# in here (a state T3 adds later, or a malformed/absent entry) maps to None —
+# "no usable status yet" — so the state machine waits rather than acting on
+# something it can't interpret.
+_THREAD_STATUS_BY_STRING: dict[str, ThreadStatus] = {
+    "completed": ThreadStatus.IDLE,
+    "running": ThreadStatus.RUNNING,
+    "in_progress": ThreadStatus.RUNNING,
+    "pending": ThreadStatus.RUNNING,
+    "queued": ThreadStatus.RUNNING,
+    "pendingInit": ThreadStatus.RUNNING,
+    "errored": ThreadStatus.ERROR,
+}
+
+# Action -> the terminal doorbell kind to ring. Only the terminal actions appear;
+# WAIT / FIX_FORWARD are non-terminal and ring nothing (the notifier rejects a
+# non-terminal kind on purpose — see ``notifier.TERMINAL_KINDS``).
+_TERMINAL_KIND_BY_ACTION: dict[Action, str] = {
+    Action.CLOSE_SUCCESS: KIND_DONE,
+    Action.ESCALATE_PREPUSH: KIND_NEEDS_HUMAN,
+    Action.FREEZE_ESCALATE: KIND_FROZEN,
+}
+
+# Default label applied when a run is handed back to a human. Mirrors the
+# tracker's ``ready-for-agent`` convention; overridable per-Watcher.
+DEFAULT_READY_FOR_HUMAN_LABEL = "ready-for-human"
+
+
+# --------------------------------------------------------------------------- #
+# Injected adapter Protocols — structural, so the real clients and the test
+# fakes both satisfy them with no subclassing. Only the methods the watcher
+# actually calls appear. ``DispatchPort`` is reused from ``poller``.
+# --------------------------------------------------------------------------- #
+class SnapshotPort(_DispatchPort, Protocol):
+    """T3 surface the watcher needs: ``dispatch`` (for the corrective turn) plus
+    ``snapshot`` (for thread liveness)."""
+
+    def snapshot(self) -> dict: ...
+
+
+class TrackerPort(Protocol):
+    """The slice of ``tracker.Tracker`` the watch tick needs."""
+
+    def add_label(self, repo: str, issue: int, label: str) -> None: ...
+    def remove_label(self, repo: str, issue: int, label: str) -> None: ...
+    def comment(self, repo: str, issue: int, body: str) -> None: ...
+    def close(self, repo: str, issue: int) -> None: ...
+
+
+class CIPort(Protocol):
+    """The slice of ``ci_watcher.CIWatcher`` the watch tick needs."""
+
+    def status(self, repo: str, commit: str) -> CIStatus: ...
+
+
+class NotifierPort(Protocol):
+    """The slice of ``notifier.Notifier`` the watch tick needs."""
+
+    def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None: ...
+
+
+@dataclass
+class InFlightRun:
+    """One run the watcher is driving, as the loop tracks it between ticks.
+
+    ``thread_id`` is the T3 thread to poll this tick; ``commit`` is the pushed
+    commit CI watches (``None`` until the agent has pushed). ``fix_forward_attempts``
+    and ``elapsed_seconds`` are the loop's own bookkeeping, fed straight into the
+    assembled ``RunState`` — ``pushed`` is derived as ``commit is not None``.
+    """
+
+    issue: Issue
+    thread_id: str
+    commit: str | None
+    fix_forward_attempts: int = 0
+    elapsed_seconds: float = 0.0
+
+
+@dataclass
+class TickResult:
+    """The outcome of one watch tick.
+
+    ``action`` is the state machine's verdict; ``terminal`` is True iff the run
+    reached an end state (closed or handed to a human) and should no longer be
+    ticked. ``thread_id`` / ``fix_forward_attempts`` carry the (possibly updated)
+    bookkeeping the caller threads into the next ``InFlightRun`` — they change
+    only on a FIX_FORWARD (new corrective thread, incremented attempts) and are
+    otherwise echoed back unchanged.
+    """
+
+    action: Action
+    terminal: bool
+    thread_id: str
+    fix_forward_attempts: int
+
+
+class Watcher:
+    """Drives one in-flight run per ``tick`` over injected adapters.
+
+    The three escalation-vs-success decisions live in the pure
+    ``run_state_machine``; this class only performs the I/O each decision
+    implies. ``ready_for_human_label`` is the label stamped on a run handed back
+    to a human (default :data:`DEFAULT_READY_FOR_HUMAN_LABEL`).
+    """
+
+    def __init__(
+        self,
+        t3_client: SnapshotPort,
+        tracker: TrackerPort,
+        ci_watcher: CIPort,
+        notifier: NotifierPort,
+        ready_for_human_label: str = DEFAULT_READY_FOR_HUMAN_LABEL,
+    ) -> None:
+        self._t3 = t3_client
+        self._tracker = tracker
+        self._ci = ci_watcher
+        self._notifier = notifier
+        self._ready_for_human_label = ready_for_human_label
+
+    def tick(self, run: InFlightRun, config: Config) -> TickResult:
+        """Drive ``run`` one step (see module docstring)."""
+        state = self._assemble_state(run)
+        action = run_state_machine.next_action(state, config)
+
+        if action is Action.CLOSE_SUCCESS:
+            return self._close_success(run, config)
+        if action in (Action.ESCALATE_PREPUSH, Action.FREEZE_ESCALATE):
+            return self._escalate(run, state, action, config)
+        if action is Action.FIX_FORWARD:
+            return self._fix_forward(run, state)
+        # WAIT: still in flight — just show progress and poll again next tick.
+        return self._wait(run, state, action)
+
+    # ----------------------------------------------------------------- #
+    # RunState assembly.
+    # ----------------------------------------------------------------- #
+    def _assemble_state(self, run: InFlightRun) -> RunState:
+        thread_status = self._thread_status(run.thread_id)
+        # Only fold CI when there's a commit to check — an unpushed run has no
+        # pipeline, and we must not query CI (the assertion in the tests, and
+        # avoiding a needless API call, both rely on this).
+        ci_status = (
+            self._ci.status(run.issue.repo, run.commit)
+            if run.commit is not None
+            else None
+        )
+        return RunState(
+            thread_status=thread_status,
+            ci_status=ci_status,
+            pushed=run.commit is not None,
+            fix_forward_attempts=run.fix_forward_attempts,
+            elapsed_seconds=run.elapsed_seconds,
+        )
+
+    def _thread_status(self, thread_id: str) -> ThreadStatus | None:
+        """This thread's liveness from the fleet snapshot, or ``None`` when the
+        thread is absent, has no turn yet, or its ``latestTurn.state`` is one we
+        don't recognise. Liveness is the state of the thread's latest turn (the
+        real snapshot shape), not a top-level ``status`` field."""
+        for thread in self._t3.snapshot().get("threads", []):
+            if thread.get("id") == thread_id:
+                latest_turn = thread.get("latestTurn") or {}
+                return _THREAD_STATUS_BY_STRING.get(latest_turn.get("state"))
+        return None
+
+    # ----------------------------------------------------------------- #
+    # Per-action handlers.
+    # ----------------------------------------------------------------- #
+    def _close_success(self, run: InFlightRun, config: Config) -> TickResult:
+        """Landed: close the issue, drop the lock, post DONE, ring the doorbell."""
+        self._post_checklist(run, Phase.DONE)
+        self._tracker.remove_label(
+            run.issue.repo, run.issue.number, config.in_progress_label
+        )
+        self._tracker.close(run.issue.repo, run.issue.number)
+        self._notify(run, Action.CLOSE_SUCCESS, "Run landed: pushed and CI green.")
+        return _terminal(Action.CLOSE_SUCCESS, run)
+
+    def _escalate(
+        self, run: InFlightRun, state: RunState, action: Action, config: Config
+    ) -> TickResult:
+        """Hand back to a human: drop the lock, add ready-for-human, post the
+        checklist, ring the matching doorbell. The issue stays OPEN."""
+        self._post_checklist(run, _phase_for(state))
+        self._tracker.remove_label(
+            run.issue.repo, run.issue.number, config.in_progress_label
+        )
+        self._tracker.add_label(
+            run.issue.repo, run.issue.number, self._ready_for_human_label
+        )
+        self._notify(run, action, _escalation_detail(action, state))
+        return _terminal(action, run)
+
+    def _fix_forward(self, run: InFlightRun, state: RunState) -> TickResult:
+        """CI red with budget left: dispatch a corrective turn and stay in flight.
+
+        Not terminal — no doorbell (the notifier only speaks terminal kinds) and
+        no label churn (the in-progress lock stays put). The corrective dispatch
+        spawns a fresh thread; its id and the incremented attempt count ride back
+        so the next tick tracks the right thread.
+        """
+        attempts = run.fix_forward_attempts + 1
+        new_thread_id = self._t3.dispatch(
+            run.issue.repo, run.issue.number, _fix_forward_prompt(run)
+        )
+        self._post_checklist(run, Phase.CI, fix_forward_attempts=attempts)
+        return TickResult(
+            action=Action.FIX_FORWARD,
+            terminal=False,
+            thread_id=new_thread_id,
+            fix_forward_attempts=attempts,
+        )
+
+    def _wait(self, run: InFlightRun, state: RunState, action: Action) -> TickResult:
+        """Still working: refresh the progress checklist, change nothing else."""
+        self._post_checklist(run, _phase_for(state))
+        return TickResult(
+            action=action,
+            terminal=False,
+            thread_id=run.thread_id,
+            fix_forward_attempts=run.fix_forward_attempts,
+        )
+
+    # ----------------------------------------------------------------- #
+    # I/O helpers.
+    # ----------------------------------------------------------------- #
+    def _post_checklist(
+        self, run: InFlightRun, phase: Phase, *, fix_forward_attempts: int | None = None
+    ) -> None:
+        attempts = run.fix_forward_attempts if fix_forward_attempts is None else fix_forward_attempts
+        body = phase_checklist.render(
+            phase,
+            {
+                "repo": run.issue.repo,
+                "issue": run.issue.number,
+                "thread_id": run.thread_id,
+                "fix_forward_attempts": attempts,
+            },
+        )
+        self._tracker.comment(run.issue.repo, run.issue.number, body)
+
+    def _notify(self, run: InFlightRun, action: Action, detail: str) -> None:
+        self._notifier.notify(
+            _TERMINAL_KIND_BY_ACTION[action], run.issue, run.thread_id, detail
+        )
+
+
+# --------------------------------------------------------------------------- #
+# Pure helpers.
+# --------------------------------------------------------------------------- #
+def _terminal(action: Action, run: InFlightRun) -> TickResult:
+    """A terminal :class:`TickResult` echoing the run's bookkeeping unchanged."""
+    return TickResult(
+        action=action,
+        terminal=True,
+        thread_id=run.thread_id,
+        fix_forward_attempts=run.fix_forward_attempts,
+    )
+
+
+def _phase_for(state: RunState) -> Phase:
+    """Best-effort current lifecycle phase from the evidence in ``state``.
+
+    The checklist is decoration only (the loop reads no agent message bodies), so
+    this maps the observable signals — pushed? CI verdict? — onto the closest
+    phase: nothing pushed ⇒ still working toward the implementation (GREEN);
+    pushed ⇒ the CI phase is where attention sits until it goes green. A green CI
+    is rendered as DONE by the close path, not here.
+    """
+    if not state.pushed:
+        return Phase.GREEN
+    if state.ci_status is CIStatus.GREEN:
+        return Phase.DEPLOYED
+    return Phase.CI
+
+
+def _escalation_detail(action: Action, state: RunState) -> str:
+    """Human-readable escalation reason for the doorbell + logs (never parsed)."""
+    if action is Action.ESCALATE_PREPUSH:
+        return (
+            "Agent stalled or errored before pushing any commit "
+            f"(thread {state.thread_status.value if state.thread_status else 'unknown'}). "
+            "Handed back for a human."
+        )
+    return (
+        "Fix-forward budget exhausted with CI still red "
+        f"({state.fix_forward_attempts} attempts, {state.elapsed_seconds:.0f}s). "
+        "Frozen for a human."
+    )
+
+
+def _fix_forward_prompt(run: InFlightRun) -> str:
+    """The corrective-turn prompt: point the agent at the red CI on its commit."""
+    return (
+        f"CI is RED on your pushed commit {run.commit} for issue #{run.issue.number} "
+        f"in `{run.issue.repo}`. Investigate the failing run, fix the cause, and "
+        f"push the fix to master. Then watch CI again until it is green."
+    )
--- a/app/breakglass/agent_session.py
+++ b/app/breakglass/agent_session.py
@ -1,26 +1,13 @@
-"""Drive the breakglass Claude agent and stream its work to the browser.
+"""Claude CLI argv + stream-json → UI-event translation for the breakglass agent.

-Each chat turn runs ``claude -p --output-format stream-json`` in the session's
-persistent workspace; the first turn opens the session with ``--session-id`` and
-later turns ``--resume`` it, so the conversation has memory across turns. The
-CLI's JSON events are translated to a small, stable SSE vocabulary the UI
-renders (``session`` / ``text`` / ``tool`` / ``result`` / ``error``) — we do not
-leak the raw event firehose to the client.
-
-Subprocesses use ``asyncio.create_subprocess_exec`` (list argv, no shell): the
-prompt and ids are argv elements, never interpreted by a shell.
+The session lifecycle (running turns, attaching clients) lives in ``session.py``;
+this module is just the two helpers it builds on:
+  * ``_turn_argv`` — the no-shell list argv for one ``claude -p`` turn.
+  * ``translate_event`` — map a raw stream-json event to the small UI vocabulary
+    (session / text / tool / result), dropping the hook/thinking-token noise.
 """
-import asyncio
-import json
-import os
-from subprocess import PIPE
-from typing import AsyncIterator
-
 from . import config

-# Sessions we've already opened (so the next turn resumes instead of re-creating).
-_started: set[str] = set()
-

 def _turn_argv(session_id: str, prompt: str, resume: bool, model: str) -> list[str]:
    argv = [
@ -66,7 +53,7 @@ def translate_event(obj: dict) -> dict | None:
                })
        if not events:
            return None
-        # The server flattens a "batch" into individual SSE frames.
+        # The session log flattens a "batch" into individual events.
        return events[0] if len(events) == 1 else {"kind": "batch", "events": events}

    if etype == "result":
@ -78,68 +65,3 @@ def translate_event(obj: dict) -> dict | None:
        }

    return None
-
-
-async def run_turn(
-    session_id: str, prompt: str, model: str | None = None
-) -> AsyncIterator[dict]:
-    """Run one chat turn, yielding translated UI events as they arrive."""
-    resume = session_id in _started
-    model = model or config.DEFAULT_MODEL
-    workspace = os.path.join(config.SESSIONS_DIR, session_id)
-    os.makedirs(workspace, exist_ok=True)
-
-    argv = _turn_argv(session_id, prompt, resume, model)
-    proc = await asyncio.create_subprocess_exec(
-        *argv, cwd=workspace, stdout=PIPE, stderr=PIPE,
-    )
-    _started.add(session_id)
-    assert proc.stdout is not None and proc.stderr is not None
-
-    try:
-        async def _pump() -> AsyncIterator[dict]:
-            async for raw in proc.stdout:
-                line = raw.decode(errors="replace").strip()
-                if not line:
-                    continue
-                try:
-                    obj = json.loads(line)
-                except json.JSONDecodeError:
-                    continue
-                ev = translate_event(obj)
-                if ev is None:
-                    continue
-                if ev.get("kind") == "batch":
-                    for sub in ev["events"]:
-                        yield sub
-                else:
-                    yield ev
-
-        async for ev in _with_timeout(_pump(), config.TURN_TIMEOUT_SECONDS):
-            yield ev
-    except asyncio.TimeoutError:
-        proc.kill()
-        await proc.wait()
-        yield {"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"}
-        return
-
-    await proc.wait()
-    if proc.returncode not in (0, None):
-        err = (await proc.stderr.read()).decode(errors="replace")
-        yield {"kind": "error", "error": err.strip()[:500] or f"exit {proc.returncode}"}
-
-
-async def _with_timeout(agen: AsyncIterator[dict], timeout: float) -> AsyncIterator[dict]:
-    """Yield from an async generator but raise TimeoutError if the WHOLE turn
-    exceeds ``timeout`` seconds (a wedged agent shouldn't stream forever)."""
-    loop = asyncio.get_event_loop()
-    deadline = loop.time() + timeout
-    it = agen.__aiter__()
-    while True:
-        remaining = deadline - loop.time()
-        if remaining <= 0:
-            raise asyncio.TimeoutError
-        try:
-            yield await asyncio.wait_for(it.__anext__(), timeout=remaining)
-        except StopAsyncIteration:
-            return
--- a/app/breakglass/config.py
+++ b/app/breakglass/config.py
@ -25,6 +25,9 @@ MAX_CONCURRENT_TURNS = int(os.environ.get("BREAKGLASS_MAX_CONCURRENT_TURNS", "2"
 TURN_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_TURN_TIMEOUT_SECONDS", "1800"))
 # A single PVE power verb must return fast; a wedged host shouldn't hang the UI.
 PVE_VERB_TIMEOUT_SECONDS = int(os.environ.get("BREAKGLASS_PVE_VERB_TIMEOUT_SECONDS", "120"))
+# How long an idle attach stream waits before emitting an SSE keepalive comment
+# (keeps proxies/CDN from closing the long-lived connection).
+SSE_KEEPALIVE_SECONDS = int(os.environ.get("BREAKGLASS_SSE_KEEPALIVE_SECONDS", "20"))

 # Auth. The app sits behind the ingress `auth = "required"` resilience proxy
 # (Authentik SSO, basic-auth fallback when Authentik is down). We additionally
--- a/app/breakglass/server.py
+++ b/app/breakglass/server.py
@ -1,38 +1,44 @@
 """Breakglass FastAPI app — the in-cluster emergency recovery UI.

+The chat uses the tmux/attach model (see session.py): the server owns the
+conversation; clients attach over SSE and the turn keeps running if they
+disconnect.
+
 Routes:
-  GET  /health                 — liveness (no auth)
-  GET  /                       — the single-page UI (static)
-  POST /api/session            — open a chat session, returns {session_id}
-  POST /api/chat               — run one turn, streams SSE events (text/tool/result)
-  POST /api/pve/{verb}         — LLM-independent PVE power verb (manual buttons)
-  GET  /api/pve/verbs          — list allowed verbs + which mutate
+  GET  /health                        — liveness (no auth)
+  GET  /                              — the single-page UI (static)
+  POST /api/session                   — create a session, returns {session_id}
+  GET  /api/session/{id}/stream       — ATTACH (SSE): replay + live tail
+  POST /api/session/{id}/prompt       — run a turn (detached; survives disconnect)
+  POST /api/session/{id}/cancel       — stop the in-flight turn
+  GET  /api/pve/verbs                 — list allowed verbs + which mutate
+  POST /api/pve/{verb}                — LLM-independent PVE power verb (buttons)

 Everything under /api requires auth (edge Authentik header or bearer token).
 """
-import json
 import os
-import uuid

-from fastapi import Depends, FastAPI, HTTPException
+from fastapi import Depends, FastAPI, Header, HTTPException
 from fastapi.responses import FileResponse, JSONResponse, StreamingResponse
 from fastapi.staticfiles import StaticFiles
 from pydantic import BaseModel, Field

-from . import agent_session, config, pve
+from . import config, pve
 from .auth import require_auth
+from .session import SessionManager, attach_stream

 app = FastAPI(title="Claude Breakglass")

 _STATIC_DIR = os.path.join(os.path.dirname(__file__), "static")

+manager = SessionManager()
+

 class SessionResponse(BaseModel):
    session_id: str


-class ChatRequest(BaseModel):
-    session_id: str
+class PromptRequest(BaseModel):
    prompt: str = Field(..., min_length=1)
    model: str | None = None

@ -44,30 +50,53 @@ async def health():

@app.post("/api/session", response_model=SessionResponse)
 async def open_session(_identity: str = Depends(require_auth)):
-    # Claude wants a UUID for --session-id.
-    return SessionResponse(session_id=str(uuid.uuid4()))
+    return SessionResponse(session_id=manager.create().id)


-@app.post("/api/chat")
-async def chat(req: ChatRequest, _identity: str = Depends(require_auth)):
-    """Stream one chat turn as Server-Sent Events. The browser reads the
-    response body incrementally (fetch + ReadableStream)."""
-
-    async def _sse():
-        try:
-            async for ev in agent_session.run_turn(req.session_id, req.prompt, req.model):
-                yield f"data: {json.dumps(ev)}\n\n"
-        except Exception as exc:  # noqa: BLE001 — surface any failure to the UI
-            yield f"data: {json.dumps({'kind': 'error', 'error': str(exc)[:500]})}\n\n"
-        yield f"data: {json.dumps({'kind': 'done'})}\n\n"
-
+@app.get("/api/session/{session_id}/stream")
+async def attach(
+    session_id: str,
+    _identity: str = Depends(require_auth),
+    last_event_id: str | None = Header(default=None, alias="Last-Event-ID"),
+):
+    """Attach to a session (SSE). Replays the conversation so far, then tails
+    live. On an EventSource auto-reconnect the browser sends Last-Event-ID, so we
+    replay only what was missed."""
+    session = manager.get(session_id)
+    if session is None:
+        raise HTTPException(status_code=404, detail="session not found")
+    try:
+        leid = int(last_event_id) if last_event_id is not None else None
+    except ValueError:
+        leid = None
    return StreamingResponse(
-        _sse(),
+        attach_stream(session, leid),
        media_type="text/event-stream",
-        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+        headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no", "Connection": "keep-alive"},
    )


+@app.post("/api/session/{session_id}/prompt")
+async def prompt(session_id: str, req: PromptRequest, _identity: str = Depends(require_auth)):
+    """Start a turn. It runs DETACHED (keeps going if the client disconnects);
+    output is delivered via the attach stream, not this response."""
+    session = manager.get(session_id)
+    if session is None:
+        raise HTTPException(status_code=404, detail="session not found")
+    if not session.start_turn(req.prompt, req.model):
+        raise HTTPException(status_code=409, detail="a turn is already running")
+    return {"status": "started"}
+
+
+@app.post("/api/session/{session_id}/cancel")
+async def cancel(session_id: str, _identity: str = Depends(require_auth)):
+    session = manager.get(session_id)
+    if session is None:
+        raise HTTPException(status_code=404, detail="session not found")
+    cancelled = await session.cancel()
+    return {"cancelled": cancelled}
+
+
@app.get("/api/pve/verbs")
 async def pve_verbs(_identity: str = Depends(require_auth)):
    return {
--- a/app/breakglass/session.py
+++ b/app/breakglass/session.py
@ -0,0 +1,201 @@
+"""Attachable server-side sessions — the tmux model for the breakglass chat.
+
+Instead of the client owning conversation state, the SERVER owns it and clients
+*attach*. A turn runs as a detached task that keeps going if the client
+disconnects (you can background the phone / hit a tunnel blip and the agent
+keeps working); its output is appended to a per-session event log and broadcast
+to every attached subscriber. A client attaches over SSE, gets the log replayed
+(or only the part it missed, via Last-Event-ID), then tails live — exactly like
+re-attaching to a tmux session. ``EventSource`` reconnects natively, so the
+"re-attach" needs zero client logic.
+
+This module owns the lifecycle; ``agent_session`` still provides the claude
+argv + the stream-json→UI-event translation (all subprocesses use the no-shell
+list-argv form), and ``config`` the knobs.
+"""
+import asyncio
+import json
+import os
+import uuid
+from subprocess import PIPE
+from typing import AsyncIterator
+
+from . import agent_session, config
+
+
+class Session:
+    """One conversation. Owns the replay log + live subscribers + the in-flight
+    turn. The claude ``session_id`` is reused with ``--resume`` so the agent
+    keeps its own context across turns."""
+
+    def __init__(self, session_id: str):
+        self.id = session_id
+        # The replay log: every UI event, in order. Index in the list IS the
+        # SSE event id, so a reconnecting client replays only what it missed.
+        self.events: list[dict] = []
+        self._subscribers: set[asyncio.Queue] = set()
+        self._turn: asyncio.Task | None = None
+        self._proc: asyncio.subprocess.Process | None = None
+        self._started = False  # has claude opened this session id yet?
+
+    # ── event log + fan-out ────────────────────────────────────────────────
+    def add_event(self, event: dict) -> dict:
+        """Append an event to the log and broadcast it to attached clients."""
+        stored = {**event, "id": len(self.events)}
+        self.events.append(stored)
+        for q in list(self._subscribers):
+            q.put_nowait(stored)
+        return stored
+
+    def subscribe(self) -> asyncio.Queue:
+        q: asyncio.Queue = asyncio.Queue()
+        self._subscribers.add(q)
+        return q
+
+    def unsubscribe(self, q: asyncio.Queue) -> None:
+        self._subscribers.discard(q)
+
+    @property
+    def turn_active(self) -> bool:
+        return self._turn is not None and not self._turn.done()
+
+    # ── running a turn (detached from any client) ──────────────────────────
+    def start_turn(self, prompt: str, model: str | None = None) -> bool:
+        """Kick off a turn as a background task. Returns False if one is already
+        running (one turn at a time per session)."""
+        if self.turn_active:
+            return False
+        self.add_event({"kind": "user", "text": prompt})
+        self._turn = asyncio.create_task(self._run_turn(prompt, model))
+        return True
+
+    async def _run_turn(self, prompt: str, model: str | None) -> None:
+        model = model or config.DEFAULT_MODEL
+        resume = self._started
+        argv = agent_session._turn_argv(self.id, prompt, resume, model)
+        try:
+            self._proc = await asyncio.create_subprocess_exec(
+                *argv, cwd=_workspace_for(self.id), stdout=PIPE, stderr=PIPE,
+            )
+        except Exception as exc:  # noqa: BLE001
+            self.add_event({"kind": "error", "error": f"could not start agent: {exc}"})
+            self.add_event({"kind": "turn_end"})
+            return
+        self._started = True
+        assert self._proc.stdout is not None and self._proc.stderr is not None
+
+        try:
+            async def _pump():
+                async for raw in self._proc.stdout:
+                    line = raw.decode(errors="replace").strip()
+                    if not line:
+                        continue
+                    try:
+                        obj = json.loads(line)
+                    except json.JSONDecodeError:
+                        continue
+                    ev = agent_session.translate_event(obj)
+                    if ev is None:
+                        continue
+                    if ev.get("kind") == "batch":
+                        for sub in ev["events"]:
+                            self.add_event(sub)
+                    else:
+                        self.add_event(ev)
+
+            await asyncio.wait_for(_pump(), timeout=config.TURN_TIMEOUT_SECONDS)
+            await self._proc.wait()
+            if self._proc.returncode not in (0, None):
+                err = (await self._proc.stderr.read()).decode(errors="replace")
+                self.add_event({"kind": "error", "error": err.strip()[:500] or f"exit {self._proc.returncode}"})
+        except asyncio.TimeoutError:
+            await self._kill_proc()
+            self.add_event({"kind": "error", "error": f"turn timed out after {config.TURN_TIMEOUT_SECONDS}s"})
+        except asyncio.CancelledError:
+            await self._kill_proc()
+            self.add_event({"kind": "cancelled"})
+            raise
+        finally:
+            self._proc = None
+            self.add_event({"kind": "turn_end"})
+
+    async def _kill_proc(self) -> None:
+        if self._proc and self._proc.returncode is None:
+            try:
+                self._proc.kill()
+                await self._proc.wait()
+            except ProcessLookupError:
+                pass
+
+    async def cancel(self) -> bool:
+        """Stop the in-flight turn. Returns True if a turn was cancelled."""
+        if not self.turn_active:
+            return False
+        await self._kill_proc()
+        if self._turn:
+            self._turn.cancel()
+            try:
+                await self._turn
+            except (asyncio.CancelledError, Exception):  # noqa: BLE001
+                pass
+        return True
+
+
+def _workspace_for(session_id: str) -> str:
+    path = os.path.join(config.SESSIONS_DIR, session_id)
+    os.makedirs(path, exist_ok=True)
+    return path
+
+
+class SessionManager:
+    """Holds all live sessions. The breakglass is single-operator, so callers
+    typically reuse one persistent session; multiple are still supported."""
+
+    def __init__(self):
+        self.sessions: dict[str, Session] = {}
+
+    def create(self) -> Session:
+        sid = str(uuid.uuid4())
+        s = Session(sid)
+        self.sessions[sid] = s
+        return s
+
+    def get(self, session_id: str) -> Session | None:
+        return self.sessions.get(session_id)
+
+    def get_or_create(self, session_id: str | None) -> Session:
+        if session_id and session_id in self.sessions:
+            return self.sessions[session_id]
+        return self.create()
+
+
+async def attach_stream(session: Session, last_event_id: int | None) -> AsyncIterator[str]:
+    """Yield SSE frames for an attached client: first the replay (everything, or
+    only events after ``last_event_id`` on a reconnect), then live events as they
+    arrive. Each frame carries an ``id:`` so EventSource resumes precisely."""
+    q = session.subscribe()
+    try:
+        start = 0 if last_event_id is None else last_event_id + 1
+        backlog = session.events[start:]
+        for ev in backlog:
+            yield _sse_frame(ev)
+        # Tell the client the replay is done and it's now live.
+        yield "event: caught-up\ndata: {}\n\n"
+
+        seen = backlog[-1]["id"] if backlog else (last_event_id if last_event_id is not None else -1)
+        while True:
+            try:
+                ev = await asyncio.wait_for(q.get(), timeout=config.SSE_KEEPALIVE_SECONDS)
+            except asyncio.TimeoutError:
+                yield ": keepalive\n\n"  # comment frame keeps the connection warm
+                continue
+            if ev["id"] <= seen:
+                continue
+            seen = ev["id"]
+            yield _sse_frame(ev)
+    finally:
+        session.unsubscribe(q)
+
+
+def _sse_frame(event: dict) -> str:
+    return f"id: {event['id']}\ndata: {json.dumps(event)}\n\n"
--- a/app/breakglass/static/apple-touch-icon.png
+++ b/app/breakglass/static/apple-touch-icon.png
--- a/app/breakglass/static/assets/index-BoWC1Onq.css
+++ b/app/breakglass/static/assets/index-BoWC1Onq.css
--- a/app/breakglass/static/assets/index-CLbKo1Yx.js
+++ b/app/breakglass/static/assets/index-CLbKo1Yx.js
--- a/app/breakglass/static/assets/index-DWHIP1Zw.css
+++ b/app/breakglass/static/assets/index-DWHIP1Zw.css
--- a/app/breakglass/static/assets/index-DjaW81Sq.js
+++ b/app/breakglass/static/assets/index-DjaW81Sq.js
--- a/app/breakglass/static/icon-192.png
+++ b/app/breakglass/static/icon-192.png
--- a/app/breakglass/static/icon-512.png
+++ b/app/breakglass/static/icon-512.png
--- a/app/breakglass/static/icon.svg
+++ b/app/breakglass/static/icon.svg
@ -0,0 +1,64 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
+  <defs>
+    <!-- layered near-black surface, matching the app theme -->
+    <radialGradient id="bg" cx="68%" cy="22%" r="92%">
+      <stop offset="0%" stop-color="#12303a"/>
+      <stop offset="42%" stop-color="#0b0f14"/>
+      <stop offset="100%" stop-color="#06080b"/>
+    </radialGradient>
+    <linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
+      <stop offset="0%" stop-color="#7df0f3"/>
+      <stop offset="55%" stop-color="#3dd1d6"/>
+      <stop offset="100%" stop-color="#1f6f72"/>
+    </linearGradient>
+    <filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
+      <feGaussianBlur stdDeviation="7" result="b"/>
+      <feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
+    </filter>
+  </defs>
+
+  <!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
+  <rect width="512" height="512" rx="112" fill="url(#bg)"/>
+  <rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
+  <!-- faint scanline texture -->
+  <g opacity="0.05" stroke="#ffffff" stroke-width="2">
+    <line x1="0" y1="148" x2="512" y2="148"/>
+    <line x1="0" y1="220" x2="512" y2="220"/>
+    <line x1="0" y1="292" x2="512" y2="292"/>
+    <line x1="0" y1="364" x2="512" y2="364"/>
+  </g>
+
+  <!-- fracture burst (amber): the "break the glass" radiating cracks -->
+  <g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
+     fill="none" opacity="0.92" filter="url(#glow)">
+    <path d="M256 256 L142 132"/>
+    <path d="M256 256 L120 250"/>
+    <path d="M256 256 L150 372"/>
+    <path d="M256 256 L372 380"/>
+    <path d="M256 256 L392 246"/>
+    <path d="M256 256 L360 138"/>
+    <!-- cross-cracks -->
+    <path d="M186 196 L150 250"/>
+    <path d="M210 320 L172 318" opacity="0.7"/>
+    <path d="M326 318 L356 350" opacity="0.7"/>
+  </g>
+
+  <!-- wrench, struck across the burst (cyan steel) -->
+  <g filter="url(#glow)">
+    <path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
+      d="M344 150
+         a62 62 0 0 0 -82 76
+         L150 338
+         a26 26 0 0 0 0 37
+         l11 11
+         a26 26 0 0 0 37 0
+         l112 -112
+         a62 62 0 0 0 76 -82
+         l-41 41
+         l-40 -11
+         l-11 -40
+         z"/>
+    <!-- handle highlight -->
+    <path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
+  </g>
+</svg>
--- a/app/breakglass/static/index.html
+++ b/app/breakglass/static/index.html
@ -2,12 +2,31 @@
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
+         notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
+         user-scaling keeps the cockpit layout stable under stress on mobile. -->
+    <meta
+      name="viewport"
+      content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
+    />
    <meta name="color-scheme" content="dark" />
    <meta name="robots" content="noindex, nofollow" />
+
+    <!-- PWA / installable. theme-color tints the mobile status bar to the dark
+         theme; black-translucent lets the app draw under the iOS status bar. -->
+    <meta name="theme-color" content="#06080b" />
+    <link rel="manifest" href="./manifest.webmanifest" />
+    <meta name="apple-mobile-web-app-capable" content="yes" />
+    <meta name="mobile-web-app-capable" content="yes" />
+    <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
+    <meta name="apple-mobile-web-app-title" content="breakglass" />
+    <link rel="apple-touch-icon" href="./apple-touch-icon.png" />
+    <link rel="icon" type="image/svg+xml" href="./icon.svg" />
+    <link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
+
    <title>devvm breakglass</title>
-    <script type="module" crossorigin src="./assets/index-DjaW81Sq.js"></script>
-    <link rel="stylesheet" crossorigin href="./assets/index-DWHIP1Zw.css">
+    <script type="module" crossorigin src="./assets/index-CLbKo1Yx.js"></script>
+    <link rel="stylesheet" crossorigin href="./assets/index-BoWC1Onq.css">
  </head>
  <body>
    <div id="app"></div>
--- a/app/breakglass/static/manifest.webmanifest
+++ b/app/breakglass/static/manifest.webmanifest
@ -0,0 +1,31 @@
+{
+  "name": "devvm breakglass",
+  "short_name": "breakglass",
+  "description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
+  "start_url": "./",
+  "scope": "./",
+  "display": "standalone",
+  "orientation": "portrait",
+  "background_color": "#06080b",
+  "theme_color": "#06080b",
+  "icons": [
+    {
+      "src": "./icon.svg",
+      "type": "image/svg+xml",
+      "sizes": "any",
+      "purpose": "any maskable"
+    },
+    {
+      "src": "./icon-192.png",
+      "type": "image/png",
+      "sizes": "192x192",
+      "purpose": "any maskable"
+    },
+    {
+      "src": "./icon-512.png",
+      "type": "image/png",
+      "sizes": "512x512",
+      "purpose": "any maskable"
+    }
+  ]
+}
--- a/app/conversational.py
+++ b/app/conversational.py
@ -0,0 +1,220 @@
+"""Conversational Brain — drives the Claude CLI for the portal-assistant gateway.
+
+A lean, no-tools, multi-turn path (portal-assistant ADR-0002): no workspace clone,
+no tool-enabled agent, and NO --dangerously-skip-permissions. Per-conversation
+continuity comes from the Claude CLI's own --session-id / --resume, so the gateway
+only has to hand us a stable session id per conversation.
+"""
+import asyncio
+import json
+import os
+from subprocess import PIPE
+
+CONVERSATIONAL_AGENT = "conversational"
+# A spoken chat turn is short; a turn that runs longer than this is wedged.
+CONVERSATIONAL_TIMEOUT_SECONDS = int(
+    os.environ.get("CONVERSATIONAL_TIMEOUT_SECONDS", "120")
+)
+
+# Latency: the conversational agent is no-tools (ADR-0002), so the CLI's default
+# project context — this repo's CLAUDE.md, the MCP server configs, local settings
+# — plus the dynamic system-prompt sections are pure overhead on a voice turn.
+# Measured 2026-06-21: the default load is ~45k input tokens/turn -> ~3.4s TTFT;
+# restricting settings to `user` and excluding the dynamic sections more than
+# halves the context (~23k) and cuts TTFT to ~2.1s (~1.3s/turn faster) with no
+# change to the reply. Applies to BOTH the gateway (json) and realtime (stream)
+# paths, since both run the same no-tools conversational turn.
+_LEAN_CONTEXT_FLAGS = [
+    "--setting-sources", "user",
+    "--exclude-dynamic-system-prompt-sections",
+]
+
+# Session ids the Claude CLI has already opened in THIS process, so a follow-up
+# turn resumes instead of re-opening. In-memory + single-replica: a pod restart
+# clears this AND the CLI's emptyDir session state together, so they stay in sync.
+_started: set[str] = set()
+
+
+def reset_started() -> None:
+    """Forget all opened sessions (used by tests)."""
+    _started.clear()
+
+
+def conversational_argv(
+    session_id: str, message: str, model: str, resume: bool
+) -> list[str]:
+    """Build the argv for one conversational turn.
+
+    A new conversation opens the session with --session-id; subsequent turns
+    continue it with --resume so Claude keeps its own context. We never pass
+    --dangerously-skip-permissions: the conversational agent has no tools and the
+    endpoint is public-facing, so nothing may be auto-permitted.
+    """
+    argv = [
+        "claude", "-p",
+        "--agent", CONVERSATIONAL_AGENT,
+        "--output-format", "json",
+        "--model", model,
+        *_LEAN_CONTEXT_FLAGS,
+    ]
+    argv += ["--resume", session_id] if resume else ["--session-id", session_id]
+    argv.append(message)
+    return argv
+
+
+def extract_reply(output_lines: list[str]) -> str:
+    """Pull the final assistant text out of `claude -p --output-format json`.
+
+    The CLI emits one JSON object with the final message under `result`; fall
+    back to the raw text if it isn't parseable so callers always get something.
+    """
+    raw = "".join(output_lines).strip()
+    if not raw:
+        return ""
+    try:
+        parsed = json.loads(raw)
+    except json.JSONDecodeError:
+        return raw
+    if isinstance(parsed, dict):
+        for key in ("result", "content", "text"):
+            value = parsed.get(key)
+            if isinstance(value, str) and value:
+                return value
+    return raw
+
+
+async def run_turn(session_id: str, message: str, model: str) -> dict:
+    """Run one conversational turn and return {exit_code, reply, stderr}.
+
+    Resumes the Claude session if we've opened it before; otherwise opens it.
+    The session is only marked opened on success so a failed first turn can be
+    retried cleanly as a new one.
+    """
+    resume = session_id in _started
+    argv = conversational_argv(session_id, message, model, resume)
+
+    proc = await asyncio.create_subprocess_exec(*argv, stdout=PIPE, stderr=PIPE)
+    assert proc.stdout is not None and proc.stderr is not None
+
+    output_lines: list[str] = []
+    async for line in proc.stdout:
+        output_lines.append(line.decode(errors="replace"))
+    stderr = await proc.stderr.read()
+    await proc.wait()
+
+    if proc.returncode == 0:
+        _started.add(session_id)
+
+    return {
+        "exit_code": proc.returncode,
+        "reply": extract_reply(output_lines),
+        "stderr": stderr.decode(errors="replace"),
+    }
+
+
+# ---------------------------------------------------------------------------
+# Streaming (OpenAI-compatible) path — token-level deltas for the realtime
+# voice agent. Pipecat's OpenAILLMService streams from /v1/chat/completions and
+# re-sends the FULL history each turn, so this path is STATELESS: the whole
+# dialogue goes in the prompt and we run a fresh CLI with stream-json to relay
+# incremental tokens as OpenAI chat-completion SSE chunks. (run_turn above stays
+# the session-based path for the non-streaming gateway.)
+# ---------------------------------------------------------------------------
+
+
+def stream_argv(prompt: str, model: str) -> list[str]:
+    """Argv for a STREAMING conversational turn (token deltas via stream-json).
+
+    Stateless — the full conversation is in `prompt` (no --session-id/--resume).
+    `--include-partial-messages` makes the CLI emit `content_block_delta` token
+    events; `--verbose` is required by the CLI for stream-json under --print. No
+    --dangerously-skip-permissions: the conversational agent has no tools.
+    """
+    return [
+        "claude", "-p",
+        "--agent", CONVERSATIONAL_AGENT,
+        "--model", model,
+        "--output-format", "stream-json",
+        "--include-partial-messages",
+        "--verbose",
+        *_LEAN_CONTEXT_FLAGS,
+        prompt,
+    ]
+
+
+def delta_text(line: str) -> str | None:
+    """Extract the incremental assistant text from one stream-json line.
+
+    Returns the text of a `content_block_delta` / `text_delta` event, or None
+    for any other event (system, message_start, content_block_stop, result) or
+    an unparseable line.
+    """
+    line = line.strip()
+    if not line:
+        return None
+    try:
+        event = json.loads(line)
+    except json.JSONDecodeError:
+        return None
+    if not isinstance(event, dict) or event.get("type") != "stream_event":
+        return None
+    inner = event.get("event") or {}
+    if inner.get("type") != "content_block_delta":
+        return None
+    delta = inner.get("delta") or {}
+    if delta.get("type") == "text_delta":
+        return delta.get("text") or None
+    return None
+
+
+def openai_chunk(
+    completion_id: str,
+    model: str,
+    created: int,
+    *,
+    role: str | None = None,
+    content: str | None = None,
+    finish_reason: str | None = None,
+) -> str:
+    """Format one OpenAI `chat.completion.chunk` as an SSE `data:` line.
+
+    ensure_ascii=False keeps Cyrillic (Bulgarian) intact on the wire.
+    """
+    delta: dict[str, str] = {}
+    if role is not None:
+        delta["role"] = role
+    if content is not None:
+        delta["content"] = content
+    payload = {
+        "id": completion_id,
+        "object": "chat.completion.chunk",
+        "created": created,
+        "model": model,
+        "choices": [{"index": 0, "delta": delta, "finish_reason": finish_reason}],
+    }
+    return "data: " + json.dumps(payload, ensure_ascii=False) + "\n\n"
+
+
+def synthesise_chat_prompt(messages) -> str:
+    """Flatten OpenAI chat messages into a dialogue prompt for the conversational
+    agent, KEEPING prior assistant turns.
+
+    Pipecat re-sends the full message history every call, so multi-turn context
+    is preserved here (statelessly) by replaying the dialogue. Each message is a
+    duck-typed object with `.role` and `.content`. System messages become a
+    preamble; user/assistant turns are rendered as a `User:`/`Assistant:`
+    dialogue ending on the latest user turn.
+    """
+    system = [m.content for m in messages if m.role == "system" and m.content]
+    turns = []
+    for m in messages:
+        if m.role == "user" and m.content:
+            turns.append("User: " + m.content)
+        elif m.role == "assistant" and m.content:
+            turns.append("Assistant: " + m.content)
+    parts = []
+    if system:
+        parts.append("\n\n".join(system))
+    if turns:
+        parts.append("\n".join(turns))
+    return "\n\n".join(parts).strip()
--- a/app/main.py
+++ b/app/main.py
@ -2,6 +2,8 @@ import asyncio
 import hmac
 import json
 import os
+import shutil
+import tempfile
 import time
 import uuid
 from contextlib import asynccontextmanager
@ -10,9 +12,11 @@ from subprocess import PIPE
 from typing import Any, Literal

 from fastapi import FastAPI, HTTPException, Header
-from fastapi.responses import JSONResponse
+from fastapi.responses import JSONResponse, StreamingResponse
 from pydantic import BaseModel, Field

+from app import conversational
+
 app = FastAPI(title="Claude Agent Service")

 API_TOKEN = os.environ.get("API_BEARER_TOKEN", "")
@ -104,6 +108,15 @@ class ChatCompletionsRequest(BaseModel):
    model_config = {"extra": "allow"}


+class ConversationalRequest(BaseModel):
+    # The portal-assistant gateway owns the conversation; it hands us a stable
+    # session id (for Claude --resume) plus the next user message. Model is
+    # selectable per request, same as the OpenAI-compat path.
+    session_id: str
+    message: str
+    model: str | None = None
+
+
 def verify_token(authorization: str | None):
    # Reject everything when the service is unconfigured. compare_digest("", "")
    # returns True, so without this guard an empty API_TOKEN would happily
@ -435,9 +448,6 @@ async def chat_completions(
 ):
    verify_token(authorization)

-    if request.stream:
-        raise HTTPException(status_code=400, detail="streaming not supported")
-
    model = request.model if request.model is not None else DEFAULT_MODEL
    if model not in SUPPORTED_MODELS:
        return JSONResponse(
@ -448,6 +458,64 @@ async def chat_completions(
            },
        )

+    # Streaming path (the realtime voice agent / Pipecat). Token-level deltas via
+    # the conversational (no-tools) agent in stream-json mode, relayed as
+    # OpenAI chat.completion.chunk SSE. Stateless: the full history is in the
+    # prompt (the client re-sends it each turn). No workspace clone — the
+    # conversational agent reads no files.
+    if request.stream:
+        if not _reserve_queue_slot():
+            return JSONResponse(
+                status_code=503,
+                content={"error": "execution failed", "detail": "queue full"},
+            )
+        prompt = conversational.synthesise_chat_prompt(request.messages)
+        completion_id = "chatcmpl-" + uuid.uuid4().hex[:24]
+        created = int(time.time())
+        spawn = asyncio.create_subprocess_exec  # bound alias (keeps subprocess use tidy)
+
+        async def event_stream():
+            workspace = tempfile.mkdtemp(prefix="conv-stream-")
+            proc = None
+            try:
+                async with _execution_slot():
+                    proc = await spawn(
+                        *conversational.stream_argv(prompt, model),
+                        cwd=workspace, stdout=PIPE, stderr=PIPE,
+                    )
+                    assert proc.stdout is not None
+                    yield conversational.openai_chunk(
+                        completion_id, model, created, role="assistant"
+                    )
+                    try:
+                        async with asyncio.timeout(
+                            conversational.CONVERSATIONAL_TIMEOUT_SECONDS
+                        ):
+                            async for raw in proc.stdout:
+                                text = conversational.delta_text(
+                                    raw.decode(errors="replace")
+                                )
+                                if text:
+                                    yield conversational.openai_chunk(
+                                        completion_id, model, created, content=text
+                                    )
+                    except asyncio.TimeoutError:
+                        pass  # wedged turn — close the stream cleanly
+                    yield conversational.openai_chunk(
+                        completion_id, model, created, finish_reason="stop"
+                    )
+                    yield "data: [DONE]\n\n"
+            finally:
+                if proc is not None and proc.returncode is None:
+                    try:
+                        proc.kill()
+                        await proc.wait()
+                    except ProcessLookupError:
+                        pass
+                shutil.rmtree(workspace, ignore_errors=True)
+
+        return StreamingResponse(event_stream(), media_type="text/event-stream")
+
    prompt = _synthesise_prompt(request.messages)

    if not _reserve_queue_slot():
@ -510,3 +578,56 @@ async def chat_completions(
            "total_tokens": 0,
        },
    }
+
+
+@app.post("/v1/conversational")
+async def conversational_turn(
+    request: ConversationalRequest,
+    authorization: str | None = Header(default=None),
+):
+    """Lean, multi-turn conversational Brain for the portal-assistant gateway.
+
+    Drives a no-tools conversational agent with per-conversation --resume — no
+    workspace clone, no tools (see portal-assistant ADR-0002). Returns the
+    assistant's reply text keyed to the caller's session id.
+    """
+    verify_token(authorization)
+
+    model = request.model if request.model is not None else DEFAULT_MODEL
+    if model not in SUPPORTED_MODELS:
+        return JSONResponse(
+            status_code=400,
+            content={"error": "unsupported model", "supported": sorted(SUPPORTED_MODELS)},
+        )
+
+    if not _reserve_queue_slot():
+        return JSONResponse(
+            status_code=503,
+            content={"error": "execution failed", "detail": "queue full"},
+        )
+
+    try:
+        async with _execution_slot():
+            result = await asyncio.wait_for(
+                conversational.run_turn(request.session_id, request.message, model),
+                timeout=conversational.CONVERSATIONAL_TIMEOUT_SECONDS,
+            )
+    except asyncio.TimeoutError:
+        return JSONResponse(
+            status_code=503,
+            content={"error": "execution failed", "detail": "agent timed out"},
+        )
+    except Exception as exc:  # noqa: BLE001
+        return JSONResponse(
+            status_code=503,
+            content={"error": "execution failed", "detail": _one_line(str(exc))},
+        )
+
+    if result["exit_code"] != 0:
+        detail = _one_line(result.get("stderr") or "") or f"exit {result['exit_code']}"
+        return JSONResponse(
+            status_code=503,
+            content={"error": "execution failed", "detail": detail},
+        )
+
+    return {"session_id": request.session_id, "reply": result["reply"]}
--- a/docs/2026-06-14-afk-implementation-pipeline-design.md
+++ b/docs/2026-06-14-afk-implementation-pipeline-design.md
@ -0,0 +1,259 @@
+# AFK implementation pipeline — design
+
+**Date:** 2026-06-14
+**Status:** proposed — pilot pending (see "Pilot" below; no code yet)
+**Scope:** A new autonomous path that turns a triaged `ready-for-agent` issue
+into tested, deployed code with no human at the keyboard. `claude-agent-service`
+becomes the **control plane**; a dedicated in-cluster **T3 Code** instance
+becomes the **executor + cockpit**. Touches: `claude-agent-service` (new poller
+ dispatch + watcher), a new T3 stack in `infra/`, a shared SSD-NFS volume, and
+the per-repo issue trackers.
+
+> Provenance: this design is the output of a long grilling session
+> (2026-06-14). It records the decisions *and* the alternatives that were
+> considered and dropped, so the reasoning survives. The three hardest-to-reverse
+> calls are split into ADRs 0002–0004.
+
+## Problem
+
+Today the development flow is **grill-with-docs → to-prd → to-issues → triage →
+implement**, and *every* stage is human-in-the-loop (HITL), including
+implementation. The owner wants the HITL boundary to stop at **design + spec**:
+once an issue is triaged `ready-for-agent`, an agent should pick it up and
+implement it **AFK** (away from keyboard) — write it test-first, push it, and
+see it through to a healthy deploy — escalating to a human only when it genuinely
+can't proceed.
+
+Two gaps block this today:
+
+- The only existing issue→agent automation is the **infra `issue-responder`**,
+  which fires on `user-report`/`feature-request` labels on the `infra` repo
+  only — not on `ready-for-agent`, not on the other sub-project repos that the
+  general design flow produces.
+- `claude-agent-service` only ever clones `infra`, runs one-shot fire-and-forget
+  `claude -p` jobs (no session, no live stream, no attach), and has no
+  multi-repo checkout. The owner wants to *watch and steer* in-flight work, which
+  the batch model can't offer.
+
+## Goal
+
+- HITL covers design + spec only. Publishing `ready-for-agent` issues is the
+  release signal (the `to-issues` quiz is the review gate).
+- An autonomous loop picks up unblocked `ready-for-agent` issues from
+  **enrolled** repos, implements them test-first, and lands them — pushing
+  straight to `master` so CI deploys them (see ADR 0002 for the risk posture).
+- The owner can **see all in-flight workers and converse with any of them** from
+  one UI — the T3 cockpit (see ADR 0003).
+- Reuse before building: lean on the existing CI/CD chain, the design skills, T3
+  Code's multi-agent cockpit, and the persistence/worktree machinery — rather
+  than hand-building a session console and a bespoke runtime.
+
+## Design
+
+### Roles: control plane vs executor + cockpit
+
+| Concern | Owner |
+|---|---|
+| When to start, which issue, the prompt, the safety envelope | **claude-agent-service** (control plane) — poller + watcher |
+| Running the agent (Claude Agent SDK), the worktree, the fleet UI | **T3 Code** (executor + cockpit) — one dedicated in-cluster instance |
+| Build → image → deploy → rollout | existing CI/CD (GHA → ghcr → Woodpecker → Keel) |
+| Issue queue + state | the per-repo GitHub issue trackers |
+
+The pivotal constraint that forces this split: **T3 can only display sessions it
+launched itself** — it has no command to adopt an externally-started session. So
+"viewable in T3" ⟺ "launched by T3". To keep `claude-agent-service` in charge
+*and* get the fleet view, the control plane **dispatches into T3** rather than
+running `claude` itself. See ADR 0003.
+
+### End-to-end flow
+
+```
+HUMAN (interactive session)
+  /grill-with-docs → /to-prd → /to-issues → /triage
+     └ produces ready-for-agent issues (dependency-ordered), labeled by a
+       trusted collaborator. Publishing them = the release signal.
+══════════════════════ HANDOFF ══════════════════════
+CONTROL PLANE  (claude-agent-service, in-cluster)
+  poller CronJob (every few min):
+    for repo in allowlist:
+      skip repo if it already has an agent-in-progress issue   (per-repo lock)
+      pick highest-priority ready-for-agent issue where:
+        • all "Blocked by" closed   • labeled by a trusted collaborator
+      → stamp agent-in-progress
+      → POST /api/orchestration/dispatch  (thread.turn.start + bootstrap:
+            create thread, prepare worktree, run setup, deliver the prompt)
+EXECUTOR + COCKPIT  (dedicated T3 instance, in-cluster)
+  runs the issue-implementer agent (our prompt) in the worktree:
+    read issue + AGENT-BRIEF + repo CONTEXT.md/ADRs → TDD red-green-refactor
+    → commit (paraphrase issue, "Closes #N", AFK trailer) → push master
+  watcher (control plane) polls GET /api/orchestration/snapshot + CI:
+    ├─ healthy ──────► comment + close issue, drop lock, notify ✅
+    ├─ pre-push block ► do NOT push, relabel ready-for-human, escalate
+    └─ post-push red ► fix-forward (≤5 attempts / 60 min)
+                         ├─ recovers ► healthy
+                         └─ exhausts ► FREEZE broken (preserve forensics),
+                                       relabel ready-for-human, hard page
+```
+
+### Trigger & dispatch predicate
+
+A poller CronJob (mirrors the existing `beads-dispatcher` pattern; stays
+in-cluster because neither the service nor T3 has public ingress). It dispatches
+issue *I* in repo *R* iff **all** hold:
+
+- `R` is in the **allowlist** ConfigMap, and the **kill switch** is off;
+- `I` has label `ready-for-agent`, applied by a **trusted collaborator** (the
+  trust gate — on private repos only collaborators can label, so the label *is*
+  the authorization; external/bot issues never auto-run);
+- every issue in `I`'s "Blocked by" is closed;
+- `R` has no issue currently labeled `agent-in-progress` (the per-repo lock).
+
+On dispatch it stamps `agent-in-progress`; on any terminal outcome it removes it.
+
+### Concurrency & locking
+
+**Parallel across repos, serial within a repo.** Multiple repos progress at
+once; at most one agent per repo (two agents in one repo would collide on the
+working tree). Enforced by the `agent-in-progress` label as a per-repo lock.
+Starting value; raise later.
+
+### Merge & failure posture — see ADR 0002
+
+- **Always push to master** (no PR gate). Tests-green is the merge gate; CI +
+  rollback are the safety net, matching the human allow-then-audit model.
+- **Pre-push** failure (can't get green / blocked / would need a disallowed op):
+  do *not* push; relabel `ready-for-human`; comment what was tried; page.
+- **Post-push** failure (CI build or rollout red): **fix-forward** up to **5
+  attempts or 60 minutes**, then if still red **freeze in the broken state**
+  (preserve forensics — do not auto-revert), relabel `ready-for-human`, hard
+  page. The owner explicitly chose debuggability over availability here.
+- **Budget:** `max_budget_usd = 100` per issue (time/attempt caps usually bite
+  first).
+
+### Build/test environment & worktrees — see ADR 0004
+
+The agent must run the target repo's test suite (TDD gate) before pushing.
+Therefore:
+
+- **Local toolchains scoped to the allowlist** — the executor image carries only
+  the *enrolled* repos' runtimes; the toolchain set grows in lockstep with the
+  allowlist.
+- **Persistent per-repo checkout + `git worktree` per issue** on a shared
+  **SSD-NFS** volume, so git objects, installed deps, and package-manager caches
+  stay warm across jobs. This **supersedes** the throwaway `git clone --local`
+  model from `2026-06-02-parallel-execution-design.md`; that rejection was
+  correct for *concurrent* same-repo jobs, but the serial-within-repo choice
+  here removes the `.git` contention it guarded against (ADR 0004). It pays off
+  precisely because `to-issues` clusters many slices in one repo, processed
+  serially — slice N reuses the warm checkout slice 1 paid for.
+
+### T3 integration: thin dispatch — see ADR 0003
+
+The control plane holds a capability-scoped **`orchestration:operate`** bearer
+token (minted via `t3 auth`, stored in Vault, refreshed for the 1-hour expiry)
+and calls T3's HTTP API:
+
+- `POST /api/orchestration/dispatch` → `thread.turn.start` with a `bootstrap`
+  that creates the thread, prepares the worktree, optionally runs a setup
+  script, and delivers the prompt — one call spawns a worktree-isolated worker.
+- `GET /api/orchestration/snapshot` → the full fleet read-model (per-thread
+  `running`/`idle`/`error`, `hasPendingUserInput`, `hasPendingApprovals`,
+  `branch`, `worktreePath`). T3 has **no outbound webhooks**, so the watcher
+  **polls** this to drive CI-watch, freeze, and label transitions.
+
+The AFK *behavior and safety* (issue-implementer prompt, guardrails, always-push,
+fix-forward/freeze, issue integration) live in **our** thin layer, so T3 is a
+**swappable, version-pinned backend** — never Keel-auto-upgraded, reversible to a
+self-hosted runtime if it goes sideways.
+
+### Observability & interaction
+
+The "active sessions layer" and the "attach and converse" surface **converge
+into one screen — the T3 cockpit**: a live list of all worker threads grouped by
+project; click one to stream its transcript and send it a turn. This dissolves
+the earlier intermediate ideas of a generalized-breakglass console and a
+raw-tmux hybrid attach — T3 provides converse / approve / resume natively
+(`thread.user-input.respond`, `thread.approval.respond`).
+
+Cross-system, durable signals the control plane still emits:
+
+- **Phase-checklist comment** on the issue, edited in place as phases complete
+  (worktree → tests-red → green → pushed → CI → deployed). Durable, low-noise,
+  lives on the issue, doubles as audit trail.
+- **Loki** logs labeled `{repo, issue}` for deep-dive.
+- **Presence** claim per running session (`repo:<name>`, purpose `AFK #N`),
+  heartbeated — so AFK work shows up next to human sessions in the layer the
+  prompt hook already injects.
+- **Doorbell**: Slack / ntfy ping on terminal states, deep-linking into the T3
+  thread. Notify, not control — the dedicated-Slack-control-plane idea is
+  dropped in favour of the T3 cockpit.
+
+### Safety envelope
+
+- **Trust gate** — only collaborator-labeled `ready-for-agent` issues run.
+- **Allowlist** — a repo is untouchable until enrolled (prereqs: tests + GHA CI
+  + `CONTEXT.md`). Start with 1–2 repos; expand deliberately.
+- **Kill switch** — one ConfigMap flag pauses all pickup (the Keel
+  scale-to-0 reflex, built in from day one).
+- **Per-repo lock** — ≤1 agent per repo.
+- **Guardrails** (reused from `issue-responder`) — no PVC/PV deletes, no direct
+  Vault edits, no force-push to master, infra changes Terraform-only, never
+  `[ci skip]`.
+- **Identity & audit** — shared service identity; each commit body paraphrases
+  the issue and carries `Closes #N` + an AFK-agent trailer, so the commit
+  message stays the audit trail.
+
+## Parameters (chosen starting values — all tunable)
+
+| Knob | Value |
+|---|---|
+| Merge gate | always push to master |
+| Post-push failure | fix-forward, then freeze-broken |
+| Fix-forward cap | 5 attempts **or** 60 minutes |
+| Per-issue budget | `max_budget_usd = 100` |
+| Concurrency | parallel across repos, serial within a repo |
+| Repo scope | opt-in allowlist, start small |
+| Progress detail | phase-checklist on issue + Loki logs |
+| Alert channel | Slack (+ ntfy), as a doorbell into T3 |
+| Executor | dedicated in-cluster T3 (thin dispatch), version-pinned |
+
+## Pilot — validate before wiring the poller
+
+The thin model rests on five unknowns. Stand up the dedicated T3 instance and
+drive a couple of allowlist-repo issues **by hand** via the dispatch API to
+confirm each, *before* building the poller and committing the architecture:
+
+1. **Per-thread custom agent + skip-permissions** — can a dispatched thread
+   carry *our* `issue-implementer` system prompt and run unattended without
+   stalling on T3's approval gating? *(biggest unknown)*
+2. **Dispatch auth** — mint `orchestration:operate`, store in Vault, refresh the
+   1-hour token.
+3. **Status/completion** — drive CI-watch/freeze/labels purely from polling
+   `GET /api/orchestration/snapshot`.
+4. **Worktree reconciliation** — T3's native `prepareWorktree` vs our
+   persistent-checkout-with-warm-caches; pick one or make them cooperate on the
+   volume.
+5. **The in-cluster T3 pod** — headless `t3 serve --no-browser`, version-pinned
+   and **Keel-excluded**, internal ingress + Authentik, with tokens / toolchains
+   / SSD volume / `claude auth` provisioned.
+
+## Relationship to prior decisions
+
+- **Supersedes** the worktree rejection in
+  `2026-06-02-parallel-execution-design.md` (contextualized, not contradicted —
+  ADR 0004).
+- **Drops** two intermediate ideas explored and rejected this session:
+  evolving `claude-agent-service` into its own session/tmux/worktree runtime,
+  and building a bespoke breakglass-generalized console — both replaced by T3.
+- **Reuses** the `issue-responder` guardrails, the CI/CD chain, the
+  `beads-dispatcher` CronJob pattern, presence, Loki, and the design skills.
+
+## Out of scope / open questions
+
+- Raw-terminal "take-over" of a worker (T3 is a GUI cockpit, not a terminal); if
+  ever needed, that's a separate add-on.
+- Multi-tenant T3 (it is single-operator by design — fine, it matches the shared
+  service identity).
+- Cross-repo dependency orchestration beyond per-issue "Blocked by".
+- T3 Code is pre-1.0 (~v0.0.x) and churny; the version-pin + Keel-exclude +
+  swappable-backend discipline (ADR 0003) is the mitigation.
--- a/docs/adr/0002-afk-autonomous-merge-and-failure-posture.md
+++ b/docs/adr/0002-afk-autonomous-merge-and-failure-posture.md
@ -0,0 +1,69 @@
+# AFK agents push straight to master; failures fix-forward then freeze, not revert
+
+The AFK implementation pipeline (see
+`docs/2026-06-14-afk-implementation-pipeline-design.md`) lets an autonomous
+agent land code with no human at the keyboard. The owner deliberately chose the
+most hands-off posture: **AFK-written code pushes straight to `master`** (which
+then deploys via the existing CI/CD chain) with **no pull-request review gate**,
+and when a deploy breaks, the agent **fixes forward and then freezes the broken
+state** rather than auto-reverting. This ADR records that risk posture and why it
+was chosen over the safer alternatives, because it is surprising and not cheap to
+walk back once callers and habits depend on it.
+
+## Status
+
+accepted (2026-06-14) — posture decided; enforced once the pipeline ships
+(pilot-gated).
+
+## Context
+
+`master` on every enrolled repo deploys continuously (GHA build → ghcr →
+Woodpecker → Keel). So "where AFK code lands" is really "what reaches a live
+deploy without a human looking". The owner weighed three merge gates and three
+post-push failure responses and picked the autonomy-maximizing end of both,
+accepting the blast radius explicitly.
+
+## Considered options — merge gate
+
+- **Always push to master (chosen).** Tests-green is the gate; CI + rollback are
+  the safety net. Matches the existing human allow-then-audit model (non-admins
+  already push straight to master). Most hands-off.
+- **Adaptive (push if confident, else PR)** — rejected as the *default* though it
+  is what `issue-responder` does; the owner wanted full hands-off, not a
+  confidence-gated PR for otherwise-working code.
+- **Always open a PR** — rejected: reintroduces a human merge step on every
+  issue, i.e. "AFK implementation, human merge" — not the goal.
+
+## Considered options — post-push failure (CI/rollout goes red after a green push)
+
+- **Fix-forward then freeze (chosen).** Iterate with corrective commits up to
+  **5 attempts or 60 minutes**; if still red, **leave the broken state in place**
+  (do not revert), relabel the issue `ready-for-human`, and hard-page. Same
+  forensics-first instinct as the breakglass (ADR 0001): preserve the exact
+  failing state for debugging rather than auto-cleaning it away.
+- **Auto-revert + escalate** — rejected (was the recommendation): restores green
+  fastest, but destroys the forensic state the owner wants to inspect.
+- **Alert and freeze immediately (no fix-forward)** — rejected: gives up on
+  transient/env-drift failures a corrective commit would clear.
+
+Pre-push failure (can't reach green, blocked, or would need a disallowed op) is
+not a dilemma: the agent does **not** push, relabels `ready-for-human`, comments
+what it tried, and pages.
+
+## Consequences
+
+- An unreviewed logic error can deploy before any human sees it; rollback (not
+  review) is the safety net. Bounded by: tests-as-gate, the start-small
+  allowlist, the per-repo lock, and the kill switch.
+- A frozen-broken deploy can sit unhealthy until the owner answers the page —
+  availability is traded for debuggability, by explicit choice. Acceptable
+  because enrolled repos are non-critical by the allowlist prerequisite, and the
+  owner is paged hard (Slack + ntfy).
+- Fix-forward can stack up to 5 commits on a bad change before freezing; the
+  60-minute cap bounds the churn window.
+- Per-issue spend is capped at `max_budget_usd = 100`.
+- Guardrails still hold underneath this posture: no PVC/PV deletes, no direct
+  Vault edits, no force-push, infra changes Terraform-only, never `[ci skip]`.
+- Reversible: tightening to adaptive/PR or to auto-revert is a config + watcher
+  change, not a re-architecture — but callers/habits will have formed around
+  "it just lands", so flag loudly if reversing.
--- a/docs/adr/0003-t3-thin-executor-and-cockpit.md
+++ b/docs/adr/0003-t3-thin-executor-and-cockpit.md
@ -0,0 +1,70 @@
+# AFK workers run inside a dedicated T3 Code instance; claude-agent-service dispatches into it
+
+The owner wants one UI to see and converse with every in-flight AFK worker, and
+named **T3 Code** (the self-hosted multi-agent cockpit already running at
+`t3.viktorbarzin.me`) as that UI. Research into T3's source
+(`pingdotgg/t3code`, ~v0.0.27) found it is genuinely built for this — a fleet of
+worker "threads" with a live read-model and a scoped HTTP dispatch API — **but**
+it can only display sessions **it launched itself**; there is no command to adopt
+a session another process started. So "viewable in T3" ⟺ "launched by T3". This
+ADR records the resulting architecture: `claude-agent-service` stays the
+**control plane** and **dispatches into a dedicated, in-cluster T3 instance**
+which is the **executor + cockpit**. The agent runs inside T3; we keep the brain.
+
+## Status
+
+accepted (2026-06-14) — direction decided; **gated on a pilot** (the five
+unknowns in the design doc) before the poller is wired and the architecture is
+committed.
+
+## Why T3, and why "thin"
+
+T3 provides, out of the box, what we would otherwise hand-build: a three-panel
+fleet cockpit (`projects → threads → conversation`), an
+`OrchestrationReadModel` with per-thread live status, and
+`POST /api/orchestration/dispatch` whose `thread.turn.start` + `bootstrap` can
+**create a thread, prepare a git worktree, run a setup script, and deliver a
+prompt in one call** — exactly the worker-spawn primitive. Converse / approve /
+resume are native (`thread.user-input.respond`, `thread.approval.respond`). For
+Claude it embeds `@anthropic-ai/claude-agent-sdk`.
+
+"Thin" = the AFK *behavior and safety* (the `issue-implementer` prompt,
+guardrails, always-push, fix-forward/freeze, CI-watch, issue integration) live
+in **our** layer (the poller + watcher), not in T3. T3 is a **swappable backend**
+we drive over its API.
+
+## Considered options
+
+- **Thin: claude-agent-service dispatches into T3 (chosen).** Control plane calls
+  T3's dispatch API; T3 runs the agent in a worktree and shows it. Get the fleet
+  view, keep the brain, least to build. Cost: execution moves into the T3 pod, so
+  T3's runtime is in the *hot path* (not just the window).
+- **claude-agent-service runs the agent, T3 only displays it** — rejected because
+  it is impossible: T3 cannot adopt an externally-started session
+  (`thread.session.set` is server-internal; no external-session-id field). This
+  is the constraint that shaped the whole decision.
+- **Deep: claude-agent-service as a custom T3 provider (ACP-style)** — rejected
+  for now: keeps the runtime ours with a T3 UI, but means building and
+  maintaining a provider against a pre-1.0, internal, no-contributions interface
+  — effectively a fork. Revisit only if "thin" proves too limiting.
+- **Skip T3; build our own console** (generalized breakglass + tmux) — rejected:
+  most stable and fully in-house, but abandons the owner's explicit "see workers
+  in T3" goal and means owning a session console forever.
+
+## Consequences
+
+- A **dedicated in-cluster T3 instance** (a pod, consistent with the earlier
+  in-cluster-over-devvm substrate choice) is the worker host, separate from the
+  per-user devvm T3 instances. It needs the SSD worktree volume, git/Anthropic
+  tokens, toolchains, `claude auth`, and an internal Authentik-gated ingress.
+- T3's runtime is now in the **execution hot path** — its maturity affects
+  whether work *runs*, not only whether it can be *seen*. Mitigations: **pin the
+  version and exclude it from Keel** (its churn + hard-cutover auth migrations
+  make auto-upgrade a Keel-class hazard), keep the integration thin and the
+  backend swappable, and **pilot** the five unknowns first.
+- T3 is **single-operator** — fine here: it matches the already-accepted shared
+  service identity for AFK work.
+- No outbound webhooks from T3 → the watcher **polls**
+  `GET /api/orchestration/snapshot`.
+- This supersedes the intermediate ideas of evolving `claude-agent-service` into
+  its own session/tmux/worktree runtime and building a bespoke attach console.
--- a/docs/adr/0004-persistent-worktrees-for-implementation-agents.md
+++ b/docs/adr/0004-persistent-worktrees-for-implementation-agents.md
@ -0,0 +1,68 @@
+# Implementation agents use persistent per-repo checkouts + git worktrees, reversing the throwaway-clone rule for this path
+
+`2026-06-02-parallel-execution-design.md` deliberately **rejected git worktrees**
+and chose throwaway `git clone --local` per job, "because worktrees share one
+`.git` → agents that `git commit`/`pull` still contend — not truly independent".
+The AFK implementation pipeline
+(`docs/2026-06-14-afk-implementation-pipeline-design.md`) **reverses that for its
+own path**: each enrolled repo gets a **persistent checkout**, and each issue
+runs in a **`git worktree`** off it, on a shared **SSD-NFS** volume. This ADR
+records why the earlier rejection does not apply here — so the two decisions
+read as complementary, not contradictory.
+
+## Status
+
+accepted (2026-06-14) — for the AFK implementation path only; the existing
+job-runner (recruiter-triage, nextcloud-todos, etc.) keeps throwaway clones.
+
+## Why the 2026-06-02 rejection doesn't bind this path
+
+The rejection's premise was **concurrent jobs in the same checkout** contending
+on `.git/index.lock` and racing `git pull`. The AFK pipeline's concurrency model
+is **serial within a repo, parallel only across repos** (ADR-adjacent decision in
+the design doc): at most one agent ever touches a given repo's `.git` at a time,
+and different repos are different checkouts. The contention the rejection guarded
+against cannot occur here. With that removed, worktrees become the *better*
+choice because they unlock cache reuse the throwaway model can't.
+
+## Considered options
+
+- **Persistent checkout + worktree per issue, on SSD-NFS (chosen).** Warm git
+  objects, **persisted `node_modules`/venv/build caches**, and shared
+  package-manager caches survive across jobs, so the TDD loop stops reinstalling
+  deps every run. Compounds with `to-issues` clustering many slices in one repo,
+  processed serially — slice N reuses slice 1's warm tree.
+- **Throwaway `git clone --local` per job (status quo elsewhere)** — rejected for
+  this path: correct for the concurrent job-runner, but re-pays dependency
+  install on every issue, which dominates wall-clock for an
+  implement-test-fix-forward loop.
+- **`cp -a` of a warm tree** — rejected (same reason as 2026-06-02): copies
+  accumulated caches → disk blowup, and no git isolation.
+
+## Considered options — storage
+
+- **SSD-NFS (chosen).** The current `/persistent` PVC is `5Gi` **HDD NFS**
+  (`nfs-truenas` → `/srv/nfs`) and unused; git checkouts + `node_modules` are
+  death-by-small-files on HDD NFS and 5Gi is too small. Provision an SSD-backed
+  NFS class over `/srv/nfs-ssd` (other apps already use that path) at a realistic
+  size (tens of GB).
+- **HDD NFS / `/persistent` as-is** — rejected: too slow for many small files,
+  too small.
+- **Local block (proxmox-lvm)** — rejected: faster but HDD and node-pinned (RWO),
+  lost on reschedule; NFS RWX survives and the volume also holds session state.
+
+## Consequences
+
+- One **SSD-NFS volume** holds, per enrolled repo: the persistent checkout, the
+  warm dep/package caches, and (under ADR 0003) the worktrees T3 prepares. Cache
+  env (`pip`, `GOMODCACHE`/`GOCACHE`, `PNPM_HOME`/npm, cargo) must be wired to it
+  — today caching is off (`pip --no-cache-dir`, no cache envs set).
+- Housekeeping the throwaway model didn't need: `git fetch` before each
+  `worktree add`, periodic `git worktree prune` + `git gc`, and cache eviction if
+  the volume fills.
+- **`infra` stays on its own path** — it is git-crypt, and editing encrypted
+  files from a worktree is disallowed; the persistent-worktree model is for the
+  non-`infra` app repos in the allowlist.
+- Open reconciliation (pilot): whether T3's native `prepareWorktree` writes into
+  this volume + our persistent checkouts, or we manage the checkout and point T3
+  at it. Resolve before committing the architecture.
--- a/frontend/index.html
+++ b/frontend/index.html
@ -2,9 +2,28 @@
 <html lang="en">
  <head>
    <meta charset="UTF-8" />
-    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <!-- viewport-fit=cover so the app paints edge-to-edge and we can honour the
+         notch/home-indicator via env(safe-area-inset-*). maximum-scale + no
+         user-scaling keeps the cockpit layout stable under stress on mobile. -->
+    <meta
+      name="viewport"
+      content="width=device-width, initial-scale=1.0, viewport-fit=cover, maximum-scale=1.0"
+    />
    <meta name="color-scheme" content="dark" />
    <meta name="robots" content="noindex, nofollow" />
+
+    <!-- PWA / installable. theme-color tints the mobile status bar to the dark
+         theme; black-translucent lets the app draw under the iOS status bar. -->
+    <meta name="theme-color" content="#06080b" />
+    <link rel="manifest" href="./manifest.webmanifest" />
+    <meta name="apple-mobile-web-app-capable" content="yes" />
+    <meta name="mobile-web-app-capable" content="yes" />
+    <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent" />
+    <meta name="apple-mobile-web-app-title" content="breakglass" />
+    <link rel="apple-touch-icon" href="./apple-touch-icon.png" />
+    <link rel="icon" type="image/svg+xml" href="./icon.svg" />
+    <link rel="icon" type="image/png" sizes="192x192" href="./icon-192.png" />
+
    <title>devvm breakglass</title>
  </head>
  <body>
--- a/frontend/public/apple-touch-icon.png
+++ b/frontend/public/apple-touch-icon.png
--- a/frontend/public/icon-192.png
+++ b/frontend/public/icon-192.png
--- a/frontend/public/icon-512.png
+++ b/frontend/public/icon-512.png
--- a/frontend/public/icon.svg
+++ b/frontend/public/icon.svg
@ -0,0 +1,64 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="512" height="512" viewBox="0 0 512 512" role="img" aria-label="devvm breakglass">
+  <defs>
+    <!-- layered near-black surface, matching the app theme -->
+    <radialGradient id="bg" cx="68%" cy="22%" r="92%">
+      <stop offset="0%" stop-color="#12303a"/>
+      <stop offset="42%" stop-color="#0b0f14"/>
+      <stop offset="100%" stop-color="#06080b"/>
+    </radialGradient>
+    <linearGradient id="steel" x1="0" y1="0" x2="1" y2="1">
+      <stop offset="0%" stop-color="#7df0f3"/>
+      <stop offset="55%" stop-color="#3dd1d6"/>
+      <stop offset="100%" stop-color="#1f6f72"/>
+    </linearGradient>
+    <filter id="glow" x="-40%" y="-40%" width="180%" height="180%">
+      <feGaussianBlur stdDeviation="7" result="b"/>
+      <feMerge><feMergeNode in="b"/><feMergeNode in="SourceGraphic"/></feMerge>
+    </filter>
+  </defs>
+
+  <!-- rounded-square field (safe for maskable: art kept within central ~80%) -->
+  <rect width="512" height="512" rx="112" fill="url(#bg)"/>
+  <rect x="6" y="6" width="500" height="500" rx="108" fill="none" stroke="#1c2530" stroke-width="3"/>
+  <!-- faint scanline texture -->
+  <g opacity="0.05" stroke="#ffffff" stroke-width="2">
+    <line x1="0" y1="148" x2="512" y2="148"/>
+    <line x1="0" y1="220" x2="512" y2="220"/>
+    <line x1="0" y1="292" x2="512" y2="292"/>
+    <line x1="0" y1="364" x2="512" y2="364"/>
+  </g>
+
+  <!-- fracture burst (amber): the "break the glass" radiating cracks -->
+  <g stroke="#f5b657" stroke-width="9" stroke-linecap="round" stroke-linejoin="round"
+     fill="none" opacity="0.92" filter="url(#glow)">
+    <path d="M256 256 L142 132"/>
+    <path d="M256 256 L120 250"/>
+    <path d="M256 256 L150 372"/>
+    <path d="M256 256 L372 380"/>
+    <path d="M256 256 L392 246"/>
+    <path d="M256 256 L360 138"/>
+    <!-- cross-cracks -->
+    <path d="M186 196 L150 250"/>
+    <path d="M210 320 L172 318" opacity="0.7"/>
+    <path d="M326 318 L356 350" opacity="0.7"/>
+  </g>
+
+  <!-- wrench, struck across the burst (cyan steel) -->
+  <g filter="url(#glow)">
+    <path fill="url(#steel)" stroke="#0e3133" stroke-width="6" stroke-linejoin="round"
+      d="M344 150
+         a62 62 0 0 0 -82 76
+         L150 338
+         a26 26 0 0 0 0 37
+         l11 11
+         a26 26 0 0 0 37 0
+         l112 -112
+         a62 62 0 0 0 76 -82
+         l-41 41
+         l-40 -11
+         l-11 -40
+         z"/>
+    <!-- handle highlight -->
+    <path d="M171 350 l128 -128" stroke="#bdf6f8" stroke-width="7" stroke-linecap="round" opacity="0.6"/>
+  </g>
+</svg>
--- a/frontend/public/manifest.webmanifest
+++ b/frontend/public/manifest.webmanifest
@ -0,0 +1,31 @@
+{
+  "name": "devvm breakglass",
+  "short_name": "breakglass",
+  "description": "Emergency recovery console for the devvm — chat with a repair agent or power-cycle the VM directly.",
+  "start_url": "./",
+  "scope": "./",
+  "display": "standalone",
+  "orientation": "portrait",
+  "background_color": "#06080b",
+  "theme_color": "#06080b",
+  "icons": [
+    {
+      "src": "./icon.svg",
+      "type": "image/svg+xml",
+      "sizes": "any",
+      "purpose": "any maskable"
+    },
+    {
+      "src": "./icon-192.png",
+      "type": "image/png",
+      "sizes": "192x192",
+      "purpose": "any maskable"
+    },
+    {
+      "src": "./icon-512.png",
+      "type": "image/png",
+      "sizes": "512x512",
+      "purpose": "any maskable"
+    }
+  ]
+}
--- a/frontend/src/App.svelte
+++ b/frontend/src/App.svelte
@ -1,100 +1,294 @@
 <script>
-  import { onMount } from 'svelte';
-  import { openSession } from './lib/api.js';
+  import { onMount, onDestroy } from 'svelte';
+  import {
+    openSession,
+    attachStream,
+    sendPrompt,
+    cancelTurn,
+    loadSessionId,
+    saveSessionId,
+    clearSessionId,
+  } from './lib/api.js';
+  import { createTranscript, reduceEvent } from './lib/transcript.js';
  import Chat from './Chat.svelte';
  import VmControls from './VmControls.svelte';

-  // ── session lifecycle ────────────────────────────────────────────────────
+  // ── lifecycle state ───────────────────────────────────────────────────────
+  // link: connecting | attached | error  (the EventSource to the session)
+  let link = $state('connecting');
+  let linkError = $state('');
  let sessionId = $state('');
-  let sessionState = $state('connecting'); // connecting | ready | error
-  let sessionError = $state('');
-  let streaming = $state(false);
+  let caughtUp = $state(false); // replay drained → live tailing
+  let turnActive = $state(false); // a turn is running (Stop shown, Send off)
+  let sending = $state(false); // a prompt POST is in flight

-  // Mobile: the VM controls live in a slide-up sheet. Desktop: a side column
-  // (CSS hides the toggle and pins the sheet open as a column ≥900px).
+  // The transcript is folded with a plain mutable object; we bump `rev` to
+  // notify the view of in-place mutations (cheaper than cloning the whole
+  // message list on every streamed token). `tx` is $state too, so REASSIGNING
+  // it (reset / new session) also propagates to the Chat prop. $state.raw keeps
+  // the object un-proxied so the hot per-token path stays a plain mutation.
+  let tx = $state.raw(createTranscript());
+  let rev = $state(0);
+
+  let es = null; // the live EventSource
+
+  // Mobile: VM controls live in a slide-up sheet. Desktop (≥900px): a column.
  let showControls = $state(false);

-  async function newSession() {
-    sessionState = 'connecting';
-    sessionError = '';
-    try {
-      sessionId = await openSession();
-      sessionState = 'ready';
-    } catch (err) {
-      sessionState = 'error';
-      sessionError = err instanceof Error ? err.message : String(err);
+  function resetTranscript() {
+    tx = createTranscript();
+    rev++;
+  }
+
+  function onEvent(ev) {
+    if (reduceEvent(tx, ev)) {
+      // turn liveness tracks the folder's view of the stream, so a turn started
+      // in ANOTHER tab (or before a reload) still flips us into "active".
+      turnActive = tx.activeUserSeen;
+      rev++;
    }
  }

-  onMount(newSession);
-
-  function onLiveSession(id) {
-    if (id) sessionId = id;
+  function closeStream() {
+    if (es) {
+      es.close();
+      es = null;
+    }
  }

-  const shortId = $derived(sessionId ? sessionId.slice(0, 8) : '────────');
-  const dotState = $derived(
-    sessionState === 'error' ? 'error' : streaming ? 'busy' : sessionState === 'ready' ? 'ready' : 'idle'
+  function attach(id) {
+    closeStream();
+    sessionId = id;
+    caughtUp = false;
+    link = 'connecting';
+    linkError = '';
+    es = attachStream(id, {
+      onOpen: () => {
+        // a successful (re)connection clears any prior transient error
+        if (link !== 'attached') link = 'attached';
+        linkError = '';
+      },
+      onCaughtUp: () => {
+        caughtUp = true;
+        link = 'attached';
+      },
+      onEvent,
+      onError: () => {
+        // EventSource auto-reconnects on a transient drop (readyState
+        // CONNECTING). Only a terminal CLOSED state is a hard failure. The
+        // server keeps the turn running regardless, so we surface a soft note
+        // and let the browser retry.
+        if (es && es.readyState === EventSource.CLOSED) {
+          link = 'error';
+          linkError = 'lost the connection to the session — retrying…';
+          // a closed source won't retry itself; re-attach to the same id.
+          setTimeout(() => {
+            if (sessionId === id) attach(id);
+          }, 1500);
+        } else {
+          link = 'connecting';
+        }
+      },
+    });
+  }
+
+  async function bootstrap() {
+    link = 'connecting';
+    linkError = '';
+    resetTranscript();
+    const existing = loadSessionId();
+    if (existing) {
+      // Reuse the persisted id and attach. If it's gone (pod restart → 404 on
+      // the stream), the EventSource errors; we detect the 404-shaped close and
+      // mint a fresh session below.
+      attach(existing);
+      // Probe liveness: if the attach can't open within a grace window AND the
+      // id is stale, create a new one. We rely on onError(CLOSED) for the 404.
+      return;
+    }
+    await createFresh();
+  }
+
+  async function createFresh() {
+    try {
+      link = 'connecting';
+      const id = await openSession();
+      saveSessionId(id);
+      attach(id);
+    } catch (err) {
+      link = 'error';
+      linkError = err instanceof Error ? err.message : String(err);
+    }
+  }
+
+  // "New session": archive the local id, mint a new one, re-attach.
+  async function newSession() {
+    if (turnActive || sending) return;
+    closeStream();
+    clearSessionId();
+    resetTranscript();
+    turnActive = false;
+    await createFresh();
+  }
+
+  // Send a prompt (typed or a preset). Output arrives via the attach stream.
+  async function submitPrompt(prompt) {
+    const text = (prompt || '').trim();
+    if (!text || turnActive || sending) return;
+    if (!sessionId) {
+      await createFresh();
+      if (!sessionId) return;
+    }
+    sending = true;
+    turnActive = true; // optimistic: the working indicator shows immediately
+    try {
+      const res = await sendPrompt({ session_id: sessionId, prompt: text });
+      if (res.status === 'busy') {
+        flash = 'A turn is already running.';
+        // turn really is active; keep the indicator, the stream will end it.
+      } else if (res.status === 'gone') {
+        // session evaporated (pod restart). Re-create and resend once.
+        clearSessionId();
+        await createFresh();
+        if (sessionId) await sendPrompt({ session_id: sessionId, prompt: text });
+      }
+    } catch (err) {
+      flash = err instanceof Error ? err.message : String(err);
+      turnActive = tx.activeUserSeen; // back off the optimistic flag on failure
+    } finally {
+      sending = false;
+    }
+  }
+
+  async function stopTurn() {
+    if (!sessionId) return;
+    try {
+      await cancelTurn(sessionId);
+      // turn_end / cancelled events arrive via the stream and flip turnActive.
+    } catch (err) {
+      flash = err instanceof Error ? err.message : String(err);
+    }
+  }
+
+  // a transient toast (409 / network blips), auto-cleared
+  let flash = $state('');
+  let flashTimer;
+  $effect(() => {
+    if (flash) {
+      clearTimeout(flashTimer);
+      flashTimer = setTimeout(() => (flash = ''), 4200);
+    }
+  });
+
+  onMount(bootstrap);
+  onDestroy(closeStream);
+
+  // ── header status lamp ──────────────────────────────────────────────────
+  // One quietly-living "system pulse": idle/connecting (cyan breathe),
+  // working (amber pulse), error (steady red — the ONLY non-power red, used
+  // sparingly for the lamp because connection loss IS the emergency here).
+  const lamp = $derived(
+    link === 'error'
+      ? 'error'
+      : turnActive
+        ? 'working'
+        : link === 'attached'
+          ? 'live'
+          : 'connecting'
  );
+  const lampLabel = $derived(
+    {
+      error: 'link down',
+      working: 'agent working',
+      live: 'attached',
+      connecting: 'connecting',
+    }[lamp]
+  );
+  const shortId = $derived(sessionId ? sessionId.slice(0, 8) : '········');
 </script>

 <div class="shell">
-  <header class="rail">
+  <header class="rail rise-in" style="--d:0ms">
    <div class="rail-title">
-      <span class="glyph" aria-hidden="true">🔧</span>
-      <h1>devvm <span class="accent">breakglass</span></h1>
+      <span class="brand-mark" aria-hidden="true">
+        <!-- breakglass glyph: a wrench struck through a fracture line -->
+        <svg viewBox="0 0 24 24" width="22" height="22" fill="none" stroke="currentColor"
+          stroke-width="1.6" stroke-linecap="round" stroke-linejoin="round">
+          <path d="M15.5 5.5a3.6 3.6 0 0 0-4.7 4.4L4 16.7 7.3 20l6.8-6.8a3.6 3.6 0 0 0 4.4-4.7l-2.2 2.2-2.2-.6-.6-2.2 2-2.6Z" />
+          <path class="frac" d="M3 3l3.2 4.1L4.4 8.6 7 12" stroke-dasharray="2 2.4" />
+        </svg>
+      </span>
+      <h1>devvm<span class="accent"> breakglass</span></h1>
    </div>

    <div class="rail-right">
-      <span class="rail-status">
-        <span class="dot dot--{dotState}" aria-hidden="true"></span>
-        {#if sessionState === 'error'}
-          <span class="session-bad">offline</span>
-        {:else if sessionState === 'connecting'}
-          <span class="session-meta">connecting…</span>
-        {:else}
-          <code class="session-id" title={sessionId}>{shortId}</code>
-        {/if}
+      <span class="lamp-wrap" title={lampLabel}>
+        <span class="lamp lamp--{lamp}" aria-hidden="true"></span>
+        <span class="lamp-text lamp-text--{lamp}">
+          {#if lamp === 'error'}
+            link down
+          {:else if lamp === 'working'}
+            working
+          {:else if lamp === 'live'}
+            <code class="sid">{shortId}</code>
+          {:else}
+            connecting
+          {/if}
+        </span>
      </span>

      <!-- Mobile-only: open the VM control sheet. Hidden on desktop (column). -->
      <button
-        class="controls-toggle"
+        class="rail-btn rail-btn--vm"
        onclick={() => (showControls = true)}
        aria-label="Open direct VM controls"
      >
-        ⚡ <span class="controls-toggle-label">VM</span>
+        <span class="bolt" aria-hidden="true">⚡</span><span class="rail-btn-label">VM</span>
      </button>

      <button
-        class="new-session"
+        class="rail-btn"
        onclick={newSession}
-        disabled={streaming || sessionState === 'connecting'}
-        title={streaming ? 'wait for the current turn to finish' : 'start a fresh session'}
+        disabled={turnActive || sending || link === 'connecting'}
+        title={turnActive ? 'wait for the current turn to finish' : 'archive this session and start fresh'}
      >
        New
      </button>
    </div>
  </header>

-  {#if sessionState === 'error'}
-    <div class="rail-error" role="alert">
-      Can't reach the breakglass backend — {sessionError}. The cluster or network
-      may be down. The <strong>⚡ VM</strong> power controls still work without the chat.
+  {#if link === 'error'}
+    <div class="rail-note" role="alert">
+      <span>{linkError || "Can't reach the breakglass backend."}</span>
+      <span class="rail-note-aside">The <strong>⚡ VM</strong> power controls still work without the chat.</span>
+      <button class="rail-note-retry" onclick={bootstrap}>Reconnect</button>
    </div>
  {/if}

+  {#if flash}
+    <div class="toast" role="status">{flash}</div>
+  {/if}
+
  <main class="stage">
-    <section class="chat-pane" aria-label="Recovery chat">
+    <section class="chat-pane rise-in" style="--d:80ms" aria-label="Recovery chat">
      <Chat
-        {sessionId}
-        sessionReady={sessionState === 'ready'}
-        {onLiveSession}
-        onStreamingChange={(v) => (streaming = v)}
+        {tx}
+        {rev}
+        {caughtUp}
+        {turnActive}
+        sending={sending}
+        linkState={link}
+        onSubmit={submitPrompt}
+        onStop={stopTurn}
      />
    </section>

-    <aside class="controls-pane" class:open={showControls} aria-label="Direct VM control">
+    <aside
+      class="controls-pane rise-in"
+      class:open={showControls}
+      style="--d:160ms"
+      aria-label="Direct VM control"
+    >
      <div class="sheet-grip" aria-hidden="true"></div>
      <div class="controls-head">
        <span class="controls-head-title">Direct VM control</span>
@ -104,7 +298,6 @@
    </aside>
  </main>

-  <!-- backdrop behind the mobile sheet -->
  <button
    class="sheet-backdrop"
    class:show={showControls}
@ -119,43 +312,51 @@
    height: 100%;
    display: flex;
    flex-direction: column;
-    max-width: 1500px;
+    max-width: 1520px;
    margin: 0 auto;
+    /* honour the notch on landscape / edge-to-edge */
+    padding-left: var(--safe-left);
+    padding-right: var(--safe-right);
  }

-  /* ── status rail (compact, single row on mobile) ─────────────────────── */
+  /* ── status rail ───────────────────────────────────────────────────────── */
  .rail {
    display: flex;
    align-items: center;
    justify-content: space-between;
    gap: 10px;
-    padding: 10px 14px;
+    padding: max(10px, var(--safe-top)) 14px 10px;
    border-bottom: 1px solid var(--line);
+    background:
+      linear-gradient(180deg, rgba(61, 209, 214, 0.03), transparent 60%),
+      linear-gradient(180deg, rgba(255, 255, 255, 0.015), transparent);
    flex: none;
  }
  .rail-title {
    display: flex;
-    align-items: baseline;
-    gap: 9px;
+    align-items: center;
+    gap: 10px;
    min-width: 0;
  }
-  .glyph {
-    font-size: 17px;
-    transform: translateY(2px);
-    filter: saturate(0.85);
+  .brand-mark {
+    color: var(--cyan);
+    display: inline-flex;
+    filter: drop-shadow(0 0 10px rgba(61, 209, 214, 0.35));
+    flex: none;
  }
+  .brand-mark .frac { color: var(--amber); stroke: var(--amber); opacity: 0.85; }
  h1 {
    margin: 0;
    font-family: var(--mono);
    font-size: 16px;
    font-weight: 600;
-    letter-spacing: 0.02em;
+    letter-spacing: 0.04em;
    color: var(--ink);
    white-space: nowrap;
  }
  .accent {
    color: var(--cyan);
-    text-shadow: 0 0 18px rgba(61, 209, 214, 0.35);
+    text-shadow: 0 0 18px rgba(61, 209, 214, 0.4);
  }

  .rail-right {
@ -164,90 +365,158 @@
    gap: 8px;
    flex: none;
  }
-  .rail-status {
+
+  /* the living system-pulse lamp */
+  .lamp-wrap {
    display: inline-flex;
    align-items: center;
-    gap: 7px;
+    gap: 8px;
+    padding: 0 4px;
    font-family: var(--mono);
    font-size: 12px;
  }
-  .session-id {
-    color: var(--cyan);
-    letter-spacing: 0.04em;
-  }
-  .session-meta {
-    color: var(--amber);
-  }
-  .session-bad {
-    color: var(--danger-bright);
-  }
-
-  .dot {
-    width: 9px;
-    height: 9px;
+  .lamp {
+    position: relative;
+    width: 10px;
+    height: 10px;
    border-radius: 50%;
    flex: none;
    background: var(--ink-faint);
  }
-  .dot--ready {
+  /* a soft halo ring that pulses outward — the "instrument is powered" tell */
+  .lamp::after {
+    content: '';
+    position: absolute;
+    inset: -4px;
+    border-radius: 50%;
+    border: 1px solid currentColor;
+    opacity: 0;
+  }
+  .lamp--live {
    background: var(--cyan);
-    box-shadow: 0 0 10px 1px rgba(61, 209, 214, 0.6);
-    animation: breathe 3.4s ease-in-out infinite;
+    color: var(--cyan);
+    box-shadow: 0 0 10px 1px rgba(61, 209, 214, 0.65);
+    animation: lamp-breathe 3.6s ease-in-out infinite;
  }
-  .dot--busy {
+  .lamp--live::after { animation: lamp-ring 3.6s ease-out infinite; }
+  .lamp--connecting {
+    background: var(--cyan-dim);
+    color: var(--cyan);
+    animation: lamp-blink 1.4s ease-in-out infinite;
+  }
+  .lamp--working {
    background: var(--amber);
+    color: var(--amber);
    box-shadow: 0 0 10px 1px rgba(245, 182, 87, 0.7);
-    animation: pulse 1s ease-in-out infinite;
+    animation: lamp-pulse 1s ease-in-out infinite;
  }
-  .dot--error {
+  .lamp--working::after { animation: lamp-ring 1s ease-out infinite; }
+  .lamp--error {
    background: var(--danger);
+    color: var(--danger);
    box-shadow: 0 0 10px 1px var(--danger-glow);
+    animation: lamp-pulse 1.2s ease-in-out infinite;
  }
-  @keyframes breathe { 0%, 100% { opacity: 0.55; } 50% { opacity: 1; } }
-  @keyframes pulse {
-    0%, 100% { transform: scale(0.82); opacity: 0.7; }
-    50% { transform: scale(1.15); opacity: 1; }
+  @keyframes lamp-breathe { 0%, 100% { opacity: 0.6; } 50% { opacity: 1; } }
+  @keyframes lamp-blink { 0%, 100% { opacity: 0.35; } 50% { opacity: 0.9; } }
+  @keyframes lamp-pulse {
+    0%, 100% { transform: scale(0.82); opacity: 0.75; }
+    50% { transform: scale(1.12); opacity: 1; }
+  }
+  @keyframes lamp-ring {
+    0% { opacity: 0.5; transform: scale(0.6); }
+    70% { opacity: 0; transform: scale(1.8); }
+    100% { opacity: 0; transform: scale(1.8); }
+  }
+  .lamp-text {
+    letter-spacing: 0.04em;
+    color: var(--ink-dim);
+    max-width: 88px;
+    overflow: hidden;
+    text-overflow: ellipsis;
+    white-space: nowrap;
+  }
+  .lamp-text--live .sid { color: var(--cyan); letter-spacing: 0.06em; }
+  .lamp-text--working { color: var(--amber); }
+  .lamp-text--error { color: var(--danger-bright); }
+  .lamp-text--connecting { color: var(--ink-faint); }
+  .sid { font-family: var(--mono); }
+  /* On the tightest phones the title + lamp text + two buttons crowd; keep the
+     living dot (the system pulse) and drop the text label until there's room. */
+  @media (max-width: 439px) {
+    .lamp-text { display: none; }
+    .lamp-wrap { padding: 0; }
  }

-  /* touch-friendly buttons */
-  .controls-toggle,
-  .new-session {
-    min-height: 40px;
-    padding: 0 13px;
+  /* rail buttons — touch-first (≥44px tall via padding + line height) */
+  .rail-btn {
+    min-height: 44px;
+    padding: 0 14px;
    border-radius: var(--radius-sm);
    border: 1px solid var(--line-strong);
    background: var(--bg-2);
    color: var(--ink-dim);
    font-size: 13px;
-    letter-spacing: 0.02em;
+    letter-spacing: 0.03em;
    display: inline-flex;
    align-items: center;
-    gap: 5px;
+    gap: 6px;
+    transition: border-color 0.15s, background 0.15s, color 0.15s;
  }
-  .controls-toggle {
-    border-color: #5a4a2a;
+  .rail-btn:hover:not(:disabled) { border-color: var(--line-bright); color: var(--ink); }
+  .rail-btn:active:not(:disabled) { background: var(--bg-3); }
+  .rail-btn:disabled { opacity: 0.42; }
+  .rail-btn--vm {
+    border-color: var(--amber-dim);
    color: var(--amber);
  }
-  .controls-toggle:active,
-  .new-session:active {
-    background: var(--bg-3);
-  }
-  .new-session:disabled {
-    opacity: 0.45;
-  }
+  .rail-btn--vm:hover:not(:disabled) { border-color: var(--amber); color: var(--amber); }
+  .bolt { font-size: 13px; line-height: 1; }

-  .rail-error {
+  .rail-note {
    margin: 10px 12px 0;
-    padding: 11px 14px;
+    padding: 10px 13px;
    border: 1px solid var(--danger-deep);
    border-left-width: 3px;
    background: rgba(255, 77, 77, 0.07);
-    color: #ffd5d5;
+    color: #ffd9d9;
    border-radius: var(--radius-sm);
    font-size: 13px;
    line-height: 1.5;
+    display: flex;
+    flex-wrap: wrap;
+    align-items: center;
+    gap: 6px 12px;
    flex: none;
  }
+  .rail-note-aside { color: #f0b8b8; }
+  .rail-note-aside strong { color: #fff; font-family: var(--mono); }
+  .rail-note-retry {
+    margin-left: auto;
+    border: 1px solid var(--danger-deep);
+    background: transparent;
+    color: var(--danger-bright);
+    border-radius: 6px;
+    padding: 6px 12px;
+    font-size: 12px;
+    min-height: 36px;
+  }
+  .rail-note-retry:hover { background: rgba(255, 77, 77, 0.12); }
+
+  .toast {
+    margin: 10px 12px 0;
+    padding: 9px 13px;
+    border: 1px solid var(--line-strong);
+    border-left: 3px solid var(--amber);
+    background: var(--bg-2);
+    color: var(--amber);
+    border-radius: var(--radius-sm);
+    font-family: var(--mono);
+    font-size: 12.5px;
+    line-height: 1.45;
+    flex: none;
+    animation: rise-in 0.28s ease-out both;
+  }

  /* ── stage ───────────────────────────────────────────────────────────── */
  .stage {
@ -271,31 +540,37 @@
    right: 0;
    bottom: 0;
    z-index: 40;
-    max-height: 86dvh;
-    overflow-y: auto;
+    max-height: 88dvh;
+    display: flex;
+    flex-direction: column;
    background: var(--bg-1);
    border-top: 1px solid var(--line-strong);
-    border-radius: 16px 16px 0 0;
-    box-shadow: 0 -18px 40px rgba(0, 0, 0, 0.55);
-    padding: 8px 14px calc(14px + env(safe-area-inset-bottom));
-    transform: translateY(101%);
-    transition: transform 0.26s cubic-bezier(0.32, 0.72, 0, 1);
+    border-radius: var(--radius-lg) var(--radius-lg) 0 0;
+    box-shadow: var(--shadow-sheet);
+    padding: 8px 14px calc(14px + var(--safe-bottom));
+    transform: translateY(102%);
+    transition: transform 0.3s cubic-bezier(0.32, 0.72, 0, 1);
+    /* the rise-in entrance is for the desktop column; the sheet is transform-
+       controlled, so cancel the shared keyframe here. */
+    animation: none !important;
  }
  .controls-pane.open {
    transform: translateY(0);
  }
  .sheet-grip {
-    width: 38px;
+    width: 40px;
    height: 4px;
    border-radius: 99px;
-    background: var(--line-strong);
+    background: var(--line-bright);
    margin: 4px auto 10px;
+    flex: none;
  }
  .controls-head {
    display: flex;
    align-items: center;
    justify-content: space-between;
    margin-bottom: 10px;
+    flex: none;
  }
  .controls-head-title {
    font-family: var(--mono);
@ -305,14 +580,15 @@
    color: var(--amber);
  }
  .sheet-close {
-    width: 34px;
-    height: 34px;
+    width: 40px;
+    height: 40px;
    border-radius: var(--radius-sm);
    border: 1px solid var(--line-strong);
    background: var(--bg-2);
    color: var(--ink-dim);
    font-size: 14px;
  }
+  .sheet-close:active { background: var(--bg-3); }

  .sheet-backdrop {
    position: fixed;
@ -320,40 +596,40 @@
    z-index: 30;
    border: 0;
    padding: 0;
-    background: rgba(0, 0, 0, 0.55);
+    background: rgba(2, 4, 7, 0.62);
+    backdrop-filter: blur(1.5px);
    opacity: 0;
    pointer-events: none;
-    transition: opacity 0.22s;
+    transition: opacity 0.24s;
  }
  .sheet-backdrop.show {
    opacity: 1;
    pointer-events: auto;
  }

-  /* ── desktop: controls become a static side column, sheet chrome gone ── */
+  /* ── desktop: controls become a static side column ─────────────────────── */
  @media (min-width: 900px) {
-    .rail {
-      padding: 14px 18px;
-    }
+    .rail { padding: 14px 18px; }
    h1 { font-size: 19px; }
    .stage {
      display: grid;
-      grid-template-columns: minmax(0, 1fr) 372px;
+      grid-template-columns: minmax(0, 1fr) 384px;
      gap: 16px;
      padding: 16px 18px 18px;
    }
    .chat-pane { display: flex; }
-    .controls-toggle { display: none; }
+    .rail-btn--vm { display: none; }
    .controls-pane {
      position: static;
      max-height: none;
-      overflow: visible;
      transform: none;
      box-shadow: none;
      border: none;
      border-radius: 0;
      padding: 0;
      z-index: auto;
+      animation: rise-in 0.5s cubic-bezier(0.22, 0.61, 0.36, 1) both !important;
+      animation-delay: var(--d, 0ms) !important;
    }
    .sheet-grip,
    .controls-head,
--- a/frontend/src/Chat.svelte
+++ b/frontend/src/Chat.svelte
@ -1,128 +1,105 @@
 <script>
  import { tick } from 'svelte';
-  import { streamChat } from './lib/api.js';
  import ToolChip from './ToolChip.svelte';

  let {
-    sessionId = '',
-    sessionReady = false,
-    onLiveSession = (/** @type {string} */ _id) => {},
-    onStreamingChange = (/** @type {boolean} */ _v) => {},
+    tx, // the folded transcript state (plain object, see lib/transcript.js)
+    rev = 0, // bumped on every in-place mutation to retrigger reactivity
+    caughtUp = false, // replay drained → staggered reveal may run
+    turnActive = false, // a turn is running: show Stop, hide Send
+    sending = false, // a prompt POST is in flight (brief)
+    linkState = 'connecting', // connecting | attached | error
+    onSubmit = (/** @type {string} */ _p) => {},
+    onStop = () => {},
  } = $props();

-  /**
-   * Message model. A user message is plain text. An assistant message is an
-   * ordered list of parts so streamed prose and tool chips interleave in the
-   * exact order the agent emitted them:
-   *   { role:'assistant', parts:[{type:'text',text}|{type:'tool',name,command}],
-   *     result?: {is_error, text, duration_ms}, error?: string }
-   * @type {Array<any>}
-   */
-  let messages = $state([]);
+  // The five quick-action presets — the mobile win: one tap, no typing.
+  const PRESETS = [
+    {
+      label: 'Triage',
+      icon: '◑',
+      prompt:
+        'Triage the devvm: uptime, load, memory, swap, disk usage, failed systemd units, and the last 30 lines of dmesg. Summarize what\'s wrong.',
+    },
+    {
+      label: 'Memory / OOM',
+      icon: '▦',
+      prompt:
+        'Check devvm memory pressure: free -h, top memory consumers, any recent OOM-kills in dmesg/journal, and swap usage. Is it OOMing?',
+    },
+    {
+      label: 'Disk',
+      icon: '▤',
+      prompt:
+        'What\'s filling the devvm disk? df -h, then the biggest directories/files under the fullest mount. Anything safe to clear?',
+    },
+    {
+      label: 'Services',
+      icon: '⚙',
+      prompt:
+        'List failed or stuck systemd units on the devvm (systemctl --failed) and show the status + recent journal lines for any that are down.',
+    },
+    {
+      label: 'QEMU wedged?',
+      icon: '◫',
+      prompt:
+        'Is the devvm\'s QEMU wedged (I/O stall)? Check guest responsiveness over SSH, then ssh pve forensics for VM 102\'s qm status/QMP/guest-agent. Tell me if a cycle is needed.',
+    },
+  ];
+
  let draft = $state('');
-  let streaming = $state(false);
-  let scroller; // the scroll viewport
+  let scroller;
  let inputEl;
-  let pinnedToBottom = true; // auto-scroll only while the user is at the bottom
+  let pinnedToBottom = true;

-  const canSend = $derived(sessionReady && !streaming && draft.trim().length > 0);
+  // re-derive the message list whenever the folder mutates (rev bump). The
+  // transcript is folded with in-place mutation on a $state.raw object, so no
+  // reference changes on its own — we depend on `rev` explicitly and rebuild
+  // fresh objects (message + its parts array) so Svelte's keyed {#each} re-
+  // renders streamed prose/chips on every token. Transcripts are small; the
+  // per-token copy is cheap and keeps the hot streaming path bug-free.
+  const messages = $derived(
+    rev >= 0 && tx
+      ? tx.messages.map((m) =>
+          m.role === 'assistant' ? { ...m, parts: m.parts.slice() } : { ...m }
+        )
+      : []
+  );
+  const isEmpty = $derived(messages.length === 0);
+  const canSend = $derived(linkState !== 'error' && !turnActive && draft.trim().length > 0);
+  const inputReady = $derived(!turnActive);

-  // ── scrolling ─────────────────────────────────────────────────────────────
+  // ── auto-scroll (only while pinned to the bottom) ─────────────────────────
  function onScroll() {
    if (!scroller) return;
    const gap = scroller.scrollHeight - scroller.scrollTop - scroller.clientHeight;
-    pinnedToBottom = gap < 60;
+    pinnedToBottom = gap < 64;
  }
  async function scrollToBottom(force = false) {
    if (!force && !pinnedToBottom) return;
    await tick();
    if (scroller) scroller.scrollTop = scroller.scrollHeight;
  }
-
-  // ── streaming a turn ────────────────────────────────────────────────────────
-  function lastAssistant() {
-    return messages[messages.length - 1];
-  }
-
-  function appendText(text) {
-    const msg = lastAssistant();
-    const parts = msg.parts;
-    const tail = parts[parts.length - 1];
-    if (tail && tail.type === 'text') {
-      tail.text += text;
-    } else {
-      parts.push({ type: 'text', text });
-    }
-    messages = messages; // notify Svelte of the in-place mutation
-  }
-
-  function handleEvent(ev) {
-    switch (ev?.kind) {
-      case 'session':
-        onLiveSession(ev.session_id);
-        break;
-      case 'text':
-        if (ev.text) appendText(ev.text);
-        break;
-      case 'tool': {
-        // Bash carries a `command`; other tools just show their name.
-        const command =
-          ev.input && typeof ev.input.command === 'string' ? ev.input.command : '';
-        lastAssistant().parts.push({ type: 'tool', name: ev.name || 'tool', command });
-        messages = messages;
-        break;
-      }
-      case 'result':
-        lastAssistant().result = {
-          is_error: Boolean(ev.is_error),
-          text: typeof ev.result === 'string' ? ev.result : '',
-          duration_ms: typeof ev.duration_ms === 'number' ? ev.duration_ms : null,
-        };
-        messages = messages;
-        break;
-      case 'error':
-        lastAssistant().error = ev.error || 'unknown error';
-        messages = messages;
-        break;
-      case 'done':
-        // handled by the stream completing; nothing to render
-        break;
-      default:
-        break;
-    }
+  // any transcript change → keep the view pinned if the user is at the bottom
+  $effect(() => {
+    rev; // track
    scrollToBottom();
+  });
+
+  function fire(prompt) {
+    if (turnActive) return;
+    pinnedToBottom = true;
+    onSubmit(prompt);
+    scrollToBottom(true);
  }

-  async function send() {
-    const prompt = draft.trim();
-    if (!prompt || streaming || !sessionReady) return;
-
-    messages.push({ role: 'user', text: prompt });
-    messages.push({ role: 'assistant', parts: [] });
-    messages = messages;
+  function send() {
+    const text = draft.trim();
+    if (!text || turnActive) return;
    draft = '';
-    streaming = true;
-    onStreamingChange(true);
-    pinnedToBottom = true;
-    await scrollToBottom(true);
-
-    try {
-      await streamChat({ session_id: sessionId, prompt }, handleEvent);
-    } catch (err) {
-      // Network/transport failure (backend down, connection dropped mid-stream).
-      const msg = lastAssistant();
-      if (msg && msg.role === 'assistant' && !msg.error) {
-        msg.error =
-          (err instanceof Error ? err.message : String(err)) +
-          ' — the connection to the agent failed.';
-        messages = messages;
-      }
-    } finally {
-      streaming = false;
-      onStreamingChange(false);
-      await scrollToBottom();
-      inputEl?.focus();
-    }
+    fire(text);
+    // restore single-row height after clearing
+    tick().then(() => inputEl?.focus());
  }

  function onKeydown(e) {
@ -130,7 +107,7 @@
      e.preventDefault();
      send();
    }
-    // Shift+Enter falls through to insert a newline.
+    // Shift+Enter → newline (default behaviour)
  }

  function fmtDuration(ms) {
@ -139,7 +116,12 @@
    return `${(ms / 1000).toFixed(ms < 10000 ? 1 : 0)} s`;
  }

-  const isEmpty = $derived(messages.length === 0);
+  // a freshly-attached transcript reveals with a brief stagger; cap the delay
+  // so a long replay doesn't animate forever.
+  function revealDelay(i) {
+    if (!caughtUp) return 0;
+    return Math.min(i, 6) * 45;
+  }
 </script>

 <div class="chat">
@ -150,41 +132,58 @@

  <div class="stream" bind:this={scroller} onscroll={onScroll}>
    {#if isEmpty}
-      <div class="empty">
-        <div class="empty-mark">⌁</div>
-        <p class="empty-title">The agent is standing by.</p>
+      <div class="empty" class:dim={linkState === 'connecting'}>
+        <div class="empty-mark" aria-hidden="true">⌁</div>
+        <p class="empty-title">
+          {#if linkState === 'error'}
+            The agent is unreachable.
+          {:else if linkState === 'connecting'}
+            Attaching to the session…
+          {:else}
+            The agent is standing by.
+          {/if}
+        </p>
        <p class="empty-sub">
-          Describe the symptom — "devvm is unreachable", "disk full", "ssh hangs"
-          — and it will connect over SSH, investigate, and stream its work here.
-          For a hard power action when the agent can't help, use
-          <strong>Direct VM control</strong>.
+          {#if linkState === 'error'}
+            The cluster or network may be down. You can still power-cycle the VM
+            with <strong>⚡ Direct VM control</strong> — it needs no agent.
+          {:else}
+            Tap a preset below or describe the symptom — "devvm unreachable",
+            "disk full", "ssh hangs" — and it will connect over SSH, investigate,
+            and stream its work here. For a hard power action, use
+            <strong>⚡ Direct VM control</strong>.
+          {/if}
        </p>
      </div>
    {/if}

-    {#each messages as msg, i (i)}
+    {#each messages as msg (msg.key)}
      {#if msg.role === 'user'}
-        <div class="row row--user">
+        <div class="row row--user rise-in" style="--d:{revealDelay(0)}ms">
          <div class="bubble bubble--user">{msg.text}</div>
        </div>
      {:else}
-        <div class="row row--assistant">
+        <div class="row row--assistant rise-in" style="--d:{revealDelay(0)}ms">
          <div class="bubble bubble--assistant">
-            {#if msg.parts.length === 0 && !msg.result && !msg.error}
+            {#if msg.parts.length === 0 && !msg.result && !msg.error && !msg.cancelled}
              <span class="thinking" aria-label="working">
                <span></span><span></span><span></span>
              </span>
            {/if}
            {#each msg.parts as part, j (j)}
-              {#if part.type === 'text'}
-                <span class="prose">{part.text}</span>
-              {:else}
-                <ToolChip name={part.name} command={part.command} />
-              {/if}
+              {#if part.type === 'text'}<span class="prose">{part.text}</span>{:else}<ToolChip name={part.name} command={part.command} />{/if}
            {/each}

            {#if msg.error}
-              <div class="turn-note turn-note--error">⚠ {msg.error}</div>
+              <div class="turn-note turn-note--error">
+                <span class="turn-note-tag">error</span>
+                <span class="turn-note-body">{msg.error}</span>
+              </div>
+            {:else if msg.cancelled}
+              <div class="turn-note turn-note--muted">
+                <span class="turn-note-tag">stopped</span>
+                <span class="turn-note-body">turn cancelled</span>
+              </div>
            {:else if msg.result}
              <div class="turn-note {msg.result.is_error ? 'turn-note--error' : 'turn-note--ok'}">
                <span class="turn-note-tag">{msg.result.is_error ? 'failed' : 'done'}</span>
@ -200,36 +199,61 @@
    {/each}
  </div>

-  <form
-    class="composer"
-    onsubmit={(e) => {
-      e.preventDefault();
-      send();
-    }}
-  >
-    {#if streaming}
-      <div class="working-bar" aria-live="polite">
-        <span class="working-dots"><span></span><span></span><span></span></span>
-        agent working — streaming live
-      </div>
-    {/if}
-    <div class="composer-row">
-      <textarea
-        bind:this={inputEl}
-        bind:value={draft}
-        onkeydown={onKeydown}
-        placeholder={sessionReady
-          ? 'Describe the problem…  (Enter to send · Shift+Enter for a new line)'
-          : 'Waiting for a session…'}
-        rows="1"
-        disabled={!sessionReady || streaming}
-        spellcheck="false"
-      ></textarea>
-      <button type="submit" class="send" disabled={!canSend}>
-        {streaming ? '…' : 'Send'}
-      </button>
+  <div class="dock">
+    <!-- quick-action preset bar: horizontally scrollable, one-tap prompts -->
+    <div class="presets" role="group" aria-label="Quick actions">
+      {#each PRESETS as p (p.label)}
+        <button
+          class="preset"
+          onclick={() => fire(p.prompt)}
+          disabled={turnActive || linkState === 'error'}
+          title={p.prompt}
+        >
+          <span class="preset-icon" aria-hidden="true">{p.icon}</span>
+          <span class="preset-label">{p.label}</span>
+        </button>
+      {/each}
    </div>
-  </form>
+
+    <form
+      class="composer"
+      onsubmit={(e) => {
+        e.preventDefault();
+        send();
+      }}
+    >
+      {#if turnActive}
+        <div class="working-bar" aria-live="polite">
+          <span class="working-dots"><span></span><span></span><span></span></span>
+          <span>agent working — streaming live</span>
+        </div>
+      {/if}
+      <div class="composer-row">
+        <textarea
+          bind:this={inputEl}
+          bind:value={draft}
+          onkeydown={onKeydown}
+          placeholder={inputReady
+            ? 'Describe the problem…  (Enter to send · Shift+Enter for a new line)'
+            : 'A turn is running — Stop it to type, or wait…'}
+          rows="1"
+          disabled={!inputReady}
+          spellcheck="false"
+          enterkeyhint="send"
+        ></textarea>
+        {#if turnActive}
+          <button type="button" class="stop" onclick={onStop} title="Stop the running turn">
+            <span class="stop-glyph" aria-hidden="true"></span>
+            Stop
+          </button>
+        {:else}
+          <button type="submit" class="send" disabled={!canSend}>
+            {sending ? '···' : 'Send'}
+          </button>
+        {/if}
+      </div>
+    </form>
+  </div>
 </div>

 <style>
@ -249,9 +273,10 @@
    display: flex;
    align-items: baseline;
    gap: 12px;
-    padding: 13px 18px;
+    padding: 12px 18px;
    border-bottom: 1px solid var(--line);
-    background: linear-gradient(180deg, rgba(255, 255, 255, 0.015), transparent);
+    background: linear-gradient(180deg, rgba(255, 255, 255, 0.018), transparent);
+    flex: none;
  }
  .chat-head-label {
    font-family: var(--mono);
@ -263,13 +288,16 @@
  .chat-head-hint {
    font-size: 12px;
    color: var(--ink-faint);
+    white-space: nowrap;
+    overflow: hidden;
+    text-overflow: ellipsis;
  }

  .stream {
    flex: 1;
    min-height: 0;
    overflow-y: auto;
-    padding: 20px 18px 8px;
+    padding: 20px 16px 10px;
    display: flex;
    flex-direction: column;
    gap: 14px;
@ -279,23 +307,27 @@
  /* empty state */
  .empty {
    margin: auto;
-    max-width: 460px;
+    max-width: 470px;
    text-align: center;
-    padding: 28px 12px;
+    padding: 24px 14px;
    color: var(--ink-dim);
  }
+  .empty.dim { opacity: 0.8; }
  .empty-mark {
-    font-size: 40px;
+    font-size: 42px;
    color: var(--cyan-dim);
    line-height: 1;
    margin-bottom: 14px;
-    text-shadow: 0 0 24px rgba(61, 209, 214, 0.25);
+    text-shadow: 0 0 26px rgba(61, 209, 214, 0.3);
+    animation: lamp-breathe 3.6s ease-in-out infinite;
  }
+  @keyframes lamp-breathe { 0%, 100% { opacity: 0.7; } 50% { opacity: 1; } }
  .empty-title {
    font-family: var(--mono);
    color: var(--ink);
    font-size: 15px;
    margin: 0 0 8px;
+    letter-spacing: 0.01em;
  }
  .empty-sub {
    font-size: 13px;
@ -303,32 +335,23 @@
    color: var(--ink-faint);
    margin: 0;
  }
-  .empty-sub strong {
-    color: var(--ink-dim);
-    font-weight: 600;
-  }
+  .empty-sub strong { color: var(--ink-dim); font-weight: 600; }

-  .row {
-    display: flex;
-  }
-  .row--user {
-    justify-content: flex-end;
-  }
-  .row--assistant {
-    justify-content: flex-start;
-  }
+  .row { display: flex; }
+  .row--user { justify-content: flex-end; }
+  .row--assistant { justify-content: flex-start; }

  .bubble {
-    max-width: 86%;
+    max-width: 88%;
    border-radius: 13px;
    padding: 11px 14px;
    font-size: 14px;
-    line-height: 1.6;
+    line-height: 1.62;
    word-wrap: break-word;
    overflow-wrap: anywhere;
  }
  .bubble--user {
-    background: linear-gradient(180deg, #15333a, #0f262c);
+    background: linear-gradient(180deg, #123036, #0d2329);
    border: 1px solid var(--cyan-dim);
    color: #d8f6f7;
    border-bottom-right-radius: 4px;
@ -341,12 +364,9 @@
    border-bottom-left-radius: 4px;
    color: var(--ink);
  }
-  /* prose renders inline so text and tool chips share the same flow */
-  .prose {
-    white-space: pre-wrap;
-  }
+  .prose { white-space: pre-wrap; }

-  /* in-flight assistant "thinking" dots */
+  /* in-flight "thinking" dots */
  .thinking,
  .working-dots {
    display: inline-flex;
@ -363,19 +383,15 @@
    animation: blink 1.2s infinite ease-in-out;
  }
  .thinking span:nth-child(2),
-  .working-dots span:nth-child(2) {
-    animation-delay: 0.18s;
-  }
+  .working-dots span:nth-child(2) { animation-delay: 0.18s; }
  .thinking span:nth-child(3),
-  .working-dots span:nth-child(3) {
-    animation-delay: 0.36s;
-  }
+  .working-dots span:nth-child(3) { animation-delay: 0.36s; }
  @keyframes blink {
    0%, 80%, 100% { opacity: 0.25; transform: translateY(0); }
    40% { opacity: 1; transform: translateY(-2px); }
  }

-  /* turn result / error footer inside the assistant bubble */
+  /* turn result / error / stopped footer inside the assistant bubble */
  .turn-note {
    margin-top: 10px;
    padding: 7px 10px;
@ -396,9 +412,16 @@
    color: #bff5d3;
  }
  .turn-note--error {
-    background: rgba(255, 77, 77, 0.08);
-    border: 1px solid var(--danger-deep);
-    color: #ffd5d5;
+    /* the error tint here is amber-leaning text on a faint warm wash, NOT the
+       reserved power-action red — a turn error is not a destructive action. */
+    background: rgba(245, 182, 87, 0.06);
+    border: 1px solid var(--amber-dim);
+    color: #f7d49a;
+  }
+  .turn-note--muted {
+    background: rgba(255, 255, 255, 0.02);
+    border: 1px solid var(--line-strong);
+    color: var(--ink-faint);
  }
  .turn-note-tag {
    text-transform: uppercase;
@ -409,20 +432,55 @@
    border: 1px solid currentColor;
    opacity: 0.85;
  }
-  .turn-note-body {
-    flex: 1;
-    min-width: 0;
-  }
-  .turn-note-time {
-    margin-left: auto;
-    color: var(--ink-faint);
+  .turn-note-body { flex: 1; min-width: 0; }
+  .turn-note-time { margin-left: auto; color: var(--ink-faint); }
+
+  /* ── dock: presets + composer, pinned to the bottom ────────────────────── */
+  .dock {
+    flex: none;
+    border-top: 1px solid var(--line);
+    background: linear-gradient(0deg, rgba(255, 255, 255, 0.015), transparent);
  }

-  /* ── composer ─────────────────────────────────────────────────────────── */
+  .presets {
+    display: flex;
+    gap: 8px;
+    overflow-x: auto;
+    padding: 11px 12px 4px;
+    scrollbar-width: none;
+    -webkit-overflow-scrolling: touch;
+    /* fade the right edge to hint there's more to scroll */
+    mask-image: linear-gradient(90deg, transparent 0, #000 14px, #000 calc(100% - 18px), transparent 100%);
+  }
+  .presets::-webkit-scrollbar { display: none; }
+  .preset {
+    flex: none;
+    min-height: 38px;
+    display: inline-flex;
+    align-items: center;
+    gap: 7px;
+    padding: 0 13px;
+    border-radius: 999px;
+    border: 1px solid var(--line-strong);
+    background: var(--bg-2);
+    color: var(--ink-dim);
+    font-family: var(--mono);
+    font-size: 12.5px;
+    letter-spacing: 0.02em;
+    white-space: nowrap;
+    transition: border-color 0.15s, color 0.15s, background 0.15s, transform 0.06s;
+  }
+  .preset:hover:not(:disabled) {
+    border-color: var(--cyan-dim);
+    color: var(--ink);
+    background: var(--bg-3);
+  }
+  .preset:active:not(:disabled) { transform: translateY(1px); }
+  .preset:disabled { opacity: 0.4; }
+  .preset-icon { color: var(--cyan); font-size: 12px; }
+
  .composer {
-    border-top: 1px solid var(--line);
-    padding: 12px;
-    background: linear-gradient(0deg, rgba(255, 255, 255, 0.012), transparent);
+    padding: 8px 12px calc(12px + var(--safe-bottom));
  }
  .working-bar {
    display: flex;
@ -431,7 +489,7 @@
    font-family: var(--mono);
    font-size: 12px;
    color: var(--amber);
-    padding: 0 4px 9px;
+    padding: 2px 4px 9px;
    letter-spacing: 0.02em;
  }
  .composer-row {
@ -442,13 +500,13 @@
  textarea {
    flex: 1;
    resize: none;
-    max-height: 168px;
+    max-height: 160px;
    min-height: 48px;
    background: var(--bg-2);
    color: var(--ink);
    border: 1px solid var(--line-strong);
    border-radius: var(--radius-sm);
-    padding: 12px 13px;
+    padding: 13px 13px;
    font-family: var(--sans);
    /* 16px: anything smaller makes iOS Safari auto-zoom on focus (mobile is the
       primary client) — the zoom then shifts the composer out of view. */
@ -458,39 +516,60 @@
    transition: border-color 0.15s, box-shadow 0.15s;
    field-sizing: content; /* progressive: auto-grows where supported */
  }
-  textarea::placeholder {
-    color: var(--ink-faint);
-  }
+  textarea::placeholder { color: var(--ink-faint); }
  textarea:focus {
    border-color: var(--cyan-dim);
    box-shadow: 0 0 0 3px rgba(61, 209, 214, 0.12);
  }
-  textarea:disabled {
-    opacity: 0.55;
-  }
+  textarea:disabled { opacity: 0.55; }

-  .send {
+  .send,
+  .stop {
    flex: none;
    align-self: stretch;
-    min-width: 78px;
+    min-width: 82px;
+    min-height: 48px;
    padding: 0 18px;
    border-radius: var(--radius-sm);
-    border: 1px solid var(--cyan-dim);
-    background: linear-gradient(180deg, #19474b, #103539);
-    color: #d8f6f7;
    font-size: 13px;
    font-weight: 600;
-    letter-spacing: 0.04em;
-    transition: filter 0.15s, border-color 0.15s, opacity 0.15s;
+    letter-spacing: 0.05em;
+    transition: filter 0.15s, border-color 0.15s, opacity 0.15s, background 0.15s;
  }
-  .send:hover:not(:disabled) {
-    filter: brightness(1.22);
-    border-color: var(--cyan);
+  .send {
+    border: 1px solid var(--cyan-dim);
+    background: linear-gradient(180deg, #16464a, #0e3438);
+    color: #d8f6f7;
  }
+  .send:hover:not(:disabled) { filter: brightness(1.24); border-color: var(--cyan); }
  .send:disabled {
    opacity: 0.4;
    background: var(--bg-2);
    border-color: var(--line-strong);
    color: var(--ink-faint);
  }
+  /* Stop is NOT red — red is reserved for destructive VM power. Stop is a calm
+     neutral control with a square "halt" glyph. */
+  .stop {
+    display: inline-flex;
+    align-items: center;
+    justify-content: center;
+    gap: 8px;
+    border: 1px solid var(--line-bright);
+    background: var(--bg-3);
+    color: var(--ink);
+  }
+  .stop:hover { border-color: var(--ink-faint); filter: brightness(1.1); }
+  .stop-glyph {
+    width: 10px;
+    height: 10px;
+    border-radius: 2px;
+    background: var(--amber);
+    box-shadow: 0 0 8px rgba(245, 182, 87, 0.55);
+    animation: lamp-pulse 1s ease-in-out infinite;
+  }
+  @keyframes lamp-pulse {
+    0%, 100% { transform: scale(0.85); opacity: 0.8; }
+    50% { transform: scale(1.08); opacity: 1; }
+  }
 </style>
--- a/frontend/src/VmControls.svelte
+++ b/frontend/src/VmControls.svelte
@ -293,7 +293,8 @@
    align-items: center;
    justify-content: center;
    gap: 8px;
-    padding: 9px 15px;
+    min-height: 44px; /* touch target */
+    padding: 10px 16px;
    border-radius: var(--radius-sm);
    font-size: 13px;
    font-weight: 600;
@ -408,7 +409,8 @@
  }
  .confirm-yes {
    flex: 1;
-    padding: 9px;
+    min-height: 44px;
+    padding: 10px;
    border-radius: var(--radius-sm);
    border: 1px solid var(--danger-bright);
    background: var(--danger);
@ -424,7 +426,8 @@
  }
  .confirm-no {
    flex: 1;
-    padding: 9px;
+    min-height: 44px;
+    padding: 10px;
    border-radius: var(--radius-sm);
    border: 1px solid var(--line-strong);
    background: var(--bg-2);
--- a/frontend/src/app.css
+++ b/frontend/src/app.css
@ -1,48 +1,70 @@
 /* ───────────────────────────────────────────────────────────────────────────
   devvm breakglass — global theme
-   A recovery console: dark, high-contrast, terminal-adjacent. Calm by default;
-   danger is the only loud thing on the screen. No external fonts/CDNs — system
-   monospace carries the identity, system sans carries readable prose.
+   Emergency recovery console / instrument panel. Dark, high-contrast, monospace
+   identity, calm by default. Danger (red) is reserved EXCLUSIVELY for the
+   destructive VM power actions — nothing else on the screen is ever red. No
+   external fonts/CDNs (air-gapped cluster): a refined system-monospace stack
+   carries the identity, system-sans carries readable prose. Distinctiveness is
+   earned through composition, the living "system pulse" lamp, motion, hairlines,
+   and the reserved danger treatment — not through a downloaded typeface.
   ─────────────────────────────────────────────────────────────────────────── */

 :root {
-  /* Surfaces — a near-black slate with cool undertone, layered for depth. */
-  --bg-0: #07090c;       /* page base */
-  --bg-1: #0c1015;       /* panel */
-  --bg-2: #11171e;       /* raised panel / input */
-  --bg-3: #161d26;       /* chips, hover */
-  --bg-term: #06080a;    /* command-output panels */
+  /* Surfaces — a near-black slate with a cool undertone, layered for depth. */
+  --bg-0: #06080b;       /* page base (darkened from #07090c for crisper AA) */
+  --bg-1: #0b0f14;       /* panel */
+  --bg-2: #10161d;       /* raised panel / input */
+  --bg-3: #161e27;       /* chips, hover */
+  --bg-term: #05070a;    /* command-output panels */

  /* Hairlines & text */
-  --line: #1d2630;
+  --line: #1c2530;
  --line-strong: #2a3744;
-  --ink: #e6edf3;        /* primary text */
-  --ink-dim: #9bb0c0;    /* secondary text */
-  --ink-faint: #5d7185;  /* labels, meta */
+  --line-bright: #3a4a5a;
+  --ink: #e9eff5;        /* primary text */
+  --ink-dim: #9bb0c0;    /* secondary text — 8.0:1 on bg-2 */
+  /* labels/meta — was #5d7185 (3.6:1, fails AA). Lifted to 6.1:1 on bg-2. */
+  --ink-faint: #8499ab;

-  /* Accents */
-  --cyan: #3dd1d6;       /* "system alive" — links, focus, session dot */
+  /* Accents — the "alive" cyan is the spine of the calm palette. */
+  --cyan: #3dd1d6;       /* "system alive" — links, focus, session pulse */
+  --cyan-bright: #62e3e7;
  --cyan-dim: #1f6f72;
+  --cyan-deep: #0e3133;
  --amber: #f5b657;      /* working / in-flight */
+  --amber-dim: #6a5226;
  --green: #5ddb8e;      /* healthy exit */
  --green-dim: #1f5f3d;

-  /* Danger — reserved EXCLUSIVELY for mutating actions. Nothing else is red. */
+  /* Danger — reserved EXCLUSIVELY for mutating power actions. Nothing else red. */
  --danger: #ff4d4d;
  --danger-bright: #ff6363;
  --danger-deep: #7a1717;
  --danger-glow: rgba(255, 77, 77, 0.35);

-  --radius: 10px;
-  --radius-sm: 7px;
+  --radius: 11px;
+  --radius-sm: 8px;
+  --radius-lg: 16px;

-  --mono: ui-monospace, "JetBrains Mono", "SF Mono", "Cascadia Code",
-    "Fira Code", Menlo, Consolas, "Liberation Mono", monospace;
+  /* A refined, deliberately-ordered monospace stack. We lead with faces that
+     have real character (Berkeley Mono / JetBrains / Cascadia / SF Mono) and
+     fall back gracefully — but ship nothing; whatever the device has carries
+     the cockpit-readout identity. */
+  --mono: "Berkeley Mono", ui-monospace, "JetBrains Mono", "SF Mono",
+    "Cascadia Code", "Fira Code", "Source Code Pro", Menlo, Consolas,
+    "Liberation Mono", monospace;
  --sans: ui-sans-serif, system-ui, -apple-system, "Segoe UI", Roboto,
    "Helvetica Neue", Arial, sans-serif;

-  --shadow-panel: 0 1px 0 rgba(255, 255, 255, 0.02) inset,
-    0 16px 40px -24px rgba(0, 0, 0, 0.9);
+  --shadow-panel: 0 1px 0 rgba(255, 255, 255, 0.025) inset,
+    0 18px 44px -26px rgba(0, 0, 0, 0.95);
+  --shadow-sheet: 0 -22px 48px -12px rgba(0, 0, 0, 0.7);
+
+  /* Safe-area shorthands (notch / home-indicator). 0px fallback off-device. */
+  --safe-top: env(safe-area-inset-top, 0px);
+  --safe-bottom: env(safe-area-inset-bottom, 0px);
+  --safe-left: env(safe-area-inset-left, 0px);
+  --safe-right: env(safe-area-inset-right, 0px);

  color-scheme: dark;
 }
@ -55,23 +77,24 @@ html,
 body {
  margin: 0;
  height: 100%;
-  /* The page itself never scrolls — the chat stream scrolls internally. This
-     keeps the composer pinned and stops iOS rubber-banding the whole UI. */
+  /* The page itself never scrolls — only the chat stream scrolls internally.
+     This keeps the composer pinned and stops iOS rubber-banding the whole UI. */
  overflow: hidden;
  overscroll-behavior: none;
 }

 body {
  background-color: var(--bg-0);
-  /* Atmosphere: a soft cyan corner-glow over a faint scanline weave, so the
-     surface reads like backlit equipment rather than flat #000. */
+  /* Atmosphere: a soft cyan corner-glow + a faint warm counter-glow over a
+     hairline scanline weave, so the surface reads as backlit equipment rather
+     than flat black. Fixed so it doesn't drift when the chat scrolls. */
  background-image:
-    radial-gradient(120% 80% at 85% -10%, rgba(61, 209, 214, 0.07), transparent 55%),
-    radial-gradient(90% 70% at 10% 110%, rgba(245, 182, 87, 0.04), transparent 50%),
+    radial-gradient(120% 78% at 86% -12%, rgba(61, 209, 214, 0.08), transparent 55%),
+    radial-gradient(90% 70% at 8% 112%, rgba(245, 182, 87, 0.045), transparent 52%),
    repeating-linear-gradient(
      0deg,
-      rgba(255, 255, 255, 0.012) 0px,
-      rgba(255, 255, 255, 0.012) 1px,
+      rgba(255, 255, 255, 0.013) 0px,
+      rgba(255, 255, 255, 0.013) 1px,
      transparent 1px,
      transparent 3px
    );
@ -84,8 +107,8 @@ body {

 #app {
  /* 100dvh (dynamic viewport height) — NOT 100vh/100% — so the composer at the
-     bottom is never hidden behind a mobile browser's address/tool bar. Mobile is
-     the primary client for this tool. 100vh is the fallback for old engines. */
+     bottom is never hidden behind a mobile browser's address/tool bar. 100vh is
+     the fallback for engines without dvh. Mobile is the primary client. */
  height: 100vh;
  height: 100dvh;
 }
@ -94,7 +117,6 @@ button {
  font-family: var(--mono);
  cursor: pointer;
 }
-
 button:disabled {
  cursor: not-allowed;
 }
@ -119,10 +141,26 @@ button:disabled {
  background-clip: content-box;
 }
 *::-webkit-scrollbar-thumb:hover {
-  background: #3a4a5a;
+  background: var(--line-bright);
  background-clip: content-box;
 }

+/* ── Shared motion primitives ──────────────────────────────────────────────
+   One well-orchestrated entrance beats scattered micro-interactions: panels
+   and rows rise a few px with a soft fade, staggered via --d on each element. */
+@keyframes rise-in {
+  from { opacity: 0; transform: translateY(8px); }
+  to { opacity: 1; transform: translateY(0); }
+}
+@keyframes fade-in {
+  from { opacity: 0; }
+  to { opacity: 1; }
+}
+.rise-in {
+  animation: rise-in 0.5s cubic-bezier(0.22, 0.61, 0.36, 1) both;
+  animation-delay: var(--d, 0ms);
+}
+
@media (prefers-reduced-motion: reduce) {
  *,
  *::before,
--- a/frontend/src/lib/api.js
+++ b/frontend/src/lib/api.js
@ -1,8 +1,41 @@
-// Same-origin API client. Auth is handled entirely by the edge proxy
-// (Authentik / basic-auth / bearer) — this UI never sends or stores a token.
-import { readEventStream } from './sse.js';
+// Same-origin API client for the breakglass UI.
+//
+// Auth is handled entirely by the edge proxy (Authentik / basic-auth / bearer):
+// this UI never sends or stores a token, and builds no login screen.
+//
+// The chat uses the tmux/attach model. The conversation lives SERVER-SIDE; we
+// only persist the session_id locally and ATTACH to it over an EventSource. The
+// browser's native EventSource auto-reconnects and sends Last-Event-ID, and the
+// server resumes from there — so there is ZERO reconnect logic here. We just
+// render events idempotently by id (see transcript.js).

-/** Open a fresh chat session. @returns {Promise<string>} session_id */
+const SESSION_KEY = 'breakglass.session_id';
+
+/** Read the persisted session id, or '' if none. */
+export function loadSessionId() {
+  try {
+    return localStorage.getItem(SESSION_KEY) || '';
+  } catch {
+    return '';
+  }
+}
+
+/** Persist the session id (best-effort; private-mode storage may throw). */
+export function saveSessionId(id) {
+  try {
+    if (id) localStorage.setItem(SESSION_KEY, id);
+    else localStorage.removeItem(SESSION_KEY);
+  } catch {
+    /* ignore — storage is a convenience, not a requirement */
+  }
+}
+
+/** Forget the persisted session id (the "New session" archive step). */
+export function clearSessionId() {
+  saveSessionId('');
+}
+
+/** Open a fresh server-side session. @returns {Promise<string>} session_id */
 export async function openSession() {
  const res = await fetch('/api/session', {
    method: 'POST',
@ -19,30 +52,89 @@ export async function openSession() {
 }

 /**
- * Run one chat turn. Streams events to onEvent until the backend sends
- * {kind:"done"} and the connection closes. Pass an AbortSignal to cancel.
+ * Attach to a session's event stream. Returns the live EventSource so the
+ * caller can close() it. Events arrive as:
+ *   - default `message` events: .data is JSON {kind, id, ...}
+ *   - a named `caught-up` event once the replay is drained (.data is {})
+ *   - native `error` events while reconnecting (EventSource retries itself)
 *
- * @param {{session_id: string, prompt: string, model?: string, signal?: AbortSignal}} opts
- * @param {(event: object) => void} onEvent
+ * @param {string} sessionId
+ * @param {{
+ *   onEvent: (e: object) => void,
+ *   onCaughtUp?: () => void,
+ *   onOpen?: () => void,
+ *   onError?: (e: Event) => void,
+ * }} handlers
+ * @returns {EventSource}
 */
-export async function streamChat({ session_id, prompt, model, signal }, onEvent) {
-  const payload = { session_id, prompt };
-  if (model) payload.model = model;
+export function attachStream(sessionId, { onEvent, onCaughtUp, onOpen, onError }) {
+  const es = new EventSource(`/api/session/${encodeURIComponent(sessionId)}/stream`);

-  const res = await fetch('/api/chat', {
-    method: 'POST',
-    headers: {
-      'content-type': 'application/json',
-      accept: 'text/event-stream',
-    },
-    body: JSON.stringify(payload),
-    signal,
-  });
-  await readEventStream(res, onEvent);
+  es.onopen = () => onOpen?.();
+
+  es.onmessage = (e) => {
+    if (!e || typeof e.data !== 'string' || e.data === '') return;
+    let obj;
+    try {
+      obj = JSON.parse(e.data);
+    } catch {
+      // A malformed frame must not abort an in-progress recovery stream.
+      return;
+    }
+    // EventSource exposes the SSE `id:` line as e.lastEventId. The server also
+    // embeds id in the JSON; prefer the JSON id, fall back to lastEventId.
+    if ((obj.id == null || obj.id === '') && e.lastEventId) obj.id = e.lastEventId;
+    onEvent(obj);
+  };
+
+  es.addEventListener('caught-up', () => onCaughtUp?.());
+
+  es.onerror = (e) => {
+    // EventSource auto-reconnects on a transient drop (readyState CONNECTING);
+    // we only surface a hard, terminal failure (readyState CLOSED).
+    onError?.(e);
+  };
+
+  return es;
 }

 /**
- * List the PVE power verbs and which of them mutate VM state.
+ * Start a turn. Output arrives via the attach stream, NOT this response.
+ * @param {{session_id: string, prompt: string, model?: string}} opts
+ * @returns {Promise<{status:'started'|'busy'|'gone'}>}
+ *   started — accepted; busy — 409 (a turn already runs); gone — 404 (re-create).
+ */
+export async function sendPrompt({ session_id, prompt, model }) {
+  const payload = { prompt };
+  if (model) payload.model = model;
+  const res = await fetch(`/api/session/${encodeURIComponent(session_id)}/prompt`, {
+    method: 'POST',
+    headers: { 'content-type': 'application/json' },
+    body: JSON.stringify(payload),
+  });
+  if (res.status === 409) return { status: 'busy' };
+  if (res.status === 404) return { status: 'gone' };
+  if (!res.ok) throw new Error(`could not start the turn (HTTP ${res.status})`);
+  return { status: 'started' };
+}
+
+/**
+ * Cancel the in-flight turn (the Stop button).
+ * @param {string} sessionId
+ * @returns {Promise<boolean>} whether a turn was cancelled
+ */
+export async function cancelTurn(sessionId) {
+  const res = await fetch(`/api/session/${encodeURIComponent(sessionId)}/cancel`, {
+    method: 'POST',
+    headers: { 'content-type': 'application/json' },
+  });
+  if (!res.ok) throw new Error(`could not stop the turn (HTTP ${res.status})`);
+  const body = await res.json().catch(() => ({}));
+  return Boolean(body.cancelled);
+}
+
+/**
+ * List the PVE power verbs and which mutate VM state.
 * @returns {Promise<{verbs: string[], mutating: string[]}>}
 */
 export async function fetchVerbs() {
@ -58,27 +150,26 @@ export async function fetchVerbs() {
 }

 /**
- * Run a PVE power verb directly (no AI in the path). The backend returns 200
- * on success and 502 when the verb's exit code is non-zero, but the JSON body
- * carries {verb, exit_code, stdout, stderr, rejected} in BOTH cases — so we
- * read the body regardless of HTTP status and let the caller style on
- * exit_code / rejected.
+ * Run a PVE power verb directly (no AI in the path). The backend returns 200 on
+ * success and 502 when the verb's exit code is non-zero, but the JSON body
+ * carries {verb, exit_code, stdout, stderr, rejected} in BOTH cases — so we read
+ * the body regardless of HTTP status and let the caller style on exit_code.
 *
 * @param {string} verb
- * @returns {Promise<{verb: string, exit_code: number|null, stdout: string, stderr: string, rejected: boolean}>}
+ * @returns {Promise<{verb:string, exit_code:number|null, stdout:string, stderr:string, rejected:boolean}>}
 */
 export async function runVerb(verb) {
  const res = await fetch(`/api/pve/${encodeURIComponent(verb)}`, {
    method: 'POST',
    headers: { 'content-type': 'application/json' },
  });
-  // 400 = unknown verb (FastAPI HTTPException) — has {detail}, not the verb shape.
  let body;
  try {
    body = await res.json();
  } catch {
    throw new Error(`VM control '${verb}' failed (HTTP ${res.status}, no body)`);
  }
+  // 400 = unknown verb (FastAPI HTTPException) — has {detail}, not the verb shape.
  if (res.status === 400) {
    throw new Error(body?.detail || `'${verb}' was rejected by the server`);
  }
--- a/frontend/src/lib/sse.js
+++ b/frontend/src/lib/sse.js
@ -1,150 +0,0 @@
-// SSE frame parsing — the load-bearing core of the breakglass UI.
-//
-// The /api/chat endpoint returns a text/event-stream that we read with
-// fetch() + response.body.getReader() (NOT EventSource, which cannot POST).
-// The backend emits one frame per event as:
-//
-//     data: {json}\n\n
-//
-// getReader() hands us bytes at arbitrary boundaries: a single frame can be
-// split across reads, and one read can contain several frames. So we keep a
-// rolling text buffer, split it on the blank-line frame delimiter, and only
-// hand back the JSON payload of *complete* frames. Per the SSE spec a frame may
-// carry multiple `data:` lines (joined with "\n"); the backend emits single
-// line JSON today, but we handle the general case so a future multi-line
-// payload can't silently corrupt the stream.
-
-/**
- * Parse a single SSE event block (the text between blank lines) into its data
- * payload string, or null if the block carries no `data:` field (e.g. a bare
- * comment or a `:` heartbeat).
- * @param {string} block
- * @returns {string|null}
- */
-export function dataFromEventBlock(block) {
-  const dataLines = [];
-  for (const rawLine of block.split('\n')) {
-    const line = rawLine.replace(/\r$/, '');
-    if (line.startsWith(':')) continue; // SSE comment / heartbeat
-    if (line === 'data:' || line === 'data') {
-      dataLines.push('');
-    } else if (line.startsWith('data:')) {
-      // Spec: a single leading space after the colon is stripped.
-      let v = line.slice('data:'.length);
-      if (v.startsWith(' ')) v = v.slice(1);
-      dataLines.push(v);
-    }
-    // field lines we don't care about (event:, id:, retry:) are ignored
-  }
-  if (dataLines.length === 0) return null;
-  return dataLines.join('\n');
-}
-
-/**
- * A stateful splitter that turns an arbitrary sequence of decoded text chunks
- * into a sequence of complete SSE event-block strings. Frames are delimited by
- * a blank line; we tolerate both "\n\n" and "\r\n\r\n".
- */
-export class SSEFrameSplitter {
-  constructor() {
-    this.buffer = '';
-  }
-
-  /**
-   * Feed a decoded text chunk; returns the event blocks that are now complete.
-   * Any trailing partial frame stays buffered for the next chunk.
-   * @param {string} chunk
-   * @returns {string[]} complete event blocks (text between delimiters)
-   */
-  push(chunk) {
-    this.buffer += chunk;
-    const blocks = [];
-    // Normalise CRLF delimiters to LF so a single split rule covers both.
-    let idx;
-    // Process every complete frame currently in the buffer.
-    while ((idx = this._nextDelimiter()) !== -1) {
-      const block = this.buffer.slice(0, idx.start);
-      this.buffer = this.buffer.slice(idx.end);
-      if (block.length > 0) blocks.push(block);
-    }
-    return blocks;
-  }
-
-  /**
-   * On stream end, return whatever complete-looking content remains. A
-   * well-behaved backend always terminates the last frame with a blank line,
-   * so this is usually empty — but if the connection closed mid-trailing-frame
-   * with a parseable block, surface it rather than dropping data.
-   * @returns {string[]}
-   */
-  flush() {
-    const rest = this.buffer.trim();
-    this.buffer = '';
-    return rest ? [rest] : [];
-  }
-
-  _nextDelimiter() {
-    // Find the earliest of "\n\n", "\r\n\r\n", "\r\r".
-    const candidates = [
-      { token: '\r\n\r\n', i: this.buffer.indexOf('\r\n\r\n') },
-      { token: '\n\n', i: this.buffer.indexOf('\n\n') },
-      { token: '\r\r', i: this.buffer.indexOf('\r\r') },
-    ].filter((c) => c.i !== -1);
-    if (candidates.length === 0) return -1;
-    candidates.sort((a, b) => a.i - b.i);
-    const { token, i } = candidates[0];
-    return { start: i, end: i + token.length };
-  }
-}
-
-/**
- * Read an SSE Response body to completion, invoking onEvent for every parsed
- * JSON event object. Resolves when the stream ends. Throws if the response is
- * not ok or has no readable body (caller shows the error inline).
- *
- * @param {Response} response  a fetch() Response with a streaming body
- * @param {(event: object) => void} onEvent  called per parsed JSON event
- */
-export async function readEventStream(response, onEvent) {
-  if (!response.ok) {
-    throw new Error(`server returned ${response.status} ${response.statusText}`);
-  }
-  if (!response.body) {
-    throw new Error('response has no readable body (streaming unsupported)');
-  }
-
-  const reader = response.body.getReader();
-  const decoder = new TextDecoder();
-  const splitter = new SSEFrameSplitter();
-
-  const handleBlock = (block) => {
-    const payload = dataFromEventBlock(block);
-    if (payload == null || payload.trim() === '') return;
-    let obj;
-    try {
-      obj = JSON.parse(payload);
-    } catch {
-      // A malformed frame must not abort an in-progress recovery stream;
-      // skip it and keep reading.
-      return;
-    }
-    onEvent(obj);
-  };
-
-  try {
-    for (;;) {
-      const { value, done } = await reader.read();
-      if (done) break;
-      const text = decoder.decode(value, { stream: true });
-      for (const block of splitter.push(text)) handleBlock(block);
-    }
-  } finally {
-    reader.releaseLock?.();
-  }
-  // Drain any trailing bytes the decoder held, then any final frame.
-  const tail = decoder.decode();
-  if (tail) {
-    for (const block of splitter.push(tail)) handleBlock(block);
-  }
-  for (const block of splitter.flush()) handleBlock(block);
-}
--- a/frontend/src/lib/sse.test.mjs
+++ b/frontend/src/lib/sse.test.mjs
@ -1,152 +0,0 @@
-// Standalone test of the SSE frame parser — no test framework, just node.
-// Run: node src/lib/sse.test.mjs   (exits non-zero on any failure)
-//
-// These pin the protocol described in the API contract: frames are
-// `data: {json}\n\n`, the event `kind` is one of session/text/tool/result/
-// error/done, and bytes arrive at arbitrary boundaries via getReader().
-import { SSEFrameSplitter, dataFromEventBlock, readEventStream } from './sse.js';
-
-let failures = 0;
-function ok(name, cond) {
-  if (cond) {
-    console.log(`  ok  ${name}`);
-  } else {
-    failures++;
-    console.error(`FAIL  ${name}`);
-  }
-}
-function eq(name, got, want) {
-  const g = JSON.stringify(got);
-  const w = JSON.stringify(want);
-  ok(`${name}  (got ${g})`, g === w);
-}
-
-// --- dataFromEventBlock ---------------------------------------------------
-eq(
-  'extracts JSON payload from a data: line',
-  dataFromEventBlock('data: {"kind":"text","text":"hi"}'),
-  '{"kind":"text","text":"hi"}'
-);
-eq(
-  'strips exactly one space after the colon',
-  dataFromEventBlock('data:  leading-space-kept'),
-  ' leading-space-kept'
-);
-eq('ignores comment/heartbeat lines', dataFromEventBlock(': keep-alive'), null);
-eq(
-  'joins multi-line data fields with newline',
-  dataFromEventBlock('data: line1\ndata: line2'),
-  'line1\nline2'
-);
-
-// --- SSEFrameSplitter: whole frames --------------------------------------
-{
-  const s = new SSEFrameSplitter();
-  const blocks = s.push('data: {"kind":"session","session_id":"abc"}\n\n');
-  eq('one complete frame yields one block', blocks, [
-    'data: {"kind":"session","session_id":"abc"}',
-  ]);
-}
-
-// --- SSEFrameSplitter: multiple frames in one chunk ----------------------
-{
-  const s = new SSEFrameSplitter();
-  const blocks = s.push(
-    'data: {"kind":"text","text":"a"}\n\ndata: {"kind":"text","text":"b"}\n\n'
-  );
-  eq('two frames in one chunk yield two blocks', blocks.length, 2);
-  eq('first block', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"a"}');
-  eq('second block', dataFromEventBlock(blocks[1]), '{"kind":"text","text":"b"}');
-}
-
-// --- SSEFrameSplitter: frame split across chunks -------------------------
-{
-  const s = new SSEFrameSplitter();
-  let blocks = s.push('data: {"kind":"te');
-  eq('partial frame yields nothing yet', blocks, []);
-  blocks = s.push('xt","text":"split"}\n\n');
-  eq('completing the frame yields it whole', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"split"}');
-}
-
-// --- SSEFrameSplitter: delimiter split across chunks ---------------------
-{
-  const s = new SSEFrameSplitter();
-  let blocks = s.push('data: {"kind":"done"}\n');
-  eq('frame held while delimiter incomplete', blocks, []);
-  blocks = s.push('\n');
-  eq('frame released once blank line completes', dataFromEventBlock(blocks[0]), '{"kind":"done"}');
-}
-
-// --- SSEFrameSplitter: CRLF delimiters -----------------------------------
-{
-  const s = new SSEFrameSplitter();
-  const blocks = s.push('data: {"kind":"text","text":"crlf"}\r\n\r\n');
-  eq('CRLF-delimited frame parses', dataFromEventBlock(blocks[0]), '{"kind":"text","text":"crlf"}');
-}
-
-// --- end-to-end via readEventStream over a mock streaming Response --------
-function mockResponse(chunks) {
-  const enc = new TextEncoder();
-  let i = 0;
-  return {
-    ok: true,
-    status: 200,
-    body: {
-      getReader() {
-        return {
-          read() {
-            if (i < chunks.length) {
-              return Promise.resolve({ value: enc.encode(chunks[i++]), done: false });
-            }
-            return Promise.resolve({ value: undefined, done: true });
-          },
-          releaseLock() {},
-        };
-      },
-    },
-  };
-}
-
-await (async () => {
-  // A realistic turn, deliberately chopped at ugly boundaries:
-  //  - the session frame split mid-JSON
-  //  - two text frames glued together
-  //  - a tool frame
-  //  - a result frame and the terminal done frame in one chunk
-  const chunks = [
-    'data: {"kind":"sess',
-    'ion","session_id":"S1"}\n\n',
-    'data: {"kind":"text","text":"checking "}\n\ndata: {"kind":"text","text":"disk"}\n\n',
-    'data: {"kind":"tool","name":"Bash","input":{"command":"df -h"}}\n\n',
-    'data: {"kind":"result","is_error":false,"result":"ok","duration_ms":12}\n\ndata: {"kind":"done"}\n\n',
-  ];
-  const events = [];
-  await readEventStream(mockResponse(chunks), (e) => events.push(e));
-
-  eq('event count', events.length, 6);
-  eq('1: session id', events[0], { kind: 'session', session_id: 'S1' });
-  eq('2: first text', events[1], { kind: 'text', text: 'checking ' });
-  eq('3: second text', events[2], { kind: 'text', text: 'disk' });
-  eq('4: tool kind+name', { kind: events[3].kind, name: events[3].name }, { kind: 'tool', name: 'Bash' });
-  eq('4: tool command', events[3].input.command, 'df -h');
-  eq('5: result', events[4], { kind: 'result', is_error: false, result: 'ok', duration_ms: 12 });
-  eq('6: done terminal', events[5], { kind: 'done' });
-})();
-
-// malformed frame in the middle must be skipped, not abort the stream
-await (async () => {
-  const chunks = [
-    'data: {"kind":"text","text":"before"}\n\n',
-    'data: {this is not json}\n\n',
-    'data: {"kind":"done"}\n\n',
-  ];
-  const events = [];
-  await readEventStream(mockResponse(chunks), (e) => events.push(e));
-  eq('malformed frame skipped, stream continues', events.map((e) => e.kind), ['text', 'done']);
-})();
-
-if (failures) {
-  console.error(`\n${failures} assertion(s) FAILED`);
-  process.exit(1);
-}
-console.log('\nall SSE parser assertions passed');
--- a/frontend/src/lib/transcript.js
+++ b/frontend/src/lib/transcript.js
@ -0,0 +1,196 @@
+// transcript.js — the load-bearing core of the breakglass UI.
+//
+// The attach stream (EventSource) replays the conversation-so-far and then
+// tails live. Replayed events are byte-identical to live ones, and on a
+// reconnect the server re-replays from Last-Event-ID — so the SAME event id can
+// arrive more than once. This module folds a flat, possibly-duplicated event
+// sequence into an ordered list of render-ready messages, idempotently.
+//
+// Contract (every default `message` event's .data is one of these JSON shapes):
+//   {kind:"user",      text, id}            → opens a USER bubble
+//   {kind:"session",   session_id, id}      → informational (agent's session id)
+//   {kind:"text",      text, id}            → assistant prose; concatenated
+//   {kind:"tool",      name, input, id}     → inline tool chip (Bash → command)
+//   {kind:"result",    is_error, result, duration_ms, id} → closes the bubble
+//   {kind:"error",     error, id}           → error note on the bubble
+//   {kind:"cancelled", id}                  → muted "stopped" note
+//   {kind:"turn_end",  id}                  → the turn finished
+//
+// Grouping: a `user` event opens a user message; the session/text/tool events
+// that follow build ONE assistant message; result/error/cancelled annotate it;
+// turn_end ends it. Assistant events with no preceding user (e.g. a session
+// banner on a fresh attach) still get an assistant message so nothing is lost.
+//
+// Idempotency: every event carries a monotonic integer-ish id. We track the
+// max id folded so far and DROP any event whose id we've already passed — a
+// reconnect replay therefore never double-renders. Ids are compared
+// numerically when both parse as numbers, else as strings (defensive).
+
+/** @typedef {{type:'text',text:string}|{type:'tool',name:string,command:string,raw:any}} Part */
+/**
+ * @typedef {Object} Message
+ * @property {'user'|'assistant'} role
+ * @property {string} key                 stable key for keyed {#each}
+ * @property {string} [text]              user text
+ * @property {Part[]} [parts]             assistant parts, in emit order
+ * @property {{is_error:boolean,text:string,duration_ms:number|null}} [result]
+ * @property {string} [error]
+ * @property {boolean} [cancelled]
+ * @property {boolean} [ended]            turn_end seen for this message
+ */
+
+/** Compare two ids; numeric when both look numeric, else lexicographic. */
+export function idGreater(a, b) {
+  const na = Number(a);
+  const nb = Number(b);
+  if (Number.isFinite(na) && Number.isFinite(nb) && `${a}`.trim() !== '' && `${b}`.trim() !== '') {
+    return na > nb;
+  }
+  return String(a) > String(b);
+}
+
+/**
+ * Create an empty transcript-folding state.
+ * @returns {{messages: Message[], maxId: any, sawId: boolean, openAssistant: Message|null, activeUserSeen: boolean}}
+ */
+export function createTranscript() {
+  return {
+    messages: [],
+    maxId: null,
+    sawId: false,
+    openAssistant: null,
+    // a turn is "active" once a user event (or local prompt) has no following
+    // turn_end; the UI reads `active` from reduceEvent's return.
+    activeUserSeen: false,
+  };
+}
+
+function bubbleKey(prefix, id, fallbackIndex) {
+  if (id != null && `${id}`.trim() !== '') return `${prefix}:${id}`;
+  return `${prefix}:idx:${fallbackIndex}`;
+}
+
+/**
+ * Should this event be applied, given the max id folded so far? Updates and
+ * returns the new max. Events WITHOUT an id are always applied (and don't move
+ * the watermark) — the protocol always carries ids, but we never drop data on a
+ * malformed frame.
+ * @returns {{apply:boolean, maxId:any}}
+ */
+export function admit(maxId, id) {
+  if (id == null || `${id}`.trim() === '') return { apply: true, maxId };
+  if (maxId == null) return { apply: true, maxId: id };
+  if (idGreater(id, maxId)) return { apply: true, maxId: id };
+  return { apply: false, maxId }; // already seen — dedupe
+}
+
+/**
+ * Fold one event into the transcript state, mutating `state` in place.
+ * Returns true if the state changed (so callers can trigger a re-render).
+ *
+ * @param {ReturnType<typeof createTranscript>} state
+ * @param {any} ev parsed event object ({kind, id, ...})
+ * @returns {boolean} changed
+ */
+export function reduceEvent(state, ev) {
+  if (!ev || typeof ev !== 'object') return false;
+  const { apply, maxId } = admit(state.maxId, ev.id);
+  state.maxId = maxId;
+  if (!apply) return false;
+  if (ev.id != null && `${ev.id}`.trim() !== '') state.sawId = true;
+
+  const ensureAssistant = () => {
+    if (!state.openAssistant) {
+      const msg = {
+        role: 'assistant',
+        key: bubbleKey('a', ev.id, state.messages.length),
+        parts: [],
+        ended: false,
+      };
+      state.messages.push(msg);
+      state.openAssistant = msg;
+    }
+    return state.openAssistant;
+  };
+
+  switch (ev.kind) {
+    case 'user': {
+      // A new user turn. Close any dangling assistant bubble first.
+      state.openAssistant = null;
+      state.messages.push({
+        role: 'user',
+        key: bubbleKey('u', ev.id, state.messages.length),
+        text: typeof ev.text === 'string' ? ev.text : '',
+      });
+      state.activeUserSeen = true;
+      return true;
+    }
+    case 'session': {
+      // Informational — does not itself render a part, but it does open the
+      // assistant bubble for the turn so subsequent text lands in one place.
+      ensureAssistant();
+      return true;
+    }
+    case 'text': {
+      if (typeof ev.text !== 'string' || ev.text === '') return false;
+      const msg = ensureAssistant();
+      const tail = msg.parts[msg.parts.length - 1];
+      if (tail && tail.type === 'text') {
+        tail.text += ev.text; // concatenate consecutive prose
+      } else {
+        msg.parts.push({ type: 'text', text: ev.text });
+      }
+      return true;
+    }
+    case 'tool': {
+      const msg = ensureAssistant();
+      const command =
+        ev.input && typeof ev.input.command === 'string' ? ev.input.command : '';
+      msg.parts.push({
+        type: 'tool',
+        name: typeof ev.name === 'string' && ev.name ? ev.name : 'tool',
+        command,
+        raw: ev.input ?? null,
+      });
+      return true;
+    }
+    case 'result': {
+      const msg = ensureAssistant();
+      msg.result = {
+        is_error: Boolean(ev.is_error),
+        text: typeof ev.result === 'string' ? ev.result : '',
+        duration_ms: typeof ev.duration_ms === 'number' ? ev.duration_ms : null,
+      };
+      return true;
+    }
+    case 'error': {
+      const msg = ensureAssistant();
+      msg.error = typeof ev.error === 'string' && ev.error ? ev.error : 'unknown error';
+      return true;
+    }
+    case 'cancelled': {
+      const msg = ensureAssistant();
+      msg.cancelled = true;
+      return true;
+    }
+    case 'turn_end': {
+      if (state.openAssistant) state.openAssistant.ended = true;
+      state.openAssistant = null;
+      state.activeUserSeen = false;
+      return true;
+    }
+    default:
+      return false;
+  }
+}
+
+/**
+ * Convenience: fold an array of events into a fresh transcript (used by tests
+ * and by a from-scratch render). Returns the final state.
+ * @param {any[]} events
+ */
+export function foldAll(events) {
+  const state = createTranscript();
+  for (const ev of events) reduceEvent(state, ev);
+  return state;
+}
--- a/frontend/src/lib/transcript.test.mjs
+++ b/frontend/src/lib/transcript.test.mjs
@ -0,0 +1,162 @@
+// Standalone test of the transcript folder — no test framework, just node.
+// Run: node src/lib/transcript.test.mjs   (exits non-zero on any failure)
+//
+// These pin the attach-model contract: events carry monotonic ids, a reconnect
+// re-replays already-seen ids (which MUST be deduped), and events group into
+// user/assistant messages with consecutive prose concatenated.
+import {
+  admit,
+  idGreater,
+  reduceEvent,
+  createTranscript,
+  foldAll,
+} from './transcript.js';
+
+let failures = 0;
+function ok(name, cond) {
+  if (cond) {
+    console.log(`  ok  ${name}`);
+  } else {
+    failures++;
+    console.error(`FAIL  ${name}`);
+  }
+}
+function eq(name, got, want) {
+  const g = JSON.stringify(got);
+  const w = JSON.stringify(want);
+  ok(`${name}  (got ${g})`, g === w);
+}
+
+// --- id comparison --------------------------------------------------------
+ok('idGreater numeric', idGreater(10, 9) === true);
+ok('idGreater numeric not', idGreater(2, 10) === false); // not string "2" > "10"
+ok('idGreater string fallback', idGreater('b', 'a') === true);
+
+// --- admit / dedupe watermark --------------------------------------------
+{
+  let { apply, maxId } = admit(null, 1);
+  eq('first id admitted', { apply, maxId }, { apply: true, maxId: 1 });
+  ({ apply, maxId } = admit(5, 5));
+  ok('equal id rejected (already seen)', apply === false && maxId === 5);
+  ({ apply, maxId } = admit(5, 3));
+  ok('lower id rejected', apply === false && maxId === 5);
+  ({ apply, maxId } = admit(5, 6));
+  ok('higher id admitted, watermark moves', apply === true && maxId === 6);
+  ({ apply, maxId } = admit(5, undefined));
+  ok('id-less event always admitted, watermark held', apply === true && maxId === 5);
+}
+
+// --- a full turn groups into user + one assistant bubble ------------------
+{
+  const events = [
+    { kind: 'user', text: 'triage it', id: 1 },
+    { kind: 'session', session_id: 'S1', id: 2 },
+    { kind: 'text', text: 'Checking ', id: 3 },
+    { kind: 'text', text: 'disk usage.', id: 4 },
+    { kind: 'tool', name: 'Bash', input: { command: 'df -h' }, id: 5 },
+    { kind: 'result', is_error: false, result: 'ok', duration_ms: 1200, id: 6 },
+    { kind: 'turn_end', id: 7 },
+  ];
+  const s = foldAll(events);
+  eq('two messages: user + assistant', s.messages.length, 2);
+  eq('first is user with text', { r: s.messages[0].role, t: s.messages[0].text }, { r: 'user', t: 'triage it' });
+  const a = s.messages[1];
+  eq('assistant role', a.role, 'assistant');
+  // consecutive text concatenated into ONE part; tool is a separate part
+  eq('parts: one concatenated text + one tool', a.parts.map((p) => p.type), ['text', 'tool']);
+  eq('prose concatenated in order', a.parts[0].text, 'Checking disk usage.');
+  eq('tool command captured', a.parts[1].command, 'df -h');
+  eq('result attached', { e: a.result.is_error, ms: a.result.duration_ms }, { e: false, ms: 1200 });
+  ok('turn ended', a.ended === true);
+  ok('no longer active after turn_end', s.activeUserSeen === false);
+}
+
+// --- reconnect replay: re-feeding the SAME events must NOT double-render --
+{
+  const events = [
+    { kind: 'user', text: 'hi', id: 1 },
+    { kind: 'text', text: 'hello', id: 2 },
+    { kind: 'turn_end', id: 3 },
+  ];
+  const s = createTranscript();
+  for (const e of events) reduceEvent(s, e);
+  // simulate an EventSource reconnect that re-replays everything from the top
+  for (const e of events) reduceEvent(s, e);
+  eq('still exactly two messages after replay', s.messages.length, 2);
+  eq('assistant prose not doubled', s.messages[1].parts[0].text, 'hello');
+}
+
+// --- a partial replay (Last-Event-ID resume) continues the same bubble ----
+{
+  const s = createTranscript();
+  reduceEvent(s, { kind: 'user', text: 'go', id: 1 });
+  reduceEvent(s, { kind: 'text', text: 'part-A ', id: 2 });
+  // reconnect: server resumes after id 2; we must drop id<=2 if re-sent and
+  // keep appending to the open assistant bubble.
+  reduceEvent(s, { kind: 'text', text: 'part-A ', id: 2 }); // dup, dropped
+  reduceEvent(s, { kind: 'text', text: 'part-B', id: 3 }); // new, appended
+  reduceEvent(s, { kind: 'turn_end', id: 4 });
+  eq('resume appended to same bubble', s.messages[1].parts[0].text, 'part-A part-B');
+  eq('still two messages', s.messages.length, 2);
+}
+
+// --- error / cancelled annotate the open bubble ---------------------------
+{
+  const s = foldAll([
+    { kind: 'user', text: 'x', id: 1 },
+    { kind: 'text', text: 'working', id: 2 },
+    { kind: 'error', error: 'ssh timeout', id: 3 },
+    { kind: 'turn_end', id: 4 },
+  ]);
+  eq('error note on assistant bubble', s.messages[1].error, 'ssh timeout');
+}
+{
+  const s = foldAll([
+    { kind: 'user', text: 'x', id: 1 },
+    { kind: 'cancelled', id: 2 },
+    { kind: 'turn_end', id: 3 },
+  ]);
+  ok('cancelled flag on assistant bubble', s.messages[1].cancelled === true);
+}
+
+// --- active state: a user event with no turn_end means a turn is running ---
+{
+  const s = createTranscript();
+  reduceEvent(s, { kind: 'user', text: 'go', id: 1 });
+  reduceEvent(s, { kind: 'text', text: '...', id: 2 });
+  ok('active while no turn_end', s.activeUserSeen === true);
+  reduceEvent(s, { kind: 'turn_end', id: 3 });
+  ok('inactive after turn_end', s.activeUserSeen === false);
+}
+
+// --- assistant-only stream (session banner on a fresh attach) still renders -
+{
+  const s = foldAll([
+    { kind: 'session', session_id: 'S1', id: 1 },
+    { kind: 'text', text: 'standing by', id: 2 },
+    { kind: 'turn_end', id: 3 },
+  ]);
+  eq('lone assistant message created', s.messages.length, 1);
+  eq('assistant prose present', s.messages[0].parts[0].text, 'standing by');
+}
+
+// --- two sequential turns produce two assistant bubbles -------------------
+{
+  const s = foldAll([
+    { kind: 'user', text: 'q1', id: 1 },
+    { kind: 'text', text: 'a1', id: 2 },
+    { kind: 'turn_end', id: 3 },
+    { kind: 'user', text: 'q2', id: 4 },
+    { kind: 'text', text: 'a2', id: 5 },
+    { kind: 'turn_end', id: 6 },
+  ]);
+  eq('four messages (u,a,u,a)', s.messages.map((m) => m.role), ['user', 'assistant', 'user', 'assistant']);
+  eq('second answer in its own bubble', s.messages[3].parts[0].text, 'a2');
+  ok('message keys are unique', new Set(s.messages.map((m) => m.key)).size === 4);
+}
+
+if (failures) {
+  console.error(`\n${failures} assertion(s) FAILED`);
+  process.exit(1);
+}
+console.log('\nall transcript assertions passed');
--- a/tests/conftest.py
+++ b/tests/conftest.py
@ -43,3 +43,186 @@ def drain():
                break
            await asyncio.sleep(0.01)
    return _drain
+
+
+# --------------------------------------------------------------------------- #
+# AFK loop fixtures.
+#
+# Shared factories + in-memory fakes for the app.afk modules. EVERYTHING the AFK
+# tests touch is faked here — no test ever reaches a real T3 server, GitHub /
+# Forgejo, or the cluster. The fakes implement the module interfaces from the
+# contract and record their calls so tests can assert on them.
+# --------------------------------------------------------------------------- #
+from app.afk.types import (  # noqa: E402  (after the env setup above, like app_main)
+    CIStatus,
+    Config,
+    Issue,
+    RunState,
+    ThreadStatus,
+)
+
+
+@pytest.fixture
+def make_issue():
+    """Factory for ``Issue``. Defaults to a clean, dispatchable issue (trusted
+    label, nothing blocking); override any field per test."""
+    def _make(
+        number: int = 1,
+        repo: str = "infra",
+        labels: list[str] | None = None,
+        blocked_by: list[int] | None = None,
+        labeled_by_trusted: bool = True,
+        priority: int = 0,
+    ) -> Issue:
+        return Issue(
+            number=number,
+            repo=repo,
+            labels=["ready-for-agent"] if labels is None else labels,
+            blocked_by=[] if blocked_by is None else blocked_by,
+            labeled_by_trusted=labeled_by_trusted,
+            priority=priority,
+        )
+    return _make
+
+
+@pytest.fixture
+def make_config():
+    """Factory for ``Config``. Defaults to an ENABLED config (kill switch off,
+    a one-repo allowlist) so policy/state-machine tests exercise real behaviour;
+    the disabled production default is covered separately in the config tests."""
+    def _make(
+        allowlist: list[str] | None = None,
+        kill_switch: bool = False,
+        **overrides,
+    ) -> Config:
+        return Config(
+            allowlist=["infra"] if allowlist is None else allowlist,
+            kill_switch=kill_switch,
+            **overrides,
+        )
+    return _make
+
+
+@pytest.fixture
+def make_run_state():
+    """Factory for ``RunState``. Defaults to a freshly-dispatched run (thread
+    running, nothing pushed, no CI, no fix-forward attempts yet)."""
+    def _make(
+        thread_status: ThreadStatus | None = ThreadStatus.RUNNING,
+        ci_status: CIStatus | None = None,
+        pushed: bool = False,
+        fix_forward_attempts: int = 0,
+        elapsed_seconds: float = 0.0,
+    ) -> RunState:
+        return RunState(
+            thread_status=thread_status,
+            ci_status=ci_status,
+            pushed=pushed,
+            fix_forward_attempts=fix_forward_attempts,
+            elapsed_seconds=elapsed_seconds,
+        )
+    return _make
+
+
+class FakeT3Client:
+    """In-memory stand-in for ``t3_client.T3Client``. Records each dispatch and
+    hands back a deterministic thread id; ``snapshot`` returns whatever was
+    staged via ``set_snapshot``."""
+
+    def __init__(self) -> None:
+        self.dispatched: list[dict] = []
+        self._snapshot: dict = {"threads": []}
+        self._next_id = 0
+
+    def dispatch(self, repo: str, issue: int, prompt: str) -> str:
+        thread_id = f"thread-{self._next_id}"
+        self._next_id += 1
+        self.dispatched.append(
+            {"repo": repo, "issue": issue, "prompt": prompt, "thread_id": thread_id}
+        )
+        return thread_id
+
+    def snapshot(self) -> dict:
+        return self._snapshot
+
+    def set_snapshot(self, snapshot: dict) -> None:
+        self._snapshot = snapshot
+
+
+class FakeTracker:
+    """In-memory stand-in for ``tracker.Tracker``. ``list_ready`` returns issues
+    staged via ``seed``; label/comment/close just record their calls."""
+
+    def __init__(self) -> None:
+        self._ready: dict[str, list[Issue]] = {}
+        self.label_ops: list[tuple[str, str, int, str]] = []  # (op, repo, issue, label)
+        self.comments: list[tuple[str, int, str]] = []
+        self.closed: list[tuple[str, int]] = []
+
+    def seed(self, repo: str, issues: list[Issue]) -> None:
+        self._ready[repo] = issues
+
+    def list_ready(self, repos: list[str]) -> list[Issue]:
+        out: list[Issue] = []
+        for repo in repos:
+            out.extend(self._ready.get(repo, []))
+        return out
+
+    def add_label(self, repo: str, issue: int, label: str) -> None:
+        self.label_ops.append(("add", repo, issue, label))
+
+    def remove_label(self, repo: str, issue: int, label: str) -> None:
+        self.label_ops.append(("remove", repo, issue, label))
+
+    def comment(self, repo: str, issue: int, body: str) -> None:
+        self.comments.append((repo, issue, body))
+
+    def close(self, repo: str, issue: int) -> None:
+        self.closed.append((repo, issue))
+
+
+class FakeCIWatcher:
+    """In-memory stand-in for ``ci_watcher.CIWatcher``. Returns the status staged
+    per ``(repo, commit)`` via ``set_status``; unknown commits read PENDING."""
+
+    def __init__(self) -> None:
+        self._statuses: dict[tuple[str, str], CIStatus] = {}
+
+    def set_status(self, repo: str, commit: str, status: CIStatus) -> None:
+        self._statuses[(repo, commit)] = status
+
+    def status(self, repo: str, commit: str) -> CIStatus:
+        return self._statuses.get((repo, commit), CIStatus.PENDING)
+
+
+class FakeNotifier:
+    """In-memory stand-in for ``notifier.Notifier``. Records every notification
+    so tests can assert escalations fired with the right kind/detail."""
+
+    def __init__(self) -> None:
+        self.sent: list[dict] = []
+
+    def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
+        self.sent.append(
+            {"kind": kind, "issue": issue, "thread_id": thread_id, "detail": detail}
+        )
+
+
+@pytest.fixture
+def fake_t3() -> FakeT3Client:
+    return FakeT3Client()
+
+
+@pytest.fixture
+def fake_tracker() -> FakeTracker:
+    return FakeTracker()
+
+
+@pytest.fixture
+def fake_ci() -> FakeCIWatcher:
+    return FakeCIWatcher()
+
+
+@pytest.fixture
+def fake_notifier() -> FakeNotifier:
+    return FakeNotifier()
--- a/tests/test_afk_ci_watcher.py
+++ b/tests/test_afk_ci_watcher.py
@ -0,0 +1,285 @@
+"""Tests for ``app.afk.ci_watcher`` — the commit → ``CIStatus`` adapter.
+
+The watcher folds two independent signals into one verdict the state machine
+reads: the **GHA run** for a pushed commit (build/test/lint) and the
+**deploy/rollout** that reaches the cluster (Woodpecker pipeline → Keel/k8s
+rollout). The CI/CD chain is GHA → ghcr → Woodpecker → Keel
+(``docs/2026-06-14-afk-implementation-pipeline-design.md``), so a commit is only
+truly GREEN once *both* the build passed AND its image actually rolled out.
+
+Every test injects FAKE clients — no test ever shells out to ``gh``,
+``woodpecker``, or ``kubectl``, or reaches the network. The fakes implement the
+``ci_watcher`` client Protocols and return staged ``StageResult`` values per
+``(repo, commit)``; the watcher's only job is to query them and fold the result,
+so the folding table is what these tests pin.
+"""
+import pytest
+
+from app.afk.ci_watcher import (
+    CIWatcher,
+    StageResult,
+)
+from app.afk.types import CIStatus
+
+
+# --------------------------------------------------------------------------- #
+# Fakes for the three injected clients.
+#
+# Each maps (repo, commit) → StageResult and records every query, so tests can
+# assert both the folded verdict AND that short-circuiting skips later stages
+# (a RED build must not even ask the rollout client).
+# --------------------------------------------------------------------------- #
+class _FakeStageClient:
+    """A recording stand-in for any of the three stage clients. ``default`` is
+    returned for an unstaged ``(repo, commit)`` — defaults to ``PENDING`` so an
+    un-seeded stage reads "not done yet", never a false GREEN."""
+
+    def __init__(self, default: StageResult = StageResult.PENDING) -> None:
+        self._results: dict[tuple[str, str], StageResult] = {}
+        self._default = default
+        self.queries: list[tuple[str, str]] = []
+
+    def set(self, repo: str, commit: str, result: StageResult) -> None:
+        self._results[(repo, commit)] = result
+
+    def _lookup(self, repo: str, commit: str) -> StageResult:
+        self.queries.append((repo, commit))
+        return self._results.get((repo, commit), self._default)
+
+
+class FakeGitHubChecks(_FakeStageClient):
+    def run_conclusion(self, repo: str, commit: str) -> StageResult:
+        return self._lookup(repo, commit)
+
+
+class FakeWoodpecker(_FakeStageClient):
+    def deploy_conclusion(self, repo: str, commit: str) -> StageResult:
+        return self._lookup(repo, commit)
+
+
+class FakeRollout(_FakeStageClient):
+    def rollout_status(self, repo: str, commit: str) -> StageResult:
+        return self._lookup(repo, commit)
+
+
+# --------------------------------------------------------------------------- #
+# Fixtures.
+# --------------------------------------------------------------------------- #
+REPO = "infra"
+COMMIT = "deadbeefcafe"
+
+
+@pytest.fixture
+def gha() -> FakeGitHubChecks:
+    return FakeGitHubChecks()
+
+
+@pytest.fixture
+def woodpecker() -> FakeWoodpecker:
+    return FakeWoodpecker()
+
+
+@pytest.fixture
+def rollout() -> FakeRollout:
+    return FakeRollout()
+
+
+@pytest.fixture
+def watcher(gha, woodpecker, rollout) -> CIWatcher:
+    return CIWatcher(github=gha, woodpecker=woodpecker, rollout=rollout)
+
+
+def _stage_all(gha, woodpecker, rollout, *, build, deploy, roll) -> None:
+    """Stage all three clients for the canonical ``(REPO, COMMIT)`` at once."""
+    gha.set(REPO, COMMIT, build)
+    woodpecker.set(REPO, COMMIT, deploy)
+    rollout.set(REPO, COMMIT, roll)
+
+
+# --------------------------------------------------------------------------- #
+# StageResult vocabulary.
+# --------------------------------------------------------------------------- #
+def test_stageresult_has_the_four_outcomes():
+    assert {s.name for s in StageResult} == {"NONE", "PENDING", "SUCCESS", "FAILURE"}
+
+
+# --------------------------------------------------------------------------- #
+# The happy path: every stage green ⇒ GREEN.
+# --------------------------------------------------------------------------- #
+def test_all_stages_success_is_green(watcher, gha, woodpecker, rollout):
+    _stage_all(gha, woodpecker, rollout,
+               build=StageResult.SUCCESS,
+               deploy=StageResult.SUCCESS,
+               roll=StageResult.SUCCESS)
+    assert watcher.status(REPO, COMMIT) is CIStatus.GREEN
+
+
+# --------------------------------------------------------------------------- #
+# GHA build stage gates everything below it.
+# --------------------------------------------------------------------------- #
+def test_build_failure_is_red(watcher, gha):
+    gha.set(REPO, COMMIT, StageResult.FAILURE)
+    assert watcher.status(REPO, COMMIT) is CIStatus.RED
+
+
+@pytest.mark.parametrize("build", [StageResult.NONE, StageResult.PENDING])
+def test_build_not_yet_concluded_is_pending(watcher, gha, build):
+    # No run yet (NONE) and in-progress (PENDING) both read PENDING — the state
+    # machine waits on either.
+    gha.set(REPO, COMMIT, build)
+    assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
+
+
+def test_build_failure_short_circuits_before_deploy_and_rollout(
+    watcher, gha, woodpecker, rollout
+):
+    gha.set(REPO, COMMIT, StageResult.FAILURE)
+    # Even if later stages would (nonsensically) be green, a red build wins...
+    woodpecker.set(REPO, COMMIT, StageResult.SUCCESS)
+    rollout.set(REPO, COMMIT, StageResult.SUCCESS)
+    assert watcher.status(REPO, COMMIT) is CIStatus.RED
+    # ...and the later clients are never even queried.
+    assert woodpecker.queries == []
+    assert rollout.queries == []
+
+
+def test_build_pending_short_circuits_before_deploy_and_rollout(
+    watcher, gha, woodpecker, rollout
+):
+    gha.set(REPO, COMMIT, StageResult.PENDING)
+    assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
+    assert woodpecker.queries == []
+    assert rollout.queries == []
+
+
+# --------------------------------------------------------------------------- #
+# Deploy (Woodpecker) stage — only consulted once the build is green.
+# --------------------------------------------------------------------------- #
+def test_deploy_failure_is_red_even_with_green_build(watcher, gha, woodpecker):
+    gha.set(REPO, COMMIT, StageResult.SUCCESS)
+    woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
+    assert watcher.status(REPO, COMMIT) is CIStatus.RED
+
+
+@pytest.mark.parametrize("deploy", [StageResult.NONE, StageResult.PENDING])
+def test_deploy_not_yet_concluded_is_pending(watcher, gha, woodpecker, deploy):
+    gha.set(REPO, COMMIT, StageResult.SUCCESS)
+    woodpecker.set(REPO, COMMIT, deploy)
+    assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
+
+
+def test_deploy_failure_short_circuits_before_rollout(
+    watcher, gha, woodpecker, rollout
+):
+    gha.set(REPO, COMMIT, StageResult.SUCCESS)
+    woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
+    rollout.set(REPO, COMMIT, StageResult.SUCCESS)
+    assert watcher.status(REPO, COMMIT) is CIStatus.RED
+    assert rollout.queries == []
+    # The build WAS consulted (it had to pass to reach deploy).
+    assert gha.queries == [(REPO, COMMIT)]
+
+
+# --------------------------------------------------------------------------- #
+# Rollout stage — the final gate. Green build + green deploy is still only
+# PENDING until the image actually reaches the cluster.
+# --------------------------------------------------------------------------- #
+def test_rollout_failure_is_red(watcher, gha, woodpecker, rollout):
+    _stage_all(gha, woodpecker, rollout,
+               build=StageResult.SUCCESS,
+               deploy=StageResult.SUCCESS,
+               roll=StageResult.FAILURE)
+    assert watcher.status(REPO, COMMIT) is CIStatus.RED
+
+
+@pytest.mark.parametrize("roll", [StageResult.NONE, StageResult.PENDING])
+def test_green_build_and_deploy_but_unfinished_rollout_is_pending(
+    watcher, gha, woodpecker, rollout, roll
+):
+    _stage_all(gha, woodpecker, rollout,
+               build=StageResult.SUCCESS,
+               deploy=StageResult.SUCCESS,
+               roll=roll)
+    assert watcher.status(REPO, COMMIT) is CIStatus.PENDING
+
+
+def test_green_requires_all_three_stages_consulted(
+    watcher, gha, woodpecker, rollout
+):
+    _stage_all(gha, woodpecker, rollout,
+               build=StageResult.SUCCESS,
+               deploy=StageResult.SUCCESS,
+               roll=StageResult.SUCCESS)
+    assert watcher.status(REPO, COMMIT) is CIStatus.GREEN
+    assert gha.queries == [(REPO, COMMIT)]
+    assert woodpecker.queries == [(REPO, COMMIT)]
+    assert rollout.queries == [(REPO, COMMIT)]
+
+
+# --------------------------------------------------------------------------- #
+# Plumbing: the commit and repo are passed through verbatim to every client,
+# and an entirely un-seeded commit reads PENDING (not GREEN, not RED).
+# --------------------------------------------------------------------------- #
+def test_repo_and_commit_passed_through_to_clients(watcher, gha):
+    gha.set("realestate-crawler", "abc123", StageResult.FAILURE)
+    assert watcher.status("realestate-crawler", "abc123") is CIStatus.RED
+    assert gha.queries == [("realestate-crawler", "abc123")]
+
+
+def test_unknown_commit_defaults_to_pending(watcher):
+    # Nothing staged anywhere ⇒ the build stage reads PENDING by default ⇒ the
+    # whole verdict is PENDING. A never-pushed/just-pushed commit is never a
+    # false GREEN.
+    assert watcher.status(REPO, "never-seen") is CIStatus.PENDING
+
+
+# --------------------------------------------------------------------------- #
+# The default rollout client is OPTIONAL — per the pilot facts, state.sqlite /
+# kubectl reads are optional, so a CIWatcher built without a rollout client must
+# still work, treating "build green + deploy green" as the terminal GREEN.
+# --------------------------------------------------------------------------- #
+def test_rollout_client_is_optional_deploy_green_is_green(gha, woodpecker):
+    w = CIWatcher(github=gha, woodpecker=woodpecker)  # no rollout client
+    gha.set(REPO, COMMIT, StageResult.SUCCESS)
+    woodpecker.set(REPO, COMMIT, StageResult.SUCCESS)
+    assert w.status(REPO, COMMIT) is CIStatus.GREEN
+
+
+def test_rollout_client_optional_still_honours_build_and_deploy_failures(
+    gha, woodpecker
+):
+    w = CIWatcher(github=gha, woodpecker=woodpecker)
+    gha.set(REPO, COMMIT, StageResult.SUCCESS)
+    woodpecker.set(REPO, COMMIT, StageResult.FAILURE)
+    assert w.status(REPO, COMMIT) is CIStatus.RED
+
+
+# --------------------------------------------------------------------------- #
+# Full folding table — exhaustive over (build, deploy, rollout) so the
+# precedence rules (FAILURE short-circuits red; otherwise any PENDING/NONE keeps
+# it pending; all-success ⇒ green) can never silently drift.
+# --------------------------------------------------------------------------- #
+_N, _P, _S, _F = (
+    StageResult.NONE,
+    StageResult.PENDING,
+    StageResult.SUCCESS,
+    StageResult.FAILURE,
+)
+
+
+def _expected(build: StageResult, deploy: StageResult, roll: StageResult) -> CIStatus:
+    # Reference fold, independent of the implementation, evaluated stage by stage.
+    for stage in (build, deploy, roll):
+        if stage is _F:
+            return CIStatus.RED
+        if stage in (_N, _P):
+            return CIStatus.PENDING
+    return CIStatus.GREEN
+
+
+@pytest.mark.parametrize("build", [_N, _P, _S, _F])
+@pytest.mark.parametrize("deploy", [_N, _P, _S, _F])
+@pytest.mark.parametrize("roll", [_N, _P, _S, _F])
+def test_full_folding_table(watcher, gha, woodpecker, rollout, build, deploy, roll):
+    _stage_all(gha, woodpecker, rollout, build=build, deploy=deploy, roll=roll)
+    assert watcher.status(REPO, COMMIT) is _expected(build, deploy, roll)
--- a/tests/test_afk_dispatch_policy.py
+++ b/tests/test_afk_dispatch_policy.py
@ -0,0 +1,374 @@
+"""Tests for ``app.afk.dispatch_policy.select_dispatchable`` — the pure gate that
+turns a pile of ready issues into the ordered set the loop may dispatch *now*.
+
+The function is PURE (no IO), so every test here is a plain in-memory call over
+the fakes/factories in ``conftest`` (``make_issue`` / ``make_config``); nothing
+touches a real T3 server, tracker, or cluster. The suite walks the full
+dispatchability matrix — trust gate, allowlist, per-repo lock, blocked_by,
+kill switch — plus the priority ordering and the one-agent-per-repo invariant.
+
+Ordering contract under test: **lower ``priority`` value first** (P0 before P1
+before P2 — most urgent wins), matching tracker conventions and
+``Issue.priority``'s own docstring, with a deterministic tiebreaker (ascending
+issue number) so the output is stable regardless of input order.
+"""
+import itertools
+
+import pytest
+
+from app.afk import dispatch_policy
+from app.afk.types import DispatchDecision, Issue
+
+
+# --------------------------------------------------------------------------- #
+# Helpers — keep assertions terse and intent-revealing.
+# --------------------------------------------------------------------------- #
+def _selected_numbers(decisions: list[DispatchDecision]) -> list[int]:
+    """The issue numbers, in the order the policy returned them."""
+    return [d.issue.number for d in decisions]
+
+
+def _selected_set(decisions: list[DispatchDecision]) -> set[int]:
+    return {d.issue.number for d in decisions}
+
+
+# --------------------------------------------------------------------------- #
+# Return shape & purity.
+# --------------------------------------------------------------------------- #
+def test_returns_list_of_dispatch_decisions(make_issue, make_config):
+    issue = make_issue(number=7, repo="infra")
+    decisions = dispatch_policy.select_dispatchable([issue], make_config(), set())
+    assert isinstance(decisions, list)
+    assert len(decisions) == 1
+    assert isinstance(decisions[0], DispatchDecision)
+    assert decisions[0].issue is issue
+    assert isinstance(decisions[0].reason, str) and decisions[0].reason  # non-empty
+
+
+def test_empty_input_yields_empty_output(make_config):
+    assert dispatch_policy.select_dispatchable([], make_config(), set()) == []
+
+
+def test_does_not_mutate_inputs(make_issue, make_config):
+    issues = [make_issue(number=1, priority=0), make_issue(number=2, priority=9)]
+    issues_snapshot = list(issues)
+    config = make_config(allowlist=["infra"])
+    in_flight: set[str] = set()
+
+    dispatch_policy.select_dispatchable(issues, config, in_flight)
+
+    # Caller's list (and its order) and the lock set are left untouched.
+    assert issues == issues_snapshot
+    assert [i.number for i in issues] == [1, 2]
+    assert in_flight == set()
+    assert config.allowlist == ["infra"]
+
+
+def test_decision_wraps_the_same_issue_object(make_issue, make_config):
+    issue = make_issue(number=42)
+    [decision] = dispatch_policy.select_dispatchable([issue], make_config(), set())
+    assert decision.issue is issue  # identity, not a copy
+
+
+# --------------------------------------------------------------------------- #
+# Kill switch — highest-precedence short-circuit.
+# --------------------------------------------------------------------------- #
+def test_kill_switch_returns_empty_even_with_perfect_issues(make_issue, make_config):
+    issues = [make_issue(number=n, repo="infra") for n in range(1, 6)]
+    config = make_config(allowlist=["infra"], kill_switch=True)
+    assert dispatch_policy.select_dispatchable(issues, config, set()) == []
+
+
+def test_kill_switch_off_dispatches(make_issue, make_config):
+    issue = make_issue(repo="infra")
+    config = make_config(allowlist=["infra"], kill_switch=False)
+    assert len(dispatch_policy.select_dispatchable([issue], config, set())) == 1
+
+
+def test_production_default_config_dispatches_nothing(make_issue):
+    """The shipped default (kill switch ON, empty allowlist) is inert: even a
+    pristine, trusted issue is never selected."""
+    from app.afk import config as afk_config
+
+    issue = make_issue(repo="infra")
+    assert dispatch_policy.select_dispatchable([issue], afk_config.default(), set()) == []
+
+
+# --------------------------------------------------------------------------- #
+# Trust gate.
+# --------------------------------------------------------------------------- #
+def test_untrusted_issue_is_skipped(make_issue, make_config):
+    issue = make_issue(repo="infra", labeled_by_trusted=False)
+    assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
+
+
+def test_trusted_issue_is_eligible(make_issue, make_config):
+    issue = make_issue(repo="infra", labeled_by_trusted=True)
+    assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
+
+
+def test_trust_gate_filters_only_untrusted(make_issue, make_config):
+    trusted = make_issue(number=1, repo="infra", labeled_by_trusted=True)
+    untrusted = make_issue(number=2, repo="infra", labeled_by_trusted=False)
+    decisions = dispatch_policy.select_dispatchable(
+        [trusted, untrusted], make_config(allowlist=["infra"]), set()
+    )
+    assert _selected_set(decisions) == {1}
+
+
+# --------------------------------------------------------------------------- #
+# Allowlist membership.
+# --------------------------------------------------------------------------- #
+def test_repo_not_in_allowlist_is_skipped(make_issue, make_config):
+    issue = make_issue(repo="some-other-repo")
+    assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
+
+
+def test_empty_allowlist_dispatches_nothing(make_issue, make_config):
+    issue = make_issue(repo="infra")
+    # kill switch off but allowlist empty -> still inert (the two-gate posture).
+    config = make_config(allowlist=[], kill_switch=False)
+    assert dispatch_policy.select_dispatchable([issue], config, set()) == []
+
+
+def test_allowlist_selects_only_listed_repos(make_issue, make_config):
+    a = make_issue(number=1, repo="infra")
+    b = make_issue(number=2, repo="realestate-crawler")
+    c = make_issue(number=3, repo="not-allowed")
+    decisions = dispatch_policy.select_dispatchable(
+        [a, b, c], make_config(allowlist=["infra", "realestate-crawler"]), set()
+    )
+    assert _selected_set(decisions) == {1, 2}
+
+
+# --------------------------------------------------------------------------- #
+# Per-repo lock (in_flight_repos).
+# --------------------------------------------------------------------------- #
+def test_repo_already_in_flight_is_skipped(make_issue, make_config):
+    issue = make_issue(repo="infra")
+    decisions = dispatch_policy.select_dispatchable(
+        [issue], make_config(allowlist=["infra"]), in_flight_repos={"infra"}
+    )
+    assert decisions == []
+
+
+def test_in_flight_lock_is_per_repo(make_issue, make_config):
+    locked = make_issue(number=1, repo="infra")
+    free = make_issue(number=2, repo="realestate-crawler")
+    decisions = dispatch_policy.select_dispatchable(
+        [locked, free],
+        make_config(allowlist=["infra", "realestate-crawler"]),
+        in_flight_repos={"infra"},
+    )
+    assert _selected_set(decisions) == {2}  # only the unlocked repo's issue runs
+
+
+def test_all_repos_in_flight_dispatches_nothing(make_issue, make_config):
+    a = make_issue(number=1, repo="infra")
+    b = make_issue(number=2, repo="realestate-crawler")
+    decisions = dispatch_policy.select_dispatchable(
+        [a, b],
+        make_config(allowlist=["infra", "realestate-crawler"]),
+        in_flight_repos={"infra", "realestate-crawler"},
+    )
+    assert decisions == []
+
+
+# --------------------------------------------------------------------------- #
+# One-agent-per-repo invariant — at most ONE decision per repo per call.
+#
+# The whole design serialises agents within a repo (two would collide on the
+# working tree). A single call must therefore never hand back two issues for the
+# same repo, even when both are eligible and the repo is not yet in-flight.
+# --------------------------------------------------------------------------- #
+def test_at_most_one_decision_per_repo(make_issue, make_config):
+    urgent = make_issue(number=1, repo="infra", priority=1)
+    minor = make_issue(number=2, repo="infra", priority=9)
+    decisions = dispatch_policy.select_dispatchable(
+        [urgent, minor], make_config(allowlist=["infra"]), set()
+    )
+    assert len(decisions) == 1
+    assert decisions[0].issue.number == 1  # most urgent (lowest value) wins the slot
+
+
+def test_one_decision_per_repo_across_many_repos(make_issue, make_config):
+    issues = [
+        make_issue(number=10, repo="infra", priority=1),
+        make_issue(number=11, repo="infra", priority=5),
+        make_issue(number=20, repo="realestate-crawler", priority=3),
+        make_issue(number=21, repo="realestate-crawler", priority=2),
+    ]
+    decisions = dispatch_policy.select_dispatchable(
+        issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
+    )
+    # One per repo, each the repo's most urgent (lowest-value) eligible issue:
+    # infra -> #10 (p1 < p5); realestate-crawler -> #21 (p2 < p3).
+    assert _selected_set(decisions) == {10, 21}
+    repos = [d.issue.repo for d in decisions]
+    assert len(repos) == len(set(repos))  # no repo appears twice
+
+
+def test_ineligible_higher_priority_does_not_consume_repo_slot(make_issue, make_config):
+    """A more-urgent issue that is itself ineligible (e.g. blocked) must not
+    suppress a less-urgent *eligible* issue in the same repo — the slot goes to
+    the best ELIGIBLE candidate, not merely the most urgent one."""
+    blocked_urgent = make_issue(number=1, repo="infra", priority=1, blocked_by=[99])
+    ready_minor = make_issue(number=2, repo="infra", priority=9)
+    decisions = dispatch_policy.select_dispatchable(
+        [blocked_urgent, ready_minor], make_config(allowlist=["infra"]), set()
+    )
+    assert _selected_numbers(decisions) == [2]
+
+
+# --------------------------------------------------------------------------- #
+# blocked_by gating — blocked_by holds OPEN blocker numbers.
+# --------------------------------------------------------------------------- #
+def test_blocked_issue_is_skipped(make_issue, make_config):
+    issue = make_issue(repo="infra", blocked_by=[101])
+    assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
+
+
+def test_unblocked_issue_with_empty_blocked_by_is_eligible(make_issue, make_config):
+    issue = make_issue(repo="infra", blocked_by=[])
+    assert len(dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set())) == 1
+
+
+@pytest.mark.parametrize("blockers", [[1], [1, 2], [5, 6, 7]])
+def test_any_open_blocker_blocks(make_issue, make_config, blockers):
+    issue = make_issue(repo="infra", blocked_by=blockers)
+    assert dispatch_policy.select_dispatchable([issue], make_config(allowlist=["infra"]), set()) == []
+
+
+def test_blocked_filters_only_blocked(make_issue, make_config):
+    ready = make_issue(number=1, repo="infra", blocked_by=[])
+    blocked = make_issue(number=2, repo="realestate-crawler", blocked_by=[7])
+    decisions = dispatch_policy.select_dispatchable(
+        [ready, blocked], make_config(allowlist=["infra", "realestate-crawler"]), set()
+    )
+    assert _selected_set(decisions) == {1}
+
+
+# --------------------------------------------------------------------------- #
+# Priority ordering — lower priority value first, deterministic tiebreaker.
+# --------------------------------------------------------------------------- #
+def test_lower_priority_value_first(make_issue, make_config):
+    p1 = make_issue(number=1, repo="infra", priority=1)
+    p5 = make_issue(number=2, repo="realestate-crawler", priority=5)
+    p9 = make_issue(number=3, repo="SparkyFitness", priority=9)
+    decisions = dispatch_policy.select_dispatchable(
+        [p1, p9, p5],
+        make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
+        set(),
+    )
+    assert _selected_numbers(decisions) == [1, 2, 3]  # priorities 1, 5, 9
+
+
+def test_ordering_independent_of_input_order(make_issue, make_config):
+    """Whatever order the caller supplies issues in, the dispatch order is the
+    same — sorted purely by the policy, not by arrival."""
+    base = [
+        ("infra", 10, 2),
+        ("realestate-crawler", 20, 8),
+        ("SparkyFitness", 30, 5),
+        ("health", 40, 1),
+    ]
+    allow = ["infra", "realestate-crawler", "SparkyFitness", "health"]
+    config = make_config(allowlist=allow)
+    expected = [40, 10, 30, 20]  # priorities 1,2,5,8 (most urgent first)
+
+    for perm in itertools.permutations(base):
+        issues = [make_issue(number=n, repo=r, priority=p) for (r, n, p) in perm]
+        decisions = dispatch_policy.select_dispatchable(issues, config, set())
+        assert _selected_numbers(decisions) == expected
+
+
+def test_priority_ties_break_deterministically_by_issue_number(make_issue, make_config):
+    """Equal priority across different repos -> a stable, total order. We tie-break
+    on ascending issue number so the result never depends on dict/set iteration
+    or input order."""
+    a = make_issue(number=30, repo="infra", priority=5)
+    b = make_issue(number=10, repo="realestate-crawler", priority=5)
+    c = make_issue(number=20, repo="SparkyFitness", priority=5)
+    config = make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"])
+
+    for perm in itertools.permutations([a, b, c]):
+        decisions = dispatch_policy.select_dispatchable(list(perm), config, set())
+        assert _selected_numbers(decisions) == [10, 20, 30]
+
+
+def test_negative_and_zero_priorities_order_correctly(make_issue, make_config):
+    neg = make_issue(number=1, repo="infra", priority=-5)
+    zero = make_issue(number=2, repo="realestate-crawler", priority=0)
+    pos = make_issue(number=3, repo="SparkyFitness", priority=3)
+    decisions = dispatch_policy.select_dispatchable(
+        [neg, zero, pos],
+        make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
+        set(),
+    )
+    assert _selected_numbers(decisions) == [1, 2, 3]  # -5 < 0 < 3 (most urgent first)
+
+
+# --------------------------------------------------------------------------- #
+# Reasons — human-readable, never parsed, but must be present and sensible.
+# --------------------------------------------------------------------------- #
+def test_every_decision_has_a_nonempty_reason(make_issue, make_config):
+    issues = [
+        make_issue(number=1, repo="infra", priority=3),
+        make_issue(number=2, repo="realestate-crawler", priority=1),
+    ]
+    decisions = dispatch_policy.select_dispatchable(
+        issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
+    )
+    assert decisions  # sanity
+    assert all(d.reason.strip() for d in decisions)
+
+
+# --------------------------------------------------------------------------- #
+# Combined matrix — every gate together. A single eligible needle in a haystack
+# of issues that each trip exactly one gate.
+# --------------------------------------------------------------------------- #
+def test_only_the_fully_eligible_issue_survives_all_gates(make_issue, make_config):
+    config = make_config(allowlist=["infra", "realestate-crawler"], kill_switch=False)
+    in_flight = {"realestate-crawler"}  # this repo is locked
+
+    issues = [
+        make_issue(number=1, repo="infra", priority=5),                      # ELIGIBLE
+        make_issue(number=2, repo="not-allowed", priority=9),                # allowlist
+        make_issue(number=3, repo="infra", priority=9, labeled_by_trusted=False),  # trust
+        make_issue(number=4, repo="infra", priority=9, blocked_by=[1]),      # blocked
+        make_issue(number=5, repo="realestate-crawler", priority=9),         # repo locked
+    ]
+    decisions = dispatch_policy.select_dispatchable(issues, config, in_flight)
+    assert _selected_numbers(decisions) == [1]
+    assert decisions[0].issue.repo == "infra"
+
+
+@pytest.mark.parametrize("trusted", [True, False])
+@pytest.mark.parametrize("allowed", [True, False])
+@pytest.mark.parametrize("blocked", [True, False])
+@pytest.mark.parametrize("locked", [True, False])
+@pytest.mark.parametrize("killed", [True, False])
+def test_full_eligibility_matrix(
+    make_issue, make_config, trusted, allowed, blocked, locked, killed
+):
+    """Exhaustive truth table: an issue is dispatched iff ALL gates pass and the
+    kill switch is off. 2**5 = 32 cases, single issue so ordering is moot."""
+    issue = make_issue(
+        number=1,
+        repo="infra",
+        priority=0,
+        labeled_by_trusted=trusted,
+        blocked_by=[99] if blocked else [],
+    )
+    config = make_config(
+        allowlist=["infra"] if allowed else ["other-repo"],
+        kill_switch=killed,
+    )
+    in_flight = {"infra"} if locked else set()
+
+    decisions = dispatch_policy.select_dispatchable([issue], config, in_flight)
+
+    should_dispatch = trusted and allowed and not blocked and not locked and not killed
+    assert (len(decisions) == 1) is should_dispatch
+    if should_dispatch:
+        assert decisions[0].issue is issue
--- a/tests/test_afk_notifier.py
+++ b/tests/test_afk_notifier.py
@ -0,0 +1,198 @@
+"""Tests for ``app.afk.notifier`` — the terminal-state doorbell.
+
+The notifier's whole job is to format a human-facing alert (Slack / ntfy) with a
+deep-link back to the T3 thread when a run reaches a terminal state — done,
+needs-human, or frozen — and hand it to an injected sender. Every test here
+injects a recording fake sender, so nothing is ever POSTed: we assert the
+*formatted payload* per kind, plus the deep-link, the kind vocabulary, and the
+guardrails (no thread → no link, unknown kind rejected, sender called exactly
+once with the return value being None).
+
+No real Slack/ntfy/T3 is touched — consistent with the rest of the AFK suite.
+"""
+import pytest
+
+from app.afk import notifier as notifier_mod
+from app.afk.notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN, Notification, Notifier
+from app.afk.types import Issue
+
+
+# --------------------------------------------------------------------------- #
+# A recording sender — captures the Notification instead of posting it.
+# --------------------------------------------------------------------------- #
+class RecordingSender:
+    """Injectable stand-in for the real Slack/ntfy POST. Records each payload so
+    a test can assert the formatting without any network."""
+
+    def __init__(self) -> None:
+        self.sent: list[Notification] = []
+
+    def __call__(self, notification: Notification) -> None:
+        self.sent.append(notification)
+
+
+@pytest.fixture
+def sender() -> RecordingSender:
+    return RecordingSender()
+
+
+def _issue(number: int = 42, repo: str = "infra") -> Issue:
+    return Issue(
+        number=number,
+        repo=repo,
+        labels=["ready-for-agent"],
+        blocked_by=[],
+        labeled_by_trusted=True,
+        priority=0,
+    )
+
+
+# --------------------------------------------------------------------------- #
+# Kind vocabulary — the three terminal states, and nothing else.
+# --------------------------------------------------------------------------- #
+def test_terminal_kinds_are_exactly_the_three_terminal_states():
+    assert KIND_DONE == "done"
+    assert KIND_NEEDS_HUMAN == "needs-human"
+    assert KIND_FROZEN == "frozen"
+    assert notifier_mod.TERMINAL_KINDS == {KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN}
+
+
+# --------------------------------------------------------------------------- #
+# Dispatch mechanics — sender injected, called exactly once, returns None.
+# --------------------------------------------------------------------------- #
+def test_notify_calls_sender_exactly_once_and_returns_none(sender):
+    n = Notifier(sender)
+    result = n.notify(KIND_DONE, _issue(), "thread-7", "all green")
+    assert result is None
+    assert len(sender.sent) == 1
+
+
+def test_notify_does_not_post_anything_itself(sender):
+    """The Notifier must never reach the network on its own — all egress goes
+    through the injected sender. A test-only sentinel proves that."""
+    n = Notifier(sender)
+    n.notify(KIND_FROZEN, _issue(), "thread-1", "budget exhausted")
+    # Nothing other than the injected sender ran: exactly one recorded payload,
+    # and it is the Notification dataclass (not a raw dict / HTTP response).
+    assert isinstance(sender.sent[0], Notification)
+
+
+# --------------------------------------------------------------------------- #
+# Deep-link — every payload links back to the T3 thread (when there is one).
+# --------------------------------------------------------------------------- #
+def test_payload_deep_links_to_the_t3_thread(sender):
+    n = Notifier(sender, base_url="https://t3.viktorbarzin.me")
+    n.notify(KIND_DONE, _issue(), "thread-abc", "done")
+    payload = sender.sent[0]
+    assert payload.link == "https://t3.viktorbarzin.me/?thread=thread-abc"
+    # The link is also surfaced in the human-readable body so it survives
+    # senders that drop structured fields (e.g. a plain ntfy message).
+    assert "https://t3.viktorbarzin.me/?thread=thread-abc" in payload.body
+
+
+def test_base_url_trailing_slash_is_normalised(sender):
+    n = Notifier(sender, base_url="https://t3.viktorbarzin.me/")
+    n.notify(KIND_DONE, _issue(), "thread-x", "done")
+    assert sender.sent[0].link == "https://t3.viktorbarzin.me/?thread=thread-x"
+
+
+def test_no_thread_id_means_no_link(sender):
+    """A run can reach 'needs-human' before any thread exists (e.g. dispatch
+    itself failed). Without a thread there is nothing to deep-link to, so the
+    link is None — but the doorbell still fires."""
+    n = Notifier(sender)
+    n.notify(KIND_NEEDS_HUMAN, _issue(), None, "dispatch failed")
+    payload = sender.sent[0]
+    assert payload.link is None
+    assert len(sender.sent) == 1
+    # No dangling "/?thread=" fragment leaks into the body either.
+    assert "?thread=" not in payload.body
+
+
+# --------------------------------------------------------------------------- #
+# Per-kind formatting — title / body / priority / tags differ per terminal kind.
+# --------------------------------------------------------------------------- #
+def test_done_payload_is_informational(sender):
+    n = Notifier(sender)
+    n.notify(KIND_DONE, _issue(number=7, repo="infra"), "thread-7", "merged + CI green")
+    p = sender.sent[0]
+    assert p.kind == KIND_DONE
+    assert p.issue_ref == "infra#7"
+    assert "infra#7" in p.title
+    assert "merged + CI green" in p.body
+    # A successful close is informational, not an escalation.
+    assert p.priority == "low"
+    assert "escalation" not in p.tags
+
+
+def test_needs_human_payload_is_an_escalation(sender):
+    n = Notifier(sender)
+    n.notify(KIND_NEEDS_HUMAN, _issue(number=9, repo="claude-agent-service"), "thread-9", "errored before push")
+    p = sender.sent[0]
+    assert p.kind == KIND_NEEDS_HUMAN
+    assert p.issue_ref == "claude-agent-service#9"
+    assert "claude-agent-service#9" in p.title
+    assert "errored before push" in p.body
+    assert p.priority == "high"
+    assert "escalation" in p.tags
+
+
+def test_frozen_payload_is_an_escalation(sender):
+    n = Notifier(sender)
+    n.notify(KIND_FROZEN, _issue(number=3, repo="infra"), "thread-3", "fix-forward budget exhausted")
+    p = sender.sent[0]
+    assert p.kind == KIND_FROZEN
+    assert "infra#3" in p.title
+    assert "fix-forward budget exhausted" in p.body
+    assert p.priority == "high"
+    assert "escalation" in p.tags
+
+
+def test_titles_distinguish_the_three_kinds(sender):
+    """An operator skimming a Slack channel must tell the three apart from the
+    title alone, without reading the body."""
+    n = Notifier(sender)
+    n.notify(KIND_DONE, _issue(), "t", "x")
+    n.notify(KIND_NEEDS_HUMAN, _issue(), "t", "x")
+    n.notify(KIND_FROZEN, _issue(), "t", "x")
+    titles = [p.title for p in sender.sent]
+    assert len({t.split(" ")[0] for t in titles}) == 3  # distinct leading marker per kind
+
+
+# --------------------------------------------------------------------------- #
+# Guardrail — only terminal kinds are sendable. An unknown kind is a bug.
+# --------------------------------------------------------------------------- #
+def test_unknown_kind_raises_and_sends_nothing(sender):
+    n = Notifier(sender)
+    with pytest.raises(ValueError):
+        n.notify("running", _issue(), "thread-1", "still working")
+    assert sender.sent == []
+
+
+# --------------------------------------------------------------------------- #
+# Pure formatter — render_notification builds the payload independently of any
+# sender, so the formatting is unit-testable on its own.
+# --------------------------------------------------------------------------- #
+def test_render_notification_is_pure_and_matches_notify(sender):
+    issue = _issue(number=11, repo="infra")
+    built = notifier_mod.render_notification(
+        KIND_FROZEN, issue, "thread-11", "stuck", base_url="https://t3.viktorbarzin.me"
+    )
+    assert isinstance(built, Notification)
+    assert built.link == "https://t3.viktorbarzin.me/?thread=thread-11"
+    # notify() must produce the identical payload it hands the sender.
+    Notifier(sender, base_url="https://t3.viktorbarzin.me").notify(
+        KIND_FROZEN, issue, "thread-11", "stuck"
+    )
+    assert sender.sent[0] == built
+
+
+def test_sender_exception_propagates(sender):
+    """If the sender fails (Slack down), the notifier does not swallow it — the
+    loop decides what to do with a failed doorbell, not this adapter."""
+    def boom(_notification: Notification) -> None:
+        raise RuntimeError("slack 503")
+
+    n = Notifier(boom)
+    with pytest.raises(RuntimeError, match="slack 503"):
+        n.notify(KIND_DONE, _issue(), "thread-1", "done")
--- a/tests/test_afk_phase_checklist.py
+++ b/tests/test_afk_phase_checklist.py
@ -0,0 +1,247 @@
+"""Tests for ``app.afk.phase_checklist`` — the live progress checklist.
+
+``render(current, meta)`` is PURE: same inputs → byte-identical markdown, no I/O.
+It draws the seven-phase lifecycle (worktree → tests-red → green → pushed → CI →
+deployed → done) as a markdown task list, with phases *before* ``current`` checked
+off, ``current`` marked in-progress, and later phases left empty.
+
+Style matches the existing suite: plain ``assert`` functions, parametrized cases,
+and a couple of full-output snapshots so the rendered shape is pinned, not just
+its line count.
+"""
+import pytest
+
+from app.afk.phase_checklist import render
+from app.afk.types import Phase
+
+
+# Lifecycle order, mirrored from the contract so a reordering of the enum that
+# the renderer didn't track shows up as a test failure rather than silent drift.
+PHASES_IN_ORDER = [
+    Phase.WORKTREE,
+    Phase.TESTS_RED,
+    Phase.GREEN,
+    Phase.PUSHED,
+    Phase.CI,
+    Phase.DEPLOYED,
+    Phase.DONE,
+]
+
+
+# --------------------------------------------------------------------------- #
+# Structure: one line per phase, in order, always all seven.
+# --------------------------------------------------------------------------- #
+def _checklist_lines(out: str) -> list[str]:
+    """The markdown task-list lines (``- [ ]`` / ``- [x]`` ...), in order."""
+    return [ln for ln in out.splitlines() if ln.lstrip().startswith("- [")]
+
+
+def test_renders_a_string():
+    assert isinstance(render(Phase.WORKTREE, {}), str)
+
+
+@pytest.mark.parametrize("current", PHASES_IN_ORDER)
+def test_every_phase_has_exactly_one_checklist_line(current):
+    lines = _checklist_lines(render(current, {}))
+    assert len(lines) == len(PHASES_IN_ORDER)
+
+
+@pytest.mark.parametrize("current", PHASES_IN_ORDER)
+def test_checklist_lines_are_in_lifecycle_order(current):
+    lines = _checklist_lines(render(current, {}))
+    # Each phase's human label appears, and in the lifecycle order.
+    positions = [
+        next(i for i, ln in enumerate(lines) if _has_label(ln, phase))
+        for phase in PHASES_IN_ORDER
+    ]
+    assert positions == sorted(positions)
+
+
+def _has_label(line: str, phase: Phase) -> bool:
+    """Whether a checklist line carries ``phase``'s headline word (case-insensitive
+    substring — the test asserts the label is *present*, not its exact decoration)."""
+    return _phase_label(phase).lower() in line.lower()
+
+
+def _phase_label(phase: Phase) -> str:
+    """The headline word(s) the renderer must use for a phase. Loose on purpose:
+    the test asserts the label is *present*, not the exact decoration."""
+    return {
+        Phase.WORKTREE: "worktree",
+        Phase.TESTS_RED: "test",
+        Phase.GREEN: "green",
+        Phase.PUSHED: "push",
+        Phase.CI: "CI",
+        Phase.DEPLOYED: "deploy",
+        Phase.DONE: "done",
+    }[phase]
+
+
+# --------------------------------------------------------------------------- #
+# Check/in-progress/empty partitioning around ``current``.
+# --------------------------------------------------------------------------- #
+def _classify(line: str) -> str:
+    """Bucket a checklist line by its marker: 'done' ``[x]``, 'todo' ``[ ]``, or
+    'active' (anything else, e.g. an in-progress glyph)."""
+    body = line.lstrip()
+    if body.startswith("- [x]"):
+        return "done"
+    if body.startswith("- [ ]"):
+        return "todo"
+    return "active"
+
+
+@pytest.mark.parametrize("idx,current", list(enumerate(PHASES_IN_ORDER)))
+def test_earlier_checked_current_active_later_empty(idx, current):
+    lines = _checklist_lines(render(current, {}))
+    buckets = [_classify(ln) for ln in lines]
+
+    # Everything strictly before the current phase is checked off.
+    assert all(b == "done" for b in buckets[:idx]), buckets
+
+    if current is Phase.DONE:
+        # Terminal phase: the whole list is checked, nothing left active/empty.
+        assert all(b == "done" for b in buckets), buckets
+    else:
+        # The current phase is the single in-progress marker...
+        assert buckets[idx] == "active", buckets
+        assert buckets.count("active") == 1, buckets
+        # ...and every phase after it is still an empty checkbox.
+        assert all(b == "todo" for b in buckets[idx + 1 :]), buckets
+
+
+def test_first_phase_has_nothing_checked_before_it():
+    lines = _checklist_lines(render(Phase.WORKTREE, {}))
+    assert _classify(lines[0]) == "active"
+    assert "done" not in [_classify(ln) for ln in lines]
+
+
+def test_done_checks_every_phase_including_done():
+    lines = _checklist_lines(render(Phase.DONE, {}))
+    assert all(_classify(ln) == "done" for ln in lines)
+    # The DONE line itself is checked, not merely the ones before it.
+    done_line = next(ln for ln in lines if _has_label(ln, Phase.DONE))
+    assert _classify(done_line) == "done"
+
+
+# --------------------------------------------------------------------------- #
+# Active-phase emphasis: the current phase is visually distinguishable.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize("current", [p for p in PHASES_IN_ORDER if p is not Phase.DONE])
+def test_active_phase_line_differs_from_todo_and_done_markers(current):
+    lines = _checklist_lines(render(current, {}))
+    active = [ln for ln in lines if _classify(ln) == "active"]
+    assert len(active) == 1
+    # Not a plain checkbox in either state.
+    assert not active[0].lstrip().startswith("- [x]")
+    assert not active[0].lstrip().startswith("- [ ]")
+
+
+# --------------------------------------------------------------------------- #
+# meta rendering: optional context is surfaced, omission never explodes.
+# --------------------------------------------------------------------------- #
+def test_meta_empty_does_not_raise_and_still_lists_phases():
+    out = render(Phase.GREEN, {})
+    assert _checklist_lines(out)  # non-empty
+
+
+def test_meta_issue_and_repo_appear_in_output():
+    out = render(Phase.GREEN, {"repo": "infra", "issue": 42})
+    assert "infra" in out
+    assert "42" in out
+
+
+def test_meta_thread_id_appears_when_present():
+    out = render(Phase.PUSHED, {"thread_id": "thread-7"})
+    assert "thread-7" in out
+
+
+def test_meta_thread_id_absent_is_silent():
+    out = render(Phase.PUSHED, {})
+    assert "thread-" not in out
+
+
+def test_meta_fix_forward_attempt_surfaced():
+    out = render(Phase.CI, {"fix_forward_attempts": 3})
+    assert "3" in out
+
+
+def test_meta_unknown_keys_are_ignored():
+    # An unexpected key must not crash or leak its raw value as a stray line.
+    out = render(Phase.WORKTREE, {"totally_unknown_field": "should-not-appear"})
+    assert "should-not-appear" not in out
+
+
+# --------------------------------------------------------------------------- #
+# Determinism + idempotence (it's pure).
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize("current", PHASES_IN_ORDER)
+def test_render_is_deterministic(current):
+    meta = {"repo": "infra", "issue": 9, "thread_id": "thread-1"}
+    assert render(current, meta) == render(current, meta)
+
+
+def test_render_does_not_mutate_meta():
+    meta = {"repo": "infra", "issue": 1}
+    before = dict(meta)
+    render(Phase.GREEN, meta)
+    assert meta == before
+
+
+# --------------------------------------------------------------------------- #
+# Snapshots: pin the exact rendered shape for two representative phases. If the
+# format changes intentionally, update these strings; an accidental change to
+# wording/markers/order fails here loudly.
+# --------------------------------------------------------------------------- #
+WORKTREE_SNAPSHOT = """\
+### infra#7 — AFK run progress
+
+- [~] Worktree created
+- [ ] Failing test written (TDD red)
+- [ ] Implementation passing (TDD green)
+- [ ] Pushed to master
+- [ ] CI green on pushed commit
+- [ ] Deployed / rolled out
+- [ ] Done — issue closed
+"""
+
+
+def test_snapshot_worktree_phase():
+    out = render(Phase.WORKTREE, {"repo": "infra", "issue": 7})
+    assert out == WORKTREE_SNAPSHOT
+
+
+CI_SNAPSHOT = """\
+### infra#7 — AFK run progress (thread thread-3)
+
+- [x] Worktree created
+- [x] Failing test written (TDD red)
+- [x] Implementation passing (TDD green)
+- [x] Pushed to master
+- [~] CI green on pushed commit
+- [ ] Deployed / rolled out
+- [ ] Done — issue closed
+"""
+
+
+def test_snapshot_ci_phase_with_thread():
+    out = render(Phase.CI, {"repo": "infra", "issue": 7, "thread_id": "thread-3"})
+    assert out == CI_SNAPSHOT
+
+
+DONE_SNAPSHOT = """\
+### infra#7 — AFK run progress
+
+- [x] Worktree created
+- [x] Failing test written (TDD red)
+- [x] Implementation passing (TDD green)
+- [x] Pushed to master
+- [x] CI green on pushed commit
+- [x] Deployed / rolled out
+- [x] Done — issue closed
+"""
+
+
+def test_snapshot_done_phase():
+    out = render(Phase.DONE, {"repo": "infra", "issue": 7})
+    assert out == DONE_SNAPSHOT
--- a/tests/test_afk_poller.py
+++ b/tests/test_afk_poller.py
@ -0,0 +1,270 @@
+"""Integration tests for ``app.afk.poller`` — the CronJob dispatch tick.
+
+Unlike the unit suites, these wire the REAL pure cores (the actual
+``dispatch_policy.select_dispatchable``) to the in-memory adapter FAKES from
+``conftest`` (``FakeTracker`` / ``FakeT3Client``). No test touches a real T3
+server, GitHub/Forgejo, or the cluster — the poller is exercised end to end with
+fakes standing in only for the I/O edges.
+
+What the tick must do (the poller contract):
+
+  * **kill switch** — a disabled config dispatches nothing AND never calls the
+    tracker or T3 (the CronJob does no I/O when the loop is off);
+  * read the ready set via ``tracker.list_ready(config.allowlist)``;
+  * derive the **per-repo lock** from the ready set itself — a repo with an issue
+    already carrying the ``in_progress_label`` is in flight and is skipped (the
+    CronJob is stateless between ticks, so the tracker is the source of truth);
+  * run the real ``select_dispatchable`` over (ready issues, config, in-flight
+    repos) and, for each decision, ``t3_client.dispatch(...)`` then
+    ``tracker.add_label(repo, issue, in_progress_label)`` — label AFTER a
+    successful dispatch so a dispatch failure never leaves a phantom lock.
+"""
+import pytest
+
+from app.afk import poller
+from app.afk.types import Config
+
+
+# --------------------------------------------------------------------------- #
+# Helpers.
+# --------------------------------------------------------------------------- #
+def _poller(fake_tracker, fake_t3) -> poller.Poller:
+    """A Poller wired to the conftest fakes and the real dispatch policy."""
+    return poller.Poller(tracker=fake_tracker, t3_client=fake_t3)
+
+
+def _dispatched_pairs(fake_t3) -> set[tuple[str, int]]:
+    return {(d["repo"], d["issue"]) for d in fake_t3.dispatched}
+
+
+def _added_in_progress(fake_tracker, label: str = "agent-in-progress") -> set[tuple[str, int]]:
+    return {
+        (repo, issue)
+        for (op, repo, issue, lbl) in fake_tracker.label_ops
+        if op == "add" and lbl == label
+    }
+
+
+# --------------------------------------------------------------------------- #
+# Kill switch — no dispatch, no I/O at all.
+# --------------------------------------------------------------------------- #
+def test_kill_switch_dispatches_nothing(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed("infra", [make_issue(number=1, repo="infra")])
+    config = Config(allowlist=["infra"], kill_switch=True)
+
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+
+    assert result.dispatched == []
+    assert fake_t3.dispatched == []
+
+
+def test_kill_switch_does_not_even_read_the_tracker(fake_t3):
+    """When the loop is off the CronJob must do zero I/O — not a single tracker
+    or T3 call. A tracker that explodes if touched proves it."""
+    class ExplodingTracker:
+        def list_ready(self, repos):
+            raise AssertionError("tracker must not be read when kill switch is on")
+
+    config = Config(allowlist=["infra"], kill_switch=True)
+    result = poller.Poller(tracker=ExplodingTracker(), t3_client=fake_t3).run_once(config)
+    assert result.dispatched == []
+
+
+# --------------------------------------------------------------------------- #
+# Empty allowlist — armed kill switch but nothing to run.
+# --------------------------------------------------------------------------- #
+def test_empty_allowlist_dispatches_nothing(fake_tracker, fake_t3, make_issue):
+    # list_ready([]) returns nothing, and even if it didn't the policy gates on
+    # the (empty) allowlist. The shipped default posture.
+    config = Config(allowlist=[], kill_switch=False)
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+    assert result.dispatched == []
+    assert fake_t3.dispatched == []
+
+
+# --------------------------------------------------------------------------- #
+# Happy path — one ready issue gets dispatched and labelled.
+# --------------------------------------------------------------------------- #
+def test_dispatches_a_ready_issue(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
+    config = Config(allowlist=["infra"], kill_switch=False)
+
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+
+    assert _dispatched_pairs(fake_t3) == {("infra", 7)}
+    assert len(result.dispatched) == 1
+    assert result.dispatched[0].thread_id == "thread-0"
+    assert result.dispatched[0].issue.number == 7
+
+
+def test_labels_in_progress_after_dispatch(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
+    config = Config(allowlist=["infra"], kill_switch=False)
+
+    _poller(fake_tracker, fake_t3).run_once(config)
+
+    assert _added_in_progress(fake_tracker) == {("infra", 7)}
+
+
+def test_in_progress_label_honours_config_override(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
+    config = Config(allowlist=["infra"], kill_switch=False, in_progress_label="busy")
+
+    _poller(fake_tracker, fake_t3).run_once(config)
+
+    assert _added_in_progress(fake_tracker, "busy") == {("infra", 7)}
+
+
+def test_dispatch_prompt_references_the_issue(fake_tracker, fake_t3, make_issue):
+    """The agent runs full-access and fetches the body itself, so the prompt the
+    poller sends must at minimum point at the concrete repo#issue."""
+    fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
+    config = Config(allowlist=["infra"], kill_switch=False)
+
+    _poller(fake_tracker, fake_t3).run_once(config)
+
+    prompt = fake_t3.dispatched[0]["prompt"]
+    assert "7" in prompt and "infra" in prompt
+    assert prompt.strip()  # non-empty
+
+
+# --------------------------------------------------------------------------- #
+# Per-repo lock — an issue already carrying the in-progress label means an agent
+# is in flight on that repo, so the repo is skipped this tick.
+# --------------------------------------------------------------------------- #
+def test_repo_with_in_progress_issue_is_locked(fake_tracker, fake_t3, make_issue):
+    in_flight = make_issue(
+        number=1, repo="infra", labels=["ready-for-agent", "agent-in-progress"]
+    )
+    waiting = make_issue(number=2, repo="infra", labels=["ready-for-agent"])
+    fake_tracker.seed("infra", [in_flight, waiting])
+    config = Config(allowlist=["infra"], kill_switch=False)
+
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+
+    # Repo already busy → nothing new dispatched, no new in-progress label.
+    assert result.dispatched == []
+    assert fake_t3.dispatched == []
+    assert _added_in_progress(fake_tracker) == set()
+
+
+def test_lock_is_per_repo_not_global(fake_tracker, fake_t3, make_issue):
+    # infra is busy; a different repo is free and should still dispatch.
+    fake_tracker.seed(
+        "infra",
+        [make_issue(number=1, repo="infra", labels=["ready-for-agent", "agent-in-progress"])],
+    )
+    fake_tracker.seed("dotfiles", [make_issue(number=2, repo="dotfiles")])
+    config = Config(allowlist=["infra", "dotfiles"], kill_switch=False)
+
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+
+    assert _dispatched_pairs(fake_t3) == {("dotfiles", 2)}
+    assert {d.issue.repo for d in result.dispatched} == {"dotfiles"}
+
+
+def test_custom_in_progress_label_drives_the_lock(fake_tracker, fake_t3, make_issue):
+    # The lock keys off config.in_progress_label, not the hardcoded default.
+    fake_tracker.seed(
+        "infra",
+        [make_issue(number=1, repo="infra", labels=["ready-for-agent", "busy"])],
+    )
+    config = Config(allowlist=["infra"], kill_switch=False, in_progress_label="busy")
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+    assert result.dispatched == []
+
+
+# --------------------------------------------------------------------------- #
+# One dispatch per repo per tick (the policy's one-agent-per-repo invariant,
+# observed through the poller): the most urgent (lowest-value) eligible issue
+# wins the slot.
+# --------------------------------------------------------------------------- #
+def test_one_dispatch_per_repo_per_tick(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed(
+        "infra",
+        [
+            make_issue(number=1, repo="infra", priority=1),  # most urgent (lowest value)
+            make_issue(number=2, repo="infra", priority=9),
+            make_issue(number=3, repo="infra", priority=5),
+        ],
+    )
+    config = Config(allowlist=["infra"], kill_switch=False)
+
+    _poller(fake_tracker, fake_t3).run_once(config)
+
+    assert _dispatched_pairs(fake_t3) == {("infra", 1)}
+    assert _added_in_progress(fake_tracker) == {("infra", 1)}
+
+
+# --------------------------------------------------------------------------- #
+# Gating still applies through the poller (the pure policy enforces it; the
+# poller must not bypass it).
+# --------------------------------------------------------------------------- #
+def test_untrusted_issue_is_not_dispatched(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed(
+        "infra", [make_issue(number=1, repo="infra", labeled_by_trusted=False)]
+    )
+    config = Config(allowlist=["infra"], kill_switch=False)
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+    assert result.dispatched == []
+    assert fake_t3.dispatched == []
+
+
+def test_blocked_issue_is_not_dispatched(fake_tracker, fake_t3, make_issue):
+    fake_tracker.seed(
+        "infra", [make_issue(number=2, repo="infra", blocked_by=[1])]
+    )
+    config = Config(allowlist=["infra"], kill_switch=False)
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+    assert result.dispatched == []
+
+
+def test_repo_outside_allowlist_is_not_dispatched(fake_tracker, fake_t3, make_issue):
+    # list_ready only queries the allowlist, but even if a stray repo's issues
+    # arrive the policy's allowlist gate drops them.
+    fake_tracker.seed("secret", [make_issue(number=1, repo="secret")])
+    config = Config(allowlist=["infra"], kill_switch=False)
+    result = _poller(fake_tracker, fake_t3).run_once(config)
+    assert result.dispatched == []
+
+
+# --------------------------------------------------------------------------- #
+# Dispatch failure must not leave a phantom lock (label only AFTER success).
+# --------------------------------------------------------------------------- #
+def test_dispatch_failure_does_not_label_in_progress(fake_tracker, make_issue):
+    class FailingT3:
+        def __init__(self):
+            self.dispatched = []
+
+        def dispatch(self, repo, issue, prompt):
+            raise RuntimeError("T3 down")
+
+    fake_tracker.seed("infra", [make_issue(number=7, repo="infra")])
+    config = Config(allowlist=["infra"], kill_switch=False)
+
+    with pytest.raises(RuntimeError):
+        poller.Poller(tracker=fake_tracker, t3_client=FailingT3()).run_once(config)
+
+    # No in-progress label was applied — the issue stays purely ready, so the
+    # next tick retries it rather than treating it as locked.
+    assert _added_in_progress(fake_tracker) == set()
+
+
+# --------------------------------------------------------------------------- #
+# list_ready is called with exactly the allowlist (not all repos).
+# --------------------------------------------------------------------------- #
+def test_queries_only_the_allowlisted_repos(fake_t3, make_issue):
+    seen_repos: list[list[str]] = []
+
+    class RecordingTracker:
+        def list_ready(self, repos):
+            seen_repos.append(list(repos))
+            return []
+
+        def add_label(self, *a):  # pragma: no cover - not reached here
+            raise AssertionError("nothing to label")
+
+    config = Config(allowlist=["infra", "dotfiles"], kill_switch=False)
+    poller.Poller(tracker=RecordingTracker(), t3_client=fake_t3).run_once(config)
+
+    assert seen_repos == [["infra", "dotfiles"]]
--- a/tests/test_afk_run_state_machine.py
+++ b/tests/test_afk_run_state_machine.py
@ -0,0 +1,190 @@
+"""Tests for ``app.afk.run_state_machine.next_action`` — the pure decision
+function that turns one assembled ``RunState`` into the next ``Action``.
+
+The function encodes ADR-0002's run lifecycle:
+
+  * healthy (pushed AND CI green)                 -> CLOSE_SUCCESS
+  * cannot reach green before push (errored /
+    stalled with nothing pushed)                  -> ESCALATE_PREPUSH
+  * pushed but CI red, budget remaining           -> FIX_FORWARD
+  * pushed but CI red, budget exhausted           -> FREEZE_ESCALATE
+  * anything still in flight                       -> WAIT
+
+It is PURE: no I/O, no clock, no globals — it reads only its two arguments, so
+every case is a plain table assertion. ``make_config`` / ``make_run_state`` come
+from ``conftest.py`` (config defaults to ENABLED, run state to a fresh dispatch).
+"""
+import pytest
+
+from app.afk.run_state_machine import next_action
+from app.afk.types import Action, CIStatus, ThreadStatus
+
+
+# --------------------------------------------------------------------------- #
+# Healthy terminal: pushed + CI green -> close, regardless of thread status.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize(
+    "thread_status",
+    [ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
+)
+def test_pushed_and_green_closes_success(make_config, make_run_state, thread_status):
+    state = make_run_state(
+        thread_status=thread_status, ci_status=CIStatus.GREEN, pushed=True
+    )
+    assert next_action(state, make_config()) is Action.CLOSE_SUCCESS
+
+
+# --------------------------------------------------------------------------- #
+# Pre-push escalation: nothing pushed and the turn is no longer going to push
+# (errored, or finished/stalled clean) -> hand back to a human.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize("thread_status", [ThreadStatus.ERROR, ThreadStatus.IDLE])
+@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
+def test_not_pushed_terminal_thread_escalates_prepush(
+    make_config, make_run_state, thread_status, ci_status
+):
+    state = make_run_state(
+        thread_status=thread_status, ci_status=ci_status, pushed=False
+    )
+    assert next_action(state, make_config()) is Action.ESCALATE_PREPUSH
+
+
+# --------------------------------------------------------------------------- #
+# Still working toward a first push -> WAIT (not yet an escalation).
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize("thread_status", [ThreadStatus.RUNNING, None])
+@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
+def test_not_pushed_in_flight_waits(
+    make_config, make_run_state, thread_status, ci_status
+):
+    state = make_run_state(
+        thread_status=thread_status, ci_status=ci_status, pushed=False
+    )
+    assert next_action(state, make_config()) is Action.WAIT
+
+
+# --------------------------------------------------------------------------- #
+# Pushed, CI not yet decided -> WAIT for the verdict, whatever the thread does.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize(
+    "thread_status",
+    [ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
+)
+@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING])
+def test_pushed_ci_pending_waits(
+    make_config, make_run_state, thread_status, ci_status
+):
+    state = make_run_state(
+        thread_status=thread_status, ci_status=ci_status, pushed=True
+    )
+    assert next_action(state, make_config()) is Action.WAIT
+
+
+# --------------------------------------------------------------------------- #
+# Pushed + CI red: fix-forward while BOTH budgets remain, else freeze.
+# Boundaries are strict-less-than on attempts AND elapsed; at/over either freezes.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize(
+    ("attempts", "elapsed", "expected"),
+    [
+        # fresh red, plenty of budget -> fix forward
+        (0, 0.0, Action.FIX_FORWARD),
+        (1, 10.0, Action.FIX_FORWARD),
+        # one attempt below the cap, well inside the clock -> still fix forward
+        (4, 3599.0, Action.FIX_FORWARD),
+        # attempts hit the cap (5) -> freeze
+        (5, 0.0, Action.FREEZE_ESCALATE),
+        (6, 0.0, Action.FREEZE_ESCALATE),
+        # clock hits the cap (3600s) -> freeze even with attempts to spare
+        (0, 3600.0, Action.FREEZE_ESCALATE),
+        (0, 7200.0, Action.FREEZE_ESCALATE),
+        # both exhausted -> freeze
+        (5, 3600.0, Action.FREEZE_ESCALATE),
+    ],
+)
+def test_pushed_red_fix_forward_until_budget_exhausted(
+    make_config, make_run_state, attempts, elapsed, expected
+):
+    state = make_run_state(
+        thread_status=ThreadStatus.IDLE,
+        ci_status=CIStatus.RED,
+        pushed=True,
+        fix_forward_attempts=attempts,
+        elapsed_seconds=elapsed,
+    )
+    assert next_action(state, make_config()) is expected
+
+
+# --------------------------------------------------------------------------- #
+# Fix-forward budget is honoured from config, not hardcoded.
+# --------------------------------------------------------------------------- #
+def test_fix_forward_attempts_cap_comes_from_config(make_config, make_run_state):
+    config = make_config(fix_forward_max_attempts=2)
+    red = dict(thread_status=ThreadStatus.IDLE, ci_status=CIStatus.RED, pushed=True)
+    assert next_action(make_run_state(fix_forward_attempts=1, **red), config) is Action.FIX_FORWARD
+    assert next_action(make_run_state(fix_forward_attempts=2, **red), config) is Action.FREEZE_ESCALATE
+
+
+def test_fix_forward_seconds_cap_comes_from_config(make_config, make_run_state):
+    config = make_config(fix_forward_max_seconds=120)
+    red = dict(thread_status=ThreadStatus.IDLE, ci_status=CIStatus.RED, pushed=True)
+    assert next_action(make_run_state(elapsed_seconds=119.0, **red), config) is Action.FIX_FORWARD
+    assert next_action(make_run_state(elapsed_seconds=120.0, **red), config) is Action.FREEZE_ESCALATE
+
+
+# --------------------------------------------------------------------------- #
+# A red CI on a pushed commit while the thread is still RUNNING a fix is, per
+# spec, keyed only on (pushed AND red) + budget — thread status doesn't gate it.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize(
+    "thread_status",
+    [ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
+)
+def test_pushed_red_with_budget_fixes_forward_for_any_thread_status(
+    make_config, make_run_state, thread_status
+):
+    state = make_run_state(
+        thread_status=thread_status,
+        ci_status=CIStatus.RED,
+        pushed=True,
+        fix_forward_attempts=0,
+        elapsed_seconds=0.0,
+    )
+    assert next_action(state, make_config()) is Action.FIX_FORWARD
+
+
+# --------------------------------------------------------------------------- #
+# Full cross-product sanity sweep: next_action is TOTAL — it returns a real
+# Action for every reachable combination, and matches the reference table.
+# --------------------------------------------------------------------------- #
+def _expected(thread_status, ci_status, pushed):
+    """Reference implementation of the decision table, written independently of
+    the module under test, to cross-check every combination."""
+    if pushed and ci_status is CIStatus.GREEN:
+        return Action.CLOSE_SUCCESS
+    if pushed and ci_status is CIStatus.RED:
+        return Action.FIX_FORWARD  # budget always available in this sweep
+    if not pushed and thread_status in (ThreadStatus.ERROR, ThreadStatus.IDLE):
+        return Action.ESCALATE_PREPUSH
+    return Action.WAIT
+
+
+@pytest.mark.parametrize(
+    "thread_status",
+    [ThreadStatus.RUNNING, ThreadStatus.IDLE, ThreadStatus.ERROR, None],
+)
+@pytest.mark.parametrize("ci_status", [None, CIStatus.PENDING, CIStatus.GREEN, CIStatus.RED])
+@pytest.mark.parametrize("pushed", [True, False])
+def test_decision_table_is_total(
+    make_config, make_run_state, thread_status, ci_status, pushed
+):
+    state = make_run_state(
+        thread_status=thread_status,
+        ci_status=ci_status,
+        pushed=pushed,
+        fix_forward_attempts=0,
+        elapsed_seconds=0.0,
+    )
+    result = next_action(state, make_config())
+    assert isinstance(result, Action)
+    assert result is _expected(thread_status, ci_status, pushed)
--- a/tests/test_afk_t3_client.py
+++ b/tests/test_afk_t3_client.py
@ -0,0 +1,265 @@
+"""Tests for ``app.afk.t3_client`` — the in-cluster T3 dispatch/snapshot adapter.
+
+Everything runs against an in-memory FAKE HTTP transport; no test touches a real
+T3 server. These assertions pin the **real** orchestration wire contract
+(reverse-engineered from T3 v0.0.27 and verified live against t3-afk on
+2026-06-15) — deliberately strict, because the previous version of this adapter
+passed a laxer fake while 400-ing the real server. The fake therefore *rejects*
+a command without a ``type`` discriminator, so a regression to the old
+``{"command": "..."}` shape fails loudly here.
+
+Pinned facts:
+  * the dispatch body is a BARE command keyed by ``type`` (not ``command``);
+  * the CLIENT mints ``threadId``/``commandId``/``messageId`` + ``createdAt``;
+    ``dispatch`` returns the id it generated (the server replies ``{sequence}``);
+  * a thread lives in a project, so ``dispatch`` ensures the repo's project
+    (snapshot GET → ``project.create`` iff absent) before ``thread.create``;
+  * ``ISSUE_IMPLEMENTER_PREAMBLE`` is prepended to the opening turn's text;
+  * ``send_turn`` posts a follow-up turn (no preamble) on an existing thread;
+  * every request carries ``Authorization: Bearer <token>``, re-read per call.
+"""
+import pytest
+
+from app.afk import t3_client
+from app.afk.issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
+
+_MODEL = "claude-sonnet-4-6"
+
+
+# --------------------------------------------------------------------------- #
+# Fake HTTP transport — httpx-shaped, but it ENFORCES the command envelope so a
+# malformed command (the old bug) raises instead of silently passing.
+# --------------------------------------------------------------------------- #
+class FakeResponse:
+    def __init__(self, payload: dict, status_code: int = 200) -> None:
+        self._payload = payload
+        self.status_code = status_code
+
+    def json(self) -> dict:
+        return self._payload
+
+    def raise_for_status(self) -> None:
+        if self.status_code >= 400:
+            raise RuntimeError(f"HTTP {self.status_code}")
+
+
+class FakeHttp:
+    """Records each POST/GET; GETs replay staged snapshots (default: no projects,
+    so ``dispatch`` creates one). POST bodies are validated as real commands."""
+
+    def __init__(self, get_responses: list[dict] | None = None) -> None:
+        self.get_responses = list(get_responses or [])
+        self.posts: list[dict] = []
+        self.gets: list[dict] = []
+
+    def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
+        assert isinstance(json.get("type"), str) and json["type"], (
+            f"command must carry a non-empty `type` discriminator, got {json!r}"
+        )
+        self.posts.append({"url": url, "json": json, "headers": headers})
+        return FakeResponse({"sequence": len(self.posts)})  # the real server reply
+
+    def get(self, url: str, headers: dict) -> FakeResponse:
+        self.gets.append({"url": url, "headers": headers})
+        body = self.get_responses.pop(0) if self.get_responses else {"projects": []}
+        return FakeResponse(body)
+
+    # Convenience views over recorded POSTs, keyed by command type.
+    def commands(self, type_: str) -> list[dict]:
+        return [c["json"] for c in self.posts if c["json"]["type"] == type_]
+
+
+def _ids():
+    """Deterministic id factory: id-1, id-2, … so tests can reason about minting."""
+    n = {"i": 0}
+
+    def f() -> str:
+        n["i"] += 1
+        return f"id-{n['i']}"
+
+    return f
+
+
+def _resolver(repo: str) -> t3_client.ProjectRef:
+    """Predictable repo -> project mapping for assertions."""
+    return t3_client.ProjectRef(f"proj-{repo}", f"/data/{repo}", repo)
+
+
+def _client(http: FakeHttp, *, base_url="http://t3-afk:8080", token="tok-1", **kw):
+    return t3_client.T3Client(
+        base_url=base_url,
+        http=http,
+        bearer_provider=lambda: token,
+        project_resolver=_resolver,
+        id_factory=kw.pop("id_factory", _ids()),
+        clock=kw.pop("clock", lambda: "2026-06-15T00:00:00+00:00"),
+        model=_MODEL,
+    )
+
+
+def _dispatch(http: FakeHttp, *, repo="infra", issue=42, prompt="Do the thing.", **kw):
+    return _client(http, **kw).dispatch(repo=repo, issue=issue, prompt=prompt)
+
+
+# --------------------------------------------------------------------------- #
+# dispatch — ensure-project, then create, then turn.
+# --------------------------------------------------------------------------- #
+def test_dispatch_ensures_project_then_creates_thread_then_turn_when_project_absent():
+    http = FakeHttp(get_responses=[{"projects": []}])
+    _dispatch(http)
+    # one snapshot GET (the existence check) + three POSTs in order.
+    assert len(http.gets) == 1
+    types = [c["json"]["type"] for c in http.posts]
+    assert types == ["project.create", "thread.create", "thread.turn.start"]
+    for call in http.posts:
+        assert call["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
+
+
+def test_dispatch_skips_project_create_when_project_already_exists():
+    http = FakeHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
+    _dispatch(http, repo="infra")
+    types = [c["json"]["type"] for c in http.posts]
+    assert types == ["thread.create", "thread.turn.start"]  # idempotent: no re-create
+
+
+def test_dispatch_uses_type_discriminator_not_command_string():
+    # Regression guard for the original bug: discriminator is `type`, and there is
+    # no legacy top-level `command` string key on any command.
+    http = FakeHttp()
+    _dispatch(http)
+    for c in http.posts:
+        assert "type" in c["json"]
+        assert not isinstance(c["json"].get("command"), str)
+
+
+# --------------------------------------------------------------------------- #
+# dispatch — thread.create real field set.
+# --------------------------------------------------------------------------- #
+def test_thread_create_carries_real_required_fields():
+    http = FakeHttp()
+    _dispatch(http, repo="infra")
+    create = http.commands("thread.create")[0]
+    assert create["projectId"] == "proj-infra"
+    assert create["modelSelection"] == {"instanceId": "claudeAgent", "model": _MODEL}
+    assert create["runtimeMode"] == "full-access"
+    assert create["interactionMode"] == "default"
+    # NullOr fields are present (not omitted) — the schema requires the keys.
+    assert create["branch"] is None
+    assert create["worktreePath"] is None
+    # client-minted identity + timestamp.
+    assert isinstance(create["commandId"], str) and create["commandId"]
+    assert isinstance(create["threadId"], str) and create["threadId"]
+    assert create["createdAt"] == "2026-06-15T00:00:00+00:00"
+
+
+def test_dispatch_returns_client_minted_thread_id_not_a_server_value():
+    http = FakeHttp()
+    returned = _dispatch(http)
+    create = http.commands("thread.create")[0]
+    turn = http.commands("thread.turn.start")[0]
+    # The returned id is the one WE put on thread.create (server only sends {sequence}).
+    assert returned == create["threadId"] == turn["threadId"]
+
+
+# --------------------------------------------------------------------------- #
+# dispatch — thread.turn.start real message shape + preamble.
+# --------------------------------------------------------------------------- #
+def test_turn_message_has_real_shape_and_prepends_preamble():
+    http = FakeHttp()
+    _dispatch(http, prompt="Implement issue 42 body here.")
+    turn = http.commands("thread.turn.start")[0]
+    msg = turn["message"]
+    assert msg["role"] == "user"
+    assert isinstance(msg["messageId"], str) and msg["messageId"]
+    assert msg["attachments"] == []
+    assert msg["text"] == ISSUE_IMPLEMENTER_PREAMBLE + "Implement issue 42 body here."
+    assert turn["runtimeMode"] == "full-access"
+    assert turn["interactionMode"] == "default"
+
+
+def test_preamble_only_on_turn_not_on_create():
+    http = FakeHttp()
+    _dispatch(http)
+    assert "message" not in http.commands("thread.create")[0]
+
+
+# --------------------------------------------------------------------------- #
+# send_turn — follow-up turn on an existing thread (multi-turn), no preamble.
+# --------------------------------------------------------------------------- #
+def test_send_turn_posts_single_turn_to_existing_thread_without_preamble():
+    http = FakeHttp()
+    _client(http).send_turn("thread-xyz", "Just this follow-up.")
+    assert [c["json"]["type"] for c in http.posts] == ["thread.turn.start"]
+    turn = http.commands("thread.turn.start")[0]
+    assert turn["threadId"] == "thread-xyz"
+    assert turn["message"]["text"] == "Just this follow-up."  # verbatim, no preamble
+    assert http.gets == []  # no project work for a follow-up
+
+
+# --------------------------------------------------------------------------- #
+# Auth — bearer on every request, re-read per call.
+# --------------------------------------------------------------------------- #
+def test_every_request_sends_bearer():
+    http = FakeHttp()
+    _dispatch(http, token="secret-token")
+    for call in http.posts:
+        assert call["headers"]["Authorization"] == "Bearer secret-token"
+    for call in http.gets:
+        assert call["headers"]["Authorization"] == "Bearer secret-token"
+
+
+def test_bearer_is_reread_per_request_so_rotation_is_honoured():
+    tokens = iter(["tok-A", "tok-B", "tok-C", "tok-D", "tok-E"])
+    http = FakeHttp()
+    client = t3_client.T3Client(
+        base_url="http://t3-afk:8080",
+        http=http,
+        bearer_provider=lambda: next(tokens),
+        project_resolver=_resolver,
+        id_factory=_ids(),
+        clock=lambda: "t",
+    )
+    client.dispatch(repo="infra", issue=1, prompt="x")
+    # GET(ensure) then POST(project.create) then POST(create) then POST(turn) —
+    # each pulled a fresh token in call order.
+    assert http.gets[0]["headers"]["Authorization"] == "Bearer tok-A"
+    assert http.posts[0]["headers"]["Authorization"] == "Bearer tok-B"
+    assert http.posts[1]["headers"]["Authorization"] == "Bearer tok-C"
+    assert http.posts[2]["headers"]["Authorization"] == "Bearer tok-D"
+
+
+# --------------------------------------------------------------------------- #
+# snapshot — GET + parse.
+# --------------------------------------------------------------------------- #
+def test_snapshot_gets_endpoint_and_returns_parsed_body():
+    fleet = {"threads": [{"id": "t1", "latestTurn": {"state": "running"}}], "projects": []}
+    http = FakeHttp(get_responses=[fleet])
+    result = _client(http).snapshot()
+    assert result == fleet
+    assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
+    assert http.posts == []
+
+
+# --------------------------------------------------------------------------- #
+# base_url normalisation + error surfacing.
+# --------------------------------------------------------------------------- #
+def test_trailing_slash_in_base_url_is_normalised():
+    http = FakeHttp()
+    client = _client(http, base_url="http://t3-afk:8080/")
+    client.dispatch(repo="infra", issue=1, prompt="x")
+    assert http.posts[0]["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
+    assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
+
+
+def test_dispatch_raises_and_short_circuits_when_a_post_errors():
+    class ErroringHttp(FakeHttp):
+        def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
+            super().post(url, json, headers)  # validates + records
+            return FakeResponse({}, status_code=500)
+
+    http = ErroringHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
+    with pytest.raises(RuntimeError):
+        _dispatch(http, repo="infra")
+    # Project already existed, so the FIRST post is thread.create — and it failed,
+    # so thread.turn.start never fired.
+    assert [c["json"]["type"] for c in http.posts] == ["thread.create"]
--- a/tests/test_afk_t3_live.py
+++ b/tests/test_afk_t3_live.py
@ -0,0 +1,92 @@
+"""LIVE smoke test for ``app.afk.t3_client`` against a real T3 instance.
+
+Skipped by default. The unit tests (``test_afk_t3_client``) pin the wire shape
+against a contract-accurate fake; this file proves the *same code* actually talks
+to a live T3 — the guard that "green tests" mean "wired to T3", which the earlier
+fake-only suite did NOT provide (it was green while the real server 400'd).
+
+It is opt-in because the orchestration API is in-cluster (ClusterIP + an
+Authentik-gated ingress), so it can't run in CI without cluster access. Run it
+from inside the cluster, or via a port-forward, with a bearer minted on the pod::
+
+    # bearer (on the t3-afk pod, as the node user):
+    #   t3 auth session issue --token-only --base-dir /data/t3 --ttl 30m
+    kubectl -n t3-afk port-forward deploy/t3-afk 3773:3773 &
+    T3_AFK_BASE_URL=http://127.0.0.1:3773 T3_AFK_TOKEN=<bearer> \
+        python3 -m pytest tests/test_afk_t3_live.py -v
+
+The read-only snapshot check is always safe. The full dispatch round-trip
+(create thread + turn + verify it appears, then delete it) only runs with
+``T3_AFK_SMOKE_DISPATCH=1`` since it spends a (tiny) agent turn.
+"""
+import os
+import time
+
+import pytest
+
+from app.afk import t3_client
+
+_BASE_URL = os.environ.get("T3_AFK_BASE_URL")
+_TOKEN = os.environ.get("T3_AFK_TOKEN")
+
+pytestmark = pytest.mark.skipif(
+    not (_BASE_URL and _TOKEN),
+    reason="set T3_AFK_BASE_URL + T3_AFK_TOKEN to run the live T3 smoke test",
+)
+
+
+def _real_client():
+    import httpx  # local import so the module imports fine without httpx installed
+
+    return t3_client.T3Client(
+        base_url=_BASE_URL,
+        http=httpx.Client(timeout=30.0),
+        bearer_provider=lambda: _TOKEN,
+    )
+
+
+def test_live_snapshot_has_the_real_shape():
+    """A real snapshot parses and carries the keys the watcher/adapter depend on:
+    ``threads`` + ``projects``, and any thread exposes ``latestTurn`` (the
+    liveness source) — not a top-level ``status``."""
+    snap = _real_client().snapshot()
+    assert isinstance(snap, dict)
+    assert "threads" in snap and "projects" in snap
+    for thread in snap["threads"]:
+        assert "id" in thread
+        # liveness lives under latestTurn.state (the contract this suite guards)
+        assert "status" not in thread, "real threads have no top-level status field"
+
+
+@pytest.mark.skipif(
+    os.environ.get("T3_AFK_SMOKE_DISPATCH") != "1",
+    reason="set T3_AFK_SMOKE_DISPATCH=1 to run the dispatch round-trip (spends a turn)",
+)
+def test_live_dispatch_round_trip_then_cleanup():
+    """End-to-end against the real server: ``dispatch`` (ensure-project + create +
+    turn) succeeds and the new thread shows up in the snapshot. Cleans up the
+    thread it created so the cockpit isn't littered."""
+    import httpx
+
+    repo = "afk-smoke/roundtrip"
+    client = _real_client()
+    thread_id = client.dispatch(repo, 1, "Reply with just: ok. Do not use any tools.")
+    assert isinstance(thread_id, str) and thread_id
+
+    # The thread must appear in the fleet read-model (poll briefly — dispatch is
+    # accepted asynchronously).
+    found = False
+    for _ in range(10):
+        if any(t.get("id") == thread_id for t in client.snapshot().get("threads", [])):
+            found = True
+            break
+        time.sleep(1.0)
+    assert found, f"dispatched thread {thread_id} never appeared in the snapshot"
+
+    # Cleanup: delete the throwaway thread (raw command — not part of the adapter).
+    httpx.post(
+        f"{_BASE_URL.rstrip('/')}/api/orchestration/dispatch",
+        headers={"Authorization": f"Bearer {_TOKEN}"},
+        json={"type": "thread.delete", "commandId": t3_client._uuid(), "threadId": thread_id},
+        timeout=30.0,
+    ).raise_for_status()
--- a/tests/test_afk_tracker.py
+++ b/tests/test_afk_tracker.py
@ -0,0 +1,493 @@
+"""Tests for ``app.afk.tracker`` — the GitHub issues adapter.
+
+The ``Tracker`` is the loop's read/write port onto the issue tracker. It wraps
+an injected GitHub client (the real one shells out to ``gh``; here we inject a
+FAKE that records calls and replays staged data) and holds all the *business*
+logic the loop depends on: turning raw issues into ``Issue`` records with
+``blocked_by`` parsed, ``labeled_by_trusted`` decided fail-closed from the label
+event actor, and ``priority`` read off a priority label. No test here reaches a
+real ``gh``, GitHub/Forgejo, or the network.
+"""
+import pytest
+
+from app.afk.tracker import (
+    DEFAULT_TRUSTED_ASSOCIATIONS,
+    GitHubClient,
+    Tracker,
+)
+from app.afk.types import Issue
+
+
+# --------------------------------------------------------------------------- #
+# Fake GitHub client — the injected port. Records every mutating call and
+# replays issues / label-events staged per repo. Implements the GitHubClient
+# Protocol the Tracker depends on.
+# --------------------------------------------------------------------------- #
+class FakeGitHub:
+    def __init__(self) -> None:
+        # repo -> list of raw issue dicts (gh issue list --json shape)
+        self._issues: dict[str, list[dict]] = {}
+        # (repo, number) -> list of label-event dicts (who added which label)
+        self._events: dict[tuple[str, int], list[dict]] = {}
+        # recorded mutations
+        self.labels_added: list[tuple[str, int, str]] = []
+        self.labels_removed: list[tuple[str, int, str]] = []
+        self.comments: list[tuple[str, int, str]] = []
+        self.closed: list[tuple[str, int]] = []
+
+    # --- staging helpers (test-only) --- #
+    def seed_issues(self, repo: str, issues: list[dict]) -> None:
+        self._issues[repo] = issues
+
+    def seed_label_events(self, repo: str, number: int, events: list[dict]) -> None:
+        self._events[(repo, number)] = events
+
+    # --- GitHubClient surface --- #
+    def list_issues(self, repo: str, label: str) -> list[dict]:
+        return [
+            issue
+            for issue in self._issues.get(repo, [])
+            if label in [lbl["name"] for lbl in issue.get("labels", [])]
+        ]
+
+    def label_events(self, repo: str, number: int) -> list[dict]:
+        return list(self._events.get((repo, number), []))
+
+    def add_label(self, repo: str, number: int, label: str) -> None:
+        self.labels_added.append((repo, number, label))
+
+    def remove_label(self, repo: str, number: int, label: str) -> None:
+        self.labels_removed.append((repo, number, label))
+
+    def comment(self, repo: str, number: int, body: str) -> None:
+        self.comments.append((repo, number, body))
+
+    def close(self, repo: str, number: int) -> None:
+        self.closed.append((repo, number))
+
+
+# --------------------------------------------------------------------------- #
+# Raw-issue / event builders matching the gh JSON shapes the real client emits.
+# --------------------------------------------------------------------------- #
+def _raw_issue(
+    number: int = 1,
+    labels: list[str] | None = None,
+    body: str = "",
+) -> dict:
+    return {
+        "number": number,
+        "labels": [{"name": name} for name in (labels or ["ready-for-agent"])],
+        "body": body,
+    }
+
+
+def _label_event(label: str, association: str = "OWNER", actor: str = "viktorbarzin") -> dict:
+    # Mirrors the `gh api .../timeline` "labeled" event shape we care about.
+    return {
+        "event": "labeled",
+        "label": {"name": label},
+        "actor": {"login": actor},
+        "author_association": association,
+    }
+
+
+@pytest.fixture
+def gh() -> FakeGitHub:
+    return FakeGitHub()
+
+
+@pytest.fixture
+def tracker(gh: FakeGitHub) -> Tracker:
+    return Tracker(gh)
+
+
+# --------------------------------------------------------------------------- #
+# Construction / contract.
+# --------------------------------------------------------------------------- #
+def test_tracker_wraps_injected_client(gh: FakeGitHub):
+    t = Tracker(gh)
+    assert t.client is gh
+
+
+def test_fake_satisfies_protocol(gh: FakeGitHub):
+    # The fake must be usable where a GitHubClient is expected (structural typing).
+    assert isinstance(gh, GitHubClient)
+
+
+def test_default_trusted_associations_are_collaborator_or_above():
+    assert DEFAULT_TRUSTED_ASSOCIATIONS == frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
+
+
+# --------------------------------------------------------------------------- #
+# list_ready — the read path that builds Issue records.
+# --------------------------------------------------------------------------- #
+def test_list_ready_returns_issue_objects(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=7)])
+    gh.seed_label_events("infra", 7, [_label_event("ready-for-agent")])
+
+    issues = tracker.list_ready(["infra"])
+
+    assert len(issues) == 1
+    issue = issues[0]
+    assert isinstance(issue, Issue)
+    assert issue.number == 7
+    assert issue.repo == "infra"
+    assert issue.labels == ["ready-for-agent"]
+
+
+def test_list_ready_spans_multiple_repos(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_issues("crawler", [_raw_issue(number=2)])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+    gh.seed_label_events("crawler", 2, [_label_event("ready-for-agent")])
+
+    issues = tracker.list_ready(["infra", "crawler"])
+
+    assert {(i.repo, i.number) for i in issues} == {("infra", 1), ("crawler", 2)}
+
+
+def test_list_ready_empty_when_no_ready_issues(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=1, labels=["bug"])])
+    assert tracker.list_ready(["infra"]) == []
+
+
+def test_list_ready_queries_with_configured_ready_label(gh: FakeGitHub):
+    # A Tracker built with a custom ready label must query the client for *that*
+    # label, not the default.
+    seen: dict[str, str] = {}
+
+    class _RecordingGitHub(FakeGitHub):
+        def list_issues(self, repo: str, label: str) -> list[dict]:
+            seen["label"] = label
+            return super().list_issues(repo, label)
+
+    rec = _RecordingGitHub()
+    rec.seed_issues("infra", [_raw_issue(number=1, labels=["queue-me"])])
+    rec.seed_label_events("infra", 1, [_label_event("queue-me")])
+    t = Tracker(rec, ready_label="queue-me")
+
+    issues = t.list_ready(["infra"])
+
+    assert seen["label"] == "queue-me"
+    assert len(issues) == 1
+
+
+# --------------------------------------------------------------------------- #
+# Trust gate — labeled_by_trusted is decided from the label-event actor,
+# fail-closed.
+# --------------------------------------------------------------------------- #
+def test_owner_labeled_issue_is_trusted(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association="OWNER")])
+
+    assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
+
+
+@pytest.mark.parametrize("association", ["MEMBER", "COLLABORATOR"])
+def test_collaborator_and_member_are_trusted(gh: FakeGitHub, tracker: Tracker, association: str):
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association=association)])
+
+    assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
+
+
+@pytest.mark.parametrize("association", ["NONE", "CONTRIBUTOR", "FIRST_TIME_CONTRIBUTOR", ""])
+def test_untrusted_association_is_not_trusted(gh: FakeGitHub, tracker: Tracker, association: str):
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association=association)])
+
+    assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
+
+
+def test_missing_label_event_is_not_trusted(gh: FakeGitHub, tracker: Tracker):
+    # The issue carries the ready label, but no event records WHO applied it —
+    # fail closed: an unattributable label is never trusted.
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events("infra", 1, [])
+
+    assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
+
+
+def test_trust_uses_latest_application_of_ready_label(gh: FakeGitHub, tracker: Tracker):
+    # If the ready label was removed and re-added, the MOST RECENT application
+    # decides trust — a trusted re-label after an untrusted one is trusted.
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events(
+        "infra",
+        1,
+        [
+            _label_event("ready-for-agent", association="NONE", actor="drive-by"),
+            _label_event("ready-for-agent", association="OWNER", actor="viktorbarzin"),
+        ],
+    )
+
+    assert tracker.list_ready(["infra"])[0].labeled_by_trusted is True
+
+
+def test_trust_ignores_events_for_other_labels(gh: FakeGitHub, tracker: Tracker):
+    # A trusted actor labeling something else must not make the ready label trusted.
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events(
+        "infra",
+        1,
+        [
+            _label_event("priority:high", association="OWNER"),
+            _label_event("ready-for-agent", association="NONE", actor="drive-by"),
+        ],
+    )
+
+    assert tracker.list_ready(["infra"])[0].labeled_by_trusted is False
+
+
+def test_custom_trusted_associations_override_default(gh: FakeGitHub):
+    # Tighten the trust set to OWNER only: a COLLABORATOR label is no longer trusted.
+    t = Tracker(gh, trusted_associations=frozenset({"OWNER"}))
+    gh.seed_issues("infra", [_raw_issue(number=1)])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent", association="COLLABORATOR")])
+
+    assert t.list_ready(["infra"])[0].labeled_by_trusted is False
+
+
+# --------------------------------------------------------------------------- #
+# blocked_by — parsed from the issue body's "Blocked by" references.
+# --------------------------------------------------------------------------- #
+def test_blocked_by_empty_when_body_has_no_references(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=1, body="just implement the thing")])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].blocked_by == []
+
+
+def test_blocked_by_parses_single_reference(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=5, body="Blocked by #3")])
+    gh.seed_label_events("infra", 5, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].blocked_by == [3]
+
+
+def test_blocked_by_parses_multiple_references(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=9, body="Blocked by #3, #4 and #10")])
+    gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].blocked_by == [3, 4, 10]
+
+
+def test_blocked_by_is_case_insensitive_and_dedupes(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=9, body="blocked BY #3 and Blocked by #3, #4")])
+    gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].blocked_by == [3, 4]
+
+
+def test_blocked_by_ignores_plain_issue_mentions(gh: FakeGitHub, tracker: Tracker):
+    # A bare "#7" that is not part of a "Blocked by" clause is NOT a blocker.
+    gh.seed_issues("infra", [_raw_issue(number=9, body="See #7 for context. Blocked by #3")])
+    gh.seed_label_events("infra", 9, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].blocked_by == [3]
+
+
+def test_blocked_by_tolerates_missing_body(gh: FakeGitHub, tracker: Tracker):
+    issue = _raw_issue(number=1)
+    issue["body"] = None  # gh returns null for an empty body
+    gh.seed_issues("infra", [issue])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].blocked_by == []
+
+
+# --------------------------------------------------------------------------- #
+# priority — read off a priority label (lower number runs first).
+# --------------------------------------------------------------------------- #
+def test_priority_defaults_to_zero_without_priority_label(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=1, labels=["ready-for-agent"])])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].priority == 0
+
+
+def test_priority_read_from_priority_label(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues("infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:2"])])
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].priority == 2
+
+
+def test_priority_lowest_label_wins_when_several(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues(
+        "infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:5", "priority:1"])]
+    )
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].priority == 1
+
+
+def test_priority_ignores_non_numeric_priority_label(gh: FakeGitHub, tracker: Tracker):
+    gh.seed_issues(
+        "infra", [_raw_issue(number=1, labels=["ready-for-agent", "priority:high"])]
+    )
+    gh.seed_label_events("infra", 1, [_label_event("ready-for-agent")])
+
+    assert tracker.list_ready(["infra"])[0].priority == 0
+
+
+# --------------------------------------------------------------------------- #
+# Mutations delegate to the injected client.
+# --------------------------------------------------------------------------- #
+def test_add_label_delegates(gh: FakeGitHub, tracker: Tracker):
+    tracker.add_label("infra", 7, "agent-in-progress")
+    assert gh.labels_added == [("infra", 7, "agent-in-progress")]
+
+
+def test_remove_label_delegates(gh: FakeGitHub, tracker: Tracker):
+    tracker.remove_label("infra", 7, "agent-in-progress")
+    assert gh.labels_removed == [("infra", 7, "agent-in-progress")]
+
+
+def test_comment_delegates(gh: FakeGitHub, tracker: Tracker):
+    tracker.comment("infra", 7, "phase: tests-red done")
+    assert gh.comments == [("infra", 7, "phase: tests-red done")]
+
+
+def test_close_delegates(gh: FakeGitHub, tracker: Tracker):
+    tracker.close("infra", 7)
+    assert gh.closed == [("infra", 7)]
+
+
+# --------------------------------------------------------------------------- #
+# The concrete gh-CLI-backed client builds no-shell argv and parses JSON; we
+# inject a fake runner so no real `gh` is ever spawned.
+# --------------------------------------------------------------------------- #
+from app.afk.tracker import GhCliClient  # noqa: E402
+
+
+class _FakeRunner:
+    """Stand-in for the subprocess runner GhCliClient shells out through.
+
+    Records every argv and returns staged stdout per command, so we can pin the
+    exact `gh` invocations without spawning a process.
+    """
+
+    def __init__(self, responses: dict[tuple[str, ...], str] | None = None) -> None:
+        self.calls: list[tuple[str, ...]] = []
+        self._responses = responses or {}
+
+    def __call__(self, argv: list[str]) -> str:
+        key = tuple(argv)
+        self.calls.append(key)
+        return self._responses.get(key, "")
+
+
+def test_gh_cli_list_issues_builds_no_shell_argv_and_parses_json():
+    argv = (
+        "gh", "issue", "list", "--repo", "owner/infra",
+        "--label", "ready-for-agent", "--state", "open",
+        "--json", "number,labels,body", "--limit", "100",
+    )
+    runner = _FakeRunner({argv: '[{"number": 4, "labels": [{"name": "ready-for-agent"}], "body": "x"}]'})
+    client = GhCliClient(repo_owner="owner", run=runner)
+
+    issues = client.list_issues("infra", "ready-for-agent")
+
+    assert runner.calls == [argv]
+    assert issues == [{"number": 4, "labels": [{"name": "ready-for-agent"}], "body": "x"}]
+
+
+def test_gh_cli_list_issues_empty_output_is_empty_list():
+    runner = _FakeRunner()  # returns "" for everything
+    client = GhCliClient(repo_owner="owner", run=runner)
+    assert client.list_issues("infra", "ready-for-agent") == []
+
+
+def test_gh_cli_label_events_filters_labeled_events():
+    timeline = (
+        '[{"event": "commented"},'
+        ' {"event": "labeled", "label": {"name": "ready-for-agent"},'
+        '  "actor": {"login": "viktorbarzin"}, "author_association": "OWNER"}]'
+    )
+    argv = (
+        "gh", "api",
+        "repos/owner/infra/issues/4/timeline",
+        "--paginate",
+        "-H", "Accept: application/vnd.github+json",
+    )
+    runner = _FakeRunner({argv: timeline})
+    client = GhCliClient(repo_owner="owner", run=runner)
+
+    events = client.label_events("infra", 4)
+
+    assert runner.calls == [argv]
+    assert [e["event"] for e in events] == ["labeled"]
+    assert events[0]["label"]["name"] == "ready-for-agent"
+
+
+def test_gh_cli_add_label_builds_argv():
+    runner = _FakeRunner()
+    client = GhCliClient(repo_owner="owner", run=runner)
+    client.add_label("infra", 4, "agent-in-progress")
+    assert runner.calls == [
+        ("gh", "issue", "edit", "4", "--repo", "owner/infra", "--add-label", "agent-in-progress")
+    ]
+
+
+def test_gh_cli_remove_label_builds_argv():
+    runner = _FakeRunner()
+    client = GhCliClient(repo_owner="owner", run=runner)
+    client.remove_label("infra", 4, "agent-in-progress")
+    assert runner.calls == [
+        ("gh", "issue", "edit", "4", "--repo", "owner/infra", "--remove-label", "agent-in-progress")
+    ]
+
+
+def test_gh_cli_comment_builds_argv():
+    runner = _FakeRunner()
+    client = GhCliClient(repo_owner="owner", run=runner)
+    client.comment("infra", 4, "phase update")
+    assert runner.calls == [
+        ("gh", "issue", "comment", "4", "--repo", "owner/infra", "--body", "phase update")
+    ]
+
+
+def test_gh_cli_close_builds_argv():
+    runner = _FakeRunner()
+    client = GhCliClient(repo_owner="owner", run=runner)
+    client.close("infra", 4)
+    assert runner.calls == [
+        ("gh", "issue", "close", "4", "--repo", "owner/infra")
+    ]
+
+
+def test_gh_cli_end_to_end_through_tracker():
+    # Wire the gh-CLI client (fake runner) behind a real Tracker and confirm a
+    # full read produces a correctly-decoded, trusted, blocked Issue.
+    list_argv = (
+        "gh", "issue", "list", "--repo", "owner/infra",
+        "--label", "ready-for-agent", "--state", "open",
+        "--json", "number,labels,body", "--limit", "100",
+    )
+    timeline_argv = (
+        "gh", "api",
+        "repos/owner/infra/issues/12/timeline",
+        "--paginate",
+        "-H", "Accept: application/vnd.github+json",
+    )
+    runner = _FakeRunner({
+        list_argv: (
+            '[{"number": 12,'
+            '  "labels": [{"name": "ready-for-agent"}, {"name": "priority:3"}],'
+            '  "body": "Blocked by #11"}]'
+        ),
+        timeline_argv: (
+            '[{"event": "labeled", "label": {"name": "ready-for-agent"},'
+            '  "actor": {"login": "viktorbarzin"}, "author_association": "OWNER"}]'
+        ),
+    })
+    tracker = Tracker(GhCliClient(repo_owner="owner", run=runner))
+
+    issue = tracker.list_ready(["infra"])[0]
+
+    assert issue.number == 12
+    assert issue.repo == "infra"
+    assert issue.blocked_by == [11]
+    assert issue.priority == 3
+    assert issue.labeled_by_trusted is True
--- a/tests/test_afk_watcher.py
+++ b/tests/test_afk_watcher.py
@ -0,0 +1,403 @@
+"""Integration tests for ``app.afk.watcher`` — the in-flight run driver.
+
+These wire the REAL pure cores (the actual ``run_state_machine.next_action`` and
+``phase_checklist.render``) to the in-memory adapter FAKES from ``conftest``
+(``FakeT3Client`` / ``FakeTracker`` / ``FakeCIWatcher`` / ``FakeNotifier``). No
+test touches a real T3 server, GitHub/Forgejo, the cluster, or Slack — the
+watcher is exercised end to end with fakes only at the I/O edges.
+
+What one watch tick must do (the watcher contract), given an in-flight run
+``(issue, thread_id, commit, bookkeeping)``:
+
+  * assemble a ``RunState`` from ``t3_client.snapshot()`` (the thread's liveness)
+    + ``ci_watcher.status(repo, commit)`` (the CI verdict, only when something is
+    pushed) + the run's own ``pushed`` / ``fix_forward_attempts`` /
+    ``elapsed_seconds`` bookkeeping, and feed it to the pure state machine;
+  * **CLOSE_SUCCESS** → ``tracker.close``, drop the in-progress label, post the
+    DONE checklist, and ring the ``done`` doorbell;
+  * **ESCALATE_PREPUSH / FREEZE_ESCALATE** → drop the in-progress label, relabel
+    ``ready-for-human``, ring the ``needs-human`` / ``frozen`` doorbell, post the
+    checklist — the run is handed back to a human;
+  * **FIX_FORWARD** → dispatch a corrective turn (``t3_client.dispatch``), bump
+    the fix-forward attempt count, keep the run in flight, refresh the checklist;
+    NOT terminal, so no doorbell and no label churn;
+  * **WAIT** → just refresh the progress checklist and keep waiting; no labels,
+    no close, no doorbell, no dispatch.
+"""
+import pytest
+
+from app.afk import watcher
+from app.afk.notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
+from app.afk.types import CIStatus, Issue
+
+
+# --------------------------------------------------------------------------- #
+# Helpers.
+# --------------------------------------------------------------------------- #
+READY_FOR_HUMAN = "ready-for-human"
+
+
+def _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier) -> watcher.Watcher:
+    return watcher.Watcher(
+        t3_client=fake_t3,
+        tracker=fake_tracker,
+        ci_watcher=fake_ci,
+        notifier=fake_notifier,
+    )
+
+
+def _run(
+    issue: Issue,
+    thread_id: str = "thread-0",
+    commit: str | None = None,
+    fix_forward_attempts: int = 0,
+    elapsed_seconds: float = 0.0,
+) -> watcher.InFlightRun:
+    return watcher.InFlightRun(
+        issue=issue,
+        thread_id=thread_id,
+        commit=commit,
+        fix_forward_attempts=fix_forward_attempts,
+        elapsed_seconds=elapsed_seconds,
+    )
+
+
+# Map the tests' abstract liveness vocab to T3's REAL ``latestTurn.state`` strings
+# so call sites stay readable while the snapshot carries the true shape the
+# watcher parses (a finished turn is "completed", a failed one "errored",
+# "running" is itself real). Unknown values pass through verbatim.
+_REAL_STATE = {"idle": "completed", "error": "errored"}
+
+
+def _snapshot(thread_id: str, status: str) -> dict:
+    """A fleet snapshot with one thread whose latest turn is in ``status`` — real
+    shape ``threads[].latestTurn.state`` (not a top-level ``status`` field)."""
+    return {
+        "threads": [
+            {"id": thread_id, "latestTurn": {"state": _REAL_STATE.get(status, status)}}
+        ]
+    }
+
+
+def _labels(fake_tracker):
+    return [(op, repo, num, lbl) for (op, repo, num, lbl) in fake_tracker.label_ops]
+
+
+def _kinds(fake_notifier):
+    return [n["kind"] for n in fake_notifier.sent]
+
+
+# --------------------------------------------------------------------------- #
+# WAIT — agent still working, nothing pushed: refresh the checklist, no action.
+# --------------------------------------------------------------------------- #
+def test_wait_refreshes_checklist_and_does_nothing_else(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "running"))
+
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue), make_config()
+    )
+
+    assert result.action.value == "wait"
+    assert result.terminal is False
+    assert fake_tracker.closed == []
+    assert _labels(fake_tracker) == []          # no label churn while waiting
+    assert fake_notifier.sent == []             # no doorbell
+    assert fake_t3.dispatched == []             # no corrective turn
+    # The progress checklist was posted as a comment.
+    assert len(fake_tracker.comments) == 1
+    repo, num, body = fake_tracker.comments[0]
+    assert (repo, num) == ("infra", 7)
+    assert "AFK run progress" in body
+
+
+def test_wait_when_thread_missing_from_snapshot(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    # No snapshot entry for this thread yet -> thread_status None -> WAIT.
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot({"threads": []})
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue), make_config()
+    )
+    assert result.action.value == "wait"
+    assert result.terminal is False
+
+
+def test_pushed_ci_pending_waits(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "running"))
+    # commit present (pushed) but CI not yet decided -> PENDING -> WAIT.
+    fake_ci.set_status("infra", "deadbeef", CIStatus.PENDING)
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="deadbeef"), make_config()
+    )
+    assert result.action.value == "wait"
+    assert fake_tracker.closed == []
+
+
+# --------------------------------------------------------------------------- #
+# CLOSE_SUCCESS — pushed + CI green: close, unlabel, DONE checklist, doorbell.
+# --------------------------------------------------------------------------- #
+def test_close_success_closes_and_unlabels_and_notifies(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
+
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="cafef00d"), make_config()
+    )
+
+    assert result.action.value == "close_success"
+    assert result.terminal is True
+    assert fake_tracker.closed == [("infra", 7)]
+    # in-progress label removed (no ready-for-human on the happy path).
+    assert ("remove", "infra", 7, "agent-in-progress") in _labels(fake_tracker)
+    assert ("add", "infra", 7, READY_FOR_HUMAN) not in _labels(fake_tracker)
+    # done doorbell fired with the thread deep-link target.
+    assert _kinds(fake_notifier) == [KIND_DONE]
+    assert fake_notifier.sent[0]["thread_id"] == "thread-0"
+    assert fake_notifier.sent[0]["issue"] is issue
+
+
+def test_close_success_posts_done_checklist(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
+
+    _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="cafef00d"), make_config()
+    )
+
+    # The final checklist shows the run DONE — every phase checked.
+    body = fake_tracker.comments[-1][2]
+    assert "Done — issue closed" in body
+    assert "- [ ]" not in body  # nothing left unchecked at DONE
+
+
+# --------------------------------------------------------------------------- #
+# ESCALATE_PREPUSH — agent stalled/errored before any push: hand to a human.
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize("thread_state", ["errored", "completed"])
+def test_escalate_prepush_relabels_and_notifies(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, thread_state
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", thread_state))
+
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit=None), make_config()
+    )
+
+    assert result.action.value == "escalate_prepush"
+    assert result.terminal is True
+    assert fake_tracker.closed == []  # NOT closed — needs a human
+    labels = _labels(fake_tracker)
+    assert ("remove", "infra", 7, "agent-in-progress") in labels
+    assert ("add", "infra", 7, READY_FOR_HUMAN) in labels
+    assert _kinds(fake_notifier) == [KIND_NEEDS_HUMAN]
+
+
+# --------------------------------------------------------------------------- #
+# FREEZE_ESCALATE — pushed, CI red, fix-forward budget exhausted: freeze + page.
+# --------------------------------------------------------------------------- #
+def test_freeze_escalate_relabels_and_notifies(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "badc0de", CIStatus.RED)
+    config = make_config(fix_forward_max_attempts=3)
+
+    # attempts already at the cap -> budget exhausted -> FREEZE_ESCALATE.
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="badc0de", fix_forward_attempts=3), config
+    )
+
+    assert result.action.value == "freeze_escalate"
+    assert result.terminal is True
+    assert fake_tracker.closed == []
+    labels = _labels(fake_tracker)
+    assert ("remove", "infra", 7, "agent-in-progress") in labels
+    assert ("add", "infra", 7, READY_FOR_HUMAN) in labels
+    assert _kinds(fake_notifier) == [KIND_FROZEN]
+
+
+# --------------------------------------------------------------------------- #
+# FIX_FORWARD — pushed, CI red, budget remaining: corrective turn, stay in flight.
+# --------------------------------------------------------------------------- #
+def test_fix_forward_dispatches_corrective_turn(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "badc0de", CIStatus.RED)
+    config = make_config(fix_forward_max_attempts=5)
+
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="badc0de", fix_forward_attempts=1), config
+    )
+
+    assert result.action.value == "fix_forward"
+    assert result.terminal is False
+    # A corrective turn was dispatched against the same repo/issue.
+    assert len(fake_t3.dispatched) == 1
+    assert (fake_t3.dispatched[0]["repo"], fake_t3.dispatched[0]["issue"]) == ("infra", 7)
+    # Attempt count advanced and is surfaced on the result for the caller's
+    # bookkeeping on the next tick.
+    assert result.fix_forward_attempts == 2
+    # Not terminal: no close, no ready-for-human, no doorbell.
+    assert fake_tracker.closed == []
+    assert ("add", "infra", 7, READY_FOR_HUMAN) not in _labels(fake_tracker)
+    assert fake_notifier.sent == []
+
+
+def test_fix_forward_updates_thread_id_to_corrective_turn(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    # The corrective dispatch spawns a new thread; the result carries the new id
+    # so the next tick polls the right thread.
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "badc0de", CIStatus.RED)
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, thread_id="thread-old", commit="badc0de"), make_config()
+    )
+    assert result.thread_id == "thread-0"  # FakeT3Client hands back thread-0
+    assert result.thread_id != "thread-old"
+
+
+def test_fix_forward_note_appears_in_checklist(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "badc0de", CIStatus.RED)
+    _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="badc0de", fix_forward_attempts=1), make_config()
+    )
+    body = fake_tracker.comments[-1][2]
+    assert "Fix-forward" in body
+
+
+# --------------------------------------------------------------------------- #
+# Unknown / unrecognised thread status folds to "keep waiting" (fail-safe).
+# --------------------------------------------------------------------------- #
+def test_unknown_thread_status_waits(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "provisioning"))  # not a known status
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit=None), make_config()
+    )
+    # Unknown status must not escalate or close — treat as "no status yet".
+    assert result.action.value == "wait"
+    assert fake_tracker.closed == []
+    assert fake_notifier.sent == []
+
+
+# --------------------------------------------------------------------------- #
+# Real T3 ``latestTurn.state`` strings map to the right liveness (contract guard
+# against the snapshot-shape drift that the previous adapter/fake masked).
+# --------------------------------------------------------------------------- #
+@pytest.mark.parametrize("state", ["running", "in_progress", "pending", "queued", "pendingInit"])
+def test_real_in_progress_states_keep_waiting(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, state
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": state}}]})
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit=None), make_config()
+    )
+    assert result.action.value == "wait"  # still working -> keep polling
+
+
+def test_real_errored_state_escalates_when_nothing_pushed(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    # The real failure state is "errored" (not "error"); with nothing pushed it
+    # is a pre-push escalation, not a freeze.
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": "errored"}}]})
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit=None), make_config()
+    )
+    assert result.action.value == "escalate_prepush"
+
+
+def test_thread_present_but_no_turn_yet_waits(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    # A freshly-created thread has no latestTurn -> no usable status yet -> WAIT.
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot({"threads": [{"id": "thread-0"}]})
+    result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit=None), make_config()
+    )
+    assert result.action.value == "wait"
+
+
+# --------------------------------------------------------------------------- #
+# Terminal cleanup only happens once / cleanly: a terminal tick posts exactly
+# one checklist comment (no double-commenting on the way out).
+# --------------------------------------------------------------------------- #
+def test_terminal_tick_posts_exactly_one_checklist(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "idle"))
+    fake_ci.set_status("infra", "cafef00d", CIStatus.GREEN)
+    _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
+        _run(issue, commit="cafef00d"), make_config()
+    )
+    assert len(fake_tracker.comments) == 1
+
+
+# --------------------------------------------------------------------------- #
+# CI status is only queried when something is pushed (don't hit CI for an
+# unpushed run — there's no commit to check).
+# --------------------------------------------------------------------------- #
+def test_ci_not_queried_when_nothing_pushed(
+    fake_t3, fake_tracker, fake_notifier, make_issue, make_config
+):
+    class ExplodingCI:
+        def status(self, repo, commit):
+            raise AssertionError("CI must not be queried with no pushed commit")
+
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "running"))
+    result = watcher.Watcher(
+        t3_client=fake_t3,
+        tracker=fake_tracker,
+        ci_watcher=ExplodingCI(),
+        notifier=fake_notifier,
+    ).tick(_run(issue, commit=None), make_config())
+    assert result.action.value == "wait"
+
+
+# --------------------------------------------------------------------------- #
+# ready-for-human label is configurable.
+# --------------------------------------------------------------------------- #
+def test_ready_for_human_label_is_configurable(
+    fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
+):
+    issue = make_issue(number=7, repo="infra")
+    fake_t3.set_snapshot(_snapshot("thread-0", "error"))
+    w = watcher.Watcher(
+        t3_client=fake_t3,
+        tracker=fake_tracker,
+        ci_watcher=fake_ci,
+        notifier=fake_notifier,
+        ready_for_human_label="needs-eyes",
+    )
+    w.tick(_run(issue, commit=None), make_config())
+    assert ("add", "infra", 7, "needs-eyes") in _labels(fake_tracker)
--- a/tests/test_breakglass.py
+++ b/tests/test_breakglass.py
@ -1,174 +1,251 @@
-"""Tests for the breakglass app: verb whitelist, SSE translation, auth, routes."""
+"""Tests for the breakglass app: session manager (attach model), verb whitelist,
+SSE translation, auth, routes."""
 import os

 os.environ.setdefault("API_BEARER_TOKEN", "test-token")
+# Turns chdir into a per-session workspace; point it somewhere writable for tests
+# (prod uses the /workspace emptyDir). Must be set before the app imports config.
+os.environ.setdefault("BREAKGLASS_SESSIONS_DIR", "/tmp/bg-test-sessions")

 import pytest
 from fastapi.testclient import TestClient

-from app.breakglass import agent_session, pve
+from app.breakglass import agent_session, pve, session as sessionmod
 from app.breakglass.server import app


 # --------------------------------------------------------------------------- #
-# PVE verb whitelist — the security boundary mirrored client-side.
+# Fakes for the claude subprocess a turn spawns.
 # --------------------------------------------------------------------------- #
+class _FakeStdout:
+    def __init__(self, lines):
+        self._lines = [(l + "\n").encode() for l in lines]
+        self._i = 0

+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        if self._i >= len(self._lines):
+            raise StopAsyncIteration
+        line = self._lines[self._i]
+        self._i += 1
+        return line
+
+
+class _FakeStderr:
+    async def read(self):
+        return b""
+
+
+class _FakeProc:
+    def __init__(self, lines, rc=0):
+        self.stdout = _FakeStdout(lines)
+        self.stderr = _FakeStderr()
+        self.returncode = None
+        self._rc = rc
+
+    async def wait(self):
+        self.returncode = self._rc
+        return self._rc
+
+    def kill(self):
+        self.returncode = -9
+
+
+def _patch_proc(monkeypatch, lines, rc=0):
+    async def _fake_spawn(*argv, **kwargs):
+        return _FakeProc(lines, rc)
+    monkeypatch.setattr(sessionmod.asyncio, "create_subprocess_exec", _fake_spawn)
+
+
+_TURN_LINES = [
+    '{"type":"system","subtype":"init","session_id":"s"}',
+    '{"type":"system","subtype":"thinking_tokens","estimated_tokens":5}',
+    '{"type":"assistant","message":{"content":[{"type":"text","text":"checking disk"}]}}',
+    '{"type":"assistant","message":{"content":[{"type":"tool_use","name":"Bash","input":{"command":"df -h"}}]}}',
+    '{"type":"result","is_error":false,"result":"done","duration_ms":12}',
+]
+
+
+# --------------------------------------------------------------------------- #
+# Session: event log + broadcast + replay/Last-Event-ID.
+# --------------------------------------------------------------------------- #
+def test_add_event_assigns_sequential_ids():
+    s = sessionmod.Session("s1")
+    a = s.add_event({"kind": "user", "text": "hi"})
+    b = s.add_event({"kind": "text", "text": "yo"})
+    assert a["id"] == 0 and b["id"] == 1
+    assert [e["kind"] for e in s.events] == ["user", "text"]
+
+
+def test_subscribe_receives_broadcast():
+    s = sessionmod.Session("s1")
+    q = s.subscribe()
+    s.add_event({"kind": "text", "text": "live"})
+    assert q.get_nowait()["text"] == "live"
+    s.unsubscribe(q)
+    s.add_event({"kind": "text", "text": "after"})
+    assert q.empty()
+
+
+@pytest.mark.asyncio
+async def test_attach_replays_then_signals_caught_up():
+    s = sessionmod.Session("s1")
+    s.add_event({"kind": "user", "text": "diagnose"})
+    s.add_event({"kind": "text", "text": "looking"})
+    frames = []
+    async for frame in sessionmod.attach_stream(s, last_event_id=None):
+        frames.append(frame)
+        if "caught-up" in frame:
+            break
+    body = "".join(frames)
+    assert "diagnose" in body and "looking" in body
+    assert "id: 0" in body and "id: 1" in body
+    assert "event: caught-up" in frames[-1]
+
+
+@pytest.mark.asyncio
+async def test_attach_reconnect_replays_only_missed():
+    s = sessionmod.Session("s1")
+    for i in range(3):
+        s.add_event({"kind": "text", "text": f"e{i}"})  # ids 0,1,2
+    frames = []
+    async for frame in sessionmod.attach_stream(s, last_event_id=0):  # already saw id 0
+        frames.append(frame)
+        if "caught-up" in frame:
+            break
+    body = "".join(frames)
+    assert "e0" not in body  # not re-sent
+    assert "e1" in body and "e2" in body
+
+
+# --------------------------------------------------------------------------- #
+# Session: running a detached turn (mocked subprocess).
+# --------------------------------------------------------------------------- #
+@pytest.mark.asyncio
+async def test_turn_streams_events_into_log(monkeypatch):
+    _patch_proc(monkeypatch, _TURN_LINES)
+    s = sessionmod.Session("s1")
+    assert s.start_turn("diagnose the devvm") is True
+    await s._turn  # wait for the detached turn to finish
+    kinds = [e["kind"] for e in s.events]
+    assert kinds[0] == "user"
+    assert "session" in kinds and "text" in kinds and "tool" in kinds
+    assert "result" in kinds and kinds[-1] == "turn_end"
+    assert "thinking_tokens" not in kinds
+
+
+@pytest.mark.asyncio
+async def test_one_turn_at_a_time(monkeypatch):
+    _patch_proc(monkeypatch, _TURN_LINES)
+    s = sessionmod.Session("s1")
+    assert s.start_turn("first") is True
+    assert s.start_turn("second") is False  # task not done yet
+    await s._turn
+
+
+@pytest.mark.asyncio
+async def test_resume_after_first_turn(monkeypatch):
+    captured = {"argvs": []}
+
+    async def _fake_spawn(*argv, **kwargs):
+        captured["argvs"].append(argv)
+        return _FakeProc(_TURN_LINES)
+
+    monkeypatch.setattr(sessionmod.asyncio, "create_subprocess_exec", _fake_spawn)
+    s = sessionmod.Session("s1")
+    s.start_turn("first"); await s._turn
+    s.start_turn("second"); await s._turn
+    assert "--session-id" in captured["argvs"][0]
+    assert "--resume" in captured["argvs"][1]
+
+
+# --------------------------------------------------------------------------- #
+# SessionManager.
+# --------------------------------------------------------------------------- #
+def test_manager_create_get():
+    m = sessionmod.SessionManager()
+    s = m.create()
+    assert m.get(s.id) is s
+    assert m.get("nope") is None
+    assert m.get_or_create(s.id) is s
+    assert m.get_or_create(None).id != s.id
+
+
+# --------------------------------------------------------------------------- #
+# PVE verb whitelist (unchanged security boundary).
+# --------------------------------------------------------------------------- #
 def test_allowed_verbs_match_host_script():
-    assert pve.ALLOWED_VERBS == {
-        "status", "forensics", "reset", "stop", "start", "cycle"
-    }
+    assert pve.ALLOWED_VERBS == {"status", "forensics", "reset", "stop", "start", "cycle"}
    assert pve.MUTATING_VERBS == {"reset", "stop", "start", "cycle"}
-    assert pve.MUTATING_VERBS < pve.ALLOWED_VERBS


-@pytest.mark.parametrize("bad", [
-    "rm -rf /", "status; rm -rf /", "status 103", "shutdown", "", "STATUS",
-    "cycle 999", "$(reboot)", "../start",
-])
+@pytest.mark.parametrize("bad", ["rm -rf /", "status; reboot", "status 103", "", "STATUS"])
@pytest.mark.asyncio
 async def test_run_verb_rejects_non_whitelisted_without_ssh(bad, monkeypatch):
-    """A bad verb must be rejected locally — never spawning a subprocess."""
-    called = False
-
    async def _boom(*a, **k):
-        nonlocal called
-        called = True
        raise AssertionError("ssh must not run for a rejected verb")
-
    monkeypatch.setattr(pve.asyncio, "create_subprocess_exec", _boom)
    result = await pve.run_verb(bad)
    assert result["rejected"] is True
-    assert result["exit_code"] is None
-    assert called is False
-
-
-@pytest.mark.asyncio
-async def test_run_verb_allowed_invokes_ssh_with_bare_verb(monkeypatch):
-    captured = {}
-
-    class _FakeProc:
-        returncode = 0
-
-        async def communicate(self):
-            return (b"status: running\n", b"")
-
-    async def _fake_exec(*argv, **kwargs):
-        captured["argv"] = argv
-        return _FakeProc()
-
-    monkeypatch.setattr(pve.asyncio, "create_subprocess_exec", _fake_exec)
-    result = await pve.run_verb("status")
-    assert result["rejected"] is False
-    assert result["exit_code"] == 0
-    assert "running" in result["stdout"]
-    # The verb is the LAST argv element, passed as a single token (no shell).
-    assert captured["argv"][-1] == "status"
-    assert captured["argv"][0] == "ssh"


 # --------------------------------------------------------------------------- #
-# stream-json -> UI event translation (pure function).
+# translate_event (pure).
 # --------------------------------------------------------------------------- #
-
-def test_translate_init_to_session():
-    ev = agent_session.translate_event(
+def test_translate_init_and_noise_and_blocks():
+    assert agent_session.translate_event(
        {"type": "system", "subtype": "init", "session_id": "abc"}
+    ) == {"kind": "session", "session_id": "abc"}
+    assert agent_session.translate_event({"type": "system", "subtype": "hook_started"}) is None
+    assert agent_session.translate_event(
+        {"type": "assistant", "message": {"content": [{"type": "text", "text": "hi"}]}}
+    ) == {"kind": "text", "text": "hi"}
+    tool = agent_session.translate_event(
+        {"type": "assistant", "message": {"content": [{"type": "tool_use", "name": "Bash", "input": {"command": "df -h"}}]}}
    )
-    assert ev == {"kind": "session", "session_id": "abc"}
-
-
-@pytest.mark.parametrize("noise", [
-    {"type": "system", "subtype": "hook_started"},
-    {"type": "system", "subtype": "thinking_tokens", "estimated_tokens": 5},
-    {"type": "user", "message": {"content": []}},
-    {"type": "unknown"},
-])
-def test_translate_drops_noise(noise):
-    assert agent_session.translate_event(noise) is None
-
-
-def test_translate_assistant_text():
-    ev = agent_session.translate_event({
-        "type": "assistant",
-        "message": {"content": [{"type": "text", "text": "checking disk"}]},
-    })
-    assert ev == {"kind": "text", "text": "checking disk"}
-
-
-def test_translate_assistant_tool_use():
-    ev = agent_session.translate_event({
-        "type": "assistant",
-        "message": {"content": [
-            {"type": "tool_use", "name": "Bash", "input": {"command": "df -h"}}
-        ]},
-    })
-    assert ev["kind"] == "tool"
-    assert ev["name"] == "Bash"
-    assert ev["input"]["command"] == "df -h"
-
-
-def test_translate_result():
-    ev = agent_session.translate_event({
-        "type": "result", "is_error": False, "result": "done", "duration_ms": 1234,
-    })
-    assert ev == {"kind": "result", "is_error": False, "result": "done", "duration_ms": 1234}
+    assert tool["kind"] == "tool" and tool["input"]["command"] == "df -h"


 # --------------------------------------------------------------------------- #
 # Routes + auth.
 # --------------------------------------------------------------------------- #
-
 client = TestClient(app)
 AUTH = {"Authorization": "Bearer test-token"}


 def test_health_no_auth():
-    r = client.get("/health")
-    assert r.status_code == 200
-    assert r.json()["service"] == "claude-breakglass"
+    assert client.get("/health").json()["service"] == "claude-breakglass"


 def test_api_requires_auth():
    assert client.post("/api/session").status_code == 401
    assert client.get("/api/pve/verbs").status_code == 401
+    assert client.post("/api/session/x/prompt", json={"prompt": "hi"}).status_code == 401


-def test_api_accepts_bearer():
+def test_session_create_and_unknown_session_404():
    r = client.post("/api/session", headers=AUTH)
-    assert r.status_code == 200
-    assert "session_id" in r.json()
+    assert r.status_code == 200 and "session_id" in r.json()
+    assert client.post("/api/session/nope/prompt", headers=AUTH, json={"prompt": "x"}).status_code == 404
+    assert client.post("/api/session/nope/cancel", headers=AUTH).status_code == 404


-def test_api_accepts_authentik_header():
-    r = client.post("/api/session", headers={"X-authentik-username": "me@viktorbarzin.me"})
-    assert r.status_code == 200
+def test_prompt_starts_turn(monkeypatch):
+    monkeypatch.setattr(sessionmod.Session, "start_turn", lambda self, *a, **k: True)
+    sid = client.post("/api/session", headers=AUTH).json()["session_id"]
+    r = client.post(f"/api/session/{sid}/prompt", headers=AUTH, json={"prompt": "diagnose"})
+    assert r.status_code == 200 and r.json()["status"] == "started"


-def test_pve_verb_route_rejects_unknown():
-    r = client.post("/api/pve/destroy", headers=AUTH)
-    assert r.status_code == 400
+def test_prompt_409_when_turn_active(monkeypatch):
+    monkeypatch.setattr(sessionmod.Session, "start_turn", lambda self, *a, **k: False)
+    sid = client.post("/api/session", headers=AUTH).json()["session_id"]
+    r = client.post(f"/api/session/{sid}/prompt", headers=AUTH, json={"prompt": "x"})
+    assert r.status_code == 409


-def test_pve_verbs_listing():
-    r = client.get("/api/pve/verbs", headers=AUTH)
-    assert r.status_code == 200
-    body = r.json()
-    assert set(body["verbs"]) == pve.ALLOWED_VERBS
-    assert set(body["mutating"]) == pve.MUTATING_VERBS
-
-
-def test_chat_streams_sse(monkeypatch):
-    async def _fake_turn(session_id, prompt, model=None):
-        yield {"kind": "session", "session_id": session_id}
-        yield {"kind": "text", "text": "hello"}
-        yield {"kind": "result", "is_error": False, "result": "ok"}
-
-    monkeypatch.setattr(agent_session, "run_turn", _fake_turn)
-    r = client.post("/api/chat", headers=AUTH,
-                    json={"session_id": "s1", "prompt": "diagnose"})
-    assert r.status_code == 200
-    assert "text/event-stream" in r.headers["content-type"]
-    body = r.text
-    assert "hello" in body
-    assert '"kind": "done"' in body  # terminal frame always emitted
+def test_pve_verbs_listing_and_unknown_rejected():
+    assert set(client.get("/api/pve/verbs", headers=AUTH).json()["verbs"]) == pve.ALLOWED_VERBS
+    assert client.post("/api/pve/destroy", headers=AUTH).status_code == 400
--- a/tests/test_conversational.py
+++ b/tests/test_conversational.py
@ -0,0 +1,256 @@
+"""Tests for the conversational (no-tools, multi-turn) brain endpoint.
+
+This is the portal-assistant "Brain": a lean path that drives the Claude CLI with
+a no-tools conversational agent and per-conversation `--resume`, used by the voice
+gateway. Unlike /v1/chat/completions it does NOT clone a workspace or run a
+tool-enabled agent (see portal-assistant ADR-0002).
+"""
+import json
+from unittest.mock import AsyncMock, patch
+
+import pytest
+from httpx import ASGITransport, AsyncClient
+
+from app import conversational
+from app.main import app
+
+
+# --------------------------------------------------------------------------- #
+# argv builder
+# --------------------------------------------------------------------------- #
+def test_conversational_argv_new_session():
+    argv = conversational_argv_call(resume=False)
+    assert argv[0] == "claude"
+    assert "-p" in argv
+    assert argv[argv.index("--agent") + 1] == "conversational"
+    # a new conversation opens with --session-id, never --resume
+    assert argv[argv.index("--session-id") + 1] == "sess-1"
+    assert "--resume" not in argv
+    # SECURITY: a public-facing endpoint must NOT skip tool permissions
+    assert "--dangerously-skip-permissions" not in argv
+    assert argv[argv.index("--model") + 1] == "sonnet"
+    assert argv[argv.index("--output-format") + 1] == "json"
+    # latency: trims project CLAUDE.md/MCP + dynamic system-prompt sections off
+    # the no-tools voice turn (~45k -> ~23k input tokens, ~1.3s faster TTFT)
+    assert argv[argv.index("--setting-sources") + 1] == "user"
+    assert "--exclude-dynamic-system-prompt-sections" in argv
+    assert argv[-1] == "Hi there"
+
+
+def test_conversational_argv_resume_continues_session():
+    argv = conversational_argv_call(resume=True)
+    # a follow-up turn resumes the existing claude session
+    assert argv[argv.index("--resume") + 1] == "sess-1"
+    assert "--session-id" not in argv
+
+
+def conversational_argv_call(resume: bool):
+    from app.conversational import conversational_argv
+    return conversational_argv(
+        session_id="sess-1", message="Hi there", model="sonnet", resume=resume
+    )
+
+
+# --------------------------------------------------------------------------- #
+# endpoint
+# --------------------------------------------------------------------------- #
+class _AsyncLineIter:
+    """Async iterator over a list of byte lines — mimics `proc.stdout`."""
+
+    def __init__(self, lines: list[bytes]):
+        self._lines = list(lines)
+        self._i = 0
+
+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        if self._i >= len(self._lines):
+            raise StopAsyncIteration
+        line = self._lines[self._i]
+        self._i += 1
+        return line
+
+
+def _mock_subprocess_returning(output: bytes, returncode: int = 0):
+    proc = AsyncMock()
+    lines = [chunk + b"\n" for chunk in output.split(b"\n") if chunk]
+    proc.stdout = _AsyncLineIter(lines)
+    proc.stderr = AsyncMock()
+    proc.stderr.read = AsyncMock(return_value=b"")
+    proc.wait = AsyncMock(return_value=returncode)
+    proc.returncode = returncode
+    return proc
+
+
+@pytest.fixture(autouse=True)
+def _reset_sessions():
+    conversational.reset_started()
+    yield
+    conversational.reset_started()
+
+
+@pytest.fixture
+def auth_header():
+    return {"Authorization": "Bearer test-token"}
+
+
+@pytest.mark.asyncio
+async def test_conversational_happy_path(auth_header):
+    """A message in → the assistant's reply out, keyed to the session."""
+    cli_output = json.dumps({
+        "type": "result",
+        "is_error": False,
+        "result": "Здравейте! Как мога да помогна?",
+        "session_id": "sess-1",
+    }).encode()
+    mock_proc = _mock_subprocess_returning(cli_output, returncode=0)
+
+    with patch("app.conversational.asyncio.create_subprocess_exec", return_value=mock_proc):
+        transport = ASGITransport(app=app)
+        async with AsyncClient(transport=transport, base_url="http://test") as client:
+            response = await client.post(
+                "/v1/conversational",
+                json={"session_id": "sess-1", "message": "Здравей"},
+                headers=auth_header,
+            )
+
+    assert response.status_code == 200, response.text
+    body = response.json()
+    assert body["session_id"] == "sess-1"
+    assert body["reply"] == "Здравейте! Как мога да помогна?"
+
+
+@pytest.mark.asyncio
+async def test_conversational_resumes_on_second_turn(auth_header):
+    """First turn opens the session (--session-id); a second turn on the same
+    session id resumes it (--resume) — this is what makes it a conversation."""
+    calls: list[tuple] = []
+
+    def fake_spawn(*args, **kwargs):
+        calls.append(args)
+        out = json.dumps({"type": "result", "is_error": False, "result": "ok"}).encode()
+        return _mock_subprocess_returning(out, returncode=0)
+
+    with patch("app.conversational.asyncio.create_subprocess_exec", side_effect=fake_spawn):
+        transport = ASGITransport(app=app)
+        async with AsyncClient(transport=transport, base_url="http://test") as client:
+            for _ in range(2):
+                r = await client.post(
+                    "/v1/conversational",
+                    json={"session_id": "sess-X", "message": "hi"},
+                    headers=auth_header,
+                )
+                assert r.status_code == 200, r.text
+
+    assert "--session-id" in calls[0] and "--resume" not in calls[0]
+    assert "--resume" in calls[1] and "--session-id" not in calls[1]
+
+
+@pytest.mark.asyncio
+async def test_conversational_requires_auth():
+    """No bearer token → 401, same as the other endpoints."""
+    transport = ASGITransport(app=app)
+    async with AsyncClient(transport=transport, base_url="http://test") as client:
+        r = await client.post(
+            "/v1/conversational",
+            json={"session_id": "s", "message": "hi"},
+        )
+    assert r.status_code == 401
+
+
+@pytest.mark.asyncio
+async def test_conversational_returns_503_on_failure(auth_header):
+    """A non-zero claude exit surfaces as 503 execution-failed."""
+    mock_proc = _mock_subprocess_returning(b"", returncode=7)
+    mock_proc.stderr.read = AsyncMock(return_value=b"boom")
+
+    with patch("app.conversational.asyncio.create_subprocess_exec", return_value=mock_proc):
+        transport = ASGITransport(app=app)
+        async with AsyncClient(transport=transport, base_url="http://test") as client:
+            r = await client.post(
+                "/v1/conversational",
+                json={"session_id": "s", "message": "x"},
+                headers=auth_header,
+            )
+    assert r.status_code == 503
+    assert r.json()["error"] == "execution failed"
+
+
+# --------------------------------------------------------------------------- #
+# streaming helpers (OpenAI-compatible token relay for the realtime voice agent)
+# --------------------------------------------------------------------------- #
+from collections import namedtuple  # noqa: E402
+
+_Msg = namedtuple("_Msg", "role content")
+
+
+def test_stream_argv_uses_stream_json_and_is_stateless():
+    argv = conversational.stream_argv("hello", "sonnet")
+    assert argv[:2] == ["claude", "-p"]
+    assert "--agent" in argv and "conversational" in argv
+    assert "stream-json" in argv
+    assert "--include-partial-messages" in argv
+    assert "--verbose" in argv
+    assert "--model" in argv and "sonnet" in argv
+    # latency: same lean-context trim as the gateway path
+    assert argv[argv.index("--setting-sources") + 1] == "user"
+    assert "--exclude-dynamic-system-prompt-sections" in argv
+    assert argv[-1] == "hello"
+    # stateless + no tools
+    assert "--resume" not in argv and "--session-id" not in argv
+    assert "--dangerously-skip-permissions" not in argv
+
+
+def test_delta_text_extracts_content_block_delta():
+    line = json.dumps({
+        "type": "stream_event",
+        "event": {"type": "content_block_delta",
+                  "delta": {"type": "text_delta", "text": "Слон"}},
+    })
+    assert conversational.delta_text(line) == "Слон"
+
+
+def test_delta_text_ignores_non_text_events():
+    for ev in [
+        {"type": "system"},
+        {"type": "stream_event", "event": {"type": "message_start"}},
+        {"type": "stream_event", "event": {"type": "content_block_delta",
+            "delta": {"type": "input_json_delta", "partial_json": "{"}}},
+        {"type": "result"},
+    ]:
+        assert conversational.delta_text(json.dumps(ev)) is None
+    assert conversational.delta_text("") is None
+    assert conversational.delta_text("not json") is None
+
+
+def test_openai_chunk_valid_sse_and_keeps_cyrillic():
+    s = conversational.openai_chunk("chatcmpl-x", "sonnet", 123, content="две")
+    assert s.startswith("data: ") and s.endswith("\n\n")
+    payload = json.loads(s[len("data: "):].strip())
+    assert payload["object"] == "chat.completion.chunk"
+    assert payload["choices"][0]["delta"]["content"] == "две"
+    assert payload["choices"][0]["finish_reason"] is None
+    assert "две" in s  # not unicode-escaped
+
+
+def test_openai_chunk_role_and_finish():
+    role = conversational.openai_chunk("id", "m", 1, role="assistant")
+    assert json.loads(role[6:].strip())["choices"][0]["delta"] == {"role": "assistant"}
+    stop = conversational.openai_chunk("id", "m", 1, finish_reason="stop")
+    c = json.loads(stop[6:].strip())["choices"][0]
+    assert c["finish_reason"] == "stop" and c["delta"] == {}
+
+
+def test_synthesise_chat_prompt_keeps_assistant_turns():
+    msgs = [
+        _Msg("system", "Be brief."),
+        _Msg("user", "Здравей"),
+        _Msg("assistant", "Здравей! Как си?"),
+        _Msg("user", "Добре, ти?"),
+    ]
+    p = conversational.synthesise_chat_prompt(msgs)
+    assert "Be brief." in p
+    assert "User: Здравей" in p
+    assert "Assistant: Здравей! Как си?" in p
+    assert p.strip().endswith("User: Добре, ти?")
--- a/tests/test_openai_compat.py
+++ b/tests/test_openai_compat.py
@ -98,14 +98,15 @@ async def test_chat_completions_happy_path(auth_header):


@pytest.mark.asyncio
-async def test_chat_completions_rejects_streaming(auth_header):
-    """stream=true is not supported and must 400 with a clear message."""
+async def test_chat_completions_streaming_rejects_unsupported_model(auth_header):
+    """Streaming is supported now; model validation still runs first, so an
+    unsupported model 400s before any CLI is spawned."""
    transport = ASGITransport(app=app)
    async with AsyncClient(transport=transport, base_url="http://test") as client:
        response = await client.post(
            "/v1/chat/completions",
            json={
-                "model": "haiku",
+                "model": "gpt-4",
                "messages": [{"role": "user", "content": "hi"}],
                "stream": True,
            },
@ -113,7 +114,7 @@ async def test_chat_completions_rejects_streaming(auth_header):
        )
    assert response.status_code == 400
    body = response.json()
-    assert "streaming not supported" in json.dumps(body).lower()
+    assert "unsupported model" in json.dumps(body).lower()


@pytest.mark.asyncio
@ -370,3 +371,58 @@ async def test_chat_completions_response_model_echoes_default_when_missing(auth_
    )
    assert status == 200
    assert body["model"] == "sonnet"
+
+
+def _delta_line(text: str) -> str:
+    return json.dumps({
+        "type": "stream_event",
+        "event": {"type": "content_block_delta",
+                  "delta": {"type": "text_delta", "text": text}},
+    })
+
+
+@pytest.mark.asyncio
+async def test_chat_completions_streaming_relays_token_sse(auth_header):
+    """stream=true relays CLI stream-json token deltas as OpenAI SSE chunks."""
+    cli_output = "\n".join([
+        json.dumps({"type": "system"}),
+        json.dumps({"type": "stream_event", "event": {"type": "message_start"}}),
+        _delta_line("Две"),
+        _delta_line(" точки."),
+        json.dumps({"type": "result", "subtype": "success"}),
+    ]).encode()
+    mock_proc = _mock_subprocess_returning(cli_output, returncode=0)
+
+    with patch("app.main.asyncio.create_subprocess_exec", return_value=mock_proc):
+        transport = ASGITransport(app=app)
+        async with AsyncClient(transport=transport, base_url="http://test") as client:
+            response = await client.post(
+                "/v1/chat/completions",
+                json={
+                    "model": "sonnet",
+                    "stream": True,
+                    "messages": [{"role": "user", "content": "Колко е?"}],
+                },
+                headers=auth_header,
+            )
+
+    assert response.status_code == 200, response.text
+    assert response.headers["content-type"].startswith("text/event-stream")
+    body = response.text
+    assert "chat.completion.chunk" in body
+    assert body.rstrip().endswith("data: [DONE]")
+
+    # Reassemble the streamed assistant content from the delta chunks.
+    content = ""
+    saw_role = False
+    for line in body.splitlines():
+        if not line.startswith("data: ") or line.strip() == "data: [DONE]":
+            continue
+        payload = json.loads(line[len("data: "):])
+        assert payload["object"] == "chat.completion.chunk"
+        delta = payload["choices"][0]["delta"]
+        if delta.get("role") == "assistant":
+            saw_role = True
+        content += delta.get("content", "")
+    assert saw_role
+    assert content == "Две точки."
Author	SHA1	Message	Date
Viktor Barzin	eccf0dd407	conversational: trim per-turn context to cut brain TTFT ~1.3s Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details The no-tools conversational agent was dragging the full project context (this repo's CLAUDE.md, the MCP server configs, local settings) plus the dynamic system-prompt sections into every voice turn — ~45k input tokens -> ~3.4s time-to-first-token (measured against the live pod, 2026-06-21). Add --setting-sources user + --exclude-dynamic-system-prompt-sections to both the gateway (json) and realtime (stream-json) conversational argvs: context drops to ~23k and TTFT to ~2.1s (~1.3s/turn faster) with no change to the reply. Helps the portal-assistant v1 gateway AND the v2 realtime agent (both run the same turn). The /execute agent path is untouched. Investigation ruled out the assumed culprits: CLI startup is only ~0.5s, and a warm prompt cache does NOT lower TTFT (turn 2 read all 45k from cache yet TTFT was unchanged) — the cost was the context size, not the spawn. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-21 18:00:21 +00:00
Viktor Barzin	a29bffdda3	chat-completions: stream conversational turns (SSE token relay) for realtime voice Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details Adds stream=true support to POST /v1/chat/completions (it previously 400'd). When streaming, it runs the no-tools `conversational` agent via `claude -p --output-format stream-json --include-partial-messages --verbose` and relays each content_block_delta as an OpenAI chat.completion.chunk SSE event, ending with finish_reason=stop + [DONE]. Free CLI/subscription auth, no tools, no API key. Stateless by design: the full message history is flattened into the prompt (prior assistant turns kept), so an OpenAI-style client that re-sends history each turn — e.g. Pipecat's OpenAILLMService — can stream from us directly. The non-streaming path (recruiter-triage workspace agent) is unchanged. This is phase 1 of the Pipecat realtime full-duplex voice-agent rebuild for portal-assistant (continuous audio, VAD endpointing, barge-in, ~seconds to first words). New pure helpers (stream_argv/delta_text/openai_chunk/ synthesise_chat_prompt) are unit-tested; the SSE endpoint has a mocked-subprocess integration test. 429 passing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 22:22:38 +00:00
Viktor Barzin	4e48214c0b	Merge portal-assistant-brain: no-tools conversational endpoint Some checks are pending Build and Push / lint-and-test (push) Waiting to run Details Build and Push / build (push) Blocked by required conditions Details Build and Push / deploy (push) Blocked by required conditions Details Build and Push / notify-failure (push) Blocked by required conditions Details Adds POST /v1/conversational + a no-tools `conversational` agent for the portal-assistant voice gateway: a lean Claude path (persistent --resume, no workspace clone, no --dangerously-skip-permissions) on the subscription token. See portal-assistant ADR-0002. 6 new tests; full suite green (422 passed). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 19:51:34 +00:00
Viktor Barzin	33ff0868c3	conversational: add no-tools multi-turn Brain endpoint for portal-assistant The portal-assistant voice gateway needs a Claude that is conversational, free (on the cluster subscription, no metered API), and safe to sit behind a public edge. Add POST /v1/conversational: it drives a new no-tools `conversational` agent with per-conversation --resume so a voice turn keeps context, and is lean on purpose — no workspace clone, no tools, and crucially NO --dangerously-skip-permissions (so even a leaked agent can't execute anything). This is deliberately NOT /v1/chat/completions, which clones the git-crypt infra repo and runs a Bash-enabled agent per turn (portal-assistant ADR-0002). The conversational agent replies in the speaker's language (Bulgarian/English), short and TTS-friendly. Tests cover the argv builder (new vs resume), the happy path, multi-turn resume across calls, auth, and failure → 503. Full suite green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-17 18:38:44 +00:00
Viktor Barzin	e34640cc47	afk: wire the T3 adapter to the REAL orchestration contract + fix priority Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details The T3 dispatch adapter was written against a guessed wire shape that the test fake accepted but the live t3-afk server 400s — so the previously-green suite did NOT mean the loop was actually wired to T3. Reverse-engineered the real contract from the v0.0.27 binary, verified it live against t3-afk (including multi-turn), and rewrote the adapter to match: - dispatch sends BARE commands keyed by `type` (not a `command` string), with client-minted threadId/commandId/messageId + createdAt; the server replies {sequence}, so dispatch returns the id it generated (never one parsed back). - a thread lives in a project (workspaceRoot = the repo checkout the agent runs in), so dispatch ensures the repo's project (snapshot -> project.create iff absent) before thread.create + thread.turn.start. - add send_turn() for follow-up turns on an existing thread — multi-turn context retention is verified live (turn 2 recalled turn 1). - watcher reads thread liveness from latestTurn.state (completed->idle, running/in_progress/pending->running, errored->error), not a non-existent top-level `status` field. Guard against recurrence: the test fake now REJECTS any command lacking a `type` discriminator (the original bug fails loudly), plus an opt-in live smoke test (tests/test_afk_t3_live.py) so "green" can mean "wired to T3". Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching tracker conventions and Issue.priority's own docstring — it had deliberately diverged to higher-first. Loop still ships DISABLED (kill switch on, empty allowlist). 416 tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 22:27:00 +00:00
Viktor Barzin	2ef0db9a96	afk: add the autonomous issue-implementer loop (SHIPS DISABLED) Adds app/afk/ — the "away-from-keyboard" control plane that watches the issue tracker for ready-for-agent issues, dispatches each to a fresh full-access T3 thread (with the issue-implementer preamble prepended, because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed, escalating or fix-forwarding via a small pure state machine. The loop is split into pure cores (no I/O, exhaustively unit-tested) and thin injected adapters (the only edges that ever touch T3, the tracker, CI, or Slack — faked in every test, so nothing here talks to a real server, GitHub/Forgejo, or the cluster): pure: types, dispatch_policy, run_state_machine, phase_checklist, config, issue_implementer_prompt adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher, notifier loops: poller — CronJob tick #1: list_ready -> select_dispatchable -> dispatch + stamp the in-progress lock (label only AFTER a successful dispatch, so a failed dispatch never leaves a phantom lock). Per-repo lock derived from the ready set, since the CronJob is stateless between ticks. watcher — CronJob tick #2: assemble RunState from snapshot + CI -> next_action -> act (close on success; relabel ready-for-human + ring the doorbell on the two escalations; dispatch a corrective turn on fix-forward; refresh the progress checklist). SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an empty allowlist, so a freshly-loaded config dispatches nothing and does zero I/O. The package is not imported by the running service and has no auto-enable path. Arming it is a deliberate, later, manual step requiring BOTH gates (clear the kill switch AND enrol the exact repos) so one fat-fingered env var can't arm every repo. Test-first throughout: 412 tests pass (poller + watcher add integration tests wiring the real pure cores to in-memory fakes). mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-15 21:15:11 +00:00
Viktor Barzin	171857da6b	Merge remote-tracking branch 'origin/master' into wizard/bg-v2 Some checks failed Build and Push / lint-and-test (push) Has been cancelled Details Build and Push / build (push) Has been cancelled Details Build and Push / deploy (push) Has been cancelled Details Build and Push / notify-failure (push) Has been cancelled Details	2026-06-14 19:19:14 +00:00
Viktor Barzin	5b5daa4bea	breakglass UI v2: attachable sessions (tmux model) + mobile-first redesign Full audit-driven rework. Keeps the proven SSE-translation + verb logic; everything else upgraded for phone-primary use. Backend — server owns the session, clients attach (Viktor's tmux idea): - session.py: SessionManager + Session with an event log, subscriber pub/sub, and turns that run DETACHED (keep going if the client disconnects). - GET /api/session/{id}/stream = attach (SSE): replays the transcript then tails live; per-event id: lines so an EventSource auto-reconnect resumes from Last-Event-ID (free re-attach). POST /{id}/prompt starts a detached turn; POST /{id}/cancel = Stop. Replaces the old one-shot /api/chat. - agent_session trimmed to the argv + translate_event helpers; 21 new/updated tests (replay, Last-Event-ID resume, broadcast, detached turn, resume, cancel, routes) — 53 green. Frontend — mobile-first via the frontend-design skill (emergency-console aesthetic): - EventSource attach (native auto-reconnect, zero client reconnect logic); transcript.js folds events->messages with id-dedupe so replays never double-render (30 unit assertions). - Installable PWA: manifest + icons (wrench/break-glass mark) + apple-mobile-web-app meta + theme-color; viewport-fit=cover + safe-area; 100dvh; 16px composer (no iOS zoom). - One-tap diagnosis presets (Triage / Memory-OOM / Disk / Services / QEMU-wedged) mapped to the devvm's real failure modes; Stop button while a turn runs. - Foldable VM-control sheet, cycle the dominant recovery action w/ confirm, output capped 46vh. - a11y: fixed --ink-faint contrast 3.6:1 -> 6.1:1 (WCAG AA); >=44px tap targets. Deleted the obsolete fetch-reader sse.js (EventSource replaces it). Verified: 53 backend tests + 30 transcript assertions; Playwright @390x844 (input on-screen y=721-821, presets/sheet/fold/cap); local integration smoke vs the real backend (attach->caught-up, 404, verbs, PWA served). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-14 19:19:03 +00:00
Viktor Barzin	be81005186	docs: capture AFK implementation pipeline design + ADRs 0002-0004 Some checks are pending Build and Push / lint-and-test (push) Waiting to run Details Build and Push / build (push) Blocked by required conditions Details Build and Push / deploy (push) Blocked by required conditions Details Build and Push / notify-failure (push) Blocked by required conditions Details Record the architecture for moving code implementation AFK, decided in a design/grilling session. The owner wants the human-in-the-loop boundary to stop at design + spec: once an issue is triaged ready-for-agent, an agent should implement it test-first, push it, and see it to a healthy deploy on its own, escalating only when it can't proceed. Decisions captured: - claude-agent-service is the control plane (poller + watcher + safety); a dedicated in-cluster T3 Code instance is the executor + cockpit, because T3 can only show sessions it launched itself -> we dispatch into it (ADR 0003). - AFK code pushes straight to master; on a broken deploy it fix-forwards then freezes the broken state for forensics rather than reverting (ADR 0002). - Implementation agents use persistent per-repo checkouts + git worktrees on SSD-NFS for warm caches, reversing the throwaway-clone rule for this path because concurrency is serial-within-repo (ADR 0004). Pilot-gated: five integration unknowns must be validated against a dedicated T3 instance before the poller is wired. No code yet. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 19:09:12 +00:00