afk: add the autonomous issue-implementer loop (SHIPS DISABLED)

Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.

The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):

  pure:     types, dispatch_policy, run_state_machine, phase_checklist,
            config, issue_implementer_prompt
  adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
            notifier
  loops:    poller  — CronJob tick #1: list_ready -> select_dispatchable
                      -> dispatch + stamp the in-progress lock (label only
                      AFTER a successful dispatch, so a failed dispatch
                      never leaves a phantom lock). Per-repo lock derived
                      from the ready set, since the CronJob is stateless
                      between ticks.
            watcher — CronJob tick #2: assemble RunState from snapshot +
                      CI -> next_action -> act (close on success; relabel
                      ready-for-human + ring the doorbell on the two
                      escalations; dispatch a corrective turn on
                      fix-forward; refresh the progress checklist).

SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.

Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-15 21:15:11 +00:00
parent 171857da6b
commit 2ef0db9a96
23 changed files with 4717 additions and 0 deletions

43
app/afk/__init__.py Normal file
View file

@ -0,0 +1,43 @@
"""AFK loop: the autonomous issue-implementer control plane.
This package is the "away-from-keyboard" automation that watches the issue
tracker for ``ready-for-agent`` issues, dispatches each to a fresh **T3** thread
(the full-access ``claudeAgent`` runtime) with the issue-implementer preamble
prepended, then drives the resulting run through its lifecycle tests-red
green pushed CI deployed escalating or fix-forwarding per a small,
testable state machine. It owns no agent behaviour itself; the agent's standing
rules are injected as a prompt preamble (``issue_implementer_prompt``) because
T3 does NOT honour ``~/.claude/CLAUDE.md``.
The whole loop ships **DISABLED**, by two independent gates: ``Config`` defaults
to ``kill_switch=True`` AND an empty ``allowlist`` (see ``config.py``). Importing
this package, scheduling the CronJob entrypoints, or constructing the default
``Config`` therefore dispatches NOTHING and performs zero I/O a disabled tick
is wholly inert. The package is also not imported by the running service
(``app.main``), so wiring it in changes nothing on its own.
>>> ENABLING IS A DELIBERATE MANUAL STEP, PERFORMED LATER, NEVER BY THIS CODE. <<<
Arming the loop takes BOTH of, on purpose (either alone stays inert, so one
fat-fingered env var can't arm every repo):
1. clear the kill switch (``AFK_KILL_SWITCH=false`` / ConfigMap ``kill_switch: "false"``), AND
2. enrol the exact repos (``AFK_ALLOWLIST=repo-a,repo-b`` / ConfigMap ``allowlist``).
There is no auto-enable path anywhere in this package; do not add one here.
Every test in the suite runs against fakes this package never talks to a real
T3 server, GitHub/Forgejo, the cluster, or Slack.
Module map (each is independently testable against the interfaces in
``types.py``):
* ``types`` shared dataclasses + enums (the contract).
* ``config`` disabled-by-default Config + env/configmap loaders.
* ``issue_implementer_prompt`` the preamble prepended to every dispatch.
* ``dispatch_policy`` which ready issues to dispatch right now (pure).
* ``run_state_machine`` snapshot + CI status next Action (pure).
* ``phase_checklist`` render the run's progress as a markdown checklist (pure).
* ``t3_client`` the two-POST T3 dispatch + snapshot reader.
* ``tracker`` issue-tracker reads/labels/comments/close.
* ``ci_watcher`` commit CI status.
* ``notifier`` escalation/notification sink.
* ``poller`` CronJob tick #1: select + dispatch ready issues.
* ``watcher`` CronJob tick #2: drive one in-flight run to a verdict.
"""

141
app/afk/ci_watcher.py Normal file
View file

@ -0,0 +1,141 @@
"""CI watcher — fold a pushed commit's pipeline into a single ``CIStatus``.
A commit the agent pushed to ``master`` is only "done" once it has both *built*
and *deployed*: the CI/CD chain is GHA ghcr Woodpecker Keel
(``docs/2026-06-14-afk-implementation-pipeline-design.md``). This adapter
collapses that multi-stage reality into the three-value verdict the state
machine speaks (:class:`~app.afk.types.CIStatus`): ``PENDING`` / ``GREEN`` /
``RED``.
It checks three stages in order and stops at the first that decides the verdict:
1. **build** the GitHub Actions run for the commit (build + test + lint);
2. **deploy** the Woodpecker pipeline that ships the built image;
3. **rollout** the image actually reaching the cluster (Keel/k8s rollout).
Folding rule, applied stage by stage: a ``FAILURE`` anywhere is ``RED`` (and we
short-circuit a red build is never "rolled out", and we don't bother the later
clients); a stage that hasn't concluded (``NONE`` = no run yet, ``PENDING`` =
in progress) makes the whole verdict ``PENDING`` (the state machine waits on
either); only when *every* stage has succeeded is the commit ``GREEN``.
The three stage clients are **injected**, each behind a tiny structural
:class:`typing.Protocol`, so this module never imports ``gh`` / ``woodpecker`` /
``kubectl`` and the tests drive it entirely with fakes. The rollout client is
**optional** the pilot keeps cluster/``state.sqlite`` reads optional, so a
watcher built without one treats a green deploy as the terminal ``GREEN``. The
real client wiring (subprocess argv, JSON parsing, kubectl-exec) lives in the
adapters that *implement* these Protocols, not here; keeping this module pure
keeps the folding logic the only thing under test.
"""
from enum import Enum
from typing import Protocol
from .types import CIStatus
class StageResult(Enum):
"""Outcome of one CI/CD stage for a commit, before folding into ``CIStatus``.
Each injected client returns one of these per ``(repo, commit)``:
``NONE`` no run exists yet for this commit (e.g. the webhook hasn't fired);
``PENDING`` a run exists and is still in progress;
``SUCCESS`` the stage concluded green;
``FAILURE`` the stage concluded red.
``NONE`` and ``PENDING`` are distinct on purpose so a client can report
"nothing here yet" vs "running" even though both fold to ``CIStatus.PENDING``;
keeping them separate lets callers/log lines tell the two apart.
"""
NONE = "none"
PENDING = "pending"
SUCCESS = "success"
FAILURE = "failure"
# --------------------------------------------------------------------------- #
# Injected client Protocols — structural, so any object with the right method
# (real adapter or test fake) satisfies them. No ``Any``: every method is typed
# (repo, commit) -> StageResult.
# --------------------------------------------------------------------------- #
class GitHubChecksClient(Protocol):
"""Reads the GitHub Actions run (build + test + lint) for a commit."""
def run_conclusion(self, repo: str, commit: str) -> StageResult: ...
class WoodpeckerClient(Protocol):
"""Reads the Woodpecker deploy pipeline triggered for a commit's image."""
def deploy_conclusion(self, repo: str, commit: str) -> StageResult: ...
class RolloutClient(Protocol):
"""Reads whether the commit's image has rolled out to the cluster."""
def rollout_status(self, repo: str, commit: str) -> StageResult: ...
class CIWatcher:
"""Folds build → deploy → rollout into a single :class:`CIStatus`.
Inject the three stage clients (``github`` and ``woodpecker`` are required;
``rollout`` is optional omit it to stop the verdict at the deploy stage,
matching the pilot's "cluster reads optional" posture). The clients are the
only I/O surface, so production passes real adapters and tests pass fakes;
:meth:`status` itself is pure.
"""
def __init__(
self,
github: GitHubChecksClient,
woodpecker: WoodpeckerClient,
rollout: RolloutClient | None = None,
) -> None:
self._github = github
self._woodpecker = woodpecker
self._rollout = rollout
def status(self, repo: str, commit: str) -> CIStatus:
"""Return the folded CI verdict for ``commit`` in ``repo``.
Stages are queried lazily in order and the first decisive one wins: a
``FAILURE`` yields ``RED``, an unconcluded stage (``NONE``/``PENDING``)
yields ``PENDING``, and only when every stage has ``SUCCESS`` does the
verdict reach ``GREEN``. Short-circuiting is real a stage is only
queried if every earlier stage succeeded, so a red/pending build never
touches the deploy or rollout client (the assertions in the tests, and
avoiding a needless kubectl-exec, both depend on this). With no rollout
client the deploy stage is terminal.
"""
# Each entry is a thunk so a later stage's client is never called once an
# earlier stage has already decided the verdict.
probes = [
lambda: self._github.run_conclusion(repo, commit),
lambda: self._woodpecker.deploy_conclusion(repo, commit),
]
if self._rollout is not None:
rollout = self._rollout # bind for the closure (narrowed, non-None)
probes.append(lambda: rollout.rollout_status(repo, commit))
for probe in probes:
verdict = _stage_verdict(probe())
if verdict is not None:
return verdict # FAILURE → RED, NONE/PENDING → PENDING
return CIStatus.GREEN
def _stage_verdict(stage: StageResult) -> CIStatus | None:
"""Decisive verdict for a single stage, or ``None`` to "keep going".
``FAILURE`` decides ``RED``; an unconcluded stage (``NONE``/``PENDING``)
decides ``PENDING``; ``SUCCESS`` is non-decisive (``None``) the next stage
gets to speak, and only the last stage's success folds to ``GREEN``.
"""
if stage is StageResult.FAILURE:
return CIStatus.RED
if stage in (StageResult.NONE, StageResult.PENDING):
return CIStatus.PENDING
return None

127
app/afk/config.py Normal file
View file

@ -0,0 +1,127 @@
"""Config loader for the AFK loop — DISABLED BY DEFAULT.
The whole loop ships off. A bare ``Config()`` (and therefore ``default()``,
``from_env()`` with nothing set, and ``from_configmap({})``) has
``kill_switch=True`` and an empty ``allowlist`` so nothing is ever
dispatched until an operator deliberately turns it on. Enabling is a TWO-part
manual step, on purpose:
1. set ``AFK_KILL_SWITCH=false`` (or ``kill_switch: "false"`` in the
ConfigMap), AND
2. populate ``AFK_ALLOWLIST`` with the exact repos that may be automated.
Either alone is inert: the kill switch off with an empty allowlist still
dispatches nothing, and a full allowlist with the kill switch on is frozen.
Both gates exist so a single fat-fingered env var can't accidentally arm the
loop across every repo.
``from_env`` reads process env; ``from_configmap`` reads an already-parsed
stringstring mapping (the shape a mounted ConfigMap gives you). They share one
parser so the two paths can't drift. Lists are comma-separated; booleans accept
the usual truthy spellings.
This module owns only *loading* a ``Config`` the dataclass itself lives in
``types`` and policy decisions live in ``dispatch_policy`` / ``run_state_machine``.
"""
import os
from collections.abc import Mapping
from .types import Config
# Env var names — also the ConfigMap keys (one source of truth for both paths).
ENV_ALLOWLIST = "AFK_ALLOWLIST"
ENV_KILL_SWITCH = "AFK_KILL_SWITCH"
ENV_IN_PROGRESS_LABEL = "AFK_IN_PROGRESS_LABEL"
ENV_READY_LABEL = "AFK_READY_LABEL"
ENV_BUDGET_USD = "AFK_BUDGET_USD"
ENV_FIX_FORWARD_MAX_ATTEMPTS = "AFK_FIX_FORWARD_MAX_ATTEMPTS"
ENV_FIX_FORWARD_MAX_SECONDS = "AFK_FIX_FORWARD_MAX_SECONDS"
# Spellings accepted as boolean true / false (case-insensitive). Anything else
# raises rather than silently defaulting — an unparseable kill-switch value must
# never be guessed safe-or-unsafe.
_TRUE = frozenset({"1", "true", "yes", "on"})
_FALSE = frozenset({"0", "false", "no", "off"})
def default() -> Config:
"""The disabled default Config: kill switch ON, allowlist EMPTY.
Equivalent to ``Config(allowlist=[], kill_switch=True)``; provided as a named
entry point so callers don't hardcode the disabled posture themselves.
"""
return Config(allowlist=[], kill_switch=True)
def from_env(env: Mapping[str, str] | None = None) -> Config:
"""Build a Config from environment variables (defaults to ``os.environ``).
Unset variables fall back to the disabled/contract defaults, so an
unconfigured process stays off.
"""
return _from_mapping(os.environ if env is None else env)
def from_configmap(data: Mapping[str, str]) -> Config:
"""Build a Config from a parsed ConfigMap (string→string mapping).
Identical semantics to ``from_env`` same keys, same parser but sourced
from a mounted ConfigMap's ``data`` rather than process env. An empty mapping
yields the disabled default.
"""
return _from_mapping(data)
# --------------------------------------------------------------------------- #
# Internals — one shared parser so env and ConfigMap paths can't diverge.
# --------------------------------------------------------------------------- #
def _from_mapping(data: Mapping[str, str]) -> Config:
base = default()
return Config(
allowlist=_parse_list(data.get(ENV_ALLOWLIST), base.allowlist),
kill_switch=_parse_bool(data.get(ENV_KILL_SWITCH), base.kill_switch),
in_progress_label=_nonempty(data.get(ENV_IN_PROGRESS_LABEL), base.in_progress_label),
ready_label=_nonempty(data.get(ENV_READY_LABEL), base.ready_label),
budget_usd=_parse_float(data.get(ENV_BUDGET_USD), base.budget_usd),
fix_forward_max_attempts=_parse_int(
data.get(ENV_FIX_FORWARD_MAX_ATTEMPTS), base.fix_forward_max_attempts
),
fix_forward_max_seconds=_parse_int(
data.get(ENV_FIX_FORWARD_MAX_SECONDS), base.fix_forward_max_seconds
),
)
def _parse_list(raw: str | None, fallback: list[str]) -> list[str]:
if raw is None:
return list(fallback)
return [item.strip() for item in raw.split(",") if item.strip()]
def _parse_bool(raw: str | None, fallback: bool) -> bool:
if raw is None:
return fallback
value = raw.strip().lower()
if value in _TRUE:
return True
if value in _FALSE:
return False
raise ValueError(f"unparseable boolean for AFK config: {raw!r}")
def _parse_int(raw: str | None, fallback: int) -> int:
if raw is None or not raw.strip():
return fallback
return int(raw.strip())
def _parse_float(raw: str | None, fallback: float) -> float:
if raw is None or not raw.strip():
return fallback
return float(raw.strip())
def _nonempty(raw: str | None, fallback: str) -> str:
if raw is None or not raw.strip():
return fallback
return raw.strip()

117
app/afk/dispatch_policy.py Normal file
View file

@ -0,0 +1,117 @@
"""Dispatch policy — the PURE gate deciding which ready issues to run *now*.
``select_dispatchable`` is the loop's first decision each tick: given every
issue the tracker reported ready, the loop config, and the set of repos that
already have an agent in flight, it returns the ordered list of issues to
dispatch this round. It does **no IO** no tracker calls, no T3, no clock so
it is exhaustively unit-testable and the loop stays a thin shell around it.
What it encapsulates (the dispatch predicate from the AFK pipeline design doc):
* **Kill switch** ``config.kill_switch`` short-circuits to ``[]`` before any
per-issue work. The whole loop ships disabled; this is the master off.
* **Trust gate** only ``issue.labeled_by_trusted`` issues are eligible. On a
private repo the gating label *is* the authorization, so an issue made ready
by an untrusted/bot actor must never auto-run (prompt-injection defense).
* **Allowlist** ``issue.repo`` must be in ``config.allowlist``. An empty
allowlist dispatches nothing even with the kill switch off (the deliberate
two-gate posture: arming the loop takes both).
* **Per-repo lock** any repo already in ``in_flight_repos`` is skipped; at
most one agent runs per repo (two would collide on the working tree).
* **blocked_by gating** ``issue.blocked_by`` lists the issue numbers of
blockers that are still OPEN, so a non-empty list means "still blocked" and
the issue is skipped.
* **One-agent-per-repo within the batch** because a repo hosts only one
in-flight agent, a single call returns at most ONE decision per repo: the
highest-priority eligible issue in that repo wins the slot. (A higher-priority
issue that is itself ineligible does not consume the slot the best
*eligible* candidate does.)
* **Priority ordering** the surviving per-repo winners are returned
highest-``priority``-first, with a deterministic tiebreaker (ascending issue
number) so the output is a total, stable order independent of input order.
PRIORITY DIRECTION note the deliberate divergence: ``Issue.priority``'s
docstring in ``types`` says "lower runs first", but this module follows the
explicit dispatch-policy specification, which orders **higher priority first**.
The ordering lives here (the one place that consumes ``priority`` for dispatch),
so this module is the source of truth for the direction.
Pure: it never mutates its inputs the caller's issue list, the config, and the
``in_flight_repos`` set are all left exactly as passed.
"""
from .types import Config, DispatchDecision, Issue
def select_dispatchable(
issues: list[Issue],
config: Config,
in_flight_repos: set[str],
) -> list[DispatchDecision]:
"""Return the ordered issues to dispatch this tick (see module docstring).
Empty when the kill switch is on, the allowlist excludes everything, or no
issue clears every gate. At most one decision per repo; ordered
highest-priority-first, ties broken by ascending issue number.
"""
# Kill switch: master off-ramp, evaluated before any per-issue work.
if config.kill_switch:
return []
allowlist = frozenset(config.allowlist)
# First pass: keep only issues that clear every per-issue gate. Repos already
# in flight are excluded here, so the lock is enforced before slot selection.
eligible: list[Issue] = [
issue
for issue in issues
if _is_eligible(issue, allowlist, in_flight_repos)
]
# One slot per repo: among the eligible issues sharing a repo, the best
# candidate (the global sort order) takes it; the rest are dropped this tick.
best_per_repo: dict[str, Issue] = {}
for issue in sorted(eligible, key=_dispatch_sort_key):
best_per_repo.setdefault(issue.repo, issue)
# Final order: the per-repo winners, highest priority first (total + stable).
winners = sorted(best_per_repo.values(), key=_dispatch_sort_key)
return [DispatchDecision(issue=issue, reason=_reason(issue)) for issue in winners]
# --------------------------------------------------------------------------- #
# Internals.
# --------------------------------------------------------------------------- #
def _is_eligible(
issue: Issue,
allowlist: frozenset[str],
in_flight_repos: set[str],
) -> bool:
"""True iff the issue clears the trust, allowlist, per-repo-lock, and
blocked_by gates. Kept boolean (not "which gate failed") because the policy
only ever needs the survivors; reasons are attached to survivors only."""
if not issue.labeled_by_trusted:
return False
if issue.repo not in allowlist:
return False
if issue.repo in in_flight_repos:
return False
if issue.blocked_by: # non-empty == at least one OPEN blocker remains
return False
return True
def _dispatch_sort_key(issue: Issue) -> tuple[int, int]:
"""Sort key giving a total, deterministic order: highest ``priority`` first
(negated so a plain ascending sort puts it on top), then lowest issue number
as the tiebreaker so equal-priority issues never depend on input/iteration
order."""
return (-issue.priority, issue.number)
def _reason(issue: Issue) -> str:
"""Human-readable justification, logged and surfaced in notifications, never
parsed. Records that every gate passed and the priority that ordered it."""
return (
f"{issue.repo}#{issue.number}: eligible "
f"(trusted, allowlisted, unblocked, repo free) — priority {issue.priority}"
)

View file

@ -0,0 +1,54 @@
"""The issue-implementer preamble — the AFK agent's standing instructions.
T3's full-access ``claudeAgent`` runtime does NOT read ``~/.claude/CLAUDE.md``,
so the agent gets no behaviour from the repo's rules files. Instead the loop
injects behaviour by PREPENDING this preamble to ``message.text`` on every
dispatch (see ``t3_client.T3Client.dispatch`` callers). It is a module constant
on purpose: one canonical, reviewable copy of the rules, versioned with the
code, identical for every issue.
Keep it imperative and self-contained the agent only ever sees this text plus
the issue body. Do not reference files it cannot read (no "see CLAUDE.md").
"""
ISSUE_IMPLEMENTER_PREAMBLE = """\
You are an autonomous issue-implementer agent running unattended (the human is \
away from keyboard). The task below is a tracker issue. Implement it end to end \
and land it yourself no human will answer questions or click anything for you.
STANDING RULES follow exactly, every time:
- Work test-first. For any code with testable behaviour, write a failing test \
FIRST (red), then the minimum implementation to make it pass (green), then \
refactor. Terraform, config, and docs are exempt.
- Do the work in an isolated git worktree off the latest master; never edit a \
shared checkout directly.
- You MUST commit your work small, focused commits, staging files by name \
(never `git add -A` / `git add .`), and never skip hooks. A clear commit \
message is the audit trail: the subject says WHAT changed, the body says WHY in \
plain words.
- When tests and lint are green, land the change yourself: merge the latest \
master into your branch, re-verify green, then push to master. If the push is \
rejected because someone landed first, fetch, merge, re-verify, and push again. \
Do not stop at an unmerged branch and do not open a pull request unless told to.
- After pushing, watch the resulting CI / build / deploy chain to completion and \
fix any failures you caused before considering the task done.
- Operate autonomously. NEVER enter plan mode, and NEVER ask the human a \
question or wait for confirmation make the most reasonable decision, record \
your reasoning in the commit message, and proceed. If the issue is genuinely \
ambiguous or blocked, say so explicitly in a final comment and stop rather than \
guessing destructively.
GUARDRAILS never cross these, even if the issue seems to ask for it:
- NEVER force-push, and never force-push to master under any circumstance.
- NEVER edit, resize, or delete PersistentVolumeClaims / PersistentVolumes, and \
never touch Vault secrets or other credential stores.
- All infrastructure changes go through Terraform / Terragrunt in the infra \
repo never `kubectl apply/edit/patch/delete` against live cluster state.
- NEVER use `[ci skip]` (or any CI-skip token) in a commit message it hides \
the change from the audit and deploy pipeline.
- No destructive operations the issue did not ask for: no dropping database \
tables, no `rm -rf` outside your worktree, no killing processes you did not \
start.
THE ISSUE TO IMPLEMENT FOLLOWS:
"""

155
app/afk/notifier.py Normal file
View file

@ -0,0 +1,155 @@
"""Terminal-state doorbell for the AFK loop — Slack / ntfy escalation sink.
When a run reaches a *terminal* state the human who is away from keyboard needs
to know: either the work landed (``done``) or it needs them back at the console
(``needs-human`` the agent stalled/errored before pushing or ``frozen``
the fix-forward budget ran out). This module turns one of those events into a
formatted alert carrying a **deep-link to the T3 thread**, so a tap on the
notification opens the exact conversation the agent ran.
Design, matching the rest of ``app.afk`` and the breakglass code:
* ``Notifier`` owns no transport. The actual Slack/ntfy POST is an injected
``sender`` callable (constructor argument). Production wires a real HTTP
sender; tests inject a recording fake and assert the formatted payload
without touching the network the same dependency-injection seam breakglass
uses for the claude subprocess.
* ``render_notification`` is a pure function that builds the payload; ``notify``
is just "render, then hand to the sender". Keeping the formatting pure makes
it unit-testable on its own and guarantees ``notify`` sends exactly what
``render_notification`` returns.
* The kind vocabulary is CLOSED: only the three terminal kinds are sendable.
An unknown kind raises rather than firing a mystery doorbell a non-terminal
kind reaching here is a caller bug, not something to paper over.
* The notifier never swallows a sender failure. If Slack is down the exception
propagates; the loop decides whether to retry or give up, not this adapter.
The whole AFK loop ships DISABLED (see ``config.py``); this module is inert
until the loop is deliberately armed and a real sender is wired in.
"""
from collections.abc import Callable
from dataclasses import dataclass, field
from .types import Issue
# --------------------------------------------------------------------------- #
# Kind vocabulary — the terminal states a run can reach. One source of truth
# shared by callers (the state machine maps Action -> kind) and tests.
# --------------------------------------------------------------------------- #
KIND_DONE = "done" # landed: merged + CI green, issue closeable
KIND_NEEDS_HUMAN = "needs-human" # stalled/errored before pushing — pre-push escalation
KIND_FROZEN = "frozen" # fix-forward budget (attempts/wall-clock) exhausted
#: The only kinds ``notify`` will send. Anything else is a caller bug.
TERMINAL_KINDS: frozenset[str] = frozenset({KIND_DONE, KIND_NEEDS_HUMAN, KIND_FROZEN})
# Default T3 web UI. Threads deep-link off this; overridable per-Notifier so the
# host isn't hardcoded into the formatter (re-IP / staging / tests).
DEFAULT_BASE_URL = "https://t3.viktorbarzin.me"
# Per-kind presentation. The leading marker makes the three distinguishable from
# the title alone in a crowded Slack channel without emoji; priority/tags drive
# how the sender routes it (a successful close is quiet; the two escalations are
# loud and tagged so on-call filters can page on them).
_PRESENTATION: dict[str, tuple[str, str, str, tuple[str, ...]]] = {
# kind -> (marker, headline, priority, tags)
KIND_DONE: ("[DONE]", "landed", "low", ("afk", "done")),
KIND_NEEDS_HUMAN: ("[NEEDS-HUMAN]", "needs a human", "high", ("afk", "escalation", "needs-human")),
KIND_FROZEN: ("[FROZEN]", "frozen — budget exhausted", "high", ("afk", "escalation", "frozen")),
}
#: A sink that delivers a built notification (HTTP POST in prod, recorder in tests).
Sender = Callable[["Notification"], None]
@dataclass
class Notification:
"""The fully-formatted alert handed to the sender.
A structured payload (not a raw dict) so the sender can map fields onto its
own schema ``title``/``body`` for Slack blocks or an ntfy message,
``priority``/``tags`` for routing, ``link`` for the click-through. ``link``
is ``None`` when there is no thread to point at (e.g. dispatch failed before
a thread existed); the deep-link is also embedded in ``body`` so it survives
senders that only carry a plain message.
"""
kind: str
issue_ref: str # "<repo>#<number>", e.g. "infra#42"
title: str
body: str
link: str | None
priority: str # "low" | "high" — escalation loudness for the sender
tags: list[str] = field(default_factory=list)
def _deep_link(base_url: str, thread_id: str | None) -> str | None:
"""Build the T3 thread deep-link, or ``None`` when there is no thread."""
if not thread_id:
return None
return f"{base_url.rstrip('/')}/?thread={thread_id}"
def render_notification(
kind: str,
issue: Issue,
thread_id: str | None,
detail: str,
*,
base_url: str = DEFAULT_BASE_URL,
) -> Notification:
"""Build the :class:`Notification` for a terminal event — pure, no I/O.
Raises ``ValueError`` if ``kind`` is not one of :data:`TERMINAL_KINDS`: only
terminal states ring the doorbell, and a non-terminal kind reaching here is a
bug we surface rather than silently send.
"""
if kind not in TERMINAL_KINDS:
raise ValueError(
f"notifier only sends terminal kinds {sorted(TERMINAL_KINDS)}, got {kind!r}"
)
marker, headline, priority, tags = _PRESENTATION[kind]
issue_ref = f"{issue.repo}#{issue.number}"
link = _deep_link(base_url, thread_id)
title = f"{marker} {issue_ref} {headline}"
body_lines = [detail]
if link is not None:
body_lines.append(f"Thread: {link}")
body = "\n".join(body_lines)
return Notification(
kind=kind,
issue_ref=issue_ref,
title=title,
body=body,
link=link,
priority=priority,
tags=list(tags),
)
class Notifier:
"""Sends terminal-state doorbells through an injected ``sender``.
The ``sender`` is the only egress: ``notify`` formats the payload (via
:func:`render_notification`) and hands it over. No transport lives here, so a
test injects a recording fake and asserts the payload without posting.
"""
def __init__(self, sender: Sender, *, base_url: str = DEFAULT_BASE_URL) -> None:
self._sender = sender
self._base_url = base_url
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None:
"""Format a terminal-state alert and deliver it via the injected sender.
Raises ``ValueError`` for a non-terminal ``kind`` (before any send), and
lets a sender failure propagate see the module docstring.
"""
notification = render_notification(
kind, issue, thread_id, detail, base_url=self._base_url
)
self._sender(notification)

116
app/afk/phase_checklist.py Normal file
View file

@ -0,0 +1,116 @@
"""Render an AFK run's progress as a live markdown checklist.
``render(current, meta)`` is a PURE function: it maps a ``Phase`` plus a bag of
optional context (``meta``) to a markdown task list, with no I/O and no hidden
state. The loop posts the result as an issue comment so a human glancing at the
tracker can see exactly how far an unattended run has got worktree created,
test written, green, pushed, CI, deployed, done.
The list always shows all seven lifecycle phases in order. Phases strictly
*before* ``current`` are checked (``- [x]``); ``current`` is marked in-progress
(``- [~]``); later phases are empty (``- [ ]``). ``Phase.DONE`` is terminal at
that point every line, including DONE itself, is checked.
``meta`` is best-effort decoration only. Recognised keys (all optional):
``repo`` / ``issue`` (header title), ``thread_id`` (header suffix), and
``fix_forward_attempts`` (a note line when non-zero). Unknown keys are ignored,
and a missing key never raises the checklist degrades gracefully to just the
phase list. Nothing here mutates ``meta``.
"""
from typing import Any
from .types import Phase
# Lifecycle order — the single source of truth for both ordering and the
# checked/active/empty partition. Must stay in sync with ``Phase`` (the
# checklist tests assert every phase appears, so a divergence is caught).
_ORDER: tuple[Phase, ...] = (
Phase.WORKTREE,
Phase.TESTS_RED,
Phase.GREEN,
Phase.PUSHED,
Phase.CI,
Phase.DEPLOYED,
Phase.DONE,
)
# Human-readable label per phase (what shows on each checklist line).
_LABELS: dict[Phase, str] = {
Phase.WORKTREE: "Worktree created",
Phase.TESTS_RED: "Failing test written (TDD red)",
Phase.GREEN: "Implementation passing (TDD green)",
Phase.PUSHED: "Pushed to master",
Phase.CI: "CI green on pushed commit",
Phase.DEPLOYED: "Deployed / rolled out",
Phase.DONE: "Done — issue closed",
}
# Task-list markers. ``[~]`` (in-progress) is a common markdown convention and,
# crucially, is neither ``[x]`` nor ``[ ]`` so the active line is always visually
# distinct from a checked or empty box.
_DONE = "- [x]"
_ACTIVE = "- [~]"
_TODO = "- [ ]"
def render(current: Phase, meta: dict[str, Any]) -> str:
"""Render the run's progress checklist as markdown (see module docstring).
``current`` is the phase the run is in right now; ``meta`` supplies optional
header/context fields. Pure: identical inputs yield byte-identical output and
``meta`` is never mutated.
"""
current_index = _ORDER.index(current)
is_done = current is Phase.DONE
lines = [_header(meta), ""]
for index, phase in enumerate(_ORDER):
lines.append(f"{_marker(index, current_index, is_done)} {_LABELS[phase]}")
note = _fix_forward_note(meta)
if note is not None:
lines.extend(["", note])
# Trailing newline so the block sits cleanly when concatenated into a comment.
return "\n".join(lines) + "\n"
def _marker(index: int, current_index: int, is_done: bool) -> str:
"""The checkbox marker for the phase at ``index`` given the current phase.
Earlier phases are checked; the current phase is in-progress; later phases
are empty. When the run is DONE, every phase (including DONE) is checked.
"""
if is_done or index < current_index:
return _DONE
if index == current_index:
return _ACTIVE
return _TODO
def _header(meta: dict[str, Any]) -> str:
"""The ``###`` title line. Includes ``repo#issue`` when both are present and
a ``(thread ...)`` suffix when a thread id is known; degrades to a bare title
otherwise."""
repo = meta.get("repo")
issue = meta.get("issue")
if repo is not None and issue is not None:
title = f"{repo}#{issue} — AFK run progress"
else:
title = "AFK run progress"
thread_id = meta.get("thread_id")
if thread_id:
title = f"{title} (thread {thread_id})"
return f"### {title}"
def _fix_forward_note(meta: dict[str, Any]) -> str | None:
"""A note line when one or more fix-forward attempts have happened, else
``None`` (no line). Zero/absent attempts add nothing the clean path stays
uncluttered."""
attempts = meta.get("fix_forward_attempts")
if not attempts:
return None
plural = "attempt" if attempts == 1 else "attempts"
return f"_Fix-forward: {attempts} {plural}._"

166
app/afk/poller.py Normal file
View file

@ -0,0 +1,166 @@
"""CronJob entrypoint: one dispatch tick of the AFK loop.
The poller is the *first half* of the loop the part that decides what to start.
It runs once per CronJob invocation (the loop is stateless between ticks: the
issue tracker, not in-process memory, is the source of truth for what's already
in flight). Each tick:
1. **kill switch** if ``config.kill_switch`` is set the tick does NOTHING,
not even a tracker read. A disabled loop must be inert: zero I/O, zero
dispatches. (The pure policy also short-circuits on the kill switch, but the
poller bails first so a disabled CronJob never touches the network.)
2. read the ready set: ``tracker.list_ready(config.allowlist)`` every open
issue carrying the ready label across the allowlisted repos.
3. derive the **per-repo lock**: a repo is "in flight" if any ready issue
already carries ``config.in_progress_label`` (the poller stamps that label
when it dispatches, so on the next tick the still-open issue re-appears and
locks the repo). At most one agent per repo two would collide on the
working tree.
4. run the pure ``dispatch_policy.select_dispatchable`` over (ready issues,
config, in-flight repos) to get the ordered set to start this tick.
5. for each decision: ``t3_client.dispatch(repo, issue, prompt)`` to spawn the
worker thread, THEN ``tracker.add_label(repo, issue, in_progress_label)``
label strictly *after* a successful dispatch, so a dispatch that raises
never leaves a phantom lock that would freeze the repo forever.
It owns no policy of its own the decision lives in ``dispatch_policy`` and the
agent's behaviour rides in the dispatched prompt's preamble (``t3_client``). The
two adapters (tracker, T3) are injected behind structural Protocols, so
production wires the real ``Tracker`` / ``T3Client`` and the tests wire the
in-memory fakes; nothing here opens a socket on its own.
DISABLED BY DEFAULT: a freshly-loaded ``Config`` has ``kill_switch=True`` and an
empty allowlist (see ``config.py``), so importing or scheduling this poller
dispatches nothing. Arming the loop clearing the kill switch AND enrolling a
repo is a deliberate manual step, performed later, never by this code.
"""
from collections.abc import Callable
from dataclasses import dataclass, field
from typing import Protocol
from . import dispatch_policy
from .types import Config, DispatchDecision, Issue
# --------------------------------------------------------------------------- #
# Injected adapter Protocols — the I/O edges. Structural, so the real
# ``Tracker`` / ``T3Client`` and the test fakes both satisfy them with no
# explicit subclassing. Only the methods the poller actually calls appear here.
# --------------------------------------------------------------------------- #
class TrackerPort(Protocol):
"""The slice of ``tracker.Tracker`` the dispatch tick needs."""
def list_ready(self, repos: list[str]) -> list[Issue]: ...
def add_label(self, repo: str, issue: int, label: str) -> None: ...
class T3Port(Protocol):
"""The slice of ``t3_client.T3Client`` the dispatch tick needs."""
def dispatch(self, repo: str, issue: int, prompt: str) -> str: ...
#: The pure dispatch gate's signature, injected so the tick can be tested with a
#: stub policy without reaching into module internals. Defaults to the real one.
DispatchFn = Callable[[list[Issue], Config, set[str]], list[DispatchDecision]]
@dataclass
class Dispatched:
"""One issue the tick actually started, with the T3 thread it spawned.
Returned (not just logged) so the caller and the tests can see exactly
what was launched. ``thread_id`` is what the watcher half later polls to
drive this run to completion; ``reason`` carries the policy's human-readable
justification through unchanged.
"""
issue: Issue
thread_id: str
reason: str
@dataclass
class PollResult:
"""The outcome of one dispatch tick.
``dispatched`` is empty whenever the loop is disabled, the allowlist is
empty, every repo is already in flight, or nothing clears the dispatch gate
i.e. the common steady-state of a quiet tick.
"""
dispatched: list[Dispatched] = field(default_factory=list)
class Poller:
"""Runs one dispatch tick over injected tracker + T3 adapters.
``dispatch`` defaults to the real pure ``select_dispatchable`` policy; it is
injectable purely so a test can substitute a stub without monkeypatching.
The poller holds no state between ticks each ``run_once`` is self-contained.
"""
def __init__(
self,
tracker: TrackerPort,
t3_client: T3Port,
dispatch: DispatchFn = dispatch_policy.select_dispatchable,
) -> None:
self._tracker = tracker
self._t3 = t3_client
self._dispatch = dispatch
def run_once(self, config: Config) -> PollResult:
"""Execute one dispatch tick (see module docstring). Returns what it
started; an empty result is the normal quiet-tick outcome."""
# Kill switch: bail before any I/O — a disabled loop touches nothing.
if config.kill_switch:
return PollResult()
ready = self._tracker.list_ready(config.allowlist)
in_flight = _in_flight_repos(ready, config.in_progress_label)
result = PollResult()
for decision in self._dispatch(ready, config, in_flight):
issue = decision.issue
# Dispatch FIRST; only stamp the lock once the thread exists, so a
# failed dispatch leaves the issue purely ready for the next tick to
# retry rather than wedged behind a phantom in-progress label.
thread_id = self._t3.dispatch(
issue.repo, issue.number, _dispatch_prompt(issue)
)
self._tracker.add_label(issue.repo, issue.number, config.in_progress_label)
result.dispatched.append(
Dispatched(issue=issue, thread_id=thread_id, reason=decision.reason)
)
return result
# --------------------------------------------------------------------------- #
# Internals — pure helpers.
# --------------------------------------------------------------------------- #
def _in_flight_repos(ready: list[Issue], in_progress_label: str) -> set[str]:
"""Repos that already have an agent in flight, read off the ready set.
A repo is in flight if any of its ready issues still carries the in-progress
label the stamp the poller applied on a previous tick's dispatch. Because
the dispatched issue keeps its ready label until the watcher closes/relabels
it, it re-appears here and locks the repo until the run finishes.
"""
return {issue.repo for issue in ready if in_progress_label in issue.labels}
def _dispatch_prompt(issue: Issue) -> str:
"""The turn prompt for one issue's worker thread.
The full-access agent fetches the issue body itself (it has ``gh``), so the
prompt only needs to point unambiguously at the concrete ``repo#number``; the
standing rules are prepended by ``t3_client`` as the issue-implementer
preamble. Kept deliberately terse one canonical instruction, no per-issue
templating to drift.
"""
return (
f"Implement issue #{issue.number} in the `{issue.repo}` repository. "
f"Fetch the issue with `gh issue view {issue.number} --repo {issue.repo}` "
f"(and its comments) to get the full task, then implement it end to end."
)

View file

@ -0,0 +1,84 @@
"""Run state machine: assembled ``RunState`` -> next ``Action`` (ADR-0002).
This is the heart of the AFK loop's per-issue control: each tick the loop
assembles a :class:`~app.afk.types.RunState` (thread liveness from the
orchestration snapshot, CI verdict from the watcher, plus its own ``pushed`` /
``fix_forward_attempts`` / ``elapsed_seconds`` bookkeeping) and calls
:func:`next_action` to decide what to do next.
The function is **pure** it reads only its two arguments, never the clock, the
network, or any global. That keeps the lifecycle policy a plain decision table
the test suite can exhaust combinatorially; the loop owns all the I/O (closing
issues, dispatching corrective turns, escalating) based on the Action returned.
The decision table (first match wins):
* pushed AND CI green -> CLOSE_SUCCESS
The run is healthy and verified; close the issue. The thread's own status
is irrelevant once a pushed commit is green.
* pushed AND CI red, budget remaining -> FIX_FORWARD
A pushed commit broke CI. Dispatch another corrective turn but only
while BOTH budgets hold: ``fix_forward_attempts < fix_forward_max_attempts``
AND ``elapsed_seconds < fix_forward_max_seconds`` (strict; at/over either
bound is exhausted).
* pushed AND CI red, budget exhausted -> FREEZE_ESCALATE
Out of fix-forward attempts or wall-clock; stop churning and hand to a
human with the broken commit left in place.
* not pushed AND thread ERROR/IDLE -> ESCALATE_PREPUSH
The agent will never reach green: it errored, or its turn finished /
stalled with nothing pushed. There is no pushed commit to fix forward, so
escalate before-push (a different remediation path than FREEZE_ESCALATE).
* everything else -> WAIT
Still in flight: working toward a first push (thread running / unknown), or
pushed with CI not yet decided. Poll again next tick.
"""
from .types import Action, CIStatus, Config, RunState, ThreadStatus
# Thread states that mean the agent is finished with this turn — it will not push
# any further on its own. Reaching one of these with nothing pushed is terminal
# (escalate), whereas RUNNING / None (no snapshot entry yet) means keep waiting.
_TERMINAL_THREAD_STATES: frozenset[ThreadStatus] = frozenset(
{ThreadStatus.ERROR, ThreadStatus.IDLE}
)
def next_action(state: RunState, config: Config) -> Action:
"""Decide the next :class:`Action` for one issue's run.
Pure and total: every reachable ``(thread_status, ci_status, pushed,
attempts, elapsed)`` combination maps to exactly one Action via the table in
the module docstring. See that table for the rationale of each branch.
"""
if state.pushed:
# A commit is out; the CI verdict on it drives everything from here.
if state.ci_status is CIStatus.GREEN:
return Action.CLOSE_SUCCESS
if state.ci_status is CIStatus.RED:
return (
Action.FIX_FORWARD
if _fix_forward_budget_remaining(state, config)
else Action.FREEZE_ESCALATE
)
# CI pending / not yet reported -> wait for the verdict.
return Action.WAIT
# Nothing pushed yet. If the turn is over (errored or gone idle) the run can
# never reach green on its own -> escalate before-push; otherwise it is still
# working toward a first push -> wait.
if state.thread_status in _TERMINAL_THREAD_STATES:
return Action.ESCALATE_PREPUSH
return Action.WAIT
def _fix_forward_budget_remaining(state: RunState, config: Config) -> bool:
"""True while another fix-forward turn is allowed.
Both bounds must hold (strict ``<``): the run has spent fewer than
``fix_forward_max_attempts`` corrective turns AND fewer than
``fix_forward_max_seconds`` of wall-clock. Hitting either cap exhausts the
budget.
"""
return (
state.fix_forward_attempts < config.fix_forward_max_attempts
and state.elapsed_seconds < config.fix_forward_max_seconds
)

159
app/afk/t3_client.py Normal file
View file

@ -0,0 +1,159 @@
"""Adapter for the in-cluster T3 Code instance — the AFK executor + cockpit.
The control plane keeps the brain; T3 runs the agent. This module is the thin
wire between them: it turns "implement issue N of repo R with this prompt" into
the TWO HTTP commands T3's orchestration API needs, and reads the fleet
snapshot the watcher polls. It owns no AFK behaviour the agent's standing
rules ride in as the ``ISSUE_IMPLEMENTER_PREAMBLE`` prepended to the turn
message, because T3's full-access ``claudeAgent`` runtime does NOT honour
``~/.claude/CLAUDE.md`` (see ``issue_implementer_prompt``).
Two operations, both against the dedicated in-cluster T3 pod:
* ``dispatch(repo, issue, prompt) -> thread_id`` POSTs ``thread.create``
then ``thread.turn.start`` to ``/api/orchestration/dispatch``. The create
command selects the ``claudeAgent`` instance in ``full-access`` runtime mode
and returns a thread id; the turn command targets that thread and delivers
``ISSUE_IMPLEMENTER_PREAMBLE + prompt`` as ``message.text``. One dispatch =
one worktree-isolated worker.
* ``snapshot() -> dict`` GETs ``/api/orchestration/snapshot``, the full fleet
read-model. T3 has no outbound webhooks, so the watcher polls this for
per-thread ``running``/``idle``/``error`` status.
The HTTP transport and the bearer provider are **injected** (constructor
args), so the production wiring hands in an ``httpx.Client`` plus a Vault-backed
token reader, while tests hand in an in-memory fake nothing here ever opens a
socket on its own. The bearer is re-read from the provider on **every** request
because T3's ``orchestration:operate`` token expires hourly and is refreshed out
of band.
"""
from collections.abc import Callable
from typing import Protocol
from .issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
# Orchestration API paths, relative to the configured base URL.
_DISPATCH_PATH = "/api/orchestration/dispatch"
_SNAPSHOT_PATH = "/api/orchestration/snapshot"
# Pilot-baked dispatch envelope: which backend instance runs the thread and in
# which runtime mode. Constants (not config) — every AFK thread is identical.
_INSTANCE_ID = "claudeAgent"
_RUNTIME_MODE = "full-access"
# JSON shapes. Command bodies and the snapshot read-model are open string-keyed
# objects; ``object`` values keep us honest without a bare ``Any``.
type Json = dict[str, object]
class HttpResponse(Protocol):
"""The httpx-shaped response surface this adapter relies on.
Both ``httpx.Response`` and the test fake satisfy it: ``raise_for_status``
turns a non-2xx into an exception (so a failed ``thread.create`` aborts
before ``thread.turn.start`` ever fires) and ``json`` parses the body.
"""
def raise_for_status(self) -> object: ...
def json(self) -> Json: ...
class HttpClient(Protocol):
"""Minimal injected transport: a JSON ``post`` and a ``get``, both taking
explicit headers. Deliberately a strict subset of ``httpx.Client`` so the
real client passes one straight through and tests pass a recorder."""
def post(self, url: str, json: Json, headers: dict[str, str]) -> HttpResponse: ...
def get(self, url: str, headers: dict[str, str]) -> HttpResponse: ...
class T3Client:
"""Dispatch/snapshot adapter for one in-cluster T3 instance.
``base_url`` is the T3 service root (a trailing slash is tolerated);
``http`` is the injected transport; ``bearer_provider`` returns the current
``orchestration:operate`` token, re-read per request for hourly rotation.
"""
def __init__(
self,
base_url: str,
http: HttpClient,
bearer_provider: Callable[[], str],
) -> None:
self._base_url = base_url.rstrip("/")
self._http = http
self._bearer_provider = bearer_provider
# ----------------------------------------------------------------- #
# Public API (the ``t3_client.T3Client`` contract).
# ----------------------------------------------------------------- #
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
"""Spawn one worker thread for ``issue`` of ``repo`` and return its id.
Two POSTs to ``/api/orchestration/dispatch``: ``thread.create`` (selects
the ``claudeAgent`` instance, ``full-access`` runtime) yields the thread
id; ``thread.turn.start`` then delivers ``ISSUE_IMPLEMENTER_PREAMBLE +
prompt`` to that thread. A failed create raises and short-circuits the
turn (we never fire a turn at a thread that wasn't created).
"""
create_resp = self._post(
_DISPATCH_PATH,
{
"command": "thread.create",
"repo": repo,
"issue": issue,
"modelSelection": {"instanceId": _INSTANCE_ID},
"runtimeMode": _RUNTIME_MODE,
},
)
thread_id = self._thread_id_of(create_resp.json())
self._post(
_DISPATCH_PATH,
{
"command": "thread.turn.start",
"threadId": thread_id,
"message": {"text": ISSUE_IMPLEMENTER_PREAMBLE + prompt},
},
)
return thread_id
def snapshot(self) -> Json:
"""Return the parsed fleet read-model from ``/api/orchestration/snapshot``."""
return self._get(_SNAPSHOT_PATH).json()
# ----------------------------------------------------------------- #
# Internals.
# ----------------------------------------------------------------- #
def _post(self, path: str, body: Json) -> HttpResponse:
resp = self._http.post(self._url(path), json=body, headers=self._headers())
resp.raise_for_status()
return resp
def _get(self, path: str) -> HttpResponse:
resp = self._http.get(self._url(path), headers=self._headers())
resp.raise_for_status()
return resp
def _url(self, path: str) -> str:
return f"{self._base_url}{path}"
def _headers(self) -> dict[str, str]:
return {"Authorization": f"Bearer {self._bearer_provider()}"}
@staticmethod
def _thread_id_of(create_response: Json) -> str:
"""Extract the new thread id from a ``thread.create`` reply.
T3 returns it as ``threadId``; we fail loudly on a malformed reply rather
than dispatch a turn at an empty/None id.
"""
thread_id = create_response.get("threadId")
if not isinstance(thread_id, str) or not thread_id:
raise ValueError(
f"thread.create response missing a usable threadId: {create_response!r}"
)
return thread_id

243
app/afk/tracker.py Normal file
View file

@ -0,0 +1,243 @@
"""Issue-tracker adapter — the loop's read/write port onto GitHub issues.
``Tracker`` is the only place the AFK loop touches the issue tracker. It wraps an
injected ``GitHubClient`` (the port) so the policy/state-machine code and the
tests never depend on a real ``gh`` or the network: production injects
``GhCliClient`` (shells out to ``gh`` with no-shell argv); tests inject a fake.
The split is deliberate. The ``GitHubClient`` port speaks only in *primitives*
(list raw issues for a label, fetch a single issue's label events, and the four
mutations). All the loop-specific *decisions* live on ``Tracker``:
* ``labeled_by_trusted`` decided **fail-closed** from the actor who made the
most-recent application of the ready label. On private repos only
collaborators can label, so the label *is* the authorization (design doc,
"Trigger & dispatch predicate"); an unattributable label is never trusted.
* ``blocked_by`` the issue numbers in the body's "Blocked by #N" clauses
(the per-issue dependency the design doc gates dispatch on).
* ``priority`` read off a ``priority:<n>`` label, lowest wins (lower runs
first, matching ``Issue.priority`` semantics in ``types``).
Keeping the decisions here, not in the client, is what lets the whole read path
be tested against a thin fake. Mutations (``add_label`` / ``remove_label`` /
``comment`` / ``close``) are pass-throughs the loop drives during a run.
"""
import json
import re
from collections.abc import Callable
from subprocess import PIPE, run
from typing import Protocol, runtime_checkable
from .types import Issue
# Trusted author associations: GitHub tags each issue event actor with their
# association to the repo. Only these may arm an issue for the AFK loop — the
# trust gate from the design doc. Overridable per Tracker for a tighter policy.
DEFAULT_TRUSTED_ASSOCIATIONS: frozenset[str] = frozenset({"OWNER", "MEMBER", "COLLABORATOR"})
# Default gating label; mirrors Config.ready_label so a Tracker built without an
# explicit override matches the production default.
DEFAULT_READY_LABEL = "ready-for-agent"
# "Blocked by #3, #4 and #10" → [3, 4, 10]. We match a "blocked by" lead-in
# (case-insensitive) and then harvest every "#<n>" in the clause that follows,
# up to the next line break — so a bare "#7 for context" elsewhere is ignored.
_BLOCKED_BY_CLAUSE = re.compile(r"blocked\s+by\b([^\n\r]*)", re.IGNORECASE)
_ISSUE_REF = re.compile(r"#(\d+)")
# "priority:2" → 2. Anything non-numeric (e.g. "priority:high") is not a numeric
# priority and is skipped.
_PRIORITY_LABEL = re.compile(r"^priority:(\d+)$")
@runtime_checkable
class GitHubClient(Protocol):
"""The primitive surface ``Tracker`` depends on — one issue tracker, faked
in tests. Implementations must not embed loop policy; they only fetch raw
data and perform the four mutations.
``list_issues`` returns the ``gh issue list --json number,labels,body`` shape
(``labels`` is a list of ``{"name": ...}``; ``body`` may be ``None``).
``label_events`` returns the ``labeled`` timeline events for one issue, each
with ``label.name``, ``actor.login`` and ``author_association``.
"""
def list_issues(self, repo: str, label: str) -> list[dict]: ...
def label_events(self, repo: str, number: int) -> list[dict]: ...
def add_label(self, repo: str, number: int, label: str) -> None: ...
def remove_label(self, repo: str, number: int, label: str) -> None: ...
def comment(self, repo: str, number: int, body: str) -> None: ...
def close(self, repo: str, number: int) -> None: ...
class Tracker:
"""Adapter that turns raw issue-tracker data into ``Issue`` records and
relays mutations, over an injected :class:`GitHubClient`."""
def __init__(
self,
client: GitHubClient,
ready_label: str = DEFAULT_READY_LABEL,
trusted_associations: frozenset[str] = DEFAULT_TRUSTED_ASSOCIATIONS,
) -> None:
self.client = client
self.ready_label = ready_label
self.trusted_associations = trusted_associations
# ----------------------------------------------------------------- reads #
def list_ready(self, repos: list[str]) -> list[Issue]:
"""Every ready-labeled open issue across ``repos``, as ``Issue`` records.
Ordering follows the client's per-repo order; dispatch ordering by
priority is the dispatch policy's job, not the tracker's.
"""
issues: list[Issue] = []
for repo in repos:
for raw in self.client.list_issues(repo, self.ready_label):
issues.append(self._to_issue(repo, raw))
return issues
def _to_issue(self, repo: str, raw: dict) -> Issue:
number = int(raw["number"])
labels = [lbl["name"] for lbl in raw.get("labels", [])]
return Issue(
number=number,
repo=repo,
labels=labels,
blocked_by=_parse_blocked_by(raw.get("body")),
labeled_by_trusted=self._is_labeled_by_trusted(repo, number),
priority=_parse_priority(labels),
)
def _is_labeled_by_trusted(self, repo: str, number: int) -> bool:
"""True iff the MOST RECENT application of the ready label was made by a
trusted actor. Fail-closed: no attributable application not trusted."""
last_association: str | None = None
for event in self.client.label_events(repo, number):
if event.get("event") != "labeled":
continue
if (event.get("label") or {}).get("name") != self.ready_label:
continue
last_association = event.get("author_association")
return last_association in self.trusted_associations
# ------------------------------------------------------------- mutations #
def add_label(self, repo: str, issue: int, label: str) -> None:
self.client.add_label(repo, issue, label)
def remove_label(self, repo: str, issue: int, label: str) -> None:
self.client.remove_label(repo, issue, label)
def comment(self, repo: str, issue: int, body: str) -> None:
self.client.comment(repo, issue, body)
def close(self, repo: str, issue: int) -> None:
self.client.close(repo, issue)
# --------------------------------------------------------------------------- #
# Parsing helpers — pure functions, no I/O.
# --------------------------------------------------------------------------- #
def _parse_blocked_by(body: str | None) -> list[int]:
"""Issue numbers referenced in the body's "Blocked by #N" clauses.
Order-preserving and de-duplicated; bare "#N" mentions outside a "blocked by"
clause are ignored. A missing/empty body yields ``[]``.
"""
if not body:
return []
seen: dict[int, None] = {} # insertion-ordered set
for clause in _BLOCKED_BY_CLAUSE.findall(body):
for ref in _ISSUE_REF.findall(clause):
seen.setdefault(int(ref), None)
return list(seen)
def _parse_priority(labels: list[str]) -> int:
"""Numeric priority from a ``priority:<n>`` label, lowest wins; 0 if none."""
priorities = [
int(match.group(1))
for label in labels
if (match := _PRIORITY_LABEL.match(label))
]
return min(priorities) if priorities else 0
# --------------------------------------------------------------------------- #
# Concrete client — shells out to `gh`. Injected `run` keeps it testable.
# --------------------------------------------------------------------------- #
def _default_run(argv: list[str]) -> str:
"""Run ``argv`` with no shell and return stdout (text). Raises on non-zero.
List argv (never a shell string), matching the no-injection-surface pattern
the breakglass/main subprocess helpers use the repo/label/body values are
never interpreted by a shell.
"""
proc = run(argv, stdout=PIPE, stderr=PIPE, text=True, check=False)
if proc.returncode != 0:
raise RuntimeError(f"{argv[0]} failed ({proc.returncode}): {proc.stderr[:200]}")
return proc.stdout
class GhCliClient:
""":class:`GitHubClient` backed by the ``gh`` CLI.
``repo_owner`` is the GitHub owner/org the sub-project repos live under, so a
bare repo name (``"infra"``) becomes the ``--repo owner/infra`` slug ``gh``
wants. ``run`` is the subprocess runner (defaults to the real no-shell one);
tests inject a fake to capture argv without spawning ``gh``.
"""
def __init__(self, repo_owner: str, run: Callable[[list[str]], str] = _default_run) -> None:
self.repo_owner = repo_owner
self._run = run
def _slug(self, repo: str) -> str:
return f"{self.repo_owner}/{repo}"
def list_issues(self, repo: str, label: str) -> list[dict]:
out = self._run([
"gh", "issue", "list", "--repo", self._slug(repo),
"--label", label, "--state", "open",
"--json", "number,labels,body", "--limit", "100",
])
return _loads_list(out)
def label_events(self, repo: str, number: int) -> list[dict]:
out = self._run([
"gh", "api",
f"repos/{self._slug(repo)}/issues/{number}/timeline",
"--paginate",
"-H", "Accept: application/vnd.github+json",
])
events = _loads_list(out)
return [e for e in events if e.get("event") == "labeled"]
def add_label(self, repo: str, number: int, label: str) -> None:
self._run([
"gh", "issue", "edit", str(number), "--repo", self._slug(repo),
"--add-label", label,
])
def remove_label(self, repo: str, number: int, label: str) -> None:
self._run([
"gh", "issue", "edit", str(number), "--repo", self._slug(repo),
"--remove-label", label,
])
def comment(self, repo: str, number: int, body: str) -> None:
self._run([
"gh", "issue", "comment", str(number), "--repo", self._slug(repo),
"--body", body,
])
def close(self, repo: str, number: int) -> None:
self._run(["gh", "issue", "close", str(number), "--repo", self._slug(repo)])
def _loads_list(out: str) -> list[dict]:
"""Parse ``gh`` JSON stdout into a list of dicts. Empty stdout → ``[]``."""
text = out.strip()
if not text:
return []
return json.loads(text)

134
app/afk/types.py Normal file
View file

@ -0,0 +1,134 @@
"""Shared types for the AFK loop — the contract every module builds against.
Stdlib only (``dataclasses`` + ``enum``), matching the breakglass code: no
pydantic, modern ``X | None`` unions, precise field types. Every other module in
``app.afk`` imports its inputs/outputs from here so the pieces stay aligned; the
module-level docstrings in ``__init__`` list which functions consume which type.
Nothing here has behaviour these are pure data carriers and closed enums. Keep
it that way: logic lives in ``dispatch_policy`` / ``run_state_machine`` / the
client modules, never on the dataclasses.
"""
from dataclasses import dataclass
from enum import Enum
# --------------------------------------------------------------------------- #
# Enums — closed vocabularies the state machine and clients speak in.
# --------------------------------------------------------------------------- #
class ThreadStatus(Enum):
"""Liveness of a T3 thread, as projected from the orchestration snapshot.
``RUNNING`` the agent is still working the turn; ``IDLE`` the turn
finished cleanly (it has gone quiet); ``ERROR`` the thread/turn failed.
"""
RUNNING = "running"
IDLE = "idle"
ERROR = "error"
class CIStatus(Enum):
"""CI verdict for a pushed commit. ``PENDING`` covers both "no run yet" and
"in progress" the state machine waits on either."""
PENDING = "pending"
GREEN = "green"
RED = "red"
class Phase(Enum):
"""Where a single issue's run is in its lifecycle. Ordered: each phase is a
gate the run passes through on the way to ``DONE``. ``phase_checklist``
renders these; the loop advances through them as evidence arrives."""
WORKTREE = "worktree" # isolated workspace created
TESTS_RED = "tests_red" # failing test written first (TDD red)
GREEN = "green" # implementation makes tests pass (TDD green)
PUSHED = "pushed" # commit(s) pushed to master
CI = "ci" # CI pipeline running on the pushed commit
DEPLOYED = "deployed" # deploy/rollout reached the cluster
DONE = "done" # verified complete; issue can be closed
class Action(Enum):
"""The decision ``run_state_machine.next_action`` returns for one tick.
``WAIT`` nothing to do yet, poll again; ``CLOSE_SUCCESS`` run is green,
CI passed, close the issue; ``ESCALATE_PREPUSH`` the agent errored/stalled
before pushing anything, hand back to a human; ``FIX_FORWARD`` CI went red
on a pushed commit, dispatch another corrective turn; ``FREEZE_ESCALATE``
fix-forward budget exhausted (attempts or wall-clock), stop and escalate.
"""
WAIT = "wait"
CLOSE_SUCCESS = "close_success"
ESCALATE_PREPUSH = "escalate_prepush"
FIX_FORWARD = "fix_forward"
FREEZE_ESCALATE = "freeze_escalate"
# --------------------------------------------------------------------------- #
# Data carriers.
# --------------------------------------------------------------------------- #
@dataclass
class Issue:
"""A tracker issue the loop might dispatch.
``labeled_by_trusted`` records whether the gating label was applied by a
trusted identity the loop must never dispatch an issue made ready by an
untrusted actor (prompt-injection / drive-by). ``blocked_by`` lists issue
numbers that must close first; ``priority`` orders the ready set (lower runs
first, matching tracker conventions).
"""
number: int
repo: str
labels: list[str]
blocked_by: list[int]
labeled_by_trusted: bool
priority: int
@dataclass
class DispatchDecision:
"""An issue the dispatch policy selected to run now, with a human-readable
``reason`` (logged + surfaced in notifications, never parsed)."""
issue: Issue
reason: str
@dataclass
class Config:
"""Loop configuration. DISABLED BY DEFAULT — ``kill_switch=True`` and an
empty ``allowlist`` mean a freshly-constructed Config dispatches nothing.
Enabling is a deliberate manual step (see ``config.from_env`` /
``from_configmap``).
"""
allowlist: list[str]
kill_switch: bool
in_progress_label: str = "agent-in-progress"
ready_label: str = "ready-for-agent"
budget_usd: float = 100.0
fix_forward_max_attempts: int = 5
fix_forward_max_seconds: int = 3600
@dataclass
class RunState:
"""Everything the state machine needs to decide one issue's next move.
Assembled each tick from the orchestration snapshot (``thread_status``), the
CI watcher (``ci_status``), and the loop's own bookkeeping (``pushed``,
``fix_forward_attempts``, ``elapsed_seconds``). ``thread_status`` /
``ci_status`` are ``None`` when not yet known (no snapshot entry / nothing
pushed to check yet).
"""
thread_status: ThreadStatus | None
ci_status: CIStatus | None
pushed: bool
fix_forward_attempts: int
elapsed_seconds: float

342
app/afk/watcher.py Normal file
View file

@ -0,0 +1,342 @@
"""CronJob entrypoint: drive ONE in-flight AFK run by a single tick.
The watcher is the *second half* of the loop the part that drives a run the
poller already started through to a terminal state. Given one in-flight run
(``InFlightRun``: the issue, the T3 thread to poll, the pushed commit if any,
and the fix-forward bookkeeping), one ``tick``:
1. **assemble a ``RunState``** from the live edges + the run's bookkeeping:
* ``thread_status`` from ``t3_client.snapshot()``, by finding this run's
thread and mapping T3's ``running``/``idle``/``error`` to a
``ThreadStatus`` (missing thread, or any unrecognised status, folds to
``None`` "no status yet" the state machine WAITs; we never escalate
or close on a status we don't understand);
* ``ci_status`` ``ci_watcher.status(repo, commit)`` *only* when a commit
is pushed (no commit nothing to check ``None``);
* ``pushed`` / ``fix_forward_attempts`` / ``elapsed_seconds`` straight
from the run.
2. **decide** via the pure ``run_state_machine.next_action`` (it owns the
lifecycle policy; the watcher owns only the I/O the decision implies).
3. **act** on the returned ``Action``:
* ``CLOSE_SUCCESS`` ``tracker.close`` + drop the in-progress label +
DONE checklist + ``done`` doorbell. The run landed.
* ``ESCALATE_PREPUSH`` / ``FREEZE_ESCALATE`` drop the in-progress label,
add the ``ready-for-human`` label, post the checklist, ring the
``needs-human`` / ``frozen`` doorbell. The run is handed to a human; the
issue is left OPEN (not closed) with the work in place.
* ``FIX_FORWARD`` dispatch a corrective turn (``t3_client.dispatch``),
bump the fix-forward attempt count, refresh the checklist, and keep the
run in flight (NOT terminal: no label churn, no doorbell the notifier
only speaks terminal kinds). The new thread id rides back on the result
so the next tick polls the corrective turn.
* ``WAIT`` just refresh the progress checklist and keep waiting.
Every adapter (T3, tracker, CI, notifier) is injected behind a structural
Protocol, so production wires the real clients and the tests wire the in-memory
fakes; this module opens no socket and reads no message bodies. (The pilot keeps
T3 ``state.sqlite`` message-body reads out of the core loop snapshot status +
CI status are all the state machine needs so this watcher never execs into the
pod; that observability nicety is a separate, optional concern.)
DISABLED BY DEFAULT applies transitively: the poller never starts a run while
the loop is off (``config.kill_switch`` / empty allowlist see ``config.py``),
so with the shipped defaults there is never an ``InFlightRun`` to tick.
"""
from dataclasses import dataclass
from typing import Protocol
from . import phase_checklist, run_state_machine
from .notifier import KIND_DONE, KIND_FROZEN, KIND_NEEDS_HUMAN
from .poller import T3Port as _DispatchPort # dispatch(repo, issue, prompt) -> id
from .types import Action, CIStatus, Config, Issue, Phase, RunState, ThreadStatus
# T3 snapshot status string -> ThreadStatus. Anything not in here (a status T3
# adds later, or a malformed entry) maps to None — "no usable status yet" — so
# the state machine waits rather than acting on something it can't interpret.
_THREAD_STATUS_BY_STRING: dict[str, ThreadStatus] = {
"running": ThreadStatus.RUNNING,
"idle": ThreadStatus.IDLE,
"error": ThreadStatus.ERROR,
}
# Action -> the terminal doorbell kind to ring. Only the terminal actions appear;
# WAIT / FIX_FORWARD are non-terminal and ring nothing (the notifier rejects a
# non-terminal kind on purpose — see ``notifier.TERMINAL_KINDS``).
_TERMINAL_KIND_BY_ACTION: dict[Action, str] = {
Action.CLOSE_SUCCESS: KIND_DONE,
Action.ESCALATE_PREPUSH: KIND_NEEDS_HUMAN,
Action.FREEZE_ESCALATE: KIND_FROZEN,
}
# Default label applied when a run is handed back to a human. Mirrors the
# tracker's ``ready-for-agent`` convention; overridable per-Watcher.
DEFAULT_READY_FOR_HUMAN_LABEL = "ready-for-human"
# --------------------------------------------------------------------------- #
# Injected adapter Protocols — structural, so the real clients and the test
# fakes both satisfy them with no subclassing. Only the methods the watcher
# actually calls appear. ``DispatchPort`` is reused from ``poller``.
# --------------------------------------------------------------------------- #
class SnapshotPort(_DispatchPort, Protocol):
"""T3 surface the watcher needs: ``dispatch`` (for the corrective turn) plus
``snapshot`` (for thread liveness)."""
def snapshot(self) -> dict: ...
class TrackerPort(Protocol):
"""The slice of ``tracker.Tracker`` the watch tick needs."""
def add_label(self, repo: str, issue: int, label: str) -> None: ...
def remove_label(self, repo: str, issue: int, label: str) -> None: ...
def comment(self, repo: str, issue: int, body: str) -> None: ...
def close(self, repo: str, issue: int) -> None: ...
class CIPort(Protocol):
"""The slice of ``ci_watcher.CIWatcher`` the watch tick needs."""
def status(self, repo: str, commit: str) -> CIStatus: ...
class NotifierPort(Protocol):
"""The slice of ``notifier.Notifier`` the watch tick needs."""
def notify(self, kind: str, issue: Issue, thread_id: str | None, detail: str) -> None: ...
@dataclass
class InFlightRun:
"""One run the watcher is driving, as the loop tracks it between ticks.
``thread_id`` is the T3 thread to poll this tick; ``commit`` is the pushed
commit CI watches (``None`` until the agent has pushed). ``fix_forward_attempts``
and ``elapsed_seconds`` are the loop's own bookkeeping, fed straight into the
assembled ``RunState`` ``pushed`` is derived as ``commit is not None``.
"""
issue: Issue
thread_id: str
commit: str | None
fix_forward_attempts: int = 0
elapsed_seconds: float = 0.0
@dataclass
class TickResult:
"""The outcome of one watch tick.
``action`` is the state machine's verdict; ``terminal`` is True iff the run
reached an end state (closed or handed to a human) and should no longer be
ticked. ``thread_id`` / ``fix_forward_attempts`` carry the (possibly updated)
bookkeeping the caller threads into the next ``InFlightRun`` they change
only on a FIX_FORWARD (new corrective thread, incremented attempts) and are
otherwise echoed back unchanged.
"""
action: Action
terminal: bool
thread_id: str
fix_forward_attempts: int
class Watcher:
"""Drives one in-flight run per ``tick`` over injected adapters.
The three escalation-vs-success decisions live in the pure
``run_state_machine``; this class only performs the I/O each decision
implies. ``ready_for_human_label`` is the label stamped on a run handed back
to a human (default :data:`DEFAULT_READY_FOR_HUMAN_LABEL`).
"""
def __init__(
self,
t3_client: SnapshotPort,
tracker: TrackerPort,
ci_watcher: CIPort,
notifier: NotifierPort,
ready_for_human_label: str = DEFAULT_READY_FOR_HUMAN_LABEL,
) -> None:
self._t3 = t3_client
self._tracker = tracker
self._ci = ci_watcher
self._notifier = notifier
self._ready_for_human_label = ready_for_human_label
def tick(self, run: InFlightRun, config: Config) -> TickResult:
"""Drive ``run`` one step (see module docstring)."""
state = self._assemble_state(run)
action = run_state_machine.next_action(state, config)
if action is Action.CLOSE_SUCCESS:
return self._close_success(run, config)
if action in (Action.ESCALATE_PREPUSH, Action.FREEZE_ESCALATE):
return self._escalate(run, state, action, config)
if action is Action.FIX_FORWARD:
return self._fix_forward(run, state)
# WAIT: still in flight — just show progress and poll again next tick.
return self._wait(run, state, action)
# ----------------------------------------------------------------- #
# RunState assembly.
# ----------------------------------------------------------------- #
def _assemble_state(self, run: InFlightRun) -> RunState:
thread_status = self._thread_status(run.thread_id)
# Only fold CI when there's a commit to check — an unpushed run has no
# pipeline, and we must not query CI (the assertion in the tests, and
# avoiding a needless API call, both rely on this).
ci_status = (
self._ci.status(run.issue.repo, run.commit)
if run.commit is not None
else None
)
return RunState(
thread_status=thread_status,
ci_status=ci_status,
pushed=run.commit is not None,
fix_forward_attempts=run.fix_forward_attempts,
elapsed_seconds=run.elapsed_seconds,
)
def _thread_status(self, thread_id: str) -> ThreadStatus | None:
"""This thread's liveness from the fleet snapshot, or ``None`` when the
thread is absent or its status string is one we don't recognise."""
for thread in self._t3.snapshot().get("threads", []):
if thread.get("id") == thread_id:
return _THREAD_STATUS_BY_STRING.get(thread.get("status"))
return None
# ----------------------------------------------------------------- #
# Per-action handlers.
# ----------------------------------------------------------------- #
def _close_success(self, run: InFlightRun, config: Config) -> TickResult:
"""Landed: close the issue, drop the lock, post DONE, ring the doorbell."""
self._post_checklist(run, Phase.DONE)
self._tracker.remove_label(
run.issue.repo, run.issue.number, config.in_progress_label
)
self._tracker.close(run.issue.repo, run.issue.number)
self._notify(run, Action.CLOSE_SUCCESS, "Run landed: pushed and CI green.")
return _terminal(Action.CLOSE_SUCCESS, run)
def _escalate(
self, run: InFlightRun, state: RunState, action: Action, config: Config
) -> TickResult:
"""Hand back to a human: drop the lock, add ready-for-human, post the
checklist, ring the matching doorbell. The issue stays OPEN."""
self._post_checklist(run, _phase_for(state))
self._tracker.remove_label(
run.issue.repo, run.issue.number, config.in_progress_label
)
self._tracker.add_label(
run.issue.repo, run.issue.number, self._ready_for_human_label
)
self._notify(run, action, _escalation_detail(action, state))
return _terminal(action, run)
def _fix_forward(self, run: InFlightRun, state: RunState) -> TickResult:
"""CI red with budget left: dispatch a corrective turn and stay in flight.
Not terminal no doorbell (the notifier only speaks terminal kinds) and
no label churn (the in-progress lock stays put). The corrective dispatch
spawns a fresh thread; its id and the incremented attempt count ride back
so the next tick tracks the right thread.
"""
attempts = run.fix_forward_attempts + 1
new_thread_id = self._t3.dispatch(
run.issue.repo, run.issue.number, _fix_forward_prompt(run)
)
self._post_checklist(run, Phase.CI, fix_forward_attempts=attempts)
return TickResult(
action=Action.FIX_FORWARD,
terminal=False,
thread_id=new_thread_id,
fix_forward_attempts=attempts,
)
def _wait(self, run: InFlightRun, state: RunState, action: Action) -> TickResult:
"""Still working: refresh the progress checklist, change nothing else."""
self._post_checklist(run, _phase_for(state))
return TickResult(
action=action,
terminal=False,
thread_id=run.thread_id,
fix_forward_attempts=run.fix_forward_attempts,
)
# ----------------------------------------------------------------- #
# I/O helpers.
# ----------------------------------------------------------------- #
def _post_checklist(
self, run: InFlightRun, phase: Phase, *, fix_forward_attempts: int | None = None
) -> None:
attempts = run.fix_forward_attempts if fix_forward_attempts is None else fix_forward_attempts
body = phase_checklist.render(
phase,
{
"repo": run.issue.repo,
"issue": run.issue.number,
"thread_id": run.thread_id,
"fix_forward_attempts": attempts,
},
)
self._tracker.comment(run.issue.repo, run.issue.number, body)
def _notify(self, run: InFlightRun, action: Action, detail: str) -> None:
self._notifier.notify(
_TERMINAL_KIND_BY_ACTION[action], run.issue, run.thread_id, detail
)
# --------------------------------------------------------------------------- #
# Pure helpers.
# --------------------------------------------------------------------------- #
def _terminal(action: Action, run: InFlightRun) -> TickResult:
"""A terminal :class:`TickResult` echoing the run's bookkeeping unchanged."""
return TickResult(
action=action,
terminal=True,
thread_id=run.thread_id,
fix_forward_attempts=run.fix_forward_attempts,
)
def _phase_for(state: RunState) -> Phase:
"""Best-effort current lifecycle phase from the evidence in ``state``.
The checklist is decoration only (the loop reads no agent message bodies), so
this maps the observable signals pushed? CI verdict? onto the closest
phase: nothing pushed still working toward the implementation (GREEN);
pushed the CI phase is where attention sits until it goes green. A green CI
is rendered as DONE by the close path, not here.
"""
if not state.pushed:
return Phase.GREEN
if state.ci_status is CIStatus.GREEN:
return Phase.DEPLOYED
return Phase.CI
def _escalation_detail(action: Action, state: RunState) -> str:
"""Human-readable escalation reason for the doorbell + logs (never parsed)."""
if action is Action.ESCALATE_PREPUSH:
return (
"Agent stalled or errored before pushing any commit "
f"(thread {state.thread_status.value if state.thread_status else 'unknown'}). "
"Handed back for a human."
)
return (
"Fix-forward budget exhausted with CI still red "
f"({state.fix_forward_attempts} attempts, {state.elapsed_seconds:.0f}s). "
"Frozen for a human."
)
def _fix_forward_prompt(run: InFlightRun) -> str:
"""The corrective-turn prompt: point the agent at the red CI on its commit."""
return (
f"CI is RED on your pushed commit {run.commit} for issue #{run.issue.number} "
f"in `{run.issue.repo}`. Investigate the failing run, fix the cause, and "
f"push the fix to master. Then watch CI again until it is green."
)