afk: add the autonomous issue-implementer loop (SHIPS DISABLED)
Adds app/afk/ — the "away-from-keyboard" control plane that watches the
issue tracker for ready-for-agent issues, dispatches each to a fresh
full-access T3 thread (with the issue-implementer preamble prepended,
because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting
run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed,
escalating or fix-forwarding via a small pure state machine.
The loop is split into pure cores (no I/O, exhaustively unit-tested) and
thin injected adapters (the only edges that ever touch T3, the tracker,
CI, or Slack — faked in every test, so nothing here talks to a real
server, GitHub/Forgejo, or the cluster):
pure: types, dispatch_policy, run_state_machine, phase_checklist,
config, issue_implementer_prompt
adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher,
notifier
loops: poller — CronJob tick #1: list_ready -> select_dispatchable
-> dispatch + stamp the in-progress lock (label only
AFTER a successful dispatch, so a failed dispatch
never leaves a phantom lock). Per-repo lock derived
from the ready set, since the CronJob is stateless
between ticks.
watcher — CronJob tick #2: assemble RunState from snapshot +
CI -> next_action -> act (close on success; relabel
ready-for-human + ring the doorbell on the two
escalations; dispatch a corrective turn on
fix-forward; refresh the progress checklist).
SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an
empty allowlist, so a freshly-loaded config dispatches nothing and does
zero I/O. The package is not imported by the running service and has no
auto-enable path. Arming it is a deliberate, later, manual step requiring
BOTH gates (clear the kill switch AND enrol the exact repos) so one
fat-fingered env var can't arm every repo.
Test-first throughout: 412 tests pass (poller + watcher add integration
tests wiring the real pure cores to in-memory fakes). mypy clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
171857da6b
commit
2ef0db9a96
23 changed files with 4717 additions and 0 deletions
159
app/afk/t3_client.py
Normal file
159
app/afk/t3_client.py
Normal file
|
|
@ -0,0 +1,159 @@
|
|||
"""Adapter for the in-cluster T3 Code instance — the AFK executor + cockpit.
|
||||
|
||||
The control plane keeps the brain; T3 runs the agent. This module is the thin
|
||||
wire between them: it turns "implement issue N of repo R with this prompt" into
|
||||
the TWO HTTP commands T3's orchestration API needs, and reads the fleet
|
||||
snapshot the watcher polls. It owns no AFK behaviour — the agent's standing
|
||||
rules ride in as the ``ISSUE_IMPLEMENTER_PREAMBLE`` prepended to the turn
|
||||
message, because T3's full-access ``claudeAgent`` runtime does NOT honour
|
||||
``~/.claude/CLAUDE.md`` (see ``issue_implementer_prompt``).
|
||||
|
||||
Two operations, both against the dedicated in-cluster T3 pod:
|
||||
|
||||
* ``dispatch(repo, issue, prompt) -> thread_id`` — POSTs ``thread.create``
|
||||
then ``thread.turn.start`` to ``/api/orchestration/dispatch``. The create
|
||||
command selects the ``claudeAgent`` instance in ``full-access`` runtime mode
|
||||
and returns a thread id; the turn command targets that thread and delivers
|
||||
``ISSUE_IMPLEMENTER_PREAMBLE + prompt`` as ``message.text``. One dispatch =
|
||||
one worktree-isolated worker.
|
||||
* ``snapshot() -> dict`` — GETs ``/api/orchestration/snapshot``, the full fleet
|
||||
read-model. T3 has no outbound webhooks, so the watcher polls this for
|
||||
per-thread ``running``/``idle``/``error`` status.
|
||||
|
||||
The HTTP transport and the bearer provider are **injected** (constructor
|
||||
args), so the production wiring hands in an ``httpx.Client`` plus a Vault-backed
|
||||
token reader, while tests hand in an in-memory fake — nothing here ever opens a
|
||||
socket on its own. The bearer is re-read from the provider on **every** request
|
||||
because T3's ``orchestration:operate`` token expires hourly and is refreshed out
|
||||
of band.
|
||||
"""
|
||||
from collections.abc import Callable
|
||||
from typing import Protocol
|
||||
|
||||
from .issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
|
||||
|
||||
# Orchestration API paths, relative to the configured base URL.
|
||||
_DISPATCH_PATH = "/api/orchestration/dispatch"
|
||||
_SNAPSHOT_PATH = "/api/orchestration/snapshot"
|
||||
|
||||
# Pilot-baked dispatch envelope: which backend instance runs the thread and in
|
||||
# which runtime mode. Constants (not config) — every AFK thread is identical.
|
||||
_INSTANCE_ID = "claudeAgent"
|
||||
_RUNTIME_MODE = "full-access"
|
||||
|
||||
# JSON shapes. Command bodies and the snapshot read-model are open string-keyed
|
||||
# objects; ``object`` values keep us honest without a bare ``Any``.
|
||||
type Json = dict[str, object]
|
||||
|
||||
|
||||
class HttpResponse(Protocol):
|
||||
"""The httpx-shaped response surface this adapter relies on.
|
||||
|
||||
Both ``httpx.Response`` and the test fake satisfy it: ``raise_for_status``
|
||||
turns a non-2xx into an exception (so a failed ``thread.create`` aborts
|
||||
before ``thread.turn.start`` ever fires) and ``json`` parses the body.
|
||||
"""
|
||||
|
||||
def raise_for_status(self) -> object: ...
|
||||
|
||||
def json(self) -> Json: ...
|
||||
|
||||
|
||||
class HttpClient(Protocol):
|
||||
"""Minimal injected transport: a JSON ``post`` and a ``get``, both taking
|
||||
explicit headers. Deliberately a strict subset of ``httpx.Client`` so the
|
||||
real client passes one straight through and tests pass a recorder."""
|
||||
|
||||
def post(self, url: str, json: Json, headers: dict[str, str]) -> HttpResponse: ...
|
||||
|
||||
def get(self, url: str, headers: dict[str, str]) -> HttpResponse: ...
|
||||
|
||||
|
||||
class T3Client:
|
||||
"""Dispatch/snapshot adapter for one in-cluster T3 instance.
|
||||
|
||||
``base_url`` is the T3 service root (a trailing slash is tolerated);
|
||||
``http`` is the injected transport; ``bearer_provider`` returns the current
|
||||
``orchestration:operate`` token, re-read per request for hourly rotation.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
base_url: str,
|
||||
http: HttpClient,
|
||||
bearer_provider: Callable[[], str],
|
||||
) -> None:
|
||||
self._base_url = base_url.rstrip("/")
|
||||
self._http = http
|
||||
self._bearer_provider = bearer_provider
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# Public API (the ``t3_client.T3Client`` contract).
|
||||
# ----------------------------------------------------------------- #
|
||||
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
|
||||
"""Spawn one worker thread for ``issue`` of ``repo`` and return its id.
|
||||
|
||||
Two POSTs to ``/api/orchestration/dispatch``: ``thread.create`` (selects
|
||||
the ``claudeAgent`` instance, ``full-access`` runtime) yields the thread
|
||||
id; ``thread.turn.start`` then delivers ``ISSUE_IMPLEMENTER_PREAMBLE +
|
||||
prompt`` to that thread. A failed create raises and short-circuits the
|
||||
turn (we never fire a turn at a thread that wasn't created).
|
||||
"""
|
||||
create_resp = self._post(
|
||||
_DISPATCH_PATH,
|
||||
{
|
||||
"command": "thread.create",
|
||||
"repo": repo,
|
||||
"issue": issue,
|
||||
"modelSelection": {"instanceId": _INSTANCE_ID},
|
||||
"runtimeMode": _RUNTIME_MODE,
|
||||
},
|
||||
)
|
||||
thread_id = self._thread_id_of(create_resp.json())
|
||||
|
||||
self._post(
|
||||
_DISPATCH_PATH,
|
||||
{
|
||||
"command": "thread.turn.start",
|
||||
"threadId": thread_id,
|
||||
"message": {"text": ISSUE_IMPLEMENTER_PREAMBLE + prompt},
|
||||
},
|
||||
)
|
||||
return thread_id
|
||||
|
||||
def snapshot(self) -> Json:
|
||||
"""Return the parsed fleet read-model from ``/api/orchestration/snapshot``."""
|
||||
return self._get(_SNAPSHOT_PATH).json()
|
||||
|
||||
# ----------------------------------------------------------------- #
|
||||
# Internals.
|
||||
# ----------------------------------------------------------------- #
|
||||
def _post(self, path: str, body: Json) -> HttpResponse:
|
||||
resp = self._http.post(self._url(path), json=body, headers=self._headers())
|
||||
resp.raise_for_status()
|
||||
return resp
|
||||
|
||||
def _get(self, path: str) -> HttpResponse:
|
||||
resp = self._http.get(self._url(path), headers=self._headers())
|
||||
resp.raise_for_status()
|
||||
return resp
|
||||
|
||||
def _url(self, path: str) -> str:
|
||||
return f"{self._base_url}{path}"
|
||||
|
||||
def _headers(self) -> dict[str, str]:
|
||||
return {"Authorization": f"Bearer {self._bearer_provider()}"}
|
||||
|
||||
@staticmethod
|
||||
def _thread_id_of(create_response: Json) -> str:
|
||||
"""Extract the new thread id from a ``thread.create`` reply.
|
||||
|
||||
T3 returns it as ``threadId``; we fail loudly on a malformed reply rather
|
||||
than dispatch a turn at an empty/None id.
|
||||
"""
|
||||
thread_id = create_response.get("threadId")
|
||||
if not isinstance(thread_id, str) or not thread_id:
|
||||
raise ValueError(
|
||||
f"thread.create response missing a usable threadId: {create_response!r}"
|
||||
)
|
||||
return thread_id
|
||||
Loading…
Add table
Add a link
Reference in a new issue