claude-agent-service/app/afk/t3_client.py

160 lines
6.5 KiB
Python
Raw Normal View History

afk: add the autonomous issue-implementer loop (SHIPS DISABLED) Adds app/afk/ — the "away-from-keyboard" control plane that watches the issue tracker for ready-for-agent issues, dispatches each to a fresh full-access T3 thread (with the issue-implementer preamble prepended, because T3 does not honour ~/.claude/CLAUDE.md), and drives the resulting run through its lifecycle: tests-red -> green -> pushed -> CI -> deployed, escalating or fix-forwarding via a small pure state machine. The loop is split into pure cores (no I/O, exhaustively unit-tested) and thin injected adapters (the only edges that ever touch T3, the tracker, CI, or Slack — faked in every test, so nothing here talks to a real server, GitHub/Forgejo, or the cluster): pure: types, dispatch_policy, run_state_machine, phase_checklist, config, issue_implementer_prompt adapters: t3_client (two-POST dispatch + snapshot), tracker, ci_watcher, notifier loops: poller — CronJob tick #1: list_ready -> select_dispatchable -> dispatch + stamp the in-progress lock (label only AFTER a successful dispatch, so a failed dispatch never leaves a phantom lock). Per-repo lock derived from the ready set, since the CronJob is stateless between ticks. watcher — CronJob tick #2: assemble RunState from snapshot + CI -> next_action -> act (close on success; relabel ready-for-human + ring the doorbell on the two escalations; dispatch a corrective turn on fix-forward; refresh the progress checklist). SHIPS DISABLED, on purpose: Config defaults to kill_switch=True AND an empty allowlist, so a freshly-loaded config dispatches nothing and does zero I/O. The package is not imported by the running service and has no auto-enable path. Arming it is a deliberate, later, manual step requiring BOTH gates (clear the kill switch AND enrol the exact repos) so one fat-fingered env var can't arm every repo. Test-first throughout: 412 tests pass (poller + watcher add integration tests wiring the real pure cores to in-memory fakes). mypy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:15:11 +00:00
"""Adapter for the in-cluster T3 Code instance — the AFK executor + cockpit.
The control plane keeps the brain; T3 runs the agent. This module is the thin
wire between them: it turns "implement issue N of repo R with this prompt" into
the TWO HTTP commands T3's orchestration API needs, and reads the fleet
snapshot the watcher polls. It owns no AFK behaviour the agent's standing
rules ride in as the ``ISSUE_IMPLEMENTER_PREAMBLE`` prepended to the turn
message, because T3's full-access ``claudeAgent`` runtime does NOT honour
``~/.claude/CLAUDE.md`` (see ``issue_implementer_prompt``).
Two operations, both against the dedicated in-cluster T3 pod:
* ``dispatch(repo, issue, prompt) -> thread_id`` POSTs ``thread.create``
then ``thread.turn.start`` to ``/api/orchestration/dispatch``. The create
command selects the ``claudeAgent`` instance in ``full-access`` runtime mode
and returns a thread id; the turn command targets that thread and delivers
``ISSUE_IMPLEMENTER_PREAMBLE + prompt`` as ``message.text``. One dispatch =
one worktree-isolated worker.
* ``snapshot() -> dict`` GETs ``/api/orchestration/snapshot``, the full fleet
read-model. T3 has no outbound webhooks, so the watcher polls this for
per-thread ``running``/``idle``/``error`` status.
The HTTP transport and the bearer provider are **injected** (constructor
args), so the production wiring hands in an ``httpx.Client`` plus a Vault-backed
token reader, while tests hand in an in-memory fake nothing here ever opens a
socket on its own. The bearer is re-read from the provider on **every** request
because T3's ``orchestration:operate`` token expires hourly and is refreshed out
of band.
"""
from collections.abc import Callable
from typing import Protocol
from .issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
# Orchestration API paths, relative to the configured base URL.
_DISPATCH_PATH = "/api/orchestration/dispatch"
_SNAPSHOT_PATH = "/api/orchestration/snapshot"
# Pilot-baked dispatch envelope: which backend instance runs the thread and in
# which runtime mode. Constants (not config) — every AFK thread is identical.
_INSTANCE_ID = "claudeAgent"
_RUNTIME_MODE = "full-access"
# JSON shapes. Command bodies and the snapshot read-model are open string-keyed
# objects; ``object`` values keep us honest without a bare ``Any``.
type Json = dict[str, object]
class HttpResponse(Protocol):
"""The httpx-shaped response surface this adapter relies on.
Both ``httpx.Response`` and the test fake satisfy it: ``raise_for_status``
turns a non-2xx into an exception (so a failed ``thread.create`` aborts
before ``thread.turn.start`` ever fires) and ``json`` parses the body.
"""
def raise_for_status(self) -> object: ...
def json(self) -> Json: ...
class HttpClient(Protocol):
"""Minimal injected transport: a JSON ``post`` and a ``get``, both taking
explicit headers. Deliberately a strict subset of ``httpx.Client`` so the
real client passes one straight through and tests pass a recorder."""
def post(self, url: str, json: Json, headers: dict[str, str]) -> HttpResponse: ...
def get(self, url: str, headers: dict[str, str]) -> HttpResponse: ...
class T3Client:
"""Dispatch/snapshot adapter for one in-cluster T3 instance.
``base_url`` is the T3 service root (a trailing slash is tolerated);
``http`` is the injected transport; ``bearer_provider`` returns the current
``orchestration:operate`` token, re-read per request for hourly rotation.
"""
def __init__(
self,
base_url: str,
http: HttpClient,
bearer_provider: Callable[[], str],
) -> None:
self._base_url = base_url.rstrip("/")
self._http = http
self._bearer_provider = bearer_provider
# ----------------------------------------------------------------- #
# Public API (the ``t3_client.T3Client`` contract).
# ----------------------------------------------------------------- #
def dispatch(self, repo: str, issue: int, prompt: str) -> str:
"""Spawn one worker thread for ``issue`` of ``repo`` and return its id.
Two POSTs to ``/api/orchestration/dispatch``: ``thread.create`` (selects
the ``claudeAgent`` instance, ``full-access`` runtime) yields the thread
id; ``thread.turn.start`` then delivers ``ISSUE_IMPLEMENTER_PREAMBLE +
prompt`` to that thread. A failed create raises and short-circuits the
turn (we never fire a turn at a thread that wasn't created).
"""
create_resp = self._post(
_DISPATCH_PATH,
{
"command": "thread.create",
"repo": repo,
"issue": issue,
"modelSelection": {"instanceId": _INSTANCE_ID},
"runtimeMode": _RUNTIME_MODE,
},
)
thread_id = self._thread_id_of(create_resp.json())
self._post(
_DISPATCH_PATH,
{
"command": "thread.turn.start",
"threadId": thread_id,
"message": {"text": ISSUE_IMPLEMENTER_PREAMBLE + prompt},
},
)
return thread_id
def snapshot(self) -> Json:
"""Return the parsed fleet read-model from ``/api/orchestration/snapshot``."""
return self._get(_SNAPSHOT_PATH).json()
# ----------------------------------------------------------------- #
# Internals.
# ----------------------------------------------------------------- #
def _post(self, path: str, body: Json) -> HttpResponse:
resp = self._http.post(self._url(path), json=body, headers=self._headers())
resp.raise_for_status()
return resp
def _get(self, path: str) -> HttpResponse:
resp = self._http.get(self._url(path), headers=self._headers())
resp.raise_for_status()
return resp
def _url(self, path: str) -> str:
return f"{self._base_url}{path}"
def _headers(self) -> dict[str, str]:
return {"Authorization": f"Bearer {self._bearer_provider()}"}
@staticmethod
def _thread_id_of(create_response: Json) -> str:
"""Extract the new thread id from a ``thread.create`` reply.
T3 returns it as ``threadId``; we fail loudly on a malformed reply rather
than dispatch a turn at an empty/None id.
"""
thread_id = create_response.get("threadId")
if not isinstance(thread_id, str) or not thread_id:
raise ValueError(
f"thread.create response missing a usable threadId: {create_response!r}"
)
return thread_id