afk: wire the T3 adapter to the REAL orchestration contract + fix priority
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled

The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:

- dispatch sends BARE commands keyed by `type` (not a `command` string), with
  client-minted threadId/commandId/messageId + createdAt; the server replies
  {sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
  in), so dispatch ensures the repo's project (snapshot -> project.create iff
  absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
  retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
  running/in_progress/pending->running, errored->error), not a non-existent
  top-level `status` field.

Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".

Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-15 22:27:00 +00:00
parent 2ef0db9a96
commit e34640cc47
8 changed files with 555 additions and 272 deletions

View file

@ -62,8 +62,21 @@ def _run(
)
# Map the tests' abstract liveness vocab to T3's REAL ``latestTurn.state`` strings
# so call sites stay readable while the snapshot carries the true shape the
# watcher parses (a finished turn is "completed", a failed one "errored",
# "running" is itself real). Unknown values pass through verbatim.
_REAL_STATE = {"idle": "completed", "error": "errored"}
def _snapshot(thread_id: str, status: str) -> dict:
return {"threads": [{"id": thread_id, "status": status}]}
"""A fleet snapshot with one thread whose latest turn is in ``status`` — real
shape ``threads[].latestTurn.state`` (not a top-level ``status`` field)."""
return {
"threads": [
{"id": thread_id, "latestTurn": {"state": _REAL_STATE.get(status, status)}}
]
}
def _labels(fake_tracker):
@ -173,7 +186,7 @@ def test_close_success_posts_done_checklist(
# --------------------------------------------------------------------------- #
# ESCALATE_PREPUSH — agent stalled/errored before any push: hand to a human.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("thread_state", ["error", "idle"])
@pytest.mark.parametrize("thread_state", ["errored", "completed"])
def test_escalate_prepush_relabels_and_notifies(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, thread_state
):
@ -292,6 +305,47 @@ def test_unknown_thread_status_waits(
assert fake_notifier.sent == []
# --------------------------------------------------------------------------- #
# Real T3 ``latestTurn.state`` strings map to the right liveness (contract guard
# against the snapshot-shape drift that the previous adapter/fake masked).
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("state", ["running", "in_progress", "pending", "queued", "pendingInit"])
def test_real_in_progress_states_keep_waiting(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, state
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": state}}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "wait" # still working -> keep polling
def test_real_errored_state_escalates_when_nothing_pushed(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# The real failure state is "errored" (not "error"); with nothing pushed it
# is a pre-push escalation, not a freeze.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": "errored"}}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "escalate_prepush"
def test_thread_present_but_no_turn_yet_waits(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# A freshly-created thread has no latestTurn -> no usable status yet -> WAIT.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0"}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "wait"
# --------------------------------------------------------------------------- #
# Terminal cleanup only happens once / cleanly: a terminal tick posts exactly
# one checklist comment (no double-commenting on the way out).