afk: wire the T3 adapter to the REAL orchestration contract + fix priority
Some checks failed
Build and Push / lint-and-test (push) Has been cancelled
Build and Push / build (push) Has been cancelled
Build and Push / deploy (push) Has been cancelled
Build and Push / notify-failure (push) Has been cancelled

The T3 dispatch adapter was written against a guessed wire shape that the test
fake accepted but the live t3-afk server 400s — so the previously-green suite did
NOT mean the loop was actually wired to T3. Reverse-engineered the real contract
from the v0.0.27 binary, verified it live against t3-afk (including multi-turn),
and rewrote the adapter to match:

- dispatch sends BARE commands keyed by `type` (not a `command` string), with
  client-minted threadId/commandId/messageId + createdAt; the server replies
  {sequence}, so dispatch returns the id it generated (never one parsed back).
- a thread lives in a project (workspaceRoot = the repo checkout the agent runs
  in), so dispatch ensures the repo's project (snapshot -> project.create iff
  absent) before thread.create + thread.turn.start.
- add send_turn() for follow-up turns on an existing thread — multi-turn context
  retention is verified live (turn 2 recalled turn 1).
- watcher reads thread liveness from latestTurn.state (completed->idle,
  running/in_progress/pending->running, errored->error), not a non-existent
  top-level `status` field.

Guard against recurrence: the test fake now REJECTS any command lacking a `type`
discriminator (the original bug fails loudly), plus an opt-in live smoke test
(tests/test_afk_t3_live.py) so "green" can mean "wired to T3".

Also align dispatch_policy to lower-priority-value-first (P0 before P1), matching
tracker conventions and Issue.priority's own docstring — it had deliberately
diverged to higher-first. Loop still ships DISABLED (kill switch on, empty
allowlist). 416 tests pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-15 22:27:00 +00:00
parent 2ef0db9a96
commit e34640cc47
8 changed files with 555 additions and 272 deletions

View file

@ -7,11 +7,10 @@ touches a real T3 server, tracker, or cluster. The suite walks the full
dispatchability matrix trust gate, allowlist, per-repo lock, blocked_by,
kill switch plus the priority ordering and the one-agent-per-repo invariant.
Ordering contract under test: **higher ``priority`` first** (per the AFK module
spec), with a deterministic tiebreaker so the output is stable regardless of
input order. NOTE: ``Issue.priority``'s own docstring says "lower runs first";
this module follows the explicit dispatch-policy spec instead see the module
docstring in ``dispatch_policy.py``.
Ordering contract under test: **lower ``priority`` value first** (P0 before P1
before P2 most urgent wins), matching tracker conventions and
``Issue.priority``'s own docstring, with a deterministic tiebreaker (ascending
issue number) so the output is stable regardless of input order.
"""
import itertools
@ -183,13 +182,13 @@ def test_all_repos_in_flight_dispatches_nothing(make_issue, make_config):
# same repo, even when both are eligible and the repo is not yet in-flight.
# --------------------------------------------------------------------------- #
def test_at_most_one_decision_per_repo(make_issue, make_config):
lo = make_issue(number=1, repo="infra", priority=1)
hi = make_issue(number=2, repo="infra", priority=9)
urgent = make_issue(number=1, repo="infra", priority=1)
minor = make_issue(number=2, repo="infra", priority=9)
decisions = dispatch_policy.select_dispatchable(
[lo, hi], make_config(allowlist=["infra"]), set()
[urgent, minor], make_config(allowlist=["infra"]), set()
)
assert len(decisions) == 1
assert decisions[0].issue.number == 2 # the higher-priority one wins the slot
assert decisions[0].issue.number == 1 # most urgent (lowest value) wins the slot
def test_one_decision_per_repo_across_many_repos(make_issue, make_config):
@ -202,20 +201,21 @@ def test_one_decision_per_repo_across_many_repos(make_issue, make_config):
decisions = dispatch_policy.select_dispatchable(
issues, make_config(allowlist=["infra", "realestate-crawler"]), set()
)
# One per repo, each the repo's highest-priority eligible issue.
assert _selected_set(decisions) == {11, 20}
# One per repo, each the repo's most urgent (lowest-value) eligible issue:
# infra -> #10 (p1 < p5); realestate-crawler -> #21 (p2 < p3).
assert _selected_set(decisions) == {10, 21}
repos = [d.issue.repo for d in decisions]
assert len(repos) == len(set(repos)) # no repo appears twice
def test_ineligible_higher_priority_does_not_consume_repo_slot(make_issue, make_config):
"""A higher-priority issue that is itself ineligible (e.g. blocked) must not
suppress a lower-priority *eligible* issue in the same repo the slot goes
to the best ELIGIBLE candidate, not merely the highest-priority one."""
blocked_hi = make_issue(number=1, repo="infra", priority=9, blocked_by=[99])
ready_lo = make_issue(number=2, repo="infra", priority=1)
"""A more-urgent issue that is itself ineligible (e.g. blocked) must not
suppress a less-urgent *eligible* issue in the same repo the slot goes to
the best ELIGIBLE candidate, not merely the most urgent one."""
blocked_urgent = make_issue(number=1, repo="infra", priority=1, blocked_by=[99])
ready_minor = make_issue(number=2, repo="infra", priority=9)
decisions = dispatch_policy.select_dispatchable(
[blocked_hi, ready_lo], make_config(allowlist=["infra"]), set()
[blocked_urgent, ready_minor], make_config(allowlist=["infra"]), set()
)
assert _selected_numbers(decisions) == [2]
@ -249,18 +249,18 @@ def test_blocked_filters_only_blocked(make_issue, make_config):
# --------------------------------------------------------------------------- #
# Priority ordering — higher priority first, deterministic tiebreaker.
# Priority ordering — lower priority value first, deterministic tiebreaker.
# --------------------------------------------------------------------------- #
def test_higher_priority_first(make_issue, make_config):
lo = make_issue(number=1, repo="infra", priority=1)
mid = make_issue(number=2, repo="realestate-crawler", priority=5)
hi = make_issue(number=3, repo="SparkyFitness", priority=9)
def test_lower_priority_value_first(make_issue, make_config):
p1 = make_issue(number=1, repo="infra", priority=1)
p5 = make_issue(number=2, repo="realestate-crawler", priority=5)
p9 = make_issue(number=3, repo="SparkyFitness", priority=9)
decisions = dispatch_policy.select_dispatchable(
[lo, hi, mid],
[p1, p9, p5],
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
set(),
)
assert _selected_numbers(decisions) == [3, 2, 1] # 9, 5, 1
assert _selected_numbers(decisions) == [1, 2, 3] # priorities 1, 5, 9
def test_ordering_independent_of_input_order(make_issue, make_config):
@ -274,7 +274,7 @@ def test_ordering_independent_of_input_order(make_issue, make_config):
]
allow = ["infra", "realestate-crawler", "SparkyFitness", "health"]
config = make_config(allowlist=allow)
expected = [20, 30, 10, 40] # priorities 8,5,2,1
expected = [40, 10, 30, 20] # priorities 1,2,5,8 (most urgent first)
for perm in itertools.permutations(base):
issues = [make_issue(number=n, repo=r, priority=p) for (r, n, p) in perm]
@ -305,7 +305,7 @@ def test_negative_and_zero_priorities_order_correctly(make_issue, make_config):
make_config(allowlist=["infra", "realestate-crawler", "SparkyFitness"]),
set(),
)
assert _selected_numbers(decisions) == [3, 2, 1] # 3 > 0 > -5
assert _selected_numbers(decisions) == [1, 2, 3] # -5 < 0 < 3 (most urgent first)
# --------------------------------------------------------------------------- #

View file

@ -176,14 +176,15 @@ def test_custom_in_progress_label_drives_the_lock(fake_tracker, fake_t3, make_is
# --------------------------------------------------------------------------- #
# One dispatch per repo per tick (the policy's one-agent-per-repo invariant,
# observed through the poller): highest-priority eligible issue wins the slot.
# observed through the poller): the most urgent (lowest-value) eligible issue
# wins the slot.
# --------------------------------------------------------------------------- #
def test_one_dispatch_per_repo_per_tick(fake_tracker, fake_t3, make_issue):
fake_tracker.seed(
"infra",
[
make_issue(number=1, repo="infra", priority=1),
make_issue(number=2, repo="infra", priority=9), # highest priority
make_issue(number=1, repo="infra", priority=1), # most urgent (lowest value)
make_issue(number=2, repo="infra", priority=9),
make_issue(number=3, repo="infra", priority=5),
],
)
@ -191,8 +192,8 @@ def test_one_dispatch_per_repo_per_tick(fake_tracker, fake_t3, make_issue):
_poller(fake_tracker, fake_t3).run_once(config)
assert _dispatched_pairs(fake_t3) == {("infra", 2)}
assert _added_in_progress(fake_tracker) == {("infra", 2)}
assert _dispatched_pairs(fake_t3) == {("infra", 1)}
assert _added_in_progress(fake_tracker) == {("infra", 1)}
# --------------------------------------------------------------------------- #

View file

@ -1,30 +1,34 @@
"""Tests for ``app.afk.t3_client`` — the in-cluster T3 dispatch/snapshot adapter.
Everything here runs against an in-memory FAKE HTTP transport (``FakeHttp``);
no test touches a real T3 server, GitHub/Forgejo, or the cluster. The fake
records every request and replays staged responses, so the assertions pin the
wire contract the control plane depends on:
Everything runs against an in-memory FAKE HTTP transport; no test touches a real
T3 server. These assertions pin the **real** orchestration wire contract
(reverse-engineered from T3 v0.0.27 and verified live against t3-afk on
2026-06-15) deliberately strict, because the previous version of this adapter
passed a laxer fake while 400-ing the real server. The fake therefore *rejects*
a command without a ``type`` discriminator, so a regression to the old
``{"command": "..."}` shape fails loudly here.
* ``dispatch`` issues exactly TWO POSTs to ``/api/orchestration/dispatch``
``thread.create`` then ``thread.turn.start`` carrying
``modelSelection.instanceId == "claudeAgent"`` and ``runtimeMode ==
"full-access"``, with ``ISSUE_IMPLEMENTER_PREAMBLE`` PREPENDED to
``message.text`` and the thread id from the first response threaded into the
second.
* each request carries the ``Authorization: Bearer <token>`` header from the
injected bearer provider (re-read per call, so token refresh is honoured).
* ``snapshot`` GETs ``/api/orchestration/snapshot`` and returns the parsed body.
Pinned facts:
* the dispatch body is a BARE command keyed by ``type`` (not ``command``);
* the CLIENT mints ``threadId``/``commandId``/``messageId`` + ``createdAt``;
``dispatch`` returns the id it generated (the server replies ``{sequence}``);
* a thread lives in a project, so ``dispatch`` ensures the repo's project
(snapshot GET ``project.create`` iff absent) before ``thread.create``;
* ``ISSUE_IMPLEMENTER_PREAMBLE`` is prepended to the opening turn's text;
* ``send_turn`` posts a follow-up turn (no preamble) on an existing thread;
* every request carries ``Authorization: Bearer <token>``, re-read per call.
"""
import pytest
from app.afk import t3_client
from app.afk.issue_implementer_prompt import ISSUE_IMPLEMENTER_PREAMBLE
_MODEL = "claude-sonnet-4-6"
# --------------------------------------------------------------------------- #
# Fake HTTP transport — httpx-shaped (``post``/``get`` → response with
# ``.json()`` + ``.raise_for_status()``), so the real client can hand the
# adapter a plain ``httpx.Client`` while tests hand it this recorder.
# Fake HTTP transport — httpx-shaped, but it ENFORCES the command envelope so a
# malformed command (the old bug) raises instead of silently passing.
# --------------------------------------------------------------------------- #
class FakeResponse:
def __init__(self, payload: dict, status_code: int = 200) -> None:
@ -40,209 +44,222 @@ class FakeResponse:
class FakeHttp:
"""Records each POST/GET and replays queued responses in order.
"""Records each POST/GET; GETs replay staged snapshots (default: no projects,
so ``dispatch`` creates one). POST bodies are validated as real commands."""
``post`` pops from ``post_responses`` (FIFO); ``get`` pops from
``get_responses``. Each recorded call captures the url, json body, and
headers so tests can assert the two-command dispatch shape and the bearer.
"""
def __init__(
self,
post_responses: list[dict] | None = None,
get_responses: list[dict] | None = None,
) -> None:
self.post_responses = list(post_responses or [])
def __init__(self, get_responses: list[dict] | None = None) -> None:
self.get_responses = list(get_responses or [])
self.posts: list[dict] = []
self.gets: list[dict] = []
def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
assert isinstance(json.get("type"), str) and json["type"], (
f"command must carry a non-empty `type` discriminator, got {json!r}"
)
self.posts.append({"url": url, "json": json, "headers": headers})
if not self.post_responses:
raise AssertionError("unexpected POST — no response staged")
return FakeResponse(self.post_responses.pop(0))
return FakeResponse({"sequence": len(self.posts)}) # the real server reply
def get(self, url: str, headers: dict) -> FakeResponse:
self.gets.append({"url": url, "headers": headers})
if not self.get_responses:
raise AssertionError("unexpected GET — no response staged")
return FakeResponse(self.get_responses.pop(0))
body = self.get_responses.pop(0) if self.get_responses else {"projects": []}
return FakeResponse(body)
# Convenience views over recorded POSTs, keyed by command type.
def commands(self, type_: str) -> list[dict]:
return [c["json"] for c in self.posts if c["json"]["type"] == type_]
# Two thread.create / thread.turn.start replies the happy-path dispatch needs.
_CREATE_REPLY = {"threadId": "thread-abc"}
_TURN_REPLY = {"ok": True}
def _ids():
"""Deterministic id factory: id-1, id-2, … so tests can reason about minting."""
n = {"i": 0}
def f() -> str:
n["i"] += 1
return f"id-{n['i']}"
return f
def _client(http: FakeHttp, *, base_url: str = "http://t3-afk:8080", token: str = "tok-1"):
def _resolver(repo: str) -> t3_client.ProjectRef:
"""Predictable repo -> project mapping for assertions."""
return t3_client.ProjectRef(f"proj-{repo}", f"/data/{repo}", repo)
def _client(http: FakeHttp, *, base_url="http://t3-afk:8080", token="tok-1", **kw):
return t3_client.T3Client(
base_url=base_url,
http=http,
bearer_provider=lambda: token,
project_resolver=_resolver,
id_factory=kw.pop("id_factory", _ids()),
clock=kw.pop("clock", lambda: "2026-06-15T00:00:00+00:00"),
model=_MODEL,
)
def _dispatch(http: FakeHttp, **kw) -> str:
repo = kw.pop("repo", "infra")
issue = kw.pop("issue", 42)
prompt = kw.pop("prompt", "Do the thing.")
def _dispatch(http: FakeHttp, *, repo="infra", issue=42, prompt="Do the thing.", **kw):
return _client(http, **kw).dispatch(repo=repo, issue=issue, prompt=prompt)
# --------------------------------------------------------------------------- #
# dispatch — the two-POST shape.
# dispatch — ensure-project, then create, then turn.
# --------------------------------------------------------------------------- #
def test_dispatch_issues_exactly_two_posts_to_dispatch_endpoint():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
def test_dispatch_ensures_project_then_creates_thread_then_turn_when_project_absent():
http = FakeHttp(get_responses=[{"projects": []}])
_dispatch(http)
assert len(http.posts) == 2
assert http.gets == []
# one snapshot GET (the existence check) + three POSTs in order.
assert len(http.gets) == 1
types = [c["json"]["type"] for c in http.posts]
assert types == ["project.create", "thread.create", "thread.turn.start"]
for call in http.posts:
assert call["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
def test_dispatch_first_command_is_thread_create():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
def test_dispatch_skips_project_create_when_project_already_exists():
http = FakeHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
_dispatch(http, repo="infra")
types = [c["json"]["type"] for c in http.posts]
assert types == ["thread.create", "thread.turn.start"] # idempotent: no re-create
def test_dispatch_uses_type_discriminator_not_command_string():
# Regression guard for the original bug: discriminator is `type`, and there is
# no legacy top-level `command` string key on any command.
http = FakeHttp()
_dispatch(http)
assert http.posts[0]["json"]["command"] == "thread.create"
def test_dispatch_second_command_is_thread_turn_start():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
_dispatch(http)
assert http.posts[1]["json"]["command"] == "thread.turn.start"
def test_dispatch_returns_thread_id_from_create_response():
http = FakeHttp(post_responses=[{"threadId": "thread-xyz"}, _TURN_REPLY])
assert _dispatch(http) == "thread-xyz"
def test_dispatch_threads_created_id_into_turn_start():
http = FakeHttp(post_responses=[{"threadId": "thread-xyz"}, _TURN_REPLY])
_dispatch(http)
# The second command must target the thread the first call created.
assert http.posts[1]["json"]["threadId"] == "thread-xyz"
for c in http.posts:
assert "type" in c["json"]
assert not isinstance(c["json"].get("command"), str)
# --------------------------------------------------------------------------- #
# dispatch — model selection / runtime envelope (the pilot-baked constants).
# dispatch — thread.create real field set.
# --------------------------------------------------------------------------- #
def test_dispatch_uses_claude_agent_instance_and_full_access_runtime():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
_dispatch(http)
create_body = http.posts[0]["json"]
assert create_body["modelSelection"]["instanceId"] == "claudeAgent"
assert create_body["runtimeMode"] == "full-access"
def test_thread_create_carries_real_required_fields():
http = FakeHttp()
_dispatch(http, repo="infra")
create = http.commands("thread.create")[0]
assert create["projectId"] == "proj-infra"
assert create["modelSelection"] == {"instanceId": "claudeAgent", "model": _MODEL}
assert create["runtimeMode"] == "full-access"
assert create["interactionMode"] == "default"
# NullOr fields are present (not omitted) — the schema requires the keys.
assert create["branch"] is None
assert create["worktreePath"] is None
# client-minted identity + timestamp.
assert isinstance(create["commandId"], str) and create["commandId"]
assert isinstance(create["threadId"], str) and create["threadId"]
assert create["createdAt"] == "2026-06-15T00:00:00+00:00"
def test_dispatch_create_carries_repo_and_issue():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
_dispatch(http, repo="claude-agent-service", issue=7)
create_body = http.posts[0]["json"]
assert create_body["repo"] == "claude-agent-service"
assert create_body["issue"] == 7
def test_dispatch_returns_client_minted_thread_id_not_a_server_value():
http = FakeHttp()
returned = _dispatch(http)
create = http.commands("thread.create")[0]
turn = http.commands("thread.turn.start")[0]
# The returned id is the one WE put on thread.create (server only sends {sequence}).
assert returned == create["threadId"] == turn["threadId"]
# --------------------------------------------------------------------------- #
# dispatch — the preamble PREPEND (behaviour injection).
# dispatch — thread.turn.start real message shape + preamble.
# --------------------------------------------------------------------------- #
def test_dispatch_prepends_issue_implementer_preamble_to_message_text():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
def test_turn_message_has_real_shape_and_prepends_preamble():
http = FakeHttp()
_dispatch(http, prompt="Implement issue 42 body here.")
text = http.posts[1]["json"]["message"]["text"]
assert text == ISSUE_IMPLEMENTER_PREAMBLE + "Implement issue 42 body here."
turn = http.commands("thread.turn.start")[0]
msg = turn["message"]
assert msg["role"] == "user"
assert isinstance(msg["messageId"], str) and msg["messageId"]
assert msg["attachments"] == []
assert msg["text"] == ISSUE_IMPLEMENTER_PREAMBLE + "Implement issue 42 body here."
assert turn["runtimeMode"] == "full-access"
assert turn["interactionMode"] == "default"
def test_dispatch_preamble_comes_strictly_before_the_prompt():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
_dispatch(http, prompt="UNIQUE-PROMPT-MARKER")
text = http.posts[1]["json"]["message"]["text"]
assert text.startswith(ISSUE_IMPLEMENTER_PREAMBLE)
assert text.index(ISSUE_IMPLEMENTER_PREAMBLE) < text.index("UNIQUE-PROMPT-MARKER")
# The raw prompt is preserved verbatim after the preamble.
assert text.endswith("UNIQUE-PROMPT-MARKER")
def test_dispatch_does_not_prepend_preamble_to_create_command():
# The preamble belongs only on the turn message, not the thread.create call.
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
def test_preamble_only_on_turn_not_on_create():
http = FakeHttp()
_dispatch(http)
assert "message" not in http.posts[0]["json"]
assert "message" not in http.commands("thread.create")[0]
# --------------------------------------------------------------------------- #
# Auth — bearer header, read from the injected provider each call.
# send_turn — follow-up turn on an existing thread (multi-turn), no preamble.
# --------------------------------------------------------------------------- #
def test_dispatch_sends_bearer_on_both_posts():
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
def test_send_turn_posts_single_turn_to_existing_thread_without_preamble():
http = FakeHttp()
_client(http).send_turn("thread-xyz", "Just this follow-up.")
assert [c["json"]["type"] for c in http.posts] == ["thread.turn.start"]
turn = http.commands("thread.turn.start")[0]
assert turn["threadId"] == "thread-xyz"
assert turn["message"]["text"] == "Just this follow-up." # verbatim, no preamble
assert http.gets == [] # no project work for a follow-up
# --------------------------------------------------------------------------- #
# Auth — bearer on every request, re-read per call.
# --------------------------------------------------------------------------- #
def test_every_request_sends_bearer():
http = FakeHttp()
_dispatch(http, token="secret-token")
for call in http.posts:
assert call["headers"]["Authorization"] == "Bearer secret-token"
for call in http.gets:
assert call["headers"]["Authorization"] == "Bearer secret-token"
def test_bearer_provider_is_called_per_request_so_refresh_is_honoured():
# A rotating provider proves the token isn't captured once at construction
# (T3's orchestration token expires hourly and must be re-read).
tokens = iter(["tok-A", "tok-B", "tok-C"])
http = FakeHttp(post_responses=[_CREATE_REPLY, _TURN_REPLY])
def test_bearer_is_reread_per_request_so_rotation_is_honoured():
tokens = iter(["tok-A", "tok-B", "tok-C", "tok-D", "tok-E"])
http = FakeHttp()
client = t3_client.T3Client(
base_url="http://t3-afk:8080",
http=http,
bearer_provider=lambda: next(tokens),
project_resolver=_resolver,
id_factory=_ids(),
clock=lambda: "t",
)
client.dispatch(repo="infra", issue=1, prompt="x")
assert http.posts[0]["headers"]["Authorization"] == "Bearer tok-A"
assert http.posts[1]["headers"]["Authorization"] == "Bearer tok-B"
# GET(ensure) then POST(project.create) then POST(create) then POST(turn) —
# each pulled a fresh token in call order.
assert http.gets[0]["headers"]["Authorization"] == "Bearer tok-A"
assert http.posts[0]["headers"]["Authorization"] == "Bearer tok-B"
assert http.posts[1]["headers"]["Authorization"] == "Bearer tok-C"
assert http.posts[2]["headers"]["Authorization"] == "Bearer tok-D"
# --------------------------------------------------------------------------- #
# snapshot — GET + parse.
# --------------------------------------------------------------------------- #
def test_snapshot_gets_snapshot_endpoint_and_returns_parsed_body():
fleet = {"threads": [{"id": "thread-abc", "status": "running"}]}
def test_snapshot_gets_endpoint_and_returns_parsed_body():
fleet = {"threads": [{"id": "t1", "latestTurn": {"state": "running"}}], "projects": []}
http = FakeHttp(get_responses=[fleet])
result = _client(http).snapshot()
assert result == fleet
assert len(http.gets) == 1
assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
assert http.posts == []
def test_snapshot_sends_bearer():
http = FakeHttp(get_responses=[{"threads": []}])
_client(http, token="snap-token").snapshot()
assert http.gets[0]["headers"]["Authorization"] == "Bearer snap-token"
# --------------------------------------------------------------------------- #
# base_url handling — a trailing slash must not produce a double slash.
# base_url normalisation + error surfacing.
# --------------------------------------------------------------------------- #
def test_trailing_slash_in_base_url_is_normalised():
http = FakeHttp(
post_responses=[_CREATE_REPLY, _TURN_REPLY],
get_responses=[{"threads": []}],
)
http = FakeHttp()
client = _client(http, base_url="http://t3-afk:8080/")
client.dispatch(repo="infra", issue=1, prompt="x")
client.snapshot()
assert http.posts[0]["url"] == "http://t3-afk:8080/api/orchestration/dispatch"
assert http.gets[0]["url"] == "http://t3-afk:8080/api/orchestration/snapshot"
# --------------------------------------------------------------------------- #
# Error surfacing — a non-2xx response must raise, not be swallowed.
# --------------------------------------------------------------------------- #
def test_dispatch_raises_when_a_post_returns_an_error_status():
def test_dispatch_raises_and_short_circuits_when_a_post_errors():
class ErroringHttp(FakeHttp):
def post(self, url: str, json: dict, headers: dict) -> FakeResponse:
self.posts.append({"url": url, "json": json, "headers": headers})
super().post(url, json, headers) # validates + records
return FakeResponse({}, status_code=500)
http = ErroringHttp()
http = ErroringHttp(get_responses=[{"projects": [{"id": "proj-infra"}]}])
with pytest.raises(RuntimeError):
_dispatch(http)
# It failed on the FIRST call — never blindly fired thread.turn.start after
# a failed thread.create.
assert len(http.posts) == 1
_dispatch(http, repo="infra")
# Project already existed, so the FIRST post is thread.create — and it failed,
# so thread.turn.start never fired.
assert [c["json"]["type"] for c in http.posts] == ["thread.create"]

92
tests/test_afk_t3_live.py Normal file
View file

@ -0,0 +1,92 @@
"""LIVE smoke test for ``app.afk.t3_client`` against a real T3 instance.
Skipped by default. The unit tests (``test_afk_t3_client``) pin the wire shape
against a contract-accurate fake; this file proves the *same code* actually talks
to a live T3 the guard that "green tests" mean "wired to T3", which the earlier
fake-only suite did NOT provide (it was green while the real server 400'd).
It is opt-in because the orchestration API is in-cluster (ClusterIP + an
Authentik-gated ingress), so it can't run in CI without cluster access. Run it
from inside the cluster, or via a port-forward, with a bearer minted on the pod::
# bearer (on the t3-afk pod, as the node user):
# t3 auth session issue --token-only --base-dir /data/t3 --ttl 30m
kubectl -n t3-afk port-forward deploy/t3-afk 3773:3773 &
T3_AFK_BASE_URL=http://127.0.0.1:3773 T3_AFK_TOKEN=<bearer> \
python3 -m pytest tests/test_afk_t3_live.py -v
The read-only snapshot check is always safe. The full dispatch round-trip
(create thread + turn + verify it appears, then delete it) only runs with
``T3_AFK_SMOKE_DISPATCH=1`` since it spends a (tiny) agent turn.
"""
import os
import time
import pytest
from app.afk import t3_client
_BASE_URL = os.environ.get("T3_AFK_BASE_URL")
_TOKEN = os.environ.get("T3_AFK_TOKEN")
pytestmark = pytest.mark.skipif(
not (_BASE_URL and _TOKEN),
reason="set T3_AFK_BASE_URL + T3_AFK_TOKEN to run the live T3 smoke test",
)
def _real_client():
import httpx # local import so the module imports fine without httpx installed
return t3_client.T3Client(
base_url=_BASE_URL,
http=httpx.Client(timeout=30.0),
bearer_provider=lambda: _TOKEN,
)
def test_live_snapshot_has_the_real_shape():
"""A real snapshot parses and carries the keys the watcher/adapter depend on:
``threads`` + ``projects``, and any thread exposes ``latestTurn`` (the
liveness source) not a top-level ``status``."""
snap = _real_client().snapshot()
assert isinstance(snap, dict)
assert "threads" in snap and "projects" in snap
for thread in snap["threads"]:
assert "id" in thread
# liveness lives under latestTurn.state (the contract this suite guards)
assert "status" not in thread, "real threads have no top-level status field"
@pytest.mark.skipif(
os.environ.get("T3_AFK_SMOKE_DISPATCH") != "1",
reason="set T3_AFK_SMOKE_DISPATCH=1 to run the dispatch round-trip (spends a turn)",
)
def test_live_dispatch_round_trip_then_cleanup():
"""End-to-end against the real server: ``dispatch`` (ensure-project + create +
turn) succeeds and the new thread shows up in the snapshot. Cleans up the
thread it created so the cockpit isn't littered."""
import httpx
repo = "afk-smoke/roundtrip"
client = _real_client()
thread_id = client.dispatch(repo, 1, "Reply with just: ok. Do not use any tools.")
assert isinstance(thread_id, str) and thread_id
# The thread must appear in the fleet read-model (poll briefly — dispatch is
# accepted asynchronously).
found = False
for _ in range(10):
if any(t.get("id") == thread_id for t in client.snapshot().get("threads", [])):
found = True
break
time.sleep(1.0)
assert found, f"dispatched thread {thread_id} never appeared in the snapshot"
# Cleanup: delete the throwaway thread (raw command — not part of the adapter).
httpx.post(
f"{_BASE_URL.rstrip('/')}/api/orchestration/dispatch",
headers={"Authorization": f"Bearer {_TOKEN}"},
json={"type": "thread.delete", "commandId": t3_client._uuid(), "threadId": thread_id},
timeout=30.0,
).raise_for_status()

View file

@ -62,8 +62,21 @@ def _run(
)
# Map the tests' abstract liveness vocab to T3's REAL ``latestTurn.state`` strings
# so call sites stay readable while the snapshot carries the true shape the
# watcher parses (a finished turn is "completed", a failed one "errored",
# "running" is itself real). Unknown values pass through verbatim.
_REAL_STATE = {"idle": "completed", "error": "errored"}
def _snapshot(thread_id: str, status: str) -> dict:
return {"threads": [{"id": thread_id, "status": status}]}
"""A fleet snapshot with one thread whose latest turn is in ``status`` — real
shape ``threads[].latestTurn.state`` (not a top-level ``status`` field)."""
return {
"threads": [
{"id": thread_id, "latestTurn": {"state": _REAL_STATE.get(status, status)}}
]
}
def _labels(fake_tracker):
@ -173,7 +186,7 @@ def test_close_success_posts_done_checklist(
# --------------------------------------------------------------------------- #
# ESCALATE_PREPUSH — agent stalled/errored before any push: hand to a human.
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("thread_state", ["error", "idle"])
@pytest.mark.parametrize("thread_state", ["errored", "completed"])
def test_escalate_prepush_relabels_and_notifies(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, thread_state
):
@ -292,6 +305,47 @@ def test_unknown_thread_status_waits(
assert fake_notifier.sent == []
# --------------------------------------------------------------------------- #
# Real T3 ``latestTurn.state`` strings map to the right liveness (contract guard
# against the snapshot-shape drift that the previous adapter/fake masked).
# --------------------------------------------------------------------------- #
@pytest.mark.parametrize("state", ["running", "in_progress", "pending", "queued", "pendingInit"])
def test_real_in_progress_states_keep_waiting(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config, state
):
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": state}}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "wait" # still working -> keep polling
def test_real_errored_state_escalates_when_nothing_pushed(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# The real failure state is "errored" (not "error"); with nothing pushed it
# is a pre-push escalation, not a freeze.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0", "latestTurn": {"state": "errored"}}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "escalate_prepush"
def test_thread_present_but_no_turn_yet_waits(
fake_t3, fake_tracker, fake_ci, fake_notifier, make_issue, make_config
):
# A freshly-created thread has no latestTurn -> no usable status yet -> WAIT.
issue = make_issue(number=7, repo="infra")
fake_t3.set_snapshot({"threads": [{"id": "thread-0"}]})
result = _watcher(fake_t3, fake_tracker, fake_ci, fake_notifier).tick(
_run(issue, commit=None), make_config()
)
assert result.action.value == "wait"
# --------------------------------------------------------------------------- #
# Terminal cleanup only happens once / cleanly: a terminal tick posts exactly
# one checklist comment (no double-commenting on the way out).